Towards an ultra efficient kinetic scheme. Part III: High-performance-computing

Dimarco, Giacomo; Loubère, Raphaël; Narski, Jacek

doi:10.1016/j.jcp.2014.12.023

In this paper we demonstrate the capability of the fast semi-Lagrangian scheme developed in [20] and [21] to deal with parallel architectures. First, we will present the behaviors of such scheme on a classical architecture using OpenMP and then on GPU (Graphics Processing Unit) architecture using CUDA. The goal is to prove that this new scheme is well adapted to these types of parallelizations, and, moreover that the gain in CPU time is substantial on nowadays affordable computers. We first present the sequential version of our high-order kinetic scheme and focus on important details for an effective parallel implementation. Then, we introduce the specific treatments and algorithms which have been developed for an OpenMP and CUDA parallelizations. Numerical tests are shown for the full 3D/3D simulations. These assess the important speed-up factor of the method gained between the sequential code and the parallel versions and its very good scalability which makes this approach a real competitor with respect to existing schemes for the solution of multidimensional kinetic models.