PASC 2021:  E. Wright, D. Przybylski, M. Rempel, C. Miller, S. Suresh, S. Su, R. Loft and S. Chandrasekaran

The MURaM (Max Planck University of Chicago Radiative MHD) code is a solar atmosphere radiative MHD model that has been broadly applied to solar phenomena ranging from quiet to active sun, including eruptive events such as flares and coronal mass ejections. The treatment of physics is sufficiently realistic to allow for the synthesis of emission from visible light to extreme UV and X-rays, which is critical for a detailed comparison with available and future multi-wavelength observations. This component relies critically on the radiation transport solver (RTS) of MURaM; the most computationally intensive component of the code. The benefits of accelerating RTS are multiple fold: A faster RTS allows for the regular use of the more expensive multi-band radiation transport needed for comparison with observations, and this will pave the way for the acceleration of ongoing improvements in RTS that are critical for simulations of the solar chromosphere.
We present challenges and strategies to accelerate a multi-physics, multi-band MURaM using a directive-based programming model, OpenACC in order to maintain a single source code across CPUs and GPUs. Results for a 288x288x288 test problem show that MURaM with the optimized RTS routine achieves 1.73x speedup using a single NVIDIA V100 GPU over a fully subscribed 40-core Intel Skylake CPU node and with respect to the number of simulation points (in millions) per second, a single NVIDIA V100 GPU is equivalent to 69 Skylake cores. We also measure parallel performance on up to 96 GPUs and present weak and strong scaling results.

CPU-GPU strong scaling of a multi-band 288x288x288 dataset

CPU-GPU strong scaling of a multi-band 288x288x288 dataset. Shown is the code performance (measured in million gridcell updates per second) as function of the number of V100 GPUs or fully subscribed Skylake CPU nodes (40 cores). A single NVIDIA V100 GPU equals about 69 CPU cores in compute performance.