Examination of Computational Performance and Potential Applications of a Global Numerical Weather Prediction Model MPAS Using KISTI Supercomputer NURION

: To predict extreme weather events, we conducted high-resolution global atmosphere modeling and simulation using high-performance computing. Using a new-generation global weather/climate prediction model called MPAS (Model for Prediction Across Scales) with variable resolution, we tested strong scalability on the KISTI (Korea Institute of Science and Technology Information) supercomputer NURION. In addition to assessing computational performance, we simulated three typhoons that occurred in 2019 to analyze the forecast accuracy of MPAS. MPAS results were also applied to force an ADCIRC (The Advanced CIRCulation) + SWAN (Simulating Waves Nearshore) model to predict coastal ﬂooding over southern Korea. The time-integration of MPAS showed excellent scalability up to 4096 cores of NURION KNL (KNight Landing) nodes, but a serious I/O bottleneck issue was still found after trying two additional I/O strategies (i.e., adjusting the stripe count and using a burst buffer). On the other hand, the forecast accuracy of MPAS showed very encouraging results for wind and pressure during typhoons. ADCIRC+SWAN also generated a good estimate of signiﬁcant wave height for typhoon Mitag. The proposed variable-resolution MPAS model, under an efﬁcient computational environment, could be utilized to predict and understand the highly nonlinear chaotic atmosphere and coastal ﬂooding in typhoons.


Introduction
Extreme weather events tend to occur more frequently and severely under climate change with global warming [1,2]. Indeed, three severe typhoons-Bavi, Maysak, and Haishen-hit the Korean Peninsula in 2020 and caused significant damage to human lives and property. In addition, the longest period of Changma (Korean summer monsoon) since 1973 also occurred in this year, with extremely heavy rainfall records and the associated damage (e.g., landslide and flooding) before the landfall of these three typhoons. The summer of 2020 saw the largest number of fatalities due to the Changma in nine years, at 50 deaths. Therefore, it is essential to predict such extreme weather phenomena in advance to secure time for coping with such natural disaster. However, conventional modeling and simulation to predict weather and climate has difficulties to better resolve these events of extremely nonlinear and chaotic nature. To better understand and project future weather and climate, it is necessary to deal with seamless atmospheric states [3] interacting between small and large scales of natural behavior in time and space. Hence, there have been recent efforts to develop a new numerical weather prediction (NWP) model able to consider nature seamlessly at both small and large scales. Moreover, to better resolve such highly nonlinear extreme weather events with unstable climate regimes, it is necessary to operate the global NWP models with temporally and spatially high resolution, which requires 2 of 14 proportionally increasing computational resources. That is, such a detailed forecast from a fine-grid modeling system cannot be done in a timely manner without the technical support of high-performance computing (HPC) [4,5]. Therefore, in this study we explored the computational efficiency of a new-generation global NWP model using the Korea Institute of Science and Technology Information (KISTI) supercomputer NURION and its possible application to the prediction of natural disasters caused by typhoons.
The NWP model that we chose was the model for prediction across scales (MPAS), developed by the National Center for Atmospheric Research (NCAR) in the US [6]. Several reports have confirmed that MPAS has a good scalability for parallel computing [7,8]. However, was still necessary to determine whether such scalability would be valid with the system that we planned to utilize, the KISTI supercomputer NURION, of which the specifications are not trivial in comparison with those of other systems. It is based on many-core processors, but each core has a relatively low clock speed of 1.4 GHz (described in Section 2.2). Therefore, we tested the strong scalability of MPAS with NURION. We also selected three typhoon cases that occurred in 2019 to see how well MPAS could predict severe weather events over the Korean Peninsula. That is, we tested both the computational efficiency and the forecast accuracy of MPAS for typhoon cases in this study. In addition, coastal flooding along the southern coast of Korea tends to be caused by the combination of rising water levels and waves during typhoons. Therefore, we estimated coastal flooding that the advanced circulation + simulating waves nearshore (ADCIRC+SWAN) model provides using the forecast of atmospheric states from MPAS as the input data. Although we could expand our experiments to more typhoon cases for a more robust analysis of performance, results from three typhoon cases in 2019 still gave us a reasonable examination of forecast accuracy and computational efficiency in this paper.
The methodology is presented in Section 2, including the MPAS model, the experimental design of the scalability tests, and the application of MPAS simulation to coastal flooding estimation for typhoon cases. The results obtained by the scalability test of MPAS on NURION, MPAS typhoon simulation, and its coastal flooding estimation are presented in Section 3. Then, we summarize our findings and discuss our study and future plans in Section 4.

MPAS Model
A defining feature of MPAS is the model grid structure, centroidal Voronoi meshes that have mostly hexagonal but also pentagonal and 7-sided cells. It is also possible to set smooth conforming meshes that make it possible to selectively consider high resolution over the area of interest without any loss of numerical accuracy. Although running MPAS requires more computational resources than running any regional model with global boundary data, we know that the inconsistency of grid and climatology between the regional and global models introduces significant boundary errors. Indeed, Park et al. (2014) [9] show how the boundary error can destroy wave simulations in a comparison between MPAS and WRF [10] under an idealized environmental setting. The smooth grid transformations of MPAS have the obvious advantages of seamless simulations of weather and climate. Figure 1 shows the horizontal meshes with 60-15 km horizontal resolution used in this study. There were 535,554 cells globally in the horizontal. We obtained the meshes shown in Figure 1 after rotating the default 60-15 km meshes given by NCAR toward the Western Pacific and East Asian regions to better reflect typhoons and the related atmospheric states there. Areas around the Korean Peninsula and typhoon tracks could be resolved with 15 km resolution, while the rest of the globe was computed with a 60 km resolution. We set the number of vertical levels to 80 with a 50 km top height. MPAS integrates prognostic equations of fully compressive non-hydrostatic equations for the variables of wind, temperature, humidity, and pressure [6] at every mesh at every integration time step (90 s in this study). All MPAS simulations in this study started from analysis data given by the NCEP Global Forecast Model (GFS) [11]. Here, NCEP GFS early analysis data were used for the initial condition, and were re-gridded onto MPAS meshes. Table 1 shows the physical parameterization used for our experiments of MPAS version 7.0. Because this research is focused on predicting the tracks and intensities of typhoons hitting the Korean Peninsula and testing the scalability of MPAS with the KISTI supercomputer NURION, we leave the additional identification of useful parameters and settings for other interesting cases as future research topics. pheric states there. Areas around the Korean Peninsula and typhoon tracks could be resolved with 15 km resolution, while the rest of the globe was computed with a 60 km resolution. We set the number of vertical levels to 80 with a 50 km top height. MPAS integrates prognostic equations of fully compressive non-hydrostatic equations for the variables of wind, temperature, humidity, and pressure [6] at every mesh at every integration time step (90 s in this study). All MPAS simulations in this study started from analysis data given by the NCEP Global Forecast Model (GFS) [11]. Here, NCEP GFS early analysis data were used for the initial condition, and were re-gridded onto MPAS meshes. Table 1 shows the physical parameterization used for our experiments of MPAS version 7.0. Because this research is focused on predicting the tracks and intensities of typhoons hitting the Korean Peninsula and testing the scalability of MPAS with the KISTI supercomputer NURION, we leave the additional identification of useful parameters and settings for other interesting cases as future research topics.

Scalability Tests of MPAS on NURION
KISTI's 5th supercomputer NURION has 25.7 petaflops of computational speed, and it was the world's 11th fastest computer when it was launched in June 2018 [12]. There are two distinct computing resources in NURION: one is 8305 computing nodes with Intel Xeon Phi processors (Knight Landing, KNL hereafter), and the other comprises 132 nodes with Intel Xeon processors (Skylake). The main system of NURION is composed of KNL processors with a total computing power of about 25.3 petaflops, which is why we tested KNL nodes in this study. Each KNL node consists of 68 cores, but we used only 32 cores per node when integrating MPAS. Using all cores per node usually does not guarantee the best computing performance because memory-bounded delay and administrative processes per node can interrupt the computation of user applications [13].
We made 10 day MPAS forecasts using 512, 1024, 2048, and 4096 cores on KNL nodes to determine how scalable the MPAS was with our configuration with NURION. That is, we measured the computational time after solving a fixed problem varying the number of cores, which is called a strong scalability test. Profiling the MPAS computational performance gave the timing information about the main time-consuming parts so that we could analyze them to share our experience with the community. When using thousands of cores for a massive computing problem, I/O (input/output; i.e., reading and printing out the files onto a file storage system) bottleneck issues can arise because all the cores try to access one file for reading or writing in a shared file system. As a possible solution for the potential I/O bottleneck issue, NURION introduced a burst buffer, which is an I/O buffer between an application and the file system. Thus, we also briefly investigated the I/O performance with a burst buffer. Moreover, we tested various stripe counts for the I/O on the shared file system. When building MPAS on NURION, we included HDF5 (Hierarchical Data Format 5), NetCDF (Network Common Data Form), and PIO (high-level parallel I/O) libraries. We integrated MPAS with only MPI (Message Passing Interface) processes, and we did not use OpenMP threads.

Potential Application of MPAS to Coastal Flooding
For three typhoon cases in 2019, we conducted five experiments per typhoon to test the predictability of MPAS with different starting times of its integration. That is, we set the start time of MPAS integration as 6, 12, 24, 36, and 48 h before the target period of each typhoon. The target period was defined by the period when each typhoon affected the Korean Peninsula directly. The list of typhoons and experimental periods is presented in Table 2, and the typhoon tracks (records of typhoon center movement in time and space) given by JMA/RSMC (Japan Meteorological Agency/The Regional Specialized Meteorological Center) are shown in Figure 2. From these experiments, we expected to find out how the forecast accuracy of MPAS tended to change with increasing forecast lead time.   In addition, we investigated the response of water levels and waves during storms using ADCIRC+SWAN with the forecast data from MPAS 10 m wind and sea-level pressure as input. The ADCIRC+SWAN model is a combination of the ADCIRC and SWAN models [14]. ADCIRC is a two-dimensional depth-integrated model to simulate water levels and currents [15,16], and SWAN is a third-generation spectrum wave model to solve the wave action balance equation and gains wave parameters by integrating a 2D wave energy spectrum in the frequency and direction domains [17,18]. The ADCIRC+SWAN model has often successfully simulated storm surges and waves due to historical typhoons over the western coast of Korea [19]. Therefore, we applied the model to a recent typhoon case over the southern coast of Korea using atmospheric input data from MPAS. In addition, we investigated the response of water levels and waves during storms using ADCIRC+SWAN with the forecast data from MPAS 10 m wind and sea-level pressure as input. The ADCIRC+SWAN model is a combination of the ADCIRC and SWAN models [14]. ADCIRC is a two-dimensional depth-integrated model to simulate water levels and currents [15,16], and SWAN is a third-generation spectrum wave model to solve the wave action balance equation and gains wave parameters by integrating a 2D wave energy spectrum in the frequency and direction domains [17,18]. The ADCIRC+SWAN model has often successfully simulated storm surges and waves due to historical typhoons over the western coast of Korea [19]. Therefore, we applied the model to a recent typhoon case over the southern coast of Korea using atmospheric input data from MPAS. The computing area of ADCIRC+SWAN in this study is shown in Figure 3; the figure also includes unstructured grid points, with 30 km to 10 m variable resolution, used for the model discretization. Figure 4 shows a flow chart of our process with the numerical models and observations using the supercomputing resources in brief.

Scalability Tests
One could expect that the computational time would be inversely proportional to the number of central processing unit (CPU) cores. In reality, however, computational performance does not necessarily follow this pattern. There could be various reasons, but performance depends mainly on whether the application code is parallelized well enough to utilize available multiple cores and/or how well the HPC environment has been established. Figure 5 shows results from the MPAS integration using 512, 1024, 2048, and 4096 KNL cores in NURION. The total integration time tended to decrease as the number of cores increased, although it did not lead to a proportional reduction of computing time. When we looked at the list of computing processes involved in the initialization, integration, and other processes, we found that the main integration of model equations gave very good scalability up to 4096 cores, while initialization and other processes caused a degradation of scalability. In the MPAS simulations, the PIO library let some of the cores (not all of the cores used for computation) be in charge of I/O, and there was communication among those I/O cores.

Scalability Tests
One could expect that the computational time would be inversely proportional to the number of central processing unit (CPU) cores. In reality, however, computational performance does not necessarily follow this pattern. There could be various reasons, but performance depends mainly on whether the application code is parallelized well enough to utilize available multiple cores and/or how well the HPC environment has been established. Figure 5 shows results from the MPAS integration using 512, 1024, 2048, and 4096 KNL cores in NURION. The total integration time tended to decrease as the number of cores increased, although it did not lead to a proportional reduction of computing time. When we looked at the list of computing processes involved in the initialization, integration, and other processes, we found that the main integration of model equations gave very good scalability up to 4096 cores, while initialization and other processes caused a degradation of scalability. In the MPAS simulations, the PIO library let some of the cores (not all of the cores used for computation) be in charge of I/O, and there was communication among those I/O cores. Computing time for initialization mainly includes reading the initial data file of about 6.3 GB, while the time for the other processes (referred to as "other." in Figure 5) includes printing out the model output files. The output files are diagnostic, history, and restart files of 226 MB, 7.2 GB, and 16.0 GB file size, respectively, and the current setting of MPAS lets the model generate a diagnostic output file every three hours and the other two output files every six hours. Therefore, massive computing consists of not only computing the time integration of equations but also printing out many large files. Because NURION has a shared file system (Lustre), it is possible to have a performance reduction with many cores. This issue is not only found in this study. There have been several studies to raise and try to overcome the I/O bottleneck issue [20,21]. Indeed, one report confirmed that KNL nodes like those of NURION tend to have even lower I/O performance than Xeon or other CPUs [22]. We conducted small experiments testing the performances of the MPAS initialization processes between KNL and Skylake (Xeon) nodes. Using two nodes of KNL and Skylake with 32 cores per node, we found that the initialization and writing processes with KNL nodes were respectively slower by factors of 7.3 and 8.0 in comparison to those with Skylake nodes. Even if we consider that a Skylake core has a clock speed 1.7 times faster than that of a KNL, the I/O slowness appears to be significant.
We conducted two more experiments with different I/O strategies using KNL nodes; in one we adjusted the stripe count, and in the other we used a burst buffer storage layer (Table 3). In our case, the default stripe count for KNL nodes was 4 with NURION, but we tried 16. This slightly accelerated the initialization for the cases with 512 and 1024 cores, but its effects became negative with 2048 cores. On the other hand, the use of a burst buffer also had quite a positive impact on accelerating the initialization for the cases with 512 and 1024 cores, which was even better than the cases using the stripe count of 16. However, with 2048 cores, the burst buffer did not have any positive results. Therefore, it would be necessary to optimize I/O processes with additional technical approaches in the future.

Potential Application of MPAS to Coastal Flooding
We conducted 15 simulations of three typhoons that occurred in 2019 with MPAS variable-resolution modeling, which covered East Asia and the Western Pacific with a high resolution of 15 km. Figure 6 shows some examples of surface wind speed and sea-level pressure forecasts of each typhoon given by MPAS. We can see the maximum wind speed and the minimum sea-level pressure around the center of typhoons located in the southwest of the Korean Peninsula. After successive simulations of typhoon cases, we evaluated the MPAS forecasts with observation data provided by Korea Meteorological Administration (KMA), represented by red triangles in Figure 2. Here, all observational records were collected at one hour intervals. Figure 7 compares the MPAS forecasts with observations in terms of 10 m wind speed and sea-level pressure for typhoon Mitag, which caused serious damage over southern Korea. Overall, the MPAS forecast showed a very good agreement of near-surface wind and pressure with observations. Interestingly, the forecast accuracy was not degraded even when the forecast started 42 h earlier. That is, we could obtain a reliable forecast from MPAS with a relatively long lead time of about two days, although we need to verify the forecast with many more cases to confirm this conclusion.
We also verified the wind direction forecast with six observation stations (Geojedo, Geomundo, Marado, Seogwipo, Tongyeong, and Ulsan). For a comparison of absolute error for wind direction, we computed the difference as follows: where O wdir and F wdir denote the observed wind direction and the predicted wind direction, respectively. As a result, we also confirmed that the forecast accuracy of wind direction did not significantly worsen as the forecast lead time increased to 30 h, as seen in Table 4. The 36 h forecast showed a difference in wind direction of about 23.14 degrees on average, which is not sufficiently high to ruin our analysis. Therefore, we can conclude that MPAS forecast of meteorological variables is valuable, with about two days of forecast lead time for the typhoon cases examined in this study. The number of samples in Table 4 is different because some observation data were missing during the verification period, which is common during severe weather.
Using the MPAS forecast data as input, ADCIRC+SWAN modeling was performed for the case of Mitag. The 10 m wind and sea-level pressure from MPAS forced AD-CIRC+SWAN to simulate the response of water level and waves. Spin-up of ADCIRC+SWAN inherited from MPAS took about 0.2 days. After the spin-up, we analyzed the result for a target period of the typhoon. We compared the estimates of ADCIRC+SWAN and observation data for significant wave height, as shown in Figure 8. Results from the experiments using three different MPAS outputs (with three different initial simulation times: 12 UTC on Sep. 30, 00 and 12 UTC on Oct. 1) were compared with three moored buoy observa-tion stations-Tongyeong, Geojedo, and Ulsan-at every hour. Overall, ADCIRC+SWAN forced by MPAS showed very good agreement with observation data in terms of significant wave height. The magnitude and tendency of the significant wave height variables were accurately simulated; hence, the peak value was also very well captured at the right time. Interestingly, the experiment starting from two days before the typhoon landfall gave the best estimates in the verification with three stations because a spin-up period of ADCIRC+SWAN may be necessary to some extent. This finding is encouraging because it confirms that we could have more than two days to prepare in advance of relevant disasters with the help of this prediction system.

Summary and Discussion
We assessed a new-generation weather and climate numerical prediction model, MPAS, with the variable resolution of 60-15 km to simulate the atmospheric states over East Asia and the Western Pacific. Modeling and simulation with high resolution in time and space require excessive computing resources; hence, HPC with efficient parallel computing is essential to make such a computing application feasible. Therefore, we tested the strong scalability of MPAS using the KISTI supercomputer NURION's KNL compute nodes. The main integration of the prognostic equation showed excellent scalability up to 4096 cores, but I/O bottleneck phenomena were found with thousands of cores. Computational performance slows down when many cores and nodes try to access the initial file and the output files stored in the shared file system; thus, the reading and writing processes become slower as the number of cores increases.
To reduce the I/O slowdown, we attempted two different I/O strategies; one was to adjust the stripe count for the Lustre file system, and the other was to utilize a burst buffer storage layer. Both methods achieved slightly better I/O performance with 512 and 1024 cores. However, these strategies did not solve the problem fundamentally, and the case with 2048 cores showed slightly worse results than the default cases. This suggests that the technical improvement of I/O within massive computer clusters should be highlighted as much as the development of better CPU cores. In the meantime, we will explore better ways to use the current file system as efficiently as possible in future research.
We investigated the forecast accuracy of the 60-15 km MPAS model for three typhoons that occurred in 2019. In addition, the forecast results from the MPAS were applied to the wave model of ADCIRC+SWAN to predict possible coastal flooding over southern Korea. The MPAS typhoon forecast showed very good agreement with the observation data of nearsurface wind and sea-level pressure. Moreover, ADCIRC+SWAN forced by MPAS wind and pressure also provided accurate estimates of significant wave height around southern Korea. The forecast skill of MPAS and ADCIRC+SWAN was maintained with more than two days of forecast lead time, which could help communities prepare for such natural disasters in advance. Interestingly, new approaches such as AI (Artificial Intelligence)based approaches to predicting typhoons/hurricanes utilizing both physical principles and available relevant data have recently been introduced [23][24][25], while the traditional modeling and simulation methods used in this study are still being developed. Those approaches could be considered as the next methodology to solve prognostic problems for events with a highly nonlinear and chaotic nature.