Time-Lagged Ensemble Quantitative Precipitation Forecasts for Three Landfalling Typhoons in the Philippines Using the CReSS Model, Part I: Description and Veriﬁcation against Rain-Gauge Observations

: In this study, the 2.5 km Cloud-Resolving Storm Simulator was applied to forecast the rainfall by three landfalling typhoons in the Philippines at high resolution: Mangkhut (2018), Koppu (2015), and Melor (2015), using a time-lagged strategy for ensemble. The three typhoons penetrated northern Luzon, central Luzon, and the middle of the Philippine Archipelago, respectively, and the present study veriﬁed the track and quantitative precipitation forecasts (QPFs) using categorical statistics against observations at 56 rain-gauge sites at seven thresholds up to 500 mm. The predictability of rainfall is the highest for Koppu, followed by Melor, and the lowest for Mangkhut, which had the highest peak rainfall amount. Targeted at the most-rainy 24 h of each case, the threat score (TS) within the short range ( ≤ 72 h) could reach 1.0 for Koppu at 350 mm in many runs (peak observation = 502 mm), and 1.0 for Mangkhut and 0.25 for Melor (peak observation = 407 mm) both at 200 mm in the best member, when the track errors were small enough. For rainfall from entire events (48 or 72 h), TS hitting 1.0 could also be achieved regularly at 500 mm for Koppu (peak observation = 695 mm), and 0.33 at 350 mm for Melor (407 mm) and 0.46 at 200 mm for Mangkhut (786 mm) in the best case. At lead times beyond the short range, one third of these earlier runs also produced good QPFs for both Koppu and Melor, but such runs were fewer for Mangkhut and the quality of QPFs was also not as high due to larger northward track biases. Overall, the QPF results are very encouraging, and comparable to the skill level for typhoon rainfall in Taiwan (with similar peak rainfall amounts). Thus, at high resolution, there is a fair chance to make decent QPFs even at lead times of 3–7 days before typhoon landfall in the Philippines, with useful information on rainfall scenarios for early preparation.


Introduction
Located at the boundary of tectonic plates in the tropical western North Pacific, the Philippines is among the nations with highest risk of natural disasters and is under constant threat of volcanic eruptions, earthquakes, typhoons, tsunami, and other hazards such as sea level rise, e.g., [1][2][3][4]. As the most exposed country in the world to tropical cyclones (TCs) [5], the Philippines often experience floods, inundation, landslides, and storm surges in the category of meteorological hazards, e.g., [6][7][8][9][10], all mainly caused by the frequent landfall of TCs, or typhoons in the western North Pacific, e.g., [9][10][11][12][13][14][15]. Therefore, accurate quantitative precipitation forecasts (QPFs) for typhoons are urgently needed in order to help prevent and reduce the impacts of hazards from excess typhoon rainfall in the Philippines, especially under the current trend toward a warmer climate [12,[16][17][18][19][20]. At the present time, however, QPFs over heavy-rainfall thresholds are still very challenging around the world, e.g., [21][22][23], and few studies have been carried out to verify typhoon QPFs objectively in the Philippines [24,25].
Just north of the Philippines across the Luzon Strait, Taiwan has similar geological characteristics and also its fair share of heavy rainfall from TCs [26]. Past studies have shown that due to its steep topography, TC rainfall is enhanced in Taiwan and its distributions over short periods are highly dictated by the location of the storm relative to the island [26][27][28][29], and therefore the total rainfall is linked to the track of the TC (including moving speed). Such an orographic phase-locking effect gives the potential for more accurate QPFs if realistic tracks can be predicted, and hence raises the predictability of TC rainfall in Taiwan. Based on this concept, the TC rainfall in Taiwan can be predicted from rainfall climatology constructed from past typhoons using forecast or projected tracks [30,31]. Similarly, the ensemble typhoon QPF model developed in [32] made use of available lagged ensemble predictions for the same TC, and thus can take into account the stochastic nature of model predictions, e.g., [33,34], as well as the specific characteristics of individual storms. In both cases, the predicted TC tracks must be accurate enough for the QPFs to be successful. The performances of such ensemble prediction systems are also evaluated for their probabilistic forecasts and potential applications, e.g., [35][36][37].
One other requirement for models to improve their typhoon QPFs over complex terrain is high resolution, preferably with a large enough fine domain size, as demonstrated in recent years in Taiwan. At the grid size (∆x) of 2.5 km, close to the recommendation for operational use [38], real-time forecasts by the Cloud-Resolving Storm Simulator (CReSS) [39,40] show improved 24 h QPFs in Taiwan at the short range (within three days) for typhoons from 2010-2015 [41][42][43][44][45], linked to a better capability to resolve both the TC convection and local topography. Using objective categorical measures, for example, the overall threat scores (TSs) at the threshold value of 350 mm (per 24 h), defined as the fraction of correct forecast of events (i.e., hits) in the union area of either the observation or prediction to reach that threshold (thus, 0 ≤ TS ≤ 1, see Section 2.2 for details), are 0.28, 0.25, and 0.15 on day 1 (0-24 h), day 2 (24-48 h), and day 3 (48-72 h), respectively [44]. In addition, such scores also exhibit a clear dependency on the observed rainfall magnitude in Taiwan, i.e., the more rain, the higher the TSs (at the same thresholds) [41,43,44], mainly for two reasons: the hits are easier to achieve when the rainfall area is larger in size over the island, and the larger (rainier) events tend to be under stronger forcing that the model can also capture at a sufficient resolution. Therefore, for the most-rainy top 5% of samples, the TSs at 350 mm on days 1-3 are roughly 0.1 higher than their counterparts for all TCs [44]. In some cases where a good track can be predicted, high TSs may be achieved even at a longer lead time, e.g., TS = 0.45 at 350 mm on day 3 for Typhoon (TY) Fanapi (2010), TS = 0.6 at 350 mm on day 3 and 0.87 at 500 mm on day 1 for TY Megi (2010) [41], TS > 0. 38 at 750 mm at all three ranges of days 1-3 for TY Soulik (2013) [44], and TS ≥ 0.28 at 500 mm on days 1-3 and TS = 0.36 at 750 mm on day 1 for TY Soudelor (2015) [45]. Note that in these examples of individual forecasts, TS must drop to zero once the threshold exceeds the observed or predicted peak rainfall (as no hits can be produced), and all the TCs here were very rainy and produced a peak 24 h amount of 842-1110 mm.
As large computational resources are necessary to achieve high resolution for a sufficiently large domain [46], it is not ideal to divide the resource and run a multi-member ensemble system, e.g., [47][48][49], to obtain probability information. To cope with this situation, the time-lagged ensemble, e.g., [50][51][52][53][54], using a longer forecast range has been recommended by Wang et al. [55,56], and this allows not only for the ensemble information but also the possibility of realistic, high-quality QPFs at lead times beyond 3 days. This is particularly suitable for TCs, which in general have relatively long lifespans of at least one week, e.g., [57]. Using daily runs initialized at 0000 UTC, the system was able to produce good 24 h QPFs at lead times beyond 72 h in more than half of the six TC week, e.g., [57]. Using daily runs initialized at 0000 UTC, the system was able to produce good 24 h QPFs at lead times beyond 72 h in more than half of the six TC examples in 2012-2013 [55]. These included TS = 0.21 at 200 mm on day 8 (168-192 h) for TY Talim (observed peak = 415 mm), TS = 0.5 at 350 mm on day 6 (120-144 h) for TY Saola (peak = 884 mm), TS = 0.5 at 200 mm also on day 6 for TY Jelawat (peak = 216 mm), and TS = 0.25 at 200 mm on day 7 (144-168 h, peak = 426 mm) and also on day 4 (72-96 h, for a different day, peak = 367 mm) for TY Kong-Rey (2013), all better than many QPFs made within the short range [55]. Recently, lagged runs every 6 h were also tested on three rainy TCs [58,59] and TY Morakot (2009), the most hazardous typhoon to hit Taiwan in five decades [60][61][62]. For two of them, the lead time of decent QPFs, with a similarity skill score (SSS, 0 ≤ SSS ≤ 1, also, see Section 2.2 for definition) of at least 0.75 against the observed rainfall pattern, can be significantly extended to more than 5.5 days prior to the starting time of rainfall accumulation (typically before landfall) [58,59]. For TYs Morakot (2009) and Soulik (2013), however, a better QPF could not be obtained beyond the short range due to the limitation by track errors [59], consistent with previous studies [61][62][63][64][65]. In any case, the evolution of heavy-rainfall probabilities with time for all typhoons can provide very useful information for hazard preparation at the earliest time possible, at a fraction of computational cost compared to the multi-member ensemble [58,59,62].
Given the above development of high-resolution time-lagged ensemble in recent years and the similarity of TC rainfall in Taiwan and the Philippines, e.g., [66], the purpose of the present study is therefore to apply the same strategy to typhoons making landfall at the Philippines and to evaluate the performance in the QPFs. In order to make a comparison with typhoon QPFs in Taiwan, similar methods of verification will be used, including the categorical matrix. From four seasons of 2015-2018, three typhoons are eventually chosen for study: Mangkhut (2018), Koppu (2015), and Melor (2015), as they were all very rainy (≥300 mm) but made landfall in different parts of the Philippine Archipelago ( Figure 1a). Each event also yielded ≥200 mm of rainfall in the Manila region, and thus posed high risk of flooding. In fact, the time-lagged ensemble was also applied to forecast the landfall intensity of Supertyphoon Haiyan (2013), and consistently produced peak surface winds reaching 73 m s −1 and minimum central mean sea level pressure below 900 hPa at (or near) landfall from about 48 h prior to its occurrence [15]. However, this study focuses on TC intensity instead of QPFs, since the main hazard from Haiyan was the storm surge. (center positions given every 6 h, with dates marked at 0000 UTC). The major islands and the capital city of Manila are labeled. (b) The topography (m, color), major mountains, locations of rain gauges (red dots) around the Philippines, and the small domains used to compute the SSS. Rain-gauge sites mentioned in text are labeled. The color scales for the topography are the same in the two panels.
The remaining part of this paper is arranged as follows. In Section 2, the CReSS model, numerical experiments, and the methodology for the verification of QPFs are described. The results of the time-lagged CReSS ensemble for the three typhoons of Mangkhut (2018), Koppu (2015), and Melor (2015) are presented and discussed in Sections 3-5, respectively. Further comparison with QPF results in Taiwan and discussion are given in Section 6, and the conclusions are given in Section 7.

The CReSS Model and Experiments
The CReSS model (version 3.4.2) was used to perform all experiments in this study. Developed at Nagoya University, Japan [39,40], it is a cloud-resolving model with a compressible and non-hydrostatic equation set, a single domain of uniform grid size, and a terrain-following vertical coordinate based on height. The model treats all clouds explicitly using a bulk cold-rain scheme (Table 1) with six water species of vapor, cloud water, cloud ice, rain, snow, and graupel [40,[67][68][69][70][71], without the use of cumulus parameterization. The sub-grid scale parameterized processes (Table 1) include the turbulence in the planetary boundary layer [72,73] and the radiation, momentum, and energy fluxes at the surface [74,75], with the aid of a substrate model [40]. Readers are referred to [39,40] and some earlier studies [15,41,44,55,62] for further details. As in [55,62], the horizontal grid size was set to 2.5 km, with a domain size of 2160 × 1740 km 2 and 50 vertical levels (Table 1, Figure 1a). This allows most TCs to enter the high-resolution domain at about 2.5-4 days before landfall, and thus a potential for the model to produce high-quality QPFs at lead times beyond the short range. The Global Forecast System (GFS) gross analyses and forecasts at the National Center for Environmental Prediction (NCEP), USA [76,77], were used as the initial and boundary conditions (IC/BCs) of our CReSS runs (Table 1). With a resolution of 0.5 • × 0.5 • at 26 levels, these data are freely available in real time and were routinely accessed. During the case periods of the three selected typhoons, CReSS experiments were run every 6 h out to a range of eight days (192 h), but some earlier runs with larger track errors were skipped if deemed unnecessary. Since only data available before each run were used, our results here represent what would have been achieved in real time.

Observational Data and Verification of Model QPFs
The performance of the model hindcasts were examined in the following aspects. The JTWC best-track data were used to verify the TC tracks. At selected times, the retrievals of rainfall intensity from the Tropical Rainfall Measuring Mission (TRMM) satellites at 3 h intervals [78], available from the Naval Research Laboratory (NRL), USA were also employed to compare with the rainfall structure of model TCs when such observations were available. To verify model QPFs, three-hourly rainfall observations from a total of 56 rain gauges over the Philippines (Figure 1b), provided by the Philippine Atmospheric, Geophysical and Astronomical Services Administration (PAGASA), were also used. The data were summed over the selected accumulation periods of 24, 48, or 72 h for the verification.
In this study, the categorical statistics based on the 2 × 2 contingency table, e.g., [79][80][81], are adopted, so as to allow easy comparisons with previous studies in Taiwan. As shown in Table 2, the outcome of a rainfall forecast versus observation to reach a specified threshold (called an event) over a given period at any verification point can be one (and only one) of four possibilities: hit (H), miss (M), false alarm (FA), and correct negative (CN). Among a set of N total points (N = H + M + FA + CN), the correct forecasts are H and CN, while the incorrect ones are M and FA. From the table, several widely used skill scores such as the probability of detection (POD, of observed events), success ratio (SR, of forecast events), TS, and bias score (BS) can be computed [79][80][81], respectively, as Thus, from the equations, one can see that POD, SR, and TS are all bounded by 0 and 1, and the higher the better. Additionally, as the ratio of correct prediction of events to the union of all those either observed or predicted (or both), TS can be no higher than POD or SR (where (1)-SR is also known as the false alarm ratio, or FAR). On the other hand, the BS is the ratio of predicted to observed event points and thus reflects over-forecasting (if BS > 1) or under-forecasting (if BS < 1) by the model. Hence, its most ideal value is unity. By using the bilinear method, model QPFs were interpolated onto the rain-gauge locations ( Figure 1b) in this study, where the above scores in Equations (1)-(4) were computed. Later, the scores will be presented using the performance diagram [82]. In this study, QPFs accumulated over both 24 h and longer periods (48 or 72 h) for entire events are verified, at threshold amounts of 0.05, 10, 50, 100, 200, 350, and 500 mm when applicable.
Besides the categorical matrix, the SSS that measures the overall similarity between two patterns, in this case the observed and predicted rainfall patterns in the Philippines, is defined and used as in [83]. It uses the mean squared error (MSE) as in the Brier score [80] or fractions skill score [84], and is formulated as where f i and o i are the forecast and observed rainfall at the ith point among N, respectively. In essence, SSS is the skill measured against the worst MSE possible, i.e., the one when no rainfall in the forecast overlaps with the observation and the second term on the right-hand side (RHS) of Equation (5) equals 1. Thus, 0 ≤ SSS ≤ 1 (the higher the better) and a score of 1 means f i = o i at all the points and the two patterns are exactly the same. As the QPFs of time-lagged forecasts are verified, the performance at the short range (≤72 h) and how many runs could produce good QPFs at longer lead times (beyond the short range) in each of the three typhoons are our main interests.

Track Forecast and Examples of Rainfall Structure
Using the method described in Section 2, a total of 19 hindcast experiments were carried out for the first case of TY Mangkhut (2018) at initial times of 0000 and 1200 UTC on 9-10 as well as every 6 h from 0000 UTC 11 to 1800 UTC 14 September (Figure 2a). While Mangkhut made landfall near 1800 UTC 14 September and moved across the northernmost part of Luzon at a pace of about 30 km h −1 (also Figure 1a), most of the hindcasts prior to 12 September had a northward bias in its track and the TC centers did not make landfall in these runs. Thus, the track errors in these earlier runs were largely cross-track in nature, and the along-track errors (errors in moving speed) were relatively small ( Figure 2a). The track errors gradually reduced with time (Figure 2b), and the one from the run made at 1800 UTC 11 September was only about 110 km near landfall, which is in fact very small with a lead time of 72 h. Starting from 12 September, all the track errors further reduced to within 80 km at landfall (Figure 2b), and inside 50 km in runs made at 0000 and 1800 UTC 13 and all those on 14 September. In fact, most runs from 13-14 September exhibited small southward bias in tracks during landfall. Thus, the track predictions of Mangkhut from the hindcasts were reasonable, and track errors were quite small within the short range (≤72 h).  In Figure 3, the rainfall structure of Mangkhut from one of the hindcasts, with t0 at 0600 UTC 13 September, is compared with TRMM rainrate retrievals at two selected times as examples. With a track error of about 60 km at landfall (Figure 2b), this particular experiment was among the ones that performed better in track prediction. With more details than the TRMM data, the CReSS model at x = 2.5 km can be seen to capture the double In Figure 3, the rainfall structure of Mangkhut from one of the hindcasts, with t 0 at 0600 UTC 13 September, is compared with TRMM rainrate retrievals at two selected times as examples. With a track error of about 60 km at landfall (Figure 2b), this particular experiment was among the ones that performed better in track prediction. With more details than the TRMM data, the CReSS model at ∆x = 2.5 km can be seen to capture the double eyewall structure and the size of the inner eyewall of Mangkhut near 0944 UTC 14 September before its landfall (Figure 3a,c). Later at 2134 UTC when the storm center was over northern Luzon near 18 • N during landfall, the model also reproduced its rainfall distribution quite well, with most intense rainfall to the southeast of the TC center and just off the northeastern tip of Luzon (Figure 3b,d). Although Figure 3 only includes two instances from one hindcast that was selected somewhat arbitrarily, similar quality of realistic TC rainfall structure is confirmed for other runs for Mangkhut (not shown), as also for other cases near Taiwan using comparable model configurations [41,44,45,55].

Model QPFs for Typhoon Mangkhut (2018)
The daily rainfall distributions on 14 September and those targeted for this day b produced by the 0000 UTC runs at different ranges (from long to short as labeled, head colors as in Figure 2a) are shown in Figure 4a-f (top row), while similar plots for 15 Se tember are shown in Figure 4g-l (second row). With a track through northern Luzon, northern half of the island received much more rain from Mangkhut than the southe half and the rest of the Philippines on 14 September, with an observed peak daily amou of 250.2 mm at Baguio (Figures 2b and 4f). With reasonable tracks, most predictions with day 4 also captured such characteristics in rainfall, with more rain along the Cordill Central Mountain (CCM) and the Sierra Madre Mountain (SMM, Figure 4b-e). Howev at the longest range on day 6, the experiment starting at t0 = 0000 UTC 9 September h most of the rainfall occurring offshore and little over Luzon (Figure 4a

Model QPFs for Typhoon Mangkhut (2018)
The daily rainfall distributions on 14 September and those targeted for this day but produced by the 0000 UTC runs at different ranges (from long to short as labeled, header colors as in Figure 2a at Baguio (Figures 2b and 4f). With reasonable tracks, most predictions within day 4 also captured such characteristics in rainfall, with more rain along the Cordillera Central Mountain (CCM) and the Sierra Madre Mountain (SMM, Figure 4b-e). However, at the longest range on day 6, the experiment starting at t 0 = 0000 UTC 9 September had most of the rainfall occurring offshore and little over Luzon (Figure 4a), obviously due to the relatively large northward track bias of around 350 km. On 15 September when Mangkhut was departing from the northwestern corner of Luzon (Figure 2a), the observed rainfall was mainly concentrated along the CCM in northwestern Luzon, with a peak of 535.6 mm again at Baguio, and diminished in the northeastern part of the island (Figure 4l). This change in rainfall pattern was also well reproduced by all the runs shown within day 4 ( Figure (Figure 4o,p). At longer lead times, for example, on days 5-6 (t0 = 0000 UTC 10 September), not enough rain fell in Luzon due to the larger track errors too far north ( Figure  4m). Plotted in Figure 4r to compare with the rain-gauge observations in Figure 4q, the 60 h total rainfall from the TRMM retrievals for the entire event of Mangkhut also indicates    (Figure 4o,p). At longer lead times, for example, on days 5-6 (t 0 = 0000 UTC 10 September), not enough rain fell in Luzon due to the larger track errors too far north ( Figure 4m). Plotted in Figure 4r to compare with the rain-gauge observations in Figure 4q, the 60 h total rainfall from the TRMM retrievals for the entire event of Mangkhut also indicates a maximum rainfall along the CCM near Baguio in northwestern Luzon, but the peak value is less than 400 mm and seemingly considerably underestimated. Overall, the visual comparison in Figure 4 suggests a good performance by the 2.5 km CReSS in predicting the rainfall of Mangkhut at ranges within 3-4 days, when the track errors became smaller.

Categorical Skill Scores and Rainfall Similarity
Through the use of categorical measures as described in Section 2.2, the QPFs for TY Mangkhut (2018) by the model can also be verified in an objective and quantitative fashion. Firstly, the results of all the hindcast runs for the 24 h target periods of 14 and 15 September are presented in Figure 5a,b, respectively, verified at the 56 rain gauges across the Philippines for six rainfall thresholds from 0.05 to 350 mm (per 24 h) using the performance diagram [82]. For the rainfall on 14 September (Figure 5a), one can see that the TS values tend to decrease with increasing thresholds, as they are mostly about 0.75-0.85 at 0.05 mm (per 24 h), 0.45-0.7 at 10 mm, 0.25-0.4 at 50 mm, and 0.1-0.3 at 100 mm, respectively, as typically found in the literature, e.g., [41][42][43][44][45]55]. As the qualified points become fewer toward the higher thresholds, the scores also tend to become more dispersed with a larger spread in the diagram. At 200 mm, a few runs were able to yield fairly high TS values (TS = 0.5 for those with t 0 = 1200 UTC 12, and 0000 and 1800 UTC 13 September; TS = 0.33 for the two with t 0 = 1800 UTC 12 and 0600 UTC 13 September). However, the TSs are zero in all other runs at 200 mm with no hits. As H > 0 are required for POD, SR, and TS to be above zero, all three scores must drop to zero in any forecast when the threshold goes beyond either the observed or predicted peak amount. Therefore, 350 mm is not a meaningful threshold to examine in Figure 5a. For the BSs, all data points are within 0.2 and 3.6, while most of them are inside 0.5-2.5. However, the earlier runs tended to have BSs < 1 due to the northward track bias, whereas the later runs tended to have BSs > 1 with some over-prediction associated with slight southward track bias (more inland).
For 15 September (Figure 5b), most points have BS values between 1.25 and 2.5 and none is below 0.8, so the over-prediction became somewhat more apparent for the second day when the observed peak amount (535.6 mm) more than doubled from one day before. The TSs of QPFs targeted on 15 September from many runs are about 0.55-0.65 at the lowest threshold of 0.05 mm (per 24 h), 0.3-0.55 at 10 mm, 0.1-0.35 at 50 mm, 0.1-0.33 at 100 mm, and mostly zero at 200 mm ( Figure 5b). Thus, despite a much higher observed peak rainfall amount on 15 September, the overall QPFs for this day appear no better in TS values. However, several experiments did perform quite well in TS toward the high thresholds in Figure 5b, such as the ones at 0600 UTC 13 (TS = 0.75 at 100 mm), 1800 UTC 13 (TS = 0.5 at 200 mm), and 0600 UTC 14 September (TS = 1.0 at 200 mm). Hence, some individual runs produced decent QPFs at high thresholds for 15 September, although the overall TSs did not seem to improve.
Next, the categorical skill scores of the hindcasts are examined in Figure 5c for the two-day total rainfall from Mangkhut over [14][15] Figure 5c). Due to a longer accumulation period and higher total rainfall amounts, these TS values are in general higher than those for individual days seen in Figure 5a,b (at the same threshold amounts of accumulation). However, still no hit was produced at 350 mm (per 48 h) by any of the runs, even though some experiments at shorter lead times are certainly able to produce rainfall totals reaching 600 mm near Baguio in Figure 4. Additionally, similar to the situation for 15 September, the data points for the 48 h QPFs again suggest some over-prediction, especially over the threshold range of 50-200 mm (per 48 h) with BS values from 1 to about 2-2.5 ( Figure 5c). Nevertheless, at low thresholds of ≤10 mm, most TSs are quite high (above 0.6) and the BSs (roughly from 0.95 to 1.3) are also fairly close to unity. Note that only two runs were executed per day (at 0000 and 1200 UTC) from 9-10 September.
As described in Section 2.2, the SSS measures the overall similarity (from 0 to 1) between the observed and predicted rainfall patterns at the gauge sites over the Philippines (N = 56). In Figure 5d, the SSS computed for the patterns of 48 h rainfall accumulation (for the whole event) was about 0.35 in the first two runs and gradually increased to about 0.7 in the last 2-3 runs (red curve), despite some fluctuation among the members. For the smaller domain of Luzon (see Figure 1b), where most of the significant rainfall (say, ≥50 mm) was received (Figure 4q), the SSS values in Figure 5d are slightly higher (blue dashed), roughly by 0.02-0.05. Among all runs, the highest score was achieved by the one with t 0 at 1800 UTC 13 September, with SSS = 0.73 over the entire Philippines (big domain) and SSS = 0.76 for Luzon only (small domain).

Track Forecast and Examples of Rainfall Structure
The second case is Koppu that struck the Philippines in the middle of October in 2015. This typhoon made landfall near 2300 UTC 17 October and penetrated the middle section of Luzon, before turning north over the ocean along its northwestern coast (Figure 1a). A total of 19 hindcast runs were made for Koppu, every 6 h from 0000 UTC 13 to 1200 UTC 17 October (Figure 6a). Compared to Mangkhut, Koppu moved much slower at less than 15 km h −1 before landfall, and even only about 6 km h −1 off the northwestern coast of Luzon on 19 October after the northward turn (Figures 1a and 6a). At longer lead times, the tracks of Koppu in the predictions from 13-14 October also gradually developed a northward bias during its approach, so that the model TCs made landfall at the northeastern part of Luzon instead (Figure 6a), with track errors of 150-270 km (Figure 6b), and then turned north afterwards. With time, the track errors gradually reduced and improved considerably after 0600 UTC 15 October to all become within 75 km near landfall. However, the northward turn still occurred too early in the model, over land instead of the nearby ocean ( Figure 6a). Only until the runs at 0600 and 1200 UTC 17 October did the TC move offshore (with track errors within 60 km during landfall), but only briefly. Thus, although the track errors were not too large and also reduced with time, the slow motion of Koppu off northwestern Luzon on 19-20 October was not well captured by any of the hindcasts, perhaps not too surprising as the last run was at 1200 UTC 17 October.
Atmosphere 2022, 13, x FOR PEER REVIEW 13 of 33 Similar to Figure 3, the model-predicted rainfall structure of Koppu is compared with the TRMM retrievals in Figure 7, at three different times by the run initialized at 0000 UTC 16 October 2015. The track error in this particular hindcast remained quite small (less than ~65 km) up to landfall, but increased rapidly afterward as the model TC moved north over land (Figure 6b) as mentioned. Therefore, the track was not among the best. Nevertheless, Figure 7 shows that the model was able to reproduce the change in rainfall structure of Koppu, from a more symmetric structure at around 2300 UTC 16 October before landfall (Figure 7a,d) to an more asymmetric pattern on late 17 and early 18 October after landfall (Figure 7b,e), with most of the clouds on the western side of the storm and intense rainfall over Luzon near 15°-16° N. Even near 1100 UTC 19 October when Koppu's center moved offshore and a primary rainband developed and extended northward (Figure 7c), this highly asymmetric rainfall structure was captured quite well, with the rainband at almost the correct location (Figure 7f) despite the track error.

Model QPFs for Typhoon Koppu (2015)
Due to the slow motion and long impact duration of Koppu, three 24 h target periods are selected for this case to examine the QPFs in Figure 8. Starting and ending at 1200 UTC, the first one is over 16-17 October (top row, Figure 8a-f). During this first period, Koppu's center was still over the ocean east of Luzon and thus the rain gauges over land did not record too much rainfall, which peaked at 109.6 mm at Virac (Figure 8f). In the QPF for this target period by the first run (t0 = 0000 UTC 13 October), on day 4.5 (halfway between days 4 and 5), the model rainfall (Figure 8a) was maximized (along the eyewall) south of the track that had a northward bias. As the lead time decreased, the northward bias also reduced and the TC rainfall area gradually shifted southward (Figure 8b-e). At the short range, the model was producing rainfall along the eastern coast of northern Luzon (Figure 8c-e), where there is a lack of rain gauges (Figure 1b).

Model QPFs for Typhoon Koppu (2015)
Due to the slow motion and long impact duration of Koppu, three 24 h target periods are selected for this case to examine the QPFs in Figure 8. Starting and ending at 1200 UTC, the first one is over 16-17 October (top row, Figure 8a-f). During this first period, Koppu's center was still over the ocean east of Luzon and thus the rain gauges over land did not record too much rainfall, which peaked at 109.6 mm at Virac (Figure 8f). In the QPF for this target period by the first run (t 0 = 0000 UTC 13 October), on day 4.5 (halfway between days 4 and 5), the model rainfall (Figure 8a) was maximized (along the eyewall) south of the track that had a northward bias. As the lead time decreased, the northward bias also reduced and the TC rainfall area gradually shifted southward (Figure 8b-e). At the short range, the model was producing rainfall along the eastern coast of northern Luzon (Figure 8c-e), where there is a lack of rain gauges (Figure 1b). During the next 24 h target period from 1200 UTC 17 to 1200 UTC 18 October, Koppu continued to approach, made landfall, penetrated the middle section of Luzon, and then exited the land to move offshore (Figure 1a). Therefore, rainfall was the most along the middle section of Luzon, with a peak amount of 188.8 mm again at Baguio (second highest at Casiguran), and it decreased toward both the north and south (Figure 8l). From the longest (day 5) to the shortest range (day 1), the model rainfall area gradually shifted south as the northward track bias reduced (Figure 8g-k), with ample rainfall in central Luzon (mainly along the two mountain ranges) at lead times within day 4. However, within day 3, the model QPFs also appear to have too much rainfall in southern Luzon, in  Only (x) has its own scale.
During the next 24 h target period from 1200 UTC 17 to 1200 UTC 18 October, Koppu continued to approach, made landfall, penetrated the middle section of Luzon, and then exited the land to move offshore (Figure 1a). Therefore, rainfall was the most along the middle section of Luzon, with a peak amount of 188.8 mm again at Baguio (second highest at Casiguran), and it decreased toward both the north and south (Figure 8l). From the longest (day 5) to the shortest range (day 1), the model rainfall area gradually shifted south as the northward track bias reduced (Figure 8g-k), with ample rainfall in central Luzon (mainly along the two mountain ranges) at lead times within day 4. However, within day 3, the model QPFs also appear to have too much rainfall in southern Luzon, in the area north of Manila (Figure 8i-k), most likely because of the reduced translation speed associated with the track errors during landfall as mentioned earlier.
During the third (and last) 24 h period from 1200 UTC 18 to 1200 UTC 19 October, Koppu was moving northward slowly offshore of northwestern Luzon, so that the intense and asymmetric rainband seen in Figure 7c produced heavy rainfall at Baguio, with a peak 24 h amount reaching 502.3 mm, and also at Dagupan (176 mm) facing the Lingayen Gulf (Figures 2b and 8r). This maximum of over 500 mm was much higher than those combined in the 48 h earlier. At ranges within day 4, the model is seen to be able to produce comparable amounts of rainfall (reaching 600 mm) in the nearby region (Figure 8o-q), because the asymmetric, northward extending rainband seen in Figure 7f could be captured reasonably well. The precision in heavy-rainfall location also improved at shorter lead times. On day 5 (t 0 = 1200 UTC 14 October), the QPF also appeared reasonably good in quality, but the rainfall amounts seemed over-predicted (Figure 8n). Starting at 1200 UTC 13 October, the QPF on day 6 did not extend its heavy rainfall area down to the Baguio area and was not as ideal (Figure 8m), since Koppu made landfall at the northeastern corner of Luzon in this run at such a long range (Figure 6a). In [85], a single forecast run initialized at 1200 UTC 17 October (same as the last t 0 in our ensemble) using the Typhoon Weather Research and Forecasting (TWRF) model of the Central Weather Bureau (CWB) was examined for Koppu, but the heavy rainfall missed the Baguio region because the track was slightly too far south. In addition, the model QPFs was only examined through visual comparison with the observations, and no objective measure was used.
When the accumulation period is the combined 72 h from 1200 UTC 16 to 1200 UTC 19 October 2015, again the observed rainfall was the most in central Luzon, with a peak total of 695.3 mm at Baguio (Figure 8w) while the TRMM data indicated a maximum less than 600 mm in 132 h (Figure 8x). The predicted rainfall distributions for the 72 h period by the four runs initialized at 1200 UTC on each day from 13-16 October, thus at ranges from days 4-6 to days 1-3 (longer to shorter), are depicted in Figure 8s-v, respectively. One can see that, with the gradual reduction in track errors with time, the rainfall in the middle section of Luzon near Baguio as well as along the two mountain ranges of CCM and SMM became more concentrated (Figure 8s-v). Additionally, the rainfall extending northward from the Lingayen Gulf and caused by the primary rainband on 18-19 October became more distinct and clear with time. Overall, the comparison in Figure 8 indicates that the rainfall of Koppu was predicted reasonably well within the range of about 3-4 days, especially for rainfall toward the last day of the 72 h period, when the primary rainband could be captured at nearly the right location despite some biases in the TC track.

Categorical Skill Scores and Rainfall Similarity
The results of 24 h and 72 h QPFs for Koppu in categorical skill scores, verified against the rain gauges, are shown in Figure 9. For the first 24 h period starting from 1200 UTC 16 October, meaningful results in Figure 9a (with a possibility of TS > 0) are confined to 100 mm (per 24 h) and below, since the observed peak amount was only 109.6 mm ( Figure 8f). As seen, QPFs for this period have TS values about 0.3-0.65 at 0.05 mm and 0.15-0.5 at both 10 and 50 mm (Figure 9a). At the threshold of 100 mm, none of the hindcasts initialized before and at 1200 UTC 16 October produced a TS above zero. In terms of the BS, the data points at 0.05 mm are between about 0.7 and 1.3, and between about 0.4 and 1.8 at 10 mm (Figure 9a), and thus are quite reasonable. However, at 50 mm their values are larger at about 1-3.7 and suggested some over-prediction, most likely near the northeastern coast of Luzon (see first row of Figure 8). For the second 24 h period starting from 1200 UTC 17 October, the observed peak rainfall increased to 188.8 mm as the storm approached and made landfall, but still meaningful results are restricted to below the threshold of 200 mm. For this period (Figure 9b), the TSs are roughly over 0.5-0.7 at 0.05 mm, 0.4-0.6 at 10 mm, 0.2-0.33 at 50 mm, and 0.1-0.3 at 100 mm, respectively. While the TSs improved compared to those for the first 24 h period, the over-prediction seemed to be more serious as the BSs of all points are above 1 and up to about 5 in Figure 9b. In addition, the BSs tend to be larger toward the higher thresholds. This increased over-forecasting on day 2 when Koppu made landfall and the rainfall over Luzon increased is similar to the situation shown in Figure 5a,b for Mangkhut.
For the third and final 24 h period starting from 1200 UTC 18 October (Figure 9c), the observed peak rainfall amount was the most among the three days and reached 502. 3

(d) 72-h QPFs for 10/16 12Z-19 12Z
Thresholds (mm):  For the second 24 h period starting from 1200 UTC 17 October, the observed peak rainfall increased to 188.8 mm as the storm approached and made landfall, but still meaningful results are restricted to below the threshold of 200 mm. For this period (Figure 9b), the TSs are roughly over 0.5-0.7 at 0.05 mm, 0.4-0.6 at 10 mm, 0.2-0.33 at 50 mm, and 0.1-0.3 at 100 mm, respectively. While the TSs improved compared to those for the first 24 h period, the over-prediction seemed to be more serious as the BSs of all points are above 1 and up to about 5 in Figure 9b. In addition, the BSs tend to be larger toward the higher thresholds. This increased over-forecasting on day 2 when Koppu made landfall and the rainfall over Luzon increased is similar to the situation shown in Figure 5a,b for Mangkhut.
For the third and final 24 h period starting from 1200 UTC 18 October (Figure 9c), the observed peak rainfall amount was the most among the three days and reached 502.3 mm, so the results in categorical statistics can be examined at higher thresholds. At the lowest threshold of 0.05 mm (per 24 h), the TS values were about 0.45-0.7 and BSs were about 0.7 to 1.1, thus suggesting good performance in rainfall areas over the Philippines. At 10 and 50 mm, most of the TSs were lowered to about 0.3-0.6 and further to 0.2-0.4, while the BSs increased to about 0.8-1.6 and further to 1-2.5 (Figure 9c). Toward the higher thresholds, the TSs were 0.1-0.5 at 100 mm with BSs roughly between 1.3 and 4.5, and the highest TS also reached 0.5 at 200 mm (t 0 = 1200 UTC 15 and 1800 UTC 16 October), where the BSs were about 2 to 5. Thus, over-prediction also occurred toward the higher thresholds, but overall it appeared somewhat less serious compared to the second 24 h period. Finally, eight experiments, of which the first was initialized at 0600 UTC 14 and the last at 0000 UTC 17 October, registered a TS = POD = SR = BS = 1.0 at 350 mm (overlapping points). As eight out of 19 runs achieved this and the range was 102-126 h for the earliest member (beyond day 5), this result was quite good and very encouraging (Figure 9c).
For the three-day total rainfall during Koppu from 1200 UTC 16 to 1200 UTC 19 October 2015, the categorical statistics of QPFs from the time-lagged ensemble are depicted in Figure 9d. With an observed peak amount of 693.5 mm at Baguio (Figure 8w (Figure 9d). At the higher threshold of 350 mm, all runs also produced TSs of 0.2-0.35 except for three earlier ones with t 0 before 0600 UTC 14 October. Even at 500 mm, eight out of the last nine members starting from 1200 UTC 14 October also yielded TSs of either 0.5 (three runs) or 1.0 (five runs), except the one at 1800 UTC 15 October. Again, the TS values tend to rise when the period for accumulation is lengthened, because now the events (rainfall reaching specific thresholds) are more likely to occur (in both the observation and prediction) to increase the chance of hits, and some timing errors in rainfall are also tolerated. Similar to the results of 24 h periods, the BS values of the 72 h QPFs indicated some over-forecasting since the majority of the points are above BS = 1 rather than below (Figure 9d). For those with BS < 1, they were all produced by runs at and before 1200 UTC 14 October, so that larger northward track biases existed. Again, the BS values tended to increase with rainfall thresholds in Figure 9d, from about 1-1.2 at 0.05 mm (per 72 h) to roughly 3-5 at 350 mm and 2-4 at 500 mm. Correspondingly, for such points with 0 < TS < 1 at high thresholds of 350 and 500 mm (16 occasions), they all have POD = 1 but SR < 1 (except for one). Overall, Figure 9d indicates an improved performance in QPFs for Koppu toward the high thresholds up to 500 mm (per 72 h). This is encouraging because even for the last run (t 0 = 1200 UTC 16 October), most of the rainfall produced by the northward-extending rainband near Baguio and the Lingayen Gulf took place on day 3, i.e., 48-72 h into the hindcast experiment. Therefore, the CReSS model at ∆x = 2.5 km is able to reproduce realistic rainfall for TY Koppu (2015) at lead times beyond 48 h, allowing for timely preparation by the authorities before the main rainfall period of the TC.
For TY Koppu (2015), the results of SSS for the 72 h rainfall (from 1200 UTC 16 to 1200 UTC 19 October) are shown in Figure 10a. For the entire Philippine Archipelago, the SSS was less ideal between 0600 UTC 13 and 0000 UTC 14 October, but mostly at least around 0.7 afterwards (red curve), again with some run-to-run variations. It is noteworthy that the earliest run, with t 0 at 0000 UTC 13 October, also registered an SSS of 0.66 and comparable to some of the later runs executed from [14][15][16] October. This result is encouraging considering that the target period is in the range of 84-156 h. For the small domain of Luzon, the SSS values (blue curve) are mostly also slightly better, by~0.05 at most. Among all runs, the highest SSS is close to 0.8 by the last run with t 0 at 1200 UTC 16 October (and thus could fully cover the 72 h target period), whereas the lowest one is about 0.4 (with t 0 at 0600 UTC 13 October). Compared to those for Mangkhut (see Figure 5d), the SSS values for Koppu in Figure 10a appear better to some extent, in agreement with the better results in Figure 9.  (2015), as functions of initial time for the big and small domains (see Figure 1b).

Track Forecast and Examples of Rainfall Structure
The third and final typhoon is Melor (2015), for which a total of 28 hindcast runs were carried out every 6 h from 0000 UTC 9 to 1800 UTC 15 December. As shown in Figure 1a, TY Melor moved westward through the central part of the Philippines, and across the northern end of Samar, the far end of southeastern Luzon, and south of Luzon on 14 December, then the northeastern shore of Mindoro during early 15 December, and finally moved into the ocean west of Manila. In the hindcasts made during the first three days of 9-11 December, the model TCs also all developed a northward bias that tended to be the most serious and about 300-350 km too north around 13 December (Figure 11a). While these TCs would turn south again to make landfall in the central Philippines, the timing was too late for some of them, so the track errors remained quite large (Figure 11b). From 12 December, on the other hand, the tracks started to improve considerably and all those executed at and after 0600 UTC 12 December had track errors within about 75 km at or near the first landfall at Samar, with a lead time up to about 45 h (Figure 11b). The moving speeds of TCs in these later runs were also roughly correct. With time, however, several of them still had some northward bias and the predicted the TC moved toward the Manila area instead of Mindoro (such as those initialized at 0000 UTC 12 and 0600-1800 UTC 13 December), while a few others exhibited slight southward bias near Mindoro ( Figure 11). Thus, using the NCEP GFS gross analyses and forecasts as IC/BCs, the majority of the earlier hindcasts exhibited a northward bias for all three typhoons in our study. These runs were initialized at least about 2 days before landfall for Melor (Figure 11), 2.5 days for Koppu (Figure 6), and ≥3 days before landfall for Mangkhut (Figure 2). Similar characteristics using NCEP data are also found in some typhoons in Taiwan [58,59].

Track Forecast and Examples of Rainfall Structure
The third and final typhoon is Melor (2015), for which a total of 28 hindcast runs were carried out every 6 h from 0000 UTC 9 to 1800 UTC 15 December. As shown in Figure 1a, TY Melor moved westward through the central part of the Philippines, and across the northern end of Samar, the far end of southeastern Luzon, and south of Luzon on 14 December, then the northeastern shore of Mindoro during early 15 December, and finally moved into the ocean west of Manila. In the hindcasts made during the first three days of 9-11 December, the model TCs also all developed a northward bias that tended to be the most serious and about 300-350 km too north around 13 December (Figure 11a). While these TCs would turn south again to make landfall in the central Philippines, the timing was too late for some of them, so the track errors remained quite large (Figure 11b). From 12 December, on the other hand, the tracks started to improve considerably and all those executed at and after 0600 UTC 12 December had track errors within about 75 km at or near the first landfall at Samar, with a lead time up to about 45 h (Figure 11b). The moving speeds of TCs in these later runs were also roughly correct. With time, however, several of them still had some northward bias and the predicted the TC moved toward the Manila area instead of Mindoro (such as those initialized at 0000 UTC 12 and 0600-1800 UTC 13 December), while a few others exhibited slight southward bias near Mindoro ( Figure 11). Thus, using the NCEP GFS gross analyses and forecasts as IC/BCs, the majority of the earlier hindcasts exhibited a northward bias for all three typhoons in our study. These runs were initialized at least about 2 days before landfall for Melor (Figure 11), 2.5 days for Koppu (Figure 6), and ≥3 days before landfall for Mangkhut (Figure 2). Similar characteristics using NCEP data are also found in some typhoons in Taiwan [58,59].
In Figure 12, the rainfall structure of Melor in the model run at t 0 = 1800 UTC 12 December is compared with the TRMM retrievals at two different times. This particular run had small track errors (≤75 km) throughout the landfall period. Before landfall at Samar, at around 1900 UTC 13 December, the model is shown to capture the correct eye size of Melor and the rainfall in the northern and eastern quadrants of the storm (Figure 12a,b). Similarly, shortly after landfall when Melor's center was right between Luzon and Mindoro at 1000 UTC 15 December, the intense rainfall near northwestern Mindoro was also nicely captured by the model (Figure 12c,d In Figure 12, the rainfall structure of Melor in the model run at t0 = 1800 UTC 12 December is compared with the TRMM retrievals at two different times. This particular run had small track errors (≤75 km) throughout the landfall period. Before landfall at Samar, at around 1900 UTC 13 December, the model is shown to capture the correct eye size of Melor and the rainfall in the northern and eastern quadrants of the storm (Figure 12a,b). Similarly, shortly after landfall when Melor's center was right between Luzon and Mindoro at 1000 UTC 15 December, the intense rainfall near northwestern Mindoro was also nicely captured by the model (Figure 12c,d

Model QPFs for Typhoon Melor (2015)
As TY Melor (2015) moved across many islands in the central Philippines and slowed down from 15-16 December, its impact period was relatively long. There three 24 h target periods covering 14-16 December were chosen as well for this e (rows 1-3 of Figure 13, respectively). On the first day of 14 December, significant rai occurred in northern Samar, the southeastern end of Luzon, and in the Manila area, a peak amount of 169.6 mm (Figure 13f). The model predictions, selected to be all f 0000 UTC of earlier days, show better QPFs over the central Philippines only within range of day 2 and not before (Figure 13a-

Model QPFs for Typhoon Melor (2015)
As TY Melor (2015) moved across many islands in the central Philippines and also slowed down from 15-16 December, its impact period was relatively long. Therefore, three 24 h target periods covering 14-16 December were chosen as well for this event (rows 1-3 of Figure 13, respectively). On the first day of 14 December, significant rainfall occurred in northern Samar, the southeastern end of Luzon, and in the Manila area, with a peak amount of 169.6 mm (Figure 13f). The model predictions, selected to be all from 0000 UTC of earlier days, show better QPFs over the central Philippines only within the range of day 2 and not before (Figure 13a-e), but none of the runs captured the significant but localized rainfall near Manila. Due to the northward track bias, the rainfall in the central Philippines was missed by earlier runs at ranges beyond day 3 (Figure 13a,b). Similarly, the rainfall on day 3 (t 0 = 0000 UTC 12 December, Figure 13c) also just missed the northern part of Samar.

Categorical Skill Scores and Rainfall Similarity
The results of categorical scores of 24 h and 72 h QPFs for Melor are shown in Figure  14. On the first day of 14 December, the observed peak rainfall was 169.6 mm, and many of the 24 h QPFs for this day exhibited TS values of about 0.4-0.7 at the threshold of 0.05 mm (per 24 h), 0.3-0.7 at 10 mm, and 0.05-0.4 at 50 mm (Figure 14a), respectively. At the highest threshold of 100 mm (per 24 h) where the TSs could be above zero, the values were over a wide range from about 0.1 all the way up to 0. 8  On the second day of 15 December, the rain over the eastern part of the central Philippines had subsided, while northern Mindoro and southwestern Luzon (near Manila) had more rain instead, with a peak of 209.2 mm at Calapan to the north of the High Rolling Mountains (Figures 2b and 13l). The handful of hindcast experiments selected (Figure 13g-k) showed fairly different rainfall distributions among them, but all captured the rainfall in Mindoro within the range of 3 days despite their differences (Figure 13i-k). On day 4 (t 0 = 0000 UTC 12 December), this run was still producing rainfall over the central Philippines linked to a slower TC motion (Figure 13h). In the run on day 6 (t 0 on 10 December), the storm was weakening and still east of Luzon without making landfall, thus the model rainfall was mainly along the northeastern and northern coast of Luzon (Figure 13g).
On the third day of 16 December, Melor was lingering offshore to the west of the Manila area, and most rainfall was observed along the eastern shore of Luzon with a peak daily value of 273.8 mm at Baler (Figures 1b and 13r). The hindcasts at ranges from day 7 to day 2 again show very different rainfall distributions as somewhat expected due to the track differences (Figure 13m-q). At the shortest range selected to show, the rainfall along the coast of eastern Luzon was captured on day 2 (Figure 13q), but to a lesser and lesser degree at longer ranges on days 3-4 (Figure 13o,p) where the differences in TC track also seemed apparent. The day-5 hindcast for 16 December (Figure 13n) is the same run to produce day 4 QPF for 15 December (Figure 13h), and the rain was still in the central Philippines due to slow TC motion. Likewise, the day 7 QPF for 16 December is an extension from day 6 QPF for 15 December (Figure 13g), and thus the rainfall decreased as the TC continued to diminish off the coast of eastern Luzon (Figure 13m). Therefore, while the QPFs for fixed target dates show different characteristics for Melor in Figure 13, they are largely dictated by the TC track in the model.
For the combined three-day period of 14-16 December, the observed rainfall from Melor was the most in eastern Luzon, the Manila area, northern Samar, and northern Mindoro, with a peak amount reaching 407 mm at Baler (Figure 13w). Compared to raingauge observations, the TRMM retrieval indicated more rainfall, reaching 500-600 mm in eastern Luzon and almost 300 mm in northern Mindoro, but only about 200 mm or less in Manila and northern Samar (Figure 13x). The predicted 72 h rainfall within the short range of days 1-3 (Figure 13v) was reasonable and close to gauge observations at Samar, Mindoro, and eastern Luzon, but not enough near Manila. On days 2-4, the hindcast from 0000 UTC 13 December (Figure 13u) made good 72 h QPFs, with much rainfall (~500 mm) over Manila and southern Luzon, and was perhaps the most ideal among the four runs selected for visual comparison. At the longer range of days 3-5, due to the slow TC motion in this run from 12 December as mentioned, the rainfall in the central Philippines had not extended westward into the Manila area and Mindoro (Figure 13t). At an even longer range, again, the 72 h rainfall only reached eastern Luzon associated with the erroneous track (Figure 13s). In summary, the QPFs were reasonable and captured most of the heavy rainfall regions along the track of Melor through the archipelagos in the central Philippines up to the range of days 2-4, but a rainy scenario in southern Luzon was delayed at longer lead times due to the track errors linked to slow TC motion in the model.

Categorical Skill Scores and Rainfall Similarity
The results of categorical scores of 24 h and 72 h QPFs for Melor are shown in Figure 14.
On the first day of 14 December, the observed peak rainfall was 169.6 mm, and many of the 24 h QPFs for this day exhibited TS values of about 0.4-0.7 at the threshold of 0.05 mm (per 24 h), 0.3-0.7 at 10 mm, and 0.05-0.4 at 50 mm (Figure 14a), respectively. At the highest threshold of 100 mm (per 24 h) where the TSs could be above zero, the values were over a wide range from about 0.1 all the way up to 0.8. However, all eight such QPFs (with TS > 0 at 100 mm) in Figure 14a were from model runs at and after 0600 UTC 12 December, i.e., those with reduced northward track bias (cf. Figure 11a) and inside the range of 66 h. For this day, the BS values are mostly between 0.4 and 2.0 and quite reasonable, although again some under-prediction tended to occur at low thresholds with slight over-prediction toward the higher thresholds. Atmosphere 2022, 13, x FOR PEER REVIEW 24 of 33 The peak 24 h daily rainfall increased slightly to 209.2 mm (in northern Mindoro) on 15 December (cf. Figure 13l). The QPFs for this day had TS values of roughly 0.5-0.75 at 0.05 mm (per 24 h) but over wide ranges of about 0.2-0.8 at 10 mm and 0.1-0.65 at both 50 and 100 mm, respectively (Figure 14b). At the threshold of 200 mm, which is very close to the observed maximum, a few runs registered a TS above zero, including those initialized at 0600 (TS = 0.17) and 1800 UTC on 12 (TS = 0.33) and at 0000 UTC on 14 December (TS = 1). Many of the hindcasts also produced higher TS values for 15 December compared to those for the day before (Figure 14a,b). Overall, the BS values from the QPFs for 15 December indicated slight under-prediction, as there were more points with BS below 1 than above, except at the higher thresholds of 200 and 350 mm (Figure 14b).
For the third day of 16 December, when the observed peak amount increased again to 273. 8  The peak 24 h daily rainfall increased slightly to 209.2 mm (in northern Mindoro) on 15 December (cf. Figure 13l). The QPFs for this day had TS values of roughly 0.5-0.75 at 0.05 mm (per 24 h) but over wide ranges of about 0.2-0.8 at 10 mm and 0.1-0.65 at both 50 and 100 mm, respectively (Figure 14b). At the threshold of 200 mm, which is very close to the observed maximum, a few runs registered a TS above zero, including those initialized at 0600 (TS = 0.17) and 1800 UTC on 12 (TS = 0.33) and at 0000 UTC on 14 December (TS = 1). Many of the hindcasts also produced higher TS values for 15 December compared to those for the day before (Figure 14a,b). Overall, the BS values from the QPFs for 15 December indicated slight under-prediction, as there were more points with BS below 1 than above, except at the higher thresholds of 200 and 350 mm (Figure 14b).
For the third day of 16 December, when the observed peak amount increased again to 273.8 mm, the TS values were about 0.5-0.7 at 0.05 mm (per 24 h), 0.15-0.65 at 10 mm, 0.05-0.5 at 50 mm, and 0.1-0.5 at 100 mm, respectively (Figure 14c). At 200 mm, again, a few hindcasts were able to produce hits and TS values above 0, including those initialized at times from 1800 UTC 12 to 1800 UTC 13 December (TS = 0.28-0.5), and at 0000 (TS = 0.25) and 1200 UTC 15 December (TS = 0.2). Distributed quite evenly between about 0.25 and 3.5, the BS values suggested some under-prediction at 50 mm and some over-prediction at 100 mm, but little preference elsewhere (Figure 14c).
For the 3-day total rainfall over 14-16 December during Melor, the observed peak amount reached 407 mm at Baler and the categorical score results are shown in Figure 14d. When the accumulation period is lengthened to 72 h, the QPFs yielded TS values about 0.7-0.9 at the lowest threshold of 0.05 mm (per 72 h), 0.55-0.75 at 10 mm, 0.2-0.65 at 50 mm, and 0.1-0.6 at 100 mm, respectively. At 200 mm, all but three runs yielded TS > 0. The highest value was 0.65 by the run with t 0 at 1800 UTC 12 December, while 10 runs produced TSs of 0.2-0.44, and seven others (between 1800 UTC 9 and 0000 UTC 12 December) had TSs below 0.2 (Figure 14d). At 350 mm, which is not much below 407 mm, the three hindcasts initialized from 0000-1200 UTC 13 December also produced non-zero TSs of 0.09-0.33. In terms of BS, many data points indicated slight under-forecasting for Melor, more serious with BS around 0.2-0.6 at the thresholds of 100 and 200 mm. Over-forecasting only occurred at 350 mm, but at such a high threshold that the statistics were based on very few data points ( Figure 14d). Thus, over the accumulation period for the entire duration of the three typhoons, Melor was the only case for which the time-lagged ensemble produced mostly under-prediction in QPFs. For both Mangkhut ( Figure 5c) and Koppu (Figure 9d), over-prediction occurred, especially toward the higher thresholds.
The values of SSS from the lagged members for the total 3-day rainfall from Melor (14-16 December) are shown in Figure 10b. One can see that higher scores of ≥0.65 were achieved by the first five and last eight runs, while the middle eight runs executed mainly from 10-11 December had worse SSS values near and below 0.4 (Figure 10b), evidently due to the northward TC track biases in those runs (see Figure 11). Therefore, the first several runs also had good SSS values comparable to the runs at shorter ranges after 0000 UTC 12 December, and thus indicate the potential to produce decent QPFs at longer lead times. For the big domain, the best SSS (=0.85) was made by the run at 1800 UTC 12 December, consistent with the best TS of 0.65 at 200 mm shown in Figure 14d above, while a few other runs also had SSS near or above 0.8. Over the small domain, which for Melor is slightly larger and extends down to include Mindoro (see Figure 1b), the SSS values also tend to be higher (except for some runs with larger track errors), and the highest value is SSS = 0.91 and very impressive (Figure 10b). Overall, the SSS values of the three typhoons are in agreement with the verifications using a categorical matrix.

Discussion
In this section, the QPF performance for the three typhoons is further discussed. Since there is a lack of studies utilizing categorical statistics or other objective methods to verify model QPFs for typhoons in the Philippines in the open literature, in this section our results are compared with those for selected typhoons in Taiwan in earlier studies at the same thresholds and similar ranges. Such a comparison is made in Table 3, sorted by the observed peak rainfall amounts. While the typhoons in Taiwan are chosen somewhat arbitrarily and the best TSs are used, one can see that the TSs of the QPFs at the short range (within 72 h) exhibit a clear tendency of higher scores in larger rainfall events as demonstrated before [41][42][43][44][45], although this phenomenon is certainly not exclusive because the predictability varies from case to case. Owing to this dependency property, it would not be fair to compare the TSs for very rainy TCs in Taiwan, such as those ranked near the top [41,45,64], with those for the three typhoons in the Philippines. As such, when the TC events in Taiwan with a peak 24 h amount of about 500-600 mm or less are used for comparison, perhaps below Kong-Rey (2013), one can see that the TSs for the typhoon cases at the short range [41,45,55] in the two regions are roughly comparable (Table 3, top half). At forecast ranges beyond 72 h (up to 8 days), the situation is similar and the scores are comparable. As shown in Section 4.3, quite a number of runs were able to predict a rainfall amount of ≥350 mm at Baguio (and not at any other site) from Koppu over the third 24 h period selected (Figure 9c), and thus had a perfect TS of 1.0, even at ranges beyond 72 h (the earliest one with t 0 at 0600 UTC 14 October). One should also be aware that such high scores are more likely to occur in the Philippines due to the much fewer total verification points available (N = 56), such that very few points can reach the high thresholds, compared to Taiwan where the gauge sites operated by the CWB are around 450. Table 3. Comparison of TS values of 24 h QPFs by the CReSS model between selected typhoons in Taiwan and the three TCs in the Philippines in this work, at six rainfall thresholds from 50 to 750 mm. Results are grouped into those within the short range (when the target period is inside 72 h from t 0 , top half) or beyond (>72 h, bottom half), and sorted by the observed peak rainfall amount (mm) from high to low. For each typhoon, the best TS values are listed from all available runs in the ensemble (if applicable). A "-" indicates threshold exceeding the peak amount, and the letter "F" in the source column represents a figure in this study. For comparison, the best TSs for the three typhoons in the Philippines for the entire event using accumulation length of 48 or 72 h are shown in Table 4, where the runs with a t 0 up to 48 h ahead of the starting time of the accumulation period are considered as at short range, such that the first 24 h of the target period is also within 72 h, to make a fair comparison with Table 3. As one can see, when the accumulation period is lengthened and the peak rainfall increased, the TSs also tend to improve and the non-zero values may reach a higher threshold, at 500 mm for Koppu, for example. For entire TC rainfall events, it is encouraging to see that the cloud-resolving models can produce fairly good TS values at high thresholds ≥200 mm prior to the event, even at lead times beyond the short range. In Table 5, the SSS values for selected typhoons in Taiwan available in earlier studies [59,60,83] are compared with those for the three TCs in the Philippines as well, since SSS can reflect the overall quality of the QPF quantitatively. For some TCs in Taiwan such as Soulik and Soudelor [59], the SSS can be consistently high (say, ≥0.85) inside the short range when the track errors are small enough. Such high SSS values are likely because the rainfall in Taiwan can reach higher amounts and be more concentrated due to its higher topography and smaller size, thus the larger denominator in the second RHS term in Equation (5). However, when track errors are too large in these selected and rainy cases, the SSS values can also be very low and near zero, when the abundant rainfall in reality is almost completely missed in the model (i.e., when predicted TCs move too far from Taiwan). Therefore, although the SSS values for the three TCs in the Philippines might not reach very high values like some of the cases in Taiwan did within the short range, a near complete miss (with a very low SSS) also seems less likely to happen. Beyond the short range, however, there was a fair chance (about 33%) for the QPFs to be of decent quality (with SSS reaching 0.65) for both Koppu and Melor, although the rainfall of Mangkhut was less predictable at such a longer range, obviously linked to the northward track biases. This likelihood of 33% is in fact slightly higher than the probabilities for some of the more rainy TCs in Taiwan, when t 0 is more than 48 h prior to the starting time of the target accumulation period (Table 5), and therefore is certainly quite encouraging. Table 5. Similar to Table 3, but comparing the SSS values of QPFs over accumulation periods for entire events of selected typhoons in Taiwan and the Philippines, for runs with a t 0 within (≤) or beyond (>) 48 h prior to the starting time of accumulation (as used in Table 4). In each lead-time category, the total number of runs available, the range of SSS (computed over the entire Philippines), and the number of runs with an SSS ≥ 0.65 (percentage in parentheses) are listed.

Region/ Typhoon
Obs. Peak Amount (mm) Length of Accum. Overall, Tables 3-5 indicate that the quality of the QPFs by the 2.5 km CReSS using the time-lagged approach for the three typhoons of Mangkhut, Koppu, and Melor are fairly good and comparable to that for TCs in Taiwan. Targeted for the most-rainy 24 h in each event, the highest TS can reach 1.0 at 200 mm for Mangkhut, also 1.0 at 350 mm for Koppu (by many runs, observed peak = 502 mm), and 0.25 at 200 mm for Melor (observed peak = 274 mm). For the entire event over 48 or 72 h, TS = 1 can even be achieved at 500 mm for Koppu (in 33% of all runs, and TS = 0.33-0.5 in another 27% of runs; observed peak = 695 mm), while TS = 0.33 can be reached at 350 mm for Melor (observed peak = 407 mm). Beyond the short range, some earlier predictions with a longer lead time can also produce QPFs of good quality to provide a useful rainfall scenario for early preparation. Among such runs, 33% of them produced SSS ≥ 0.65 for both Koppu and Melor, but not for Mangkhut due to larger track biases. In terms of predictability, the rainfall of Koppu is the highest, followed by Melor, while that of Mangkhut appears to be the lowest. For Mangkhut, which was the least predictable, nevertheless, all 11 later runs still produced non-zero TSs at 200 mm for the entire event (48 h), and one run with t 0 at 1200 UTC 11 September did yield a TS of 0.33 at 250 mm (not plotted in Figure 5c), at the range of 60-108 h.

Conclusions
In this study, the 2.5 km CReSS is applied to three typhoons in the Philippines: Mangkhut (2018), Koppu (2015), and Melor (2015), using a time-lagged strategy in order to maximize the computational resource to produce rainfall forecasts at high resolution. Following the order, the three typhoons made landfall in northern Luzon, central Luzon, and the middle part of the Philippine Archipelago through southern Luzon and Mindoro, respectively, and the track predictions and QPFs for each are verified using rain-gauge data (56 stations) and categorical statistics at seven thresholds (0.05, 10, 50, 100, 200, 350, and 500 mm) herein. The QPF results are compared with those for selected TCs in Taiwan, since there is a lack of such objective verifications in the Philippines in the literature. The similarity skill score (SSS) that measures the overall similarity between the predicted and observed rainfall patterns is also used. The major findings can be summarized as the following.
Among the three TCs, the rainfall of Koppu (2015) that penetrated central Luzon was the most predictable, as the observed rainband and heavy rainfall near Baguio could be captured by many of the lagged runs in spite of some track errors. For the most-rainy 24 h (1200 UTC 18 to 1200 UTC 19 October, 502.3 mm at Baguio), 8 out of 19 total runs produced a perfect TS of 1.0 at 350 mm, including two (out of five) within the short range (target period inside 72 h from t 0 ) and six (out of 14) at longer lead times, among which the earliest run was at 0600 UTC 14 October (at t = 102-126 h). For the entire 72 h event (1200 UTC 16 to 1200 UTC 19 October, 695.3 mm at Baguio), eight of the last nine runs since 1200 UTC 14 yielded TS of 0.5 (three runs) or 1.0 (five runs) at 500 mm, and one run at longer lead times had TS = 0.33 at 500 mm. Using SSS ≥ 0.65 as an indication of good overall quality of QPFs, then, seven out of nine runs (78%) since 1200 UTC 14 October, and two out of six runs (33%) initialized before that time, reached this criterion. Overall, the ability of the system to capture heavy rainfall at high thresholds for Koppu is rather impressive, although some tendency of over-prediction existed.
The predictability of Melor (2015), which struck the central portion of the Philippine Archipelago, was in the middle of the three TCs. For the most-rainy day of 16 December (peak amount was 273.8 mm at Baler), TS values of 0.2-0.5 at 200 mm were produced in 7 of the 13 runs initialized between 1800 UTC 12 and 1800 UTC 15 December, after the reduction in some northward track biases that led to larger rainfall errors (in location and timing) in some lagged members in the middle. Consequently, for the 72 h total TC rainfall from Melor (14-16 December, peak amount was 407 mm at Baler), three runs executed from 0000-1200 UTC 13 December produced TSs of 0.09-0.33 at 350 mm, and all eight runs from 0600 UTC 12 (at shorter lead times) and the four earliest runs (before 0000 UTC 10 December, at longer lead times) yielded TS of 0.17-0.65 at 200 mm, but only 0.08 at most in between. In terms of SSS, eight out of last nine runs (89%) from 0000 UTC 12 December had an SSS ≥ 0.65, while four out of 12 (33%) at longer lead times did the same. The BS values indicated little preference in either over-or under-prediction for Melor.
For Mangkhut (2018) that made landfall in the northern part of Luzon and produced the highest peak rainfall among the three TCs, its rainfall seemingly had the lowest predictability. The earlier runs exhibited northward track bias, which prevented good QPFs beyond the short range. For the most-rainy 24 h (15 September, peak rainfall was 535.6 mm at Baguio), none of these runs had a TS above 0.2 at 100 mm. However, as the track error reduced with time, all seven runs within the short range had TSs of 0.29-0.75 at 100 mm, and two had TS = 0.5 or 1.0 at 200 mm. For the two-day total rainfall from Mangkhut (14-15 September, peak amount = 785.5 mm at Baguio), six out of nine runs from 0000 UTC 12 September produced a TS ≥ 0.2 at 200 mm, while three out of eight runs at longer lead times achieved the same. While three short-range runs and two longer-range ones also had TSs of 0.2-0.33 at 250 mm (not plotted), no run was able to produce hits at 350 mm (and thus TS = 0). Thus, the TS values for Mangkhut were slightly lower than those attainable for the other two TCs, although many runs inside the short range were able to produce rainfall amounts near 600 mm near Baguio (but not at the site). For the SSS, only 33% of runs from 0000 UTC 12 September produced an SSS ≥ 0.65, while none at the longer lead times could do the same. The overall lower SSS values are likely linked to the smaller area of significant rainfall in this case, especially in earlier runs with a TC too far north.
The comparison of results from categorical measures and the SSS with those for selected TCs in Taiwan, overall, indicates that the quality of the QPFs by the 2.5 km CReSS using the time-lagged approach for the three typhoons in the Philippines studied here was fairly good and comparable, especially for Koppu. Due to the limitation on length, only the tracks and QPFs using rain-gauge observations are verified in this study (Part I). In a follow-up study (Part II), verification of TC intensity, heavy-rainfall probability, and QPFs using recent satellite products will be reported to complement the present work.