Effective Use of Ensemble Numerical Weather Predictions in Taiwan by Means of a SOM-Based Cluster Analysis Technique

: Typhoon rainfall is one of the most important water resources in Taiwan. However, heavy rainfall during typhoons often leads to serious disasters. Therefore, accurate typhoon rainfall forecasts are always desired for water resources managers and disaster warning systems. In this study, the quantitative rainfall forecasts from an ensemble numerical weather prediction system in Taiwan are used. Furthermore, a novel strategy, which is based on the use of a self-organizing map (SOM) based cluster analysis technique, is proposed to integrate these ensemble forecasts. By means of the SOM-based cluster analysis technique, ensemble forecasts that have similar features are clustered. That is helpful for users to effectively combine these ensemble forecasts for providing better typhoon rainfall forecasts. To clearly demonstrate the advantage of the proposed strategy, actual application is conducted during ﬁve typhoon events. The results indicate that the ensemble rainfall forecasts from numerical weather prediction models are well categorized by the SOM-based cluster analysis technique. Moreover, the integrated typhoon rainfall forecasts resulting from the proposed strategy are more accurate when compared to those from the conventional method (i.e., the ensemble mean of all forecasts). In conclusion, the proposed strategy provides improved forecasts of typhoon rainfall. The improved quantitative rainfall forecasts are expected to be useful to support disaster warning systems as well as water resources management systems during typhoons.


Introduction
Taiwan is located in one of the main paths of Northwestern Pacific typhoons. From 1911 to 2016, a total of 363 typhoons affected Taiwan (counted by the Taiwan Central Weather Bureau). That is, about three to four typhoons make landfall in Taiwan in one year (mostly during June to October). During typhoons, heavy rainfall often causes various types of damages, such as floods, inundation, and landslides, and result in loss of life and property damage [1]. However, typhoon rainfall is also one of the most important water resources in Taiwan. On average, about 70% of the annual rainfall occurs between May and October [2,3], and most rainfall occurs during typhoons. Therefore, as a typhoon approaches Taiwan, the major goal is to take proper preventive measures, such as flood mitigation and early warnings. But when the typhoon leaves, the goal is changed to store sufficient water in reservoirs. Hence, accurate typhoon rainfall forecasts are always desired as essential information for water resources management and disaster warning systems in Taiwan. However, typhoon rainfall is difficult to forecast because of the high variability in space and time.
TTFRI-EPS, which is a collective effort among several academic institutes and government agencies, is an ensemble numerical weather prediction system in Taiwan [13]. TTFRI-EPS started in 2010. To date, more than 20 ensemble members have been established for future weather forecasting. These ensemble members are designed by using different numerical weather prediction models with different model configurations. The Weather Research and Forecasting (WRF) Model [38], the fifth-generation Pennsylvania State University-National Center for Atmospheric Research Mesoscale Model (MM5) [39], the Cloud Resolving Storm Simulator (CReSS) Model [40], and the Hurricane Weather Research and Forecasting (HWRF) Model [41] are used. The WRF, MM5, and HWRF use three nested domains with 45 km (the outermost), 15 km (the middle), and 5 km (the inner) horizontal resolutions. The outermost domain (i.e., the main domain) covers most of Asia and the western North Pacific Ocean, and the inner domain covers Taiwan, as well as the neighboring ocean. The CReSS uses only one main domain with 5 km horizontal resolution. This domain size is similar to the inner domain used in other models. In this study, the outputs of the inner domain (i.e., gridded data with 5 km horizontal resolution) are used. In vertical direction, a total of 45,43,35, and 40 vertical levels are used for WRF, HWRF, MM5, and CReSS models, respectively. As to the temporal resolution, these NWMs yield hourly weather predictions.
The initial condition perturbations mean the variations in the atmospheric first-guess states. Two strategies, the cold-start and the partial-cycle, are adopted. Cold-start means the initial conditions are obtained directly from the National Centers for Environmental Prediction Global Forecast System (NCEP-GFS). Partial-cycle means that 12-h before NCEP-GFS analysis data are used first and then two 6-h data assimilation cycles are performed to obtain the initial conditions at the analysis time. That is, different times from the NCEP-GFS analysis data are used for forecast initialization. Besides, the three-dimensional variational data assimilation system with two statistical background error covariance matrices (CV3 and CV5), and the outer loop procedure are used to process these GFS analysis data. Additionally, different physical parameterization schemes, such as cumulus schemes and microphysics schemes, are adopted for model perturbations. Regarding cumulus schemes, which represent sub-grid vertical fluxes and rainfall due to convective clouds, six schemes (the Grell-Devenyi [42], the Grell 3D [42], the Betts-Miller-Janjic [43], the Kain-Fritsch [44], the Grell [45], and the Simplified Arakawa and Schubert [46]) are adopted. As for microphysics schemes, which are used in the domain with horizontal resolution less than 5 km, four schemes (the Goddard [47], the WRF Single-Moment 5-class [48], the Cold rain [49], and the Ferrier [50]) are adopted. As last, three planetary boundary layer schemes (the Yonsei University [51], the Medium-Range Forecast nonlocal boundary layer [52], and the Mellor & Yamada [53]) are adopted. As to HWRF, the data provided from the National Centers for Environmental Prediction (NCEP) GFS are used. The aforementioned model configurations (summarized in Table 1) are designed based on the preliminary experiments in 2010. The detailed information of TTFRI-EPS and the ensemble members have been well introduced in the literature. Please refer to the researches made by authors [6,13,54,55] for more details about the TTFRI-EPS.
Nowadays, TTFRI-EPS operationally issued 24-, 48-, and 72-h typhoon track and rainfall forecasts four times per day (initialized at 00, 06, 12, and 18 Coordinated Universal Time (UTC)). An example of ensemble 24-h forecasts of typhoon track and rainfall issued at 18 UTC on 28 August during Typhoon Kong-Rey in 2013 is presented in Figure 1 (the model initial time is 12 UTC on 28 August). For typhoon track, gray and black lines display the ensemble forecasts and the simply mean of all ensemble forecasts, respectively. As to rainfall, the star mark indicates the location of the maximum 24-h typhoon rainfall forecast of each ensemble member. The corresponding value is also provided. As mentioned earlier, the ensemble members of TTFRI-EPS are designed by using different numerical weather prediction models with perturbations of the initial conditions and model configurations. Thus, the differences among the resulting ensemble forecasts in Figure 1 are obvious. Despite this, these ensemble forecasts still provide some useful information for hydrological modelling [13,20,55], such as the probable track of typhoon and the main rainfall area. Hence, it is expected that the ensemble forecasts from TTFRI-EPS have potential as valuable references for typhoon rainfall forecasting.

The Artificial Neural Network (ANN)-Based Integration Strategy
As mentioned in the previous section, the ensemble forecasts of typhoon rainfall from TTFRI-EPS are useful information. But they are not easy to use directly in hydrological modeling without any analysis or post-process due to the high variability in space and time. Hence, an ANN-based strategy is proposed herein to analyze and properly integrate these ensemble forecasts. ANN is a kind of data mining technology and is widely used as an information processing tool in various disciplines. Recently, ANNs have been applied to the integration of ensemble forecasts (e.g., [25,30,56]). Krasnopolsky and Lin [25] used ANNs for improving 24-h precipitation forecasts over the continental US. Their results indicated ANNs significantly reduce the high bias at low precipitation levels and the low bias at high precipitation levels. Kumar et al. [56] used ANNs to integrate the daily medium range (days 1-5) precipitation forecasts during monsoon season in India and indicated that their model has a higher skill than individual model forecasts and the simple ensemble mean in general. These researches inspired us to develop an ANN-based strategy for TTFRI-EPS. In this paper, the self-organizing map (SOM), which is a special class of ANNs and is powerful for data analysis and pattern recognition [57], is adopted. Studies have confirmed the potential of SOM in clustering and classification (e.g., [58][59][60]). Recently, SOM has also been used to process data from numerical meteorological models (e.g., [59,61]). In these studies, encouraging results have been achieved in their experiments. Hence, SOM is used herein to analyze the ensemble rainfall forecasts from TTFRI-EPS. Firstly, a SOM-based cluster analysis technique is presented. Then, based on the clustering results that are obtained by the SOM-based cluster analysis technique, a novel strategy is proposed for efficiently combining the ensemble numerical weather predictions. The SOM-based cluster analysis technique and the proposed strategy are described as follows.

Self-Organizing Map-Based Cluster Analysis Technique
The self-organizing map proposed by Kohonen [57] is a special class of ANNs. SOM can map high-dimensional input data onto a low-dimensional output space so as to allow the clusters to be determined objectively by visual inspection. The architecture of a SOM network generally consists of one input layer and one output layer with numerous neurons (i.e., the Kohonen layer). Each neuron of the Kohonen layer involves a synaptic weight w having the same dimension as input data x. In an unsupervised manner, the learning of SOM is to adjust the synaptic weights through the competitive, cooperative, and adaptive processes sequentially. Firstly, in the competitive process, all of the neurons compete among themselves to find out a neuron i whose synaptic weight w i has the minimum Euclidean distance to a certain current input data x k . The particular neuron i is the winning neuron of x k . Secondly, in the cooperative process, the influence of the winning neuron on its neighboring neurons is calculated by the topological neighborhood function h j,i(x) : where d j,i is the distance between the winning neuron i and its neighboring neuron j in the output space, and σ is the effective width, which is set to a half of the used SOM dimension herein. Thirdly, in the adaptive process, the synaptic weights of SOM are adjusted according to the input x k using the formula defined as where η(n) is the learning rate at the learning step n and w j (n + 1) is the synaptic weight of neuron j at the learning step n + 1. The learning rate shrinks with the learning step as η(n) = η(0) exp(−n/1000), in which η(0) is the initial learning rate and is set to 1 herein. These three processes are repeated until the synaptic weights are unchanged. As shown in Figure 2, during the SOM learning, the synaptic weights are ordered and are gradually descriptive of the distribution of input data [57]. This property is helpful for users to reveal the grouping of input data. proposed for efficiently combining the ensemble numerical weather predictions. The SOM-based cluster analysis technique and the proposed strategy are described as follows.

Self-Organizing Map-Based Cluster Analysis Technique
The self-organizing map proposed by Kohonen [57] is a special class of ANNs. SOM can map high-dimensional input data onto a low-dimensional output space so as to allow the clusters to be determined objectively by visual inspection. The architecture of a SOM network generally consists of one input layer and one output layer with numerous neurons (i.e., the Kohonen layer). Each neuron of the Kohonen layer involves a synaptic weight w having the same dimension as input data x. In an unsupervised manner, the learning of SOM is to adjust the synaptic weights through the competitive, cooperative, and adaptive processes sequentially. Firstly, in the competitive process, all of the neurons compete among themselves to find out a neuron whose synaptic weight w has the minimum Euclidean distance to a certain current input data x . The particular neuron is the winning neuron of x . Secondly, in the cooperative process, the influence of the winning neuron on its neighboring neurons is calculated by the topological neighborhood function ℎ , (x) : where , is the distance between the winning neuron and its neighboring neuron in the output space, and is the effective width, which is set to a half of the used SOM dimension herein. Thirdly, in the adaptive process, the synaptic weights of SOM are adjusted according to the input x using the formula defined as where ( ) is the learning rate at the learning step and w ( + 1) is the synaptic weight of neuron at the learning step + 1. The learning rate shrinks with the learning step as ( ) = (0)exp(− 1000 ⁄ ), in which (0) is the initial learning rate and is set to 1 herein. These three processes are repeated until the synaptic weights are unchanged. As shown in Figure 2, during the SOM learning, the synaptic weights are ordered and are gradually descriptive of the distribution of input data [57]. This property is helpful for users to reveal the grouping of input data. A SOM-based cluster analysis technique is applied herein. When the SOM learning is complete, all original input data are fed into the well-trained SOM. If a neuron responds to a specific input data, the neuron is the winner and is called the "image" of the specific input data. In other words, the neuron is "imaged" by the specific input data. The location of a winning neuron in the output space shows the topological location of the corresponding input data in the input space. If two input data are similar in the input space, their images will be crowded in a certain place of the output space. Finally, by labelling all of the winning neurons in the output space, the distribution of all input data is revealed. Hence, based on the results provided by the SOM-based cluster analysis technique, it is easy to objectively group input data into clusters. For more details about the SOM-based cluster analysis technique, please refer to Lin and Wang [62] and Lin and Wu [63].   A SOM-based cluster analysis technique is applied herein. When the SOM learning is complete, all original input data are fed into the well-trained SOM. If a neuron responds to a specific input data, the neuron is the winner and is called the "image" of the specific input data. In other words, the neuron is "imaged" by the specific input data. The location of a winning neuron in the output space shows the topological location of the corresponding input data in the input space. If two input data are similar in the input space, their images will be crowded in a certain place of the output space. Finally, by labelling all of the winning neurons in the output space, the distribution of all input data is revealed. Hence, based on the results provided by the SOM-based cluster analysis technique, it is easy to objectively group input data into clusters. For more details about the SOM-based cluster analysis technique, please refer to Lin and Wang [62] and Lin and Wu [63].

Strategy for Effective Combination of Ensemble Numerical Weather Predictions
In this subsection, on the basis of the SOM-based cluster analysis technique, a novel strategy is proposed to effectively integrate the ensemble forecasts of TTFRI-EPS. The illustration of the proposed ANN-based integration strategy is presented in Figure 3. Two steps are involved: the Past and the Future steps. Firstly, in the Past step, the ensemble forecasts from TTFRI-EPS and the observation during the near past time (the past 6 h herein) are all analyzed by the SOM-based cluster analysis technique. By means of the SOM-based cluster analysis technique, the grouping of ensemble forecasts and observation is revealed. It is helpful to detect the ensemble forecasts that are grouped into the same cluster as the observation. That is, these ensemble forecasts and the observation have the same "image". This phenomenon means these the forecasts have similar features to the observation. In other words, these forecasts captured the actual weather evolution during the near past time well. Hence, the members who provided the forecasts having similar features with observation are selected. These selected members are generally regarded as reliable and then used in the following time interval. Secondly, in the Future step, the forecasts for the following 24 h provided by these selected members are adopted. By calculating the ensemble mean of these selected forecasts, the forecasted rainfall for the future time interval (i.e., the following 24 h) is obtained. In conclusion, the proposed integration strategy is based on the assumption that if a member well capture the actual weather evolution in the past, the member is expected to perform better in the future. The proposed ANN-based integration strategy is a physically-based empirical real-time integration strategy. Hence, by means of the ANN-based strategy, it is expected that the ensemble forecasts from TTFRI-EPS will be well integrated for providing improved forecasts. It is noted that the 6-h and 24-h forecasted rainfall are obtained by cumulating the hourly outputs of TTFRI-EPS.

Study Cases
TTFRI-EPS was conducted in 2010 and reached maturity in 2012. Therefore, five typhoons, Saola (2012), Kong-Rey (2013), Fung-Wong (2014), Soudelor (2015), and Megi (2016), which made landfall and seriously affected Taiwan in the most recent five years, are used herein. When these typhoons made landfall, almost the entire Taiwan suffered heavy rainfall. Detailed information of these five typhoons is provided in Table 2. The rainfall period listed in this table is Coordinated Universal Time (UTC). Among these five typhoons, Soudelor, which is classified as Category 5 in Saffir-Simpson Hurricane Scale, is the strongest typhoon. The largest maximum 24-h rainfall (1042 mm) is also observed during Soudelor. The observed 24-h rainfall and typhoon tracks are presented in Figures 4 and 5, respectively. In Figure 4, observed rainfall data from about 750 gauges are used. It is found that the patterns of observed 24-h rainfall among the first three typhoons are mutually different. The main rainfall areas of Saola, Kong-Rey, and Fung-Wong are located in northeastern Taiwan, in southwestern Taiwan, and in southeastern Taiwan, respectively. As to Soudelor and Megi, the pattern of observed 24-h rainfall is similar because of the approximately similar track (see Figure 5). These two typhoons both moved from the southeast to the northwest and finally passed through Taiwan. The main rainfall areas are located in northeastern and southwestern Taiwan. This phenomenon is due to the phase-lock effect, which means a close relation between the typhoon position and rainfall [54].
The corresponding ensemble forecasted 24-h rainfall provided by TTFRI-EPS is also collected. For Saola, Kong-Rey, Fung-Wong, Soudelor, and Megi, the ensemble forecasts initialized at 18 UTC on 31 July 2012, 12 UTC on 28 August 2013, 12 UTC on 20 September 2014, 06 UTC on 7 August 2015, and 12 UTC on 26 September 2016 are used, respectively. Using times 6 h earlier ensures that all forecasts are available and avoids the numerical model spin-up issue. Additionally, it is worth noting that the ensemble forecasts are gridded data at a spatial resolution of 5-km (i.e., the outputs of the inner domain). Thus, the forecast of a certain gauge is obtained from the grid, which is nearest to this gauge. These ensembles forecasted 24-h rainfall are then integrated by the proposed ANN-based strategy.  The observed 24-h rainfall and typhoon tracks are presented in Figures 4 and 5, respectively. In Figure 4, observed rainfall data from about 750 gauges are used. It is found that the patterns of observed 24-h rainfall among the first three typhoons are mutually different. The main rainfall areas of Saola, Kong-Rey, and Fung-Wong are located in northeastern Taiwan, in southwestern Taiwan, and in southeastern Taiwan, respectively. As to Soudelor and Megi, the pattern of observed 24-h rainfall is similar because of the approximately similar track (see Figure 5). These two typhoons both moved from the southeast to the northwest and finally passed through Taiwan. The main rainfall areas are located in northeastern and southwestern Taiwan. This phenomenon is due to the phase-lock effect, which means a close relation between the typhoon position and rainfall [54].

Results of the SOM-Based Cluster Analysis Technique
In this section, the SOM-based cluster analysis technique in analyzing the ensemble forecasts from TTFRI-EPS is examined. The ensemble forecasted 24-h rainfall of Typhoon Saola is taken as an example and a SOM with dimension 1 × 4 is adopted herein. Figure 6 shows the clustering results yielded by the SOM-based cluster analysis technique. The corresponding forecasted 24-h rainfall of each ensemble member is also provided in this figure. In Figure 6, the information about the members in each cluster is revealed. A total of 5, 8, 4, and 5 members are involved in Clusters I, II, III, and IV, respectively. Members that were grouped into the same cluster, such as M10 and M12 in Cluster IV, have similar rainfall forecasts with each other. On the contrary, members grouped into different clusters, such as M10 (Cluster IV) and M01 (Cluster I), have obviously different forecasts. It is also obvious that members with the smallest rainfall are grouped in Cluster I, which is far away from Cluster IV containing members with the largest rainfall. That is, the location of members in Figure 6 demonstrates the relative topological relationship of rainfall patterns. For clusters with larger topological distance, more different rainfall patterns will be observed. For example, the difference between M10 (Cluster IV) and M04 (Cluster III) is smaller than that between M10 (Cluster IV) and M02 (Cluster I).
Besides, as shown in Figure 6, four clusters of TTFRI-EPS ensemble members are obtained due to the use of the SOM with dimension 1 × 4 herein. Higher dimensions are acceptable but not common. The dimension of SOM influences the clustering results. In general, a SOM with larger dimensions shows more details of the topological relationships of input data, but it is more difficult to determine the proper number of clusters than smaller ones. Therefore, in order to obtain a satisfactory clustering result, the dimension of SOM is chosen depending on the requirement of users. In this study, the clustering results will then be directly applied in the proposed ANN-based integration strategy. In view of the requirement (i.e., quick application) and the number of ensemble members in TTFRI-EPS, a SOM with smaller dimension is suggested herein. Hence, the SOM with dimension 1 × 4 is finally adopted in this study. Additionally, further study on investigating the model configurations of members in each cluster is required in future research for gaining more knowledge about TTFRI-EPS.

Results and Discussion
In this section, the potential of the ensemble mean corresponding to each cluster obtained by the SOM-based cluster analysis technique is assessed. The performance of the forecasts provided by the proposed strategy (i.e., the ensemble mean corresponding to the forecasts in the appropriate cluster) is also evaluated. In order to reach just conclusions, the conventional strategy (i.e., the ensemble mean of all forecasts) is also used herein for comparison with the proposed strategy.

Potential of the Ensemble Mean of Each Cluster
In this subsection, the performance of the ensemble mean corresponding to each cluster obtained by the SOM-based cluster analysis technique is assessed. The ensemble mean of all members (i.e., the conventional strategy) is also used herein as the benchmark. Figure 7 shows the ensemble mean of each cluster, as well as the ensemble mean of all members. The observation is also provided for comparison. It is observed that for Typhoon Kong-Rey, the ensemble mean of each cluster differs from each other. It is also detected that for some typhoons, such as Fung-Wong and Megi, the ensemble mean of each cluster is alike. This phenomenon represents the ensemble forecasts for typhoons are diverse (Kong-Rey), or similar (Fung-Wong and Megi). Additionally, it appears that the ensemble mean of a certain cluster is more similar to the observation than that of all of the members. That is, the ensemble mean of the forecasts in a certain cluster performs better than that of all forecasts. measures, a just conclusion is expected to be reached.
Moreover, it is worth noting that these measures are calculated based on the rainfall data of all gauges. The observed rainfall data are collected from 750 gauges in Taiwan. The forecasted rainfall data of a certain gauge is obtained from the 5-km resolution gridded data (use the grid which is nearest to this gauge). That is, these measures present the forecasting performance for point forecasts at each gauge. Performance measures of the ensemble mean corresponding to each cluster (black dash) and to all of the members (red dot) are presented in Figure 8. As shown in Figure 8a, it is obvious that for most of the five typhoons, the ensemble mean of a certain cluster has higher CC values with the observation than that of all of the members does. That is, the forecasting performance of the ensemble mean of a certain cluster is better than that of all the members. Similar results are also observed in Figure 8b-d for the other three measures. Hence, based on the results in Figure 8, it is To quantitatively evaluate the performance of the ensemble means shown in Figure 7, four measures that are commonly used in hydrology are employed herein. Firstly, the coefficient of correlation (CC) is used to assess the correlation between the observed and forecasted rainfall. Higher CC value means better correlation. The CC is written as: where R n andR n are the observed and forecasted 24-h rainfall at a certain gauge n, respectively. R and R are the average of the observed and forecasted 24-h rainfall, respectively, and N is the total number of gauges. Secondly, the root mean square error (RMSE) is used to measure the error between the observed and forecasted 24-h rainfall and is written as: Thirdly, the absolute percentage error of volume (AEV) is used to evaluate the error between the total volume of observed and forecasted 24-h rainfall and is defined as: where abs( ) means the absolute value. Fourthly, the absolute percentage error of peak (AEP) is used to evaluate the error between the maximum observed and forecasted 24-h rainfall. The AEP is defined as: where max(R) means the peak value of 24-h rainfall. Therefore, based on the use of these four measures, a just conclusion is expected to be reached. Moreover, it is worth noting that these measures are calculated based on the rainfall data of all gauges. The observed rainfall data are collected from 750 gauges in Taiwan. The forecasted rainfall data of a certain gauge is obtained from the 5-km resolution gridded data (use the grid which is nearest to this gauge). That is, these measures present the forecasting performance for point forecasts at each gauge.
Performance measures of the ensemble mean corresponding to each cluster (black dash) and to all of the members (red dot) are presented in Figure 8. As shown in Figure 8a, it is obvious that for most of the five typhoons, the ensemble mean of a certain cluster has higher CC values with the observation than that of all of the members does. That is, the forecasting performance of the ensemble mean of a certain cluster is better than that of all the members. Similar results are also observed in Figure 8b-d for the other three measures. Hence, based on the results in Figure 8, it is confirmed the better potential of using the ensemble mean of a certain cluster instead of the ensemble mean of all members. It is concluded that if the ensemble mean of an appropriate cluster is adopted, an improved forecasting performance will then be obtained. That is, improved forecasts are obtained by using the SOM-based cluster analysis technique to combine ensemble forecasts, rather than simply averaging all ensemble forecasts together. In the following subsection, how to determine the appropriate cluster in advance is discussed. observed in Figure 8b-d for the other three measures. Hence, based on the results in Figure 8, it is confirmed the better potential of using the ensemble mean of a certain cluster instead of the ensemble mean of all members. It is concluded that if the ensemble mean of an appropriate cluster is adopted, an improved forecasting performance will then be obtained. That is, improved forecasts are obtained by using the SOM-based cluster analysis technique to combine ensemble forecasts, rather than simply averaging all ensemble forecasts together. In the following subsection, how to determine the appropriate cluster in advance is discussed.

Evaluation of the Performance of the Proposed ANN-Based Integration Strategy
In this subsection, the performance of the proposed ANN-based integration strategy is evaluated, and the determination of the appropriate cluster is also discussed. As shown in Figure 3, two steps, the Past and the Future steps, are involved in the proposed ANN-based integration strategy. In the Past step, the ensemble forecasts from TTFRI-EPS and the observation during the past 6 h are all analyzed by the SOM-based cluster analysis technique. Therefore, the cluster that involves the observation is obtained and is adopted as the appropriate cluster hereafter. The ensemble forecasts grouped into this appropriate cluster are also discovered. Grouping in the same cluster means these ensemble forecasts have similar rainfall patterns to the observation. That is, these forecasts well captured the actual weather evolution. Therefore, the ensemble members who provided these ensemble forecasts are generally regarded as reliable and are adopted in the following step. Then, in the Future step, the forecasts for the following 24 h provided by these selected members are used. The ensemble mean of this forecasted 24-h rainfall is calculated afterwards. Finally, the obtained ensemble mean from the proposed strategy is seen as the integrated typhoon rainfall forecasts for the following 24 h.
Performance measures of the rainfall forecasts from the proposed strategy (blue cross), as well as those from the conventional strategy (red dot) are presented in Figure 9. As shown in Figure 9a, it is obviously that the proposed strategy always yields higher CC values as compared to the conventional strategy. As regards the other three measures in Figure 9b-d, the proposed strategy still yields lower RMSE, AEV, and AEP values as compared to the conventional strategy. That is, the forecasting performance of the proposed strategy (i.e., ensemble mean of the members in the appropriate cluster) is better than that of the conventional strategy (i.e., ensemble mean of all members). Hence, based on the results in Figure 9, it is confirmed that the improved forecasting performance is obtained through the proposed ANN-based integration strategy. It is concluded that by means of the proposed ANN-based integration strategy, the ensemble numerical weather predictions from TTFRI-EPS are effectively combined to yield better typhoon rainfall forecasts. still yields lower RMSE, AEV, and AEP values as compared to the conventional strategy. That is, the forecasting performance of the proposed strategy (i.e., ensemble mean of the members in the appropriate cluster) is better than that of the conventional strategy (i.e., ensemble mean of all members). Hence, based on the results in Figure 9, it is confirmed that the improved forecasting performance is obtained through the proposed ANN-based integration strategy. It is concluded that by means of the proposed ANN-based integration strategy, the ensemble numerical weather predictions from TTFRI-EPS are effectively combined to yield better typhoon rainfall forecasts.  Moreover, the improvement in four performance measures due to the use of the proposed strategy instead of the conventional strategy is presented in Table 3. For AEV, the mean values of 5 typhoons corresponding to the conventional and the proposed strategies are 19.85 and 19.05, respectively. The error decreases about 4.0% due to the use of the proposed strategy. Thus, the percentage of improvement in AEV is 4.0%. In the same way, the percentage of improvement in AEP is 4.2%. As for CC and RMSE, all rainfall data of five typhoons are strung together to calculate the values. Thus, the percentages of improvement in CC and RMSE due to the use of the proposed strategy are 3.5% and 4.3%, respectively. Hence, it is confirmed that the proposed strategy indeed provides improved 24-h rainfall forecasts, which are useful to support disaster warning systems and water resources management systems during typhoons. Future study on introduction of a post-processing procedure to integrate the members in the appropriate cluster, such as the non-equal weighting scheme, will be investigated for further improving the forecasting performance. Table 3. Improvement due to the use of the proposed strategy instead of the conventional one.

Measures
Performance In order to highlight the improvement in forecasting performance due to the use of the proposed strategy, detailed comparisons between the proposed and the conventional strategies under different levels of rainfall are focused. Hence, the extreme rainfall data (the highest 10%, 20%, 30%, 40%, and 50%) are used herein to evaluate the potential of the proposed strategy for disaster warning. The comparison results are presented in Table 4. It is obviously that the proposed strategy yields higher CC and lower RMSE values as compared to the conventional strategy. With the use of more extreme rainfall data, the improvements of CC and RMSE are more significant. Hence, based on the results in Table 4, it is again confirmed that the proposed strategy indeed provides improved 24-h rainfall forecasts, especially for the extreme rainfall. Future study on the use of more events to examine the proposed methodology will be required for further reaching more just conclusions. The results based on more events are also useful for further studies, such as the probabilistic quality of each cluster, or the variations of model configurations of the members in the appropriate cluster among events.

Summary and Conclusions
Accurate typhoon rainfall forecasts are always desired for water resources managers and disaster warning systems. In this study, the rainfall forecasts from an ensemble numerical weather prediction system in Taiwan (i.e., TTFRI-EPS) are used. To further integrate these ensemble forecasts, an ANN-based integration strategy is proposed. Firstly, a SOM-based cluster analysis technique is applied to analyze the ensemble forecasts and the observation during the past 6 h. The ensemble forecasts grouped into the same cluster with the observation (i.e., the appropriate cluster) are revealed and that means these ensemble forecasts have similar rainfall pattern to the observation. Hence, the following 24 h forecasts corresponding to the ensemble members in the appropriate cluster are calculated afterwards. The ensemble mean of this forecasted 24-h rainfall is finally adopted as the integrated typhoon rainfall forecasts for the following 24 h. That is, the novelty of this study is the use of the SOM-based cluster analysis technique to post-process the TTFRI-EPS ensemble forecasts. The rainfall forecasts are obtained by using the SOM-based cluster analysis technique to combine these ensemble forecasts, rather than simply averaging all ensemble forecasts together.
To clearly demonstrate the advantage of the proposed strategy, actual application is conducted during five typhoon events. Firstly, the results indicate that by means of the SOM-based cluster analysis technique, the ensemble 24-h rainfall forecasts from TTFRI-EPS are well categorized. The clustering result is helpful for users to quickly detect the features of all ensemble forecasts. Besides, the clustering results indicate that the members in the appropriate cluster vary from event to event. That somewhat explains why the ensemble prediction system is increasingly used to capture the uncertainty of weather predictions, rather than the use of a single deterministic prediction. However, this evidence is not solid yet because only five events are analyzed. More events are required to reach a just conclusion. Then, it is also confirmed that the better forecasting potential of using the ensemble mean of a certain cluster as compared to that of all members. Moreover, the integrated 24-h typhoon rainfall forecasts resulting from the proposed strategy is more accurate than those from the conventional one (i.e., the ensemble mean of all members), especially for the extreme rainfall. In conclusion, the proposed strategy effectively integrates the ensemble forecasts and indeed provides improved forecasts of 24-h typhoon rainfall. The improved rainfall forecasts are expected to be useful to support disaster warning systems and water resources management systems during typhoons.