Regional Solar Irradiance Forecast for Kanto Region by Support Vector Regression Using Forecast of Meso-Ensemble Prediction System

From the perspective of stable operation of the power transmission system, the transmission system operators (TSO) needs to procure reserve adjustment power at the stage of the previous day based on solar power forecast information from global horizontal irradiance (GHI). Because the reserve adjustment power is determined based on information on major outliers in past forecasts, reducing the maximum forecast error in addition to improving the average forecast accuracy is extremely important from the perspective of grid operation. In the past, researchers have proposed various methods combining the numerical weather prediction (NWP) and machine learning techniques for the one day-ahead solar power forecasting, but the accuracy of NWP has been a bottleneck issue. In recent years, the development of the ensemble prediction system (EPS) forecasts based on probabilistic approaches has been promoted to improve the accuracy of NWP, and in Japan, EPS forecasts in the mesoscale domain, called mesoscale ensemble prediction system (MEPS), have been distributed by the Japan Meteorological Agency (JMA). The use of EPS as a machine learning model is expected to improve the maximum forecast error, as well as the accuracy, since the predictor can utilize various weather scenarios as information. The purpose of this study is to examine the effect of EPS on the GHI prediction and the structure of the machine learning model that can effectively use EPS. In this study, we constructed the support vector regression (SVR)-based predictors with multiple network configurations using MEPS as input and evaluated the forecast error of the Kanto region GHI by each model. Through the comparison of the prediction results, it was shown that the machine learning model can achieve average accuracy improvement while reducing the maximum prediction error by MEPS, and knowledge was obtained on how to effectively provide EPS information to the predictor. In addition, machine learning was found to be useful in improving the systematic error of MEPS.


Introduction
In the basic structure of the electricity market where the power generation sector and the transmission sector are separated, the transmission system operator (TSO) bears responsibility for coordinating the gap from the balancing group's (BG's) plan of the supply and demand in real time. Therefore, TSO is bound to compensate for the imbalance in real time due to the error of the day-ahead forecast of the photovoltaic (PV) power supply by the power resource. Hence, the cost of securing a power source that is not used in the actual supply and demand is always incurred as a countermeasure against the case range ensemble forecasts (SREF) in 2001 [21]; United Kingdom Met Office (UKMO) started Met Office Global and Regional Ensemble Prediction System (MOGREPS) in 2005 [22]; Meteorological Service of Canada (MSC) started Regional Ensemble Prediction System (REPS) in 2011 [23]; Deutscher Wetterdienst (DWD) started ensemble prediction system based on the Consortium for Small-scale Modeling for German region (COSMO-DE-EPS) in 2012 [24]. For the regional scale forecast in Japan region, JMA started operational forecast of the Meso-Ensemble Prediction System (MEPS): regional EPS based on the MSM. MEPS generates 21 member forecasts by singular vector (SV) method, and it provides the up to 39 hour-ahead forecast. MEPS is adept at representing a variety of meteorological scenarios. Hence, by combining MEPS data with machine learning technique, more accurate and versatile day-ahead solar irradiance forecast system for Japan region can be expected. In other words, by the machine learning model referring to MEPS data, the one day-ahead solar power forecasts' precision can assumedly be improved by correcting the bias in MEPS with suppressing the huge forecasting error of concern to TSO and BGs.
There are several works applying EPS data to the machine learning models for the short term prediction of the renewable resource or the weather data by using EPS data from European Centre for Medium-Range Weather Forecasts (ECMWF). C. Junk et al. applied ensemble model output statistics (EMOS) and analog-based EMOS to ECMWF's EPS data for wind power forecasting [25], S. Sperati et al. applied neural-network (NN) to ECMWF's EPS data for solar power forecasting [26], and S. Rasp and S. Lerch used NN with ECMWF's EPS data for estimating air temperature at 2 m [27]. L. Massidda and M. Marrocu applied QR to each ECMWF's EPS member and combined Integrated Forecasting System (IFS) data with their QR outputs [28]. However, to our best knowledge, there are few research on the GHI forecast based on the machine learning approach using EPS data despite the works in References [26,28].
Note that, in previous research, ECMWF's EPS forecasts are only used for the machine learning-based solar power forecasting, and few studies have focused on the maximum prediction error, regardless of its practical importance for the power transmission systems. In addition, the affection of the difference of the machine learning models' network which process the EPS's multiple members is not sufficiently investigated. It is necessary to study the configuration that enables machine learning models to utilize EPS forecasts effectively.
In order to improve the average accuracy and reduce the maximum prediction error by the machine learning model using EPS forecasts, and to verify the effective way of constructing the machine learning model using EPS predictions, this study constructed several network configurations of SVR-based predictors for regional GHI prediction in the Kanto region using MEPS forecasts as input, and verified the accuracy of each predictor by the ground observation data at meteorological stations.
By comparing the results of regional GHI prediction of each predictor, we discussed how to effectively construct a machine learning model using EPS. Besides, by comparing the prediction results with those of MSM and MEPS, we confirmed that the machine learning model is effective in improving the systematic error of GHI prediction by MEPS. Note that the MEPS forecast used in this paper incorporates a new model from the JMA for the calculation of physical processes, and mentions its bias error in the discussion. Furthermore, in addition to improving the average accuracy of the predictor, we showed that the maximum prediction error, which is of practical importance, can be reduced by the machine learning model using EPS.
The paper is organized as follows. In Section 2, we denote the data description. Section 3 explains the fundamentals of our study. Support vector regression and evaluation methods. Section 4 shows our prediction models, and their precision are discussed in Section 5. In Section 6, conclusions are denoted.

Meso-Scale Ensemble Prediction System
Forecasts that are calculated from accurately estimated initial and boundary values are called deterministic forecasts, and higher accuracy has been promoted by improving observation systems and data assimilation technology, improving NWP models, and introduction of high-performance computers. As such deterministic forecasts, in Japan region, MSM data is distributed from the JMA.
JMA works diligently for disaster prevention, such as an intensive rain storm, through the improvement of the MSM. However, because the error of deterministic forecasts may increase depending on the initial variables and boundary condition by the chaotic behavior of the atmosphere, its intrinsic predictability is limited. Even though the practical predictability can be extended through the improvement of the NWP models, it is necessary to objectively grasp their reliability through the probabilistic approach.
Ensemble prediction enables to obtain the stochastic characteristics of the weather forecasts through the multiple predictions of the members created by perturbating the numerical conditions. By utilizing the information from ensemble members' behavior, efficient risk management becomes possible. Besides, their expected value can improve the precision of the forecast.
For the examination of the regional EPS, JMA started the development of mesoscale SV in 2005. After a number of comparative tests, MEPS had developed for the implementation from 2012. Along with the operation of the 10th Generation Numerical Analysis Prediction System (NAPS10), the operational forecast of MEPS started in 2019 [29]. MEPS is the system generating 21 ensemble members by perturbating the initial and lateral boundary conditions of the control member, which is equal to MSM-GPV: their perturbations are given by the meso SV and the global SV with ensuring their consistency. Forecasts of MEPS are operated at 00, 06, 12, 18 UTC, and their forecasts can be made up to 39 h in advance. The spatial resolution of MEPS is 5 km. GHI forecasts from MEPS are averaged over the previous hour, and the ground measurements used for comparison are converted to average value of the previous hour according to the MEPS's GHI. Note that MEPS forecasts used in our work adopt the asuca model, which is the improved model of dynamical processes from conventional JMA Non-Hydrostatic Model (JMA-NHM) (see more details in Reference [30]).
GHI forecasts provided from MEPS can be useful for solar power forecasts; however, MEPS is coordinated for windstorm disaster prevention, and the control-run of the MEPS has systematic errors due to inconsistencies in cloud cover calculations between cloud microphysical processes and radiative and boundary layer processes.
For such biases caused by sub-grid scale physics, it is expected that correction by machine learning technique is useful. By combining machine learning techniques with MEPS, it is expected that it will be possible to build an one day-ahead solar power forecasting model that can respond to various situations while correcting for systematic errors in MEPS.

Input Data
In our work, we aim to verify the improvement of the global horizontal irradiation (GHI) forecast from MSM or MEPS through the prediction for the regional GHI averaged at five points in the Kanto region (Maebashi, Tsukuba, Utsunomiya, Tokyo, Choshi) at 1-h intervals from 6 June 2018 to 6 October 2018.
Our prediction models' explanatory variables consist of the weather forecasts and the theoretical total solar irradiation intensity: MSM-GPV's forecast data (temperature and relative humidity on a surface, high/middle/low level cloud covers), MEPS's GHI forecast obtained from 21 ensemble members, the calculated values of the solar insolation at the top of atmosphere on a horizontal surface [31]. For one day-ahead forecast, we adopt the forecast data delivered at JST 15:00 of the previous day of the machine learning prediction target date. We describe the MSM datasets excluded GHI forecast data and GHI from MEPS of each site as MSM s (h), GH I s (h), and we denote them as follows.
where s = 1, 2, . . . , N s and m = 0, 1, . . . , N m indicate the index number of the sites and the members of MEPS, h is the hour of the forecast date, T s (h) and RH s (h) are the forecast values of the temperature and the relative humidity on a surface, and HC s (h), MC s (h) and LC s (h) are the forecasts of the high / middle / low level cloud covers. In particular, MSM's GHI forecast data correspond to the control member of the MEPS members, that is tagged by the 0-th number (m = 0). Each site's extra-atmospheric solar radiation on a horizontal surface is denoted by E s (h).
For the training and evaluation of the predictor, we use the ground-based solar radiation data observed by the JMA station at the target site in Figure 1 as the objective variable.

Support Vector Regression
SVR is the machine learning technique that applies the support vector networks [32] to the regression problem. K. R. Müller et al. and H. Drucker et al. showed the SVR obtained the excellent performance in the time series forecast [33,34]. In case time series data is given by (x i , y i ), i = 1, 2, . . . , n, where x i is the explanatory variable, and y i the objective variable, the basic idea of SVR is the optimization technique for estimating the regression formula denoted by where K x p , x q is the kernel function, α i is the weight parameter, and β is the bias. SVR firstly projects the datasets into the feature space given by the non-linear map function linking to the kernel function, and secondly estimates the line minimizing the loss function's penalty in the feature space through the optimization of the dual problem. Standard SVR formula adopts the RBF kernel and the -insensitive loss function: corresponds to the size of insensitive tube, and its optimal value depends on the noise level in the datasets. σ 2 is the variance of the Gaussian function for the data fitting. For constructing support vector machine tuning -insensitive tube size automatically, Scholköpf et al. developed the ν-SVR [35]. Through using ν-SVR, parameter can be automatically adjusted without assuming the noise level. The implementation method and kernel functions of SVR are detailed in Reference [36].
In our study, we applied ν-SVR method to the machine learners of the system operating regional solar irradiance forecast task. The LibSVM wrapper for R ("e1071" package) is used for the implementation [37,38].

Index of Error
There are several ways to evaluate prediction results. Mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R 2 ) are frequently used for the comparisons of the multiple forecasts' precision: where n is the number of the samples, y j is the real value, and y j is the prediction value. MAE evaluates the prediction error by the Manhattan distance, while RMSE evaluates the prediction error by the Euclidean distance: In case prediction values are given by the constant, median minimizes MAE, and mean minimizes the RMSE. R 2 gives the performance index compared with the case that mean is given as the prediction value. The characteristics of these indices are detailed in Reference [39].

Cross-Validation
In addition to measuring the performance score of the prediction result, it is needed to adequately train and predict the machine learning models for the evaluation. Ideally, it is desirable that the training dataset and the test dataset should be strictly divided for verifying the performance of time-series forecast of the machine learning models. However, if enough long dataset is not available, k-fold cross-validation (CV) is adopted for the verification process: firstly, the dataset is split into k subgroups; secondly, the machine learner trains the dataset excepting k-th subgroups' data; thirdly, k-th data is used for the test of the machine learning model's prediction; those process is repeated until all subgroups' test results are acquired. In particular, if the number of folds k is equal to the number of samples n, that is called leave-one-out CV.
In our work, 4-fold CV is adopted for the evaluation of the prediction result, and in order to eliminate the influence of the time trend, random shuffle is performed for each date before the 4-fold CV. Besides, we evaluate the error data obtained from CV collectively; k-fold CV generally aggregates the score by groups and obtains the mean of their score, but we totally treat the prediction results of each sample without separating groups. The configuration of our evaluation process is shown in Figure 2.

Prediction Model
Accuracy of the regional solar power forecast based on machine learning models may depend on the strategies for obtaining its prediction value: configuration of the prediction model. In Reference [40], J. Fonseca, Jr., et al. shows the strategy obtaining the regional PV power forecast from local SVR forecasts (Strategy1) gives better performance than the strategy that SVR directly obtains the regional PV power forecast (Strategy2). For this reason, it is important to check the prediction results of the prediction model using MEPS in multiple configurations.
MEPS has 21 forecast members, that are calculated individually for each ensemble member. Hence, the constitution of the predictor could be designed variously: one machine learner could treat all 21 forecast data for the prediction collectively; multiple weak learners could treat each MEPS member's forecast for make estimation, and their output are integrated at ensemble node (Integrator: INT) for the final decision of the prediction value. In addition, there are two strategies to integrate regional solar radiation at target sites: machine learner directly trains and predicts only the averaged GHI value. Each site's GHI is individually predicted by machine learners, and then they are averaged.
In case adopting the weak learner for generating the ensemble prediction of the GHI for each site, disregarding the combination, (N m + 1) × N s prediction are obtained, and they have the two axes: index of the MEPS members m, index of the sites s. To project them on the s axis, there are two patterns for the locus of the ensemble nodes. One arranges the ensemble nodes at each site while another arranges the ensemble node at last, that is, integrators output the local GHI estimation or regional GHI estimation. Besides, there are options for how to join in the ensemble node.
In this study, we consider the prediction models based on the SVR machine learners, which are composed as Figures 3-6 and Table 1. Each SVR model (with the exception of the integrator) uses weather forecast data, extra-atmospheric solar radiation on a horizontal surface, and their value of the time of one hour delay. Besides, standardization is applied to the pre-processing of the training. Each case has a different approach to predicting regional GHI. Case1 is an approach to obtain the regional GHI directly without splitting the data, while Case2 is a method to accumulate the predicted GHI for each region. Cases3-4 use the EPS forecast of each location to construct a weak learner for each member, and the integration method of the weak learner is changed in each case. Case3 takes an approach to forecast the regional GHI by passing the results of the weak learner to the integrator and accumulating the local GHI forecasts for each location. In contrast, Case4 takes the approach of calculating regional GHI forecasts for each ensemble member from the results of the weak learner, and finally integrating and forecasting them in the integrator. The mathematical representation of each case and the different labels within the cases are described below.  Figure 3, only one machine learner is used for the prediction; all sites' data are collectively inputted into it, and regional average GHI is predicted directly.
In Case1a, the predictor uses MSM for GHI forecast input; in Case1b, the predictor uses MEPS 21 member data collectively. Their predictions are given by: In Case2 shown in Figure 4, multiple machine learners predict GHI of each site, and their outputs are summed up to get the regional average GHI forecast. In Case2a, predictors use only the control member data (m = 0). In Case2b, each site's MEPS 21 members' GHI are put together into the machine learners, which predict each site's local GHI, respectively, and their outputs are averaged to obtain the regional GHI prediction.
where y m s (h) is the local GHI prediction based on the m-th member's GHI forecast data from MEPS.
In Case3 shown in Figure 5, the prediction model consists of weak learners and integrators in each subsystem; weak learners try to improve GHI prediction from each MEPS member's GHI forecast individually; integrators are, respectively, output each site's GHI prediction, and regional average GHI is obtained from them: where I NT s (·) is the function corresponding to the integrator of each site. They predict the local GHI for each site. They use only the weak learners' output if SVR models are used in the ensemble nodes. In Case4 shown in Figure 6, this model consists of weak learners and an integrator as with the Case3; weak learners predict each site's GHI from each MEPS member data, and their outputs are summed up for estimating regional average GHI, respectively, for each member's group; the integrator finally produces the regional average GHI prediction: where I NT(•) is the integrator placed in the last layer of the prediction model. It uses each MEPS member's regional GHI prediction values as the explanatory variables.
In Case3, we adopt mean operation and SVR as integration process. However, in Case4, we only use SVR for the function of the integrator because the mean operation's result is equal to the result of Case3's mean operation case.
For the comparison with the persistence model, which is often treated as the benchmark, we give the prediction value denoted by: The persistence model simply outputs the one day-ahead observed GHI as the forecast. Figure 3. Schematic of the access of the MEPS's GHI forecasts in Case1. Case1a uses the MSM's GHI (m = 0) only, and Case1b uses MEPS all member (m = 0, 1, . . . , 20). They directly predict regional GHI with one SVR model.   Figure 6. Schematic of the access of the MEPS's GHI forecasts in Case4. This prediction model places the integrator (I NT(•)) at the last layer. Weak learners (SVR m s (•, u s )) output are gathered into the mean function for each MEPS ensemble member to obtain the regional GHI ensemble prediction, and then their forecasts enter the ensemble node.

Results and Discussion
Evaluation scores for Case1-4, Persistence Model, MSM's forecast (MSM-GPV), and MEPS ensemble mean are in Table 2 and Figure 7. In comparison with the Persistence, MSM and MEPS forecasts, all Cases refine the accuracy of the regional GHI forecast for the application of the machine learning model. In particular, Case2b presents the best performance, and the prediction values including MEPS have higher performance than the Case1a, MSM-GPV, and MEPS ensemble mean. The results of Case1b and Case2b show that the forecast model using MEPS forecast data scores better than the other model within the same approach (Case1a, Case2a).  In addition, it is confirmed that the strategy averaging multiple local forecasts gives better performance by comparison Case1 with Case2. Among the prediction models using MEPS forecasts, the approach of Case3 is the best after Case2b. Case3 is also based on the accumulation of regional GHI forecasts. As with regional PV forecast in Reference [40], regional GHI forecast that adopts MEPS ensemble members shows the similar trends. The systematic error of the NWP is not uniform among points, and it is thought that the systematic error caused by the topography of each point can be corrected by machine learning models. Therefore, it is inferred that it is effective to divide the model by subregion in the forecasting model using EPS forecasts.
However, in contrast to the improvement of the precision by the strategy of the local GHI forecast integration, the score of Case3-4, which renovated the MEPS ensemble members through weak learners, gives worse performance than Case2b, while MEPS ensemble members improve the forecast in Case1-2 and MSM without weak learners. Score of the Case3-4 are almost equal to the score of the Case2a. Each weak learner in Case3-4 uses each MEPS member's GHI forecast individually, and their configuration is equal to the predictor in Case2a. By constructing multiple SVR by changing the choice of the MEPS member simply, multi-model ensemble approach does not contribute to the accuracy improvement. The ensemble mean approach assumes that the ensemble members contain the appropriate perturbation. By applying MEPS GHI forecast to the SVR directly, weak learners output loses the effectiveness, that is, the spread of probability distribution is lost. To refine the precision of the models in Case3-4, it is needed to maintain the ensemble members' diversity through the post-process in Reference [26], such as EMOS and valiance deficit (VD). Hence, in case no post-processing is used, it is better to input the EPS forecasts together in one model to construct an effective predictor. MSM, which is the base model of the MEPS, has the tendency to underestimate the regional GHI. In previous research, Ohtake et al. (2015) analyzed the seasonal variations in the GHI forecast errors of the MSM-GPV during the year from 2008 to 2012, and MSM had the tendency to underestimate GHI from −29.5 to 50.6 in summer (from June to November), regardless of the initialization times [41]. In 2008, JMA also reported MSM underestimated GHI in summer during 2004 to 2007 [42]. Over only the Kanto region, Ohtake et al. (2013) investigated the relationship between the relatively large GHI forecast error and the frequency of the cloud type observed at JMA station, and they conjectured cirrus appeared in the underestimation case [43]. Referring to the Köppen-Geiger climate classification [44], the Kanto region of Japan is classified as a humid subtropical climate (Cfa). In East Asia, the monsoon tends to carry humid air from low latitudes to high latitudes during the summer, resulting in high humidity in the Kanto region. In relation to upper clouds (e.g., cirrus) produced by humid air, the MSM does not consider ice supersaturation in the calculation of cloud cover diagnostics in radiative processes. This suggests that overestimation of upper clouds is one of the causes of underestimation in summer. MEPS ensemble members are generated from the control run which corresponds to the MSM. Hence, it can be inferred the ensemble members of the MEPS similarly has the tendency to underestimate the GHI, while the control-run of the members replaced the model of dynamical process from JMA-NHM to asuca. Because the machine learning model includes relative humidity and temperature as explanatory variables, the effect of overestimation of upper cloud cover is considered to be corrected by learning in Case1-4 regardless of the differences in networks.
From the above, the following can be inferred about the average accuracy of each predictor in this study.

•
If the predictors have the same network structure, the average accuracy of the predictors can be improved by adding EPS predictions to the explanatory variables.

•
An approach that divides the predictor by sub-region and combines local GHI is effective for predicting regional GHI.

•
When post-processing is not used for weak learners, it is more effective to use a single predictor to handle EPS forecasts rather than to construct a multistage predictor for each ensemble member.

•
Regardless of the network structure, each predictor compensates for the systematic errors of the EPS.
In addition to the average accuracy of the forecast, confirmation of the maximum forecast error and error distribution is particularly important from the practical standpoint of day-ahead solar power forecasting. To confirm the distribution of the forecast errors ( y k − y k ), quantile of the error is in Table 3, and their box plot are shown in Figures 8 and 9. Focusing on the maximum error (100% quantile), persistence model, Case1a, Case2a, and Case3-4 give larger maximum error than the MSM, that is, their prediction models hardly decrease the amount of the regulating power that TSO purvey on the day before the operation. MEPS without machine learning models, Case1b and Case2b, however, show smaller maximum error (100% quantile) in whole period, and they suppress the overestimation of the regional GHI on JST10:00-JST15:00. This means that it is possible to reduce the overestimation of PV power generation based on the GHI forecast during the sunny hours when the power demand is high in the summer. Notably, when compared with the predictors only using MSM, the maximum error decreases significantly by the utilization of MEPS: Case1b reduces 100% quantile error by 41.6% from the result of Case1a; Case2b reduces 100% quantile error by 36.8% from the result of Case2a. Case1b and Case2b coordinate MEPS ensemble member forecasts collectively; thus, machine learning models assumably obtain the performance in the various situation by referring to the MEPS ensemble members' multiple behaviors. From above, by collectively introducing EPS forecasts into the machine learning model's explanatory variables, the prediction model can improve the maximum forecast error, and the bias in the NWP models can also be corrected.

Conclusions
In this study, we applied several configurations of the machine learning model to the MEPS data, which is a regional EPS data of Japan, and evaluated the prediction errors in order to verify the improvement effect of the GHI forecast and the effective configuration method of the predictor using EPS. The main contributions of our paper are the following:

•
We investigated how to construct a regional GHI predictor using regional EPS and how to give an effective network configuration.

•
It was confirmed that systematic errors in MEPS forecasts can be improved by machine learning.

•
By applying machine learning methods to the prediction of MEPS, we showed that it is possible to construct a predictor that reduces the maximum prediction error while correcting for systematic errors in MEPS.
Multiple configurations were considered for design the predictors, and machine learning-based predictors utilizing MEPS ensemble members forecasts improve the MBE, MAE, RMSE, R 2 in almost all cases when compared with the predictor using only MSM. Furthermore, maximum error of the machine learning model decreased in case MEPS forecasts are collectively input into the predictor; in other words, machine learning-based predictors' overestimation can be suppressed by the MEPS data utilization. However, duration of our available MEPS data is limited, and they can have the seasonal deviation (e.g., Ohtake et al., 2013Ohtake et al., , 2015. For the more accurate verification, it is needed to confirm their effect in longer term: in spring, in autumn, in winter, in annual. In addition, while the comparison of the prediction results has provided us with knowledge on how to construct an effective network for machine learning models using EPS, it is necessary to conduct future comparisons of predictors with multi-stage configurations using weak learners with appropriate post-processing for predicting weak learners. Besides, machine learners used only GHI from the MEPS forecasts as the first analysis. Hence, it is desirable that various parameters of MEPS forecasts are utilized in the machine learning models. In order to investigate even deeper, we will extend the period of the available data and utilize the multiple parameters of the MEPS ensemble members' forecasts.