A Power-Forecasting Method for Geographically Distributed PV Power Systems using Their Previous Datasets

: The number of photovoltaic (PV) power systems being installed worldwide has been increasing. This has resulted in maintenance of an adequate balance between demand and supply becoming a great concern for power system operators. Forecasting PV power outputs is a promising countermeasure that has been garnering signiﬁcant interest. Conventional methods for achieving this often use learning methods, such as neural networks and support vector regression. In contrast, this paper proposes a short-term power-forecasting method for geographically distributed PV systems that uses only their previous output power data. In the proposed method, ﬁrst, the ratio of the power generation output to the maximum power output value for each observation instance in a designated period for each PV system at a certain date and time is obtained. Then, the future geographical distribution of the ratio is predicted from the temporal change (motion) of the preceding distribution. Finally, the predicted ratio is reconverted into the power output to perform short-term power forecasting. The results of total PV output power prediction in the Kanto area of Japan indicate that the proposed method has an average mean absolute percentage error of 4.23% and root mean square error of 0.69 kW, which veriﬁes its e ﬃ cacy.


Introduction
In recent times, renewable energy has been attracting increasing attention owing to growing concern regarding global warming, preservation of the environment, and depletion of fossil fuels. In particular, the number of photovoltaic (PV) installations has increased remarkably in recent years owing to the low cost of PV modules, lack of carbon dioxide emission, and ease of installation of PV panels. According to the global status report [1], in 2017, the total solar power capacity worldwide was 402 GW, which exceeds the total power generation capacity of nuclear power stations (392 GW). PV solar energy is considered one of the main renewable energy power generation resources. However, the output of a PV system is unstable because it varies with weather conditions. Thus, if a large number of PV systems is connected to a power grid, it is difficult to maintain the demand and supply balance in the overall electric power system. In addition, the net electricity demand increases rapidly from daytime to evening, when the electricity demand peaks, because of the large number of PV installations; this phenomenon is called the duck curve problem [2]. This can be solved by appropriately operating conventional power plants to cope with any sudden change in the PV output power if the PV output power can be predicted with high accuracy. Accurate prediction can aid residential energy management.
The various methods used to forecast PV output can be broadly classified into two approaches: The indirect approach and the direct approach. The former approach involves estimating the solar irradiance, whereas the latter involves using historical solar irradiance and temperature data provided by meteorological agencies. Ohtake et al. [3] forecasted the global horizontal irradiance (GHI) using mesoscale model (MSM) data provided by the Japan Meteorological Agency (JMA). In other studies, artificial neural networks (ANNs) have been used. Mellit and Pavan [4] realized one day ahead prediction using a multilayer perceptron (MLP) to solve problems in which the necessary data cannot be obtained. Wang et al. [5] used historical data along with the statistical indicators of irradiance and ambient temperature to reduce difficulties and complexities by developing an input vector model. Marquez et al. [6] used a gamma test and a genetic algorithm to identify potentially useful input data before using an ANN. Thorey et al. [7] forecasted the solar radiation using THORPEX Interactive Grand Global Ensemble (TIGGE) and satellite observations from HelioClim-3. TIGGE was established to accelerate the study on high-impact weather forecasts, providing a dataset of ensemble forecasts from many meteorological centers [8]. HelioClim-3 is a solar radiation database obtained from satellites [9]. Thorey et al. [7] used both datasets to develop an aggregated forecast method.
Yamazaki et al. [10] estimated solar irradiance based on just-in-time modeling, which is a local modeling technique that makes nonlinear estimation easier. Jamaly et al. [11] and Yang et al. [12] proposed spatiotemporal solar irradiance forecasting methods that employ the Kriging method. Specifically, Jamaly et al. [11] made it possible to assume steady or varying cloud moving speed over time by using the anisotropic spatiotemporal Kriging method. On the other hand, Yang et al. [12] proposed two methods to estimate the threshold distance of the spatiotemporal irradiance and applied it to two forecasting models-the spacetime Kriging model and the vector autoregressive model. However, these studies cited above did not derive equations, neither did they provide parameters or models to convert the PV output power. The output power of a PV system is influenced by the tilt angle, azimuthal angle, surrounding environment, geographical factors, meteorological factors, and deterioration of each PV array. These parameters need to be considered to forecast the PV output power accurately. Liaoliao et al. [13] combined a number of previous models with a proposed model comprising cell and inverter models to forecast PV output from solar irradiance. Specifically, they first estimated PV array irradiance from GHI using combinations of decomposition and transposition models. Then, they applied the estimated irradiance to their proposed PV output forecasting model.
Learning methods, such as neural network, support vector regression (SVR), and support vector machine (SVM), have also been incorporated in several direct methods of PV output power forecasting. Such methods are called statistical methods by the International Energy Agency Photovoltaic Power Systems (IEA PVPS) [14]. Rubby et al. [15] utilized an SVR model in which they inputted the values for parameters such as the month, hour, humidity, temperature, pressure, wind speed, and diffused horizontal irradiance. Fonseca Jr et al. [16] used the grid point value mesoscale model (GPV-MSM), which forecasts temperature and humidity, provided by the JMA, as input to the SVR model. Pierro et al. [17] proposed a data driven-upscaling method in which a neural network is used to estimate the regional PV power output. The inputs to the neural network comprise clustered regional PV plant data and irradiance data derived from satellite, numerical weather prediction data, and clustering data. Liu et al. [18] used a backpropagation neural network (BPNN) that calculates the output layer error and the local error, which is the difference between the requested output and the actual output, and then weights each neuron to decrease the local error. Liu et al. [19] subsequently compared the BPNN model with the SVM model and selected the latter to forecast the PV output owing to its higher accuracy. Then, they constructed a sample set of "cumulative particle matter concentration-efficiency reduction" to consider the effect of dust deposition on PV panels under fog and haze weather conditions. Yang et al. [20] presented a hybrid method combining self-organizing map (SOM), learning vector quantization (LVQ), SVR, and fuzzy inference. They first classified historical PV output data using SOM and LVQ, then used the SVR to learn the pattern of historical data, and finally used fuzzy inference to forecast the PV output based on the forecast provided by the Taiwan Central Weather Bureau. Killinger et al. [21] proposed a spatiotemporal PV power generation forecasting method based on measured datasets provided by 45 PV systems. To improve the forecasting accuracy, they used parameters related to the module orientation for each system individually. Inman et al. [22] and Antonanzas et al. [23] reviewed the systems used to forecast PV output. Specifically, they summarized the forecasting methods, algorithms and learning methods used for forecasting, metrics, time horizon, forecast variable, and data period. Furthermore, they reported the most suitable methods and data for particular time horizons; the SVM, ANN, and GHI were found to be the most used methods for forecasting the PV output. Yang et al. [24] also reviewed history and trends of solar irradiance and PV power forecasting by using text mining.
We present a different type of forecasting method. Kito et al. [25] focused on the motion vector of clouds and calculated their mobility vector using a cross-correlation function from still images obtained from a satellite. To forecast the solar irradiance, they applied the vector to the irradiance forecasting model. Lorenz et al. [26] proposed a method for forecasting surface solar irradiance based on cloud index images and motion vector. To predict future cloud images, first, they calculate the motion vector of the cloud using two consecutive time images, while assuming that the cloud motion is spatially uniform in the two consecutive times. Then, they use the motion vector technique to predict the future cloud image. Finally, using the Heliosat method, future solar irradiance is forecasted. However, assuming that a cloud is spatially uniform at consecutive times is error-prone because of the change of structure of clouds owing to geographical factors, cloud disappearance and generation, and changes in wind velocity.
Chow et al. [27] proposed a forecasting method that uses variational optical flow-which can capture nonuniform motion-to overcome the above problems. In their proposed method, first, the feature points of the cloud are captured from two consecutive images based on an optical flow tracker. Then, the trajectories of the feature points are calculated. Finally, the future cloud image is output by the trajectories. Bernecker et al. [28] proposed a method that similarly forecast solar irradiance after forecasting cloud motion based on sky images. Their proposed method first calculates the velocity of the cloud, in pixels per minutes, and the direction of movement from two consecutive images. Then, the structure of the clouds is captured via cloud segmentation, and a preliminary forecast is obtained from the two sets of results. However, because the preliminary forecast is sparse, a Kalman filter, which can combine sparse predictions to obtain continuous predictions, is used to forecast successive cloud motion. Finally, solar irradiance is forecasted using the cloud forecast result. Chapter 3.1 in the IEA PVPS Report [14] reviews the cloud motion vector approach.
In order to directly forecast the PV output, our proposed method focuses on the geographical distribution of the normalized values (NVs) of the PV output power. In this work, we optimized the mesh size and interpolation of the blank meshes used by the method introduced in a previous report [29], thereby reducing the forecast errors. The forecasting method involves normalizing the values by the maximum output powers of the respective PV systems, which results in the differences in the rated power and tilt and azimuthal angles of the respective PV systems being cancelled. Second, the geographical distribution of the NVs is modified to align the geographical distribution of the square meshes; the average of the NVs of the PV systems in each mesh is used as a representative value of the mesh. Third, motion estimation is used to predict the movement of the geographical distribution of the representative values. Finally, the future representative values are reconverted to the future output power of the respective PV systems.

Normalization
This paper proposes a method to forecast the future output power of PV systems by estimating the motion of the geographical distribution based on the variation of their previous values. However, it is not appropriate to compare the raw values of the output power of respective PV systems, because each PV system differs in terms of the rated power and tilt and azimuthal angles. Various methods have employed normalization to deal with this issue [18]. The method proposed in this paper also normalizes the PV outputs; however, this normalization method is slightly different from the other methods. The NV can be determined using (1).
where P(t) is the actual output power at a specific time t, and P 2w (t) is the highest output power among the 14 data points at a specific time t in the immediate two weeks (14 days). Lonij et al. [30] employed a similar estimation technique to other normalization methods, using 80% of clear sky performance measurements taken at the same time of day on the previous 15 days in order to normalize the PV output. Figure 1 shows a conceptual graph of P 2w (t). The 14 light-green curves are the output power profiles on the immediate 14 days from 16 October to 30 October, and the red and black curves indicate P(t) and P 2w (t) at each time on the predicted day (31 October), respectively. A two-week term was employed for P 2w (t) to consider seasonality, such as changes in the solar altitude and surrounding environment, and to obtain sunny weather at least once for each time in a period of two weeks. This paper proposes a method to forecast the future output power of PV systems by estimating the motion of the geographical distribution based on the variation of their previous values. However, it is not appropriate to compare the raw values of the output power of respective PV systems, because each PV system differs in terms of the rated power and tilt and azimuthal angles. Various methods have employed normalization to deal with this issue [18]. The method proposed in this paper also normalizes the PV outputs; however, this normalization method is slightly different from the other methods. The NV can be determined using (1).
where P(t) is the actual output power at a specific time t, and is the highest output power among the 14 data points at a specific time t in the immediate two weeks (14 days). Lonij et al. [30] employed a similar estimation technique to other normalization methods, using 80% of clear sky performance measurements taken at the same time of day on the previous 15 days in order to normalize the PV output. Figure 1

Time [h]
PV output on the predicted day(31, October) Kondoh and Kameda [31] previously applied this normalization method to trouble detection by comparing neighboring PV systems. They found that the method can cancel the influence of the rated power and tilt and azimuthal angles on output power by direct solar irradiance, but it cannot cancel the influence on output power by indirect solar irradiance.

Motion Estimation
In this method, the geographical distribution of past NVs is assumed to move with time based on the movement of the clouds, and the future PV output is predicted by estimating the motion vectors of the NV distribution. To estimate the motion vectors of the NV distribution, this method uses motion estimation [32], which is a technique developed for image processing. The forecasting process based on motion estimation is as follows.
It is necessary to fix the randomly scattered PV systems to align the pixels in order to apply motion estimation. Figure 2 shows the locations of the monitored PV systems, indicated by respective circles, in the Kanto region of Japan; the tints of the circles indicate the NVs. These distributions are modified to the aligned distributions with meshes, which are distinguished in terms of the latitude and longitude. Figure 3 shows an example of the meshes at 12:00 on 26 September. The tints of the circles in Figure 3 indicate the representative values obtained as averages of the NVs in the respective meshes.  Assuming a uniform linear motion in a very short period, Kameda et al. [32] predicted the linear motion of the NVs. For an NV expressed as , , the NVs at times t−∆t and t can be expressed as (2): where is the location of the NV, , is the motion vector of the NV at location and time , and , is the velocity at time ∆ . In addition, , = , , , , where , and , are the latitude and longitude components, respectively, in which , is the velocity of the latitude direction and , is the velocity of the longitude direction. The data term and the regularization (spatial smoothing) term of motion distribution are respectively defined in (3) and (4): , , = |∇ , | |∇ , | , where ∇ = , , and ∂ is the partial differential operator. The regularization term represents the spatial change of the motion distribution function , . Using the weighted sum of these terms, we can define the energy function as follows: Here, is the weight of the regularization term. In this study, was set to 1.000 based on the results of preliminary experiments conducted by us. As this is a problem involving the minimization   Assuming a uniform linear motion in a very short period, Kameda et al. [32] predicted the linear motion of the NVs. For an NV expressed as , , the NVs at times t−∆t and t can be expressed as (2): where is the location of the NV, , is the motion vector of the NV at location and time , and , is the velocity at time ∆ . In addition, , = , , , , where , and , are the latitude and longitude components, respectively, in which , is the velocity of the latitude direction and , is the velocity of the longitude direction. The data term and the regularization (spatial smoothing) term of motion distribution are respectively defined in (3) and (4): , , = |∇ , | |∇ , | , where ∇ = , , and ∂ is the partial differential operator. The regularization term represents the spatial change of the motion distribution function , . Using the weighted sum of these terms, we can define the energy function as follows: Here, is the weight of the regularization term. In this study, was set to 1.000 based on the results of preliminary experiments conducted by us. As this is a problem involving the minimization Assuming a uniform linear motion in a very short period, Kameda et al. [32] predicted the linear motion of the NVs. For an NV expressed as f (x, t), the NVs at times t−∆t and t can be expressed as (2): where x is the location of the NV, u(x, t) is the motion vector of the NV at location x and time t, and u(x, t) is the velocity at time ∆t. In addition, u(x, t) = (u(x, t), v(x, t)) T , where u(x, t) and v(x, t) are the latitude and longitude components, respectively, in which u(x, t) is the velocity of the latitude direction and v(x, t) is the velocity of the longitude direction. The data term E D and the regularization (spatial smoothing) term E s of motion distribution are respectively defined in (3) and (4): where ∇ = ∂ x , ∂ y T , and ∂ is the partial differential operator. The regularization term E s represents the spatial change of the motion distribution function u(x, t). Using the weighted sum of these terms, we can define the energy function as follows: Here, λ is the weight of the regularization term. In this study, λ was set to 1.000 based on the results of preliminary experiments conducted by us. As this is a problem involving the minimization of J(u, t), the solution is expressed in (6) and (7), which are Euler-Lagrange equations: where ∇ T ∇ = ∂ x 2 + ∂ y 2 , which is a Laplacian function. The details of the solutions to these equations and the corresponding algorithm are given in [32]. Figure 4 presents a conceptual illustration of motion estimation. Using the motion estimation technique, the motion vector AB is first calculated using the NVs at locations A and B. The vector AB is then translated to a vector BC to predict the NV at location C. The NV at location C at t+∆t is equal to that at the location of A at t−∆t and B at t. For example, the future NV at location C at 12:00 is equal to the NV at location B at 11:30 and location A at 11:00. The future NV is then reconverted to future PV output power using (1).
where = , which is a Laplacian function. The details of the solutions to these equations and the corresponding algorithm are given in [32]. Figure 4 presents a conceptual illustration of motion estimation. Using the motion estimation technique, the motion vector AB is first calculated using the NVs at locations A and B. The vector AB is then translated to a vector BC to predict the NV at location C. The NV at location C at t+∆ is equal to that at the location of A at t−∆ and B at t. For example, the future NV at location C at 12:00 is equal to the NV at location B at 11:30 and location A at 11:00. The future NV is then reconverted to future PV output power using (1).

Dataset
The dataset used in this study contained actual PV output power with 30 min observation intervals (∆t), from August 2013 to July 2014, for approximately 5000 PV systems monitored in Japan. The proposed method used the actual output power data at time t and 30 min prior (t−∆t) to estimate the output power at 30 min ahead (t+∆t). This study focused on the Kanto region in Japan, which lies in latitude range 35.4-36.5° and longitude range 139.0-140.8°. The number of monitored PV systems was approximately 1200, and their total rated power was approximately 12 MW. When the size of the meshes was reduced to determine the optimal size, some meshes did not contain any PV systems, as shown in Figure 5a, wherein the red circles indicate the missing NVs. The motion estimation technique cannot be used if there are many missing values, such as in Figure 5a. To overcome this

Dataset
The dataset used in this study contained actual PV output power with 30 min observation intervals (∆t), from August 2013 to July 2014, for approximately 5000 PV systems monitored in Japan. The proposed method used the actual output power data at time t and 30 min prior (t−∆t) to estimate the output power at 30 min ahead (t+∆t). This study focused on the Kanto region in Japan, which lies in latitude range 35.4-36.5 • and longitude range 139.0-140.8 • . The number of monitored PV systems was approximately 1200, and their total rated power was approximately 12 MW. When the size of the meshes was reduced to determine the optimal size, some meshes did not contain any PV systems, as shown in Figure 5a, wherein the red circles indicate the missing NVs. The motion estimation technique cannot be used if there are many missing values, such as in Figure 5a. To overcome this problem, the scattered missing values were interpolated using the grid data function in MATLAB R2018a. Various interpolation methods exists; however, griddata was used for simple interpolation in this study. Figure 5b shows the interpolation results. Some values remain missing on the right side; however, the region corresponding to the missing values is largely on the ocean and along the edges. Therefore, the values of these regions do not affect the motion estimation.

Assesment Method
This method forecasts the output power 30 min ahead. To verify the accuracy, the mean absolute percentage error (MAPE) ((8)) and root mean square error (RMSE) ((9)) were used: where N is the number of PV systems, and P real i , P f orecast i , and P rate i are the actual PV output, forecasted PV output, and rated power of the i th PV system, respectively. Equations (10) and (11) were also used to evaluate the optimal mesh size: where M is the number of meshes, NV rep j is the actual representative value, which is the average of the output power of all the PV systems in the j th mesh, NV f orecast j is the forecasted NV in the j th mesh, and NV real i is the actual NV of the i th PV system. The two parameters were set to evaluate the rounding error in each mesh. The difference between MAPE rep and MAPE NV represents the error when some PV systems in a mesh are grouped, and a representative value is used to forecast using the motion estimation. However, the representative values have rounding errors because the representative value is only the average of the NVs, which differ for different PV systems. Optimizing the mesh size helps to decrease the variance in the NVs in one mesh and the rounding error. Equation (10) was used to calculate the error between the actual representative NVs and the forecasted NVs. Equation (11) was used to calculate the error between the actual NVs of the respective PV systems and the forecasted NVs. When the difference between these errors is small, the rounding errors are also small.

Optimization of Mesh Size
First, the result for 26 September 2013 is focused on in this section to optimize the size of the mesh. There was a typhoon at sea toward the east of the Kanto region on this day. Therefore, the weather changed drastically, and the PV output change was also drastic. As shown in Figure 2, the NVs for the Kanto region differed significantly from place to place on this day. As the NVs significantly differ even within a mesh, the optimum mesh size for this day is considered to be the optimum mesh sizes for other days as well. Figure 6a-e shows the MAPE rep and MAPE NV profiles with every 30 min forecast for several mesh sizes. Comparing Figure 6a-d, we find that, the lower the mesh width is, the closer are the two curves. These results show that the rounding errors can be decreased by reducing the width of the mesh up to 0.02 • (1.8 km) in both latitude and longitude. Comparing Figure 6d,e, it can be seen that there is little difference between them. To optimize the mesh size, the MAPE of the non-NVs (MAPE rep in Equation (10)) and the output powers of the respective PV systems (MAPE in Equation (8)) are considered next. Figure 7 shows the MAPE between the actual and forecasted output powers of the respective PV systems. The smaller the mesh is, the lower is the MAPE up to 0.02 • (1.8 km). However, the MAPE curve of 0.02 • (1.8 km) is lower than that of 0.01 • (0.9 km), particularly around noon, as shown in Figure 7. Furthermore, the daily average of the MAPE of 0.02 • (1.8 km) is 5.39% and that of 0.01 • (0.9 km) is 5.53%. Therefore, a mesh size of 0.02 • (1.8 km) is the most suitable for this method. Figure 8a shows the result of the output power forecast for a sunny day on 19 September 2013 in terms of the total output power of all the PV systems forecasted. The MAPEs calculated using (8) are lower than 2.5% on this day, averaging at 1.97%. These values indicate that the MAPEs are low when the day is sunny; this is expected because the output power of PV systems is known to follow a sinusoidal curve on a sunny day, in general. Figure 8b shows the result of the output power forecast for 8 October 2013. On this day, the weather was sunny in the morning and cloudy in the afternoon. The daily average of the MAPE was 5.86% on this day. Figure 8c shows the result for 23 September 2013; the weather was cloudy in the morning and rainy in the afternoon. The total output power after 11:00 was much lower than that on 8th October 2013. The daily average of the MAPE on the day was 5.40%. These results show that the proposed method can accurately forecast the total output power on cloudy days as well. Figure 8d shows the result for a rainy day on 6 June 2014, with less generation output. The daily average of the MAPE on the day was 2.51%. Even on a rainy day, the accuracy of the forecast is high; however, this is natural because the total output power on a rainy day is low compared with the total rated power in the entire Kanto region, and the prediction of such a low output power exhibits a low error in Equation (8). Table 1 lists the RMSE and average of daily MAPE for five days in Figure 8. The value in the bottom row represents the average of MAPE and RMSE for five days. These low values show the effectiveness of this study.   respective PV systems (MAPE in Equation (8)) are considered next. Figure 7 shows the MAPE between the actual and forecasted output powers of the respective PV systems. The smaller the mesh is, the lower is the MAPE up to 0.02° (1.8 km). However, the MAPE curve of 0.02° (1.8 km) is lower than that of 0.01° (0.9 km), particularly around noon, as shown in Figure 7. Furthermore, the daily average of the MAPE of 0.02° (1.8 km) is 5.39% and that of 0.01° (0.9 km) is 5.53%. Therefore, a mesh size of 0.02° (1.8 km) is the most suitable for this method. Figure 8a shows the result of the output power forecast for a sunny day on 19 September 2013 in terms of the total output power of all the PV systems forecasted. The MAPEs calculated using (8) are lower than 2.5% on this day, averaging at 1.97%. These values indicate that the MAPEs are low when the day is sunny; this is expected because the output power of PV systems is known to follow a sinusoidal curve on a sunny day, in general. Figure 8b shows the result of the output power forecast for 8 October 2013. On this day, the weather was sunny in the morning and cloudy in the afternoon. The daily average of the MAPE was 5.86% on this day. Figure 8c shows the result for 23 September 2013; the weather was cloudy in the morning and rainy in the afternoon. The total output power after 11:00 was much lower than that on 8th October 2013. The daily average of the MAPE on the day was 5.40%. These results show that the proposed method can accurately forecast the total output power on cloudy days as well. Figure 8d shows the result for a rainy day on 6 June 2014, with less generation output. The daily average of the MAPE on the day was 2.51%. Even on a rainy day, the accuracy of the forecast is high; however, this is natural because the total output power on a rainy day is low compared with the total rated power in the entire Kanto region, and the prediction of such a low output power exhibits a low error in Equation (8). Table 1 lists the RMSE and average of daily MAPE for five days in Figure 8. The value in the bottom row represents the average of MAPE and RMSE for five days. These low values show the effectiveness of this study.

Result and Comparison with Other Methods
To verify the accuracy of the proposed method, the RMSEs obtained using the forecast methods proposed by Zang et al. [33], Seyedmahmoudian et al. [34], and Zhou et al. [35] were compared to that obtained using the proposed method. Table 2 lists the RMSE values of the four methods. The RMSE average of the proposed method is 0.69 kW, which is much lower than those of the other four methods. However, the proposed method forecasts the output power 30 min ahead, whereas the methods by Zang et al. [33] and Seyedmahmoudian et al. [34] forecast one hour ahead. The shorter the forecast time horizon is, the better is the performance. However, despite the difference in the time horizons being only 30 min, the difference in the RMSEs is significant. This shows that the proposed method with the geographical distribution can be used to effectively forecast the output power.  To verify the accuracy of the proposed method, the RMSEs obtained using the forecast methods proposed by Zang et al. [33], Seyedmahmoudian et al. [34], and Zhou et al. [35] were compared to that obtained using the proposed method. Table 2 lists the RMSE values of the four methods. The RMSE average of the proposed method is 0.69 kW, which is much lower than those of the other four methods. However, the proposed method forecasts the output power 30 min ahead, whereas the methods by Zang et al. [33] and Seyedmahmoudian et al. [34] forecast one hour ahead. The shorter the forecast time horizon is, the better is the performance. However, despite the difference in the time horizons being only 30 min, the difference in the RMSEs is significant. This shows that the proposed method with the geographical distribution can be used to effectively forecast the output power.

Reasons for Error
Two reasons can be deduced for the errors observed in the proposed method. The first is the normalization of the output power. The normalization was done to compare the output powers of the PV systems, which have different tilt and azimuthal angles and rated power. However, the differences exhibited by the respective PV systems cannot be completely cancelled out by normalization. This is the reason for the error when normalization is performed.
The other reason is the motion estimation. Figure 8e shows the result for 26 September 2013, wherein the MAPE recorded the highest value of 9.77% at 13:00. Let us focus on this time to understand the reason for the error. Figure 9a-c show the distribution of the NVs from 12:00 to 13:00. Figure 10 shows the image of the motion vectors estimated for the forecast at 13:00. Figure 9 shows a wide area with high NVs moving from west to east in the western part of the Kanto region and from northwest to southeast in the northern part (indicated using a red square frame in Figure 10) of the Kanto region. By comparing the movements shown in Figure 10, we find that the directions of the movement of the actual NVs are largely consistent with the directions of the motion vectors. However, the directions of the motion vectors in the northern part of the Kanto region are estimated to be toward northeast and not southeast. This results in another error because of the difference between the directions of the actual and estimated movements of the NVs.

Conclusions
This paper proposed a direct method for forecasting the output power of geographically distributed PV systems using their previous dataset. Learning methods, such as SVM and NN, have often been used for forecasting PV output. The proposed method is different from these approaches, as it estimates the movement of the geographical distributions of the NVs of the output power. To forecast the output power, the previous output power is first normalized. Then, the geographical distributions of the NVs of the respective PV systems are modified to aligned meshes, and a motion estimation technique is applied to forecast the future movements of the NVs. Finally, the future NVs are reconverted to the future output power of the respective PV systems. The averages of the MAPE and RMSE for the proposed method were, respectively, 4.23% and 0.69 kW. The errors occur in the processes of normalization and averaging in meshes and motion estimation, but the low MAPE and RMSE verify the high accuracy of this method. Thus, we believe the proposed method can be used to effectively forecast the output power of geographically distributed PV systems.
The following further studies should be conducted in the future: 1. Correlation between the available amount of data of PV systems and prediction accuracy: All the data we could obtain were used to forecast with the best prediction accuracy in this study. However, we hypothesize that the prediction accuracy depends on the available data. Therefore, it is important to investigate their relationship and to determine the minimum amount of data with which forecasting can be done with a tolerable accuracy. 2. The proposed method only forecasts the output power at 30 min ahead, which is the same period as the sampling interval of the data used for the forecast processes. Forecasting other time horizons and evaluating the corresponding error should be performed. 3. In the motion estimation, only the two historical points of PV outputs were used while assuming uniform linear motion, as shown in Figure 4. However, there is a possibility that using more history points could improve prediction accuracy, for example, by capturing spiral motion.

Conclusions
This paper proposed a direct method for forecasting the output power of geographically distributed PV systems using their previous dataset. Learning methods, such as SVM and NN, have often been used for forecasting PV output. The proposed method is different from these approaches, as it estimates the movement of the geographical distributions of the NVs of the output power. To forecast the output power, the previous output power is first normalized. Then, the geographical distributions of the NVs of the respective PV systems are modified to aligned meshes, and a motion estimation technique is applied to forecast the future movements of the NVs. Finally, the future NVs are reconverted to the future output power of the respective PV systems. The averages of the MAPE and RMSE for the proposed method were, respectively, 4.23% and 0.69 kW. The errors occur in the processes of normalization and averaging in meshes and motion estimation, but the low MAPE and RMSE verify the high accuracy of this method. Thus, we believe the proposed method can be used to effectively forecast the output power of geographically distributed PV systems.
The following further studies should be conducted in the future: 1.
Correlation between the available amount of data of PV systems and prediction accuracy: All the data we could obtain were used to forecast with the best prediction accuracy in this study. However, we hypothesize that the prediction accuracy depends on the available data. Therefore, it is important to investigate their relationship and to determine the minimum amount of data with which forecasting can be done with a tolerable accuracy.

2.
The proposed method only forecasts the output power at 30 min ahead, which is the same period as the sampling interval of the data used for the forecast processes. Forecasting other time horizons and evaluating the corresponding error should be performed.

3.
In the motion estimation, only the two historical points of PV outputs were used while assuming uniform linear motion, as shown in Figure 4. However, there is a possibility that using more history points could improve prediction accuracy, for example, by capturing spiral motion.