k-Nearest Neighbor Neural Network Models for Very Short-Term Global Solar Irradiance Forecasting Based on Meteorological Data

This paper proposes a novel methodology for very short term forecasting of hourly global solar irradiance (GSI). The proposed methodology is based on meteorology data, especially for optimizing the operation of power generating electricity from photovoltaic (PV) energy. This methodology is a combination of k-nearest neighbor (k-NN) algorithm modelling and artificial neural network (ANN) model. The k-NN-ANN method is designed to forecast GSI for 60 min ahead based on meteorology data for the target PV station which position is surrounded by eight other adjacent PV stations. The novelty of this method is taking into account the meteorology data. A set of GSI measurement samples was available from the PV station in Taiwan which is used as test data. The first method implements k-NN as a preprocessing technique prior to ANN method. The error statistical indicators of k-NN-ANN model the mean absolute bias error (MABE) is 42 W/m2 and the root-mean-square error (RMSE) is 242 W/m2. The models forecasts are then compared to measured data and simulation results indicate that the k-NN-ANN-based model presented in this research can calculate hourly GSI with satisfactory accuracy.


Introduction
Nowadays, forecasting global solar irradiance (GSI) is an essential task, particularly related to the increased use of photovoltaic (PV) solar energy as a power source.Forecasting GSI can be executed in different terms: long-term, medium-term, and short-term.Since solar power is categorized as an intermittent energy source, forecasting is paramount to regulate electricity loads in power networks.It also functions to optimize power delivery and unit commitment and by extension, it helps minimize the operating costs of power systems [1].With a forecast, it is expected that plant operation control systems can be improved, so as to balance power generation and load.Moreover, distribution of load, electric energy storage, and energy supply will be maximized and more reliable.
The performance of PV systems is heavily influenced by meteorological conditions such as temperature, global irradiation, humidity, wind speed and wind direction [2].The relation is clear: electrical energy generated by the PV solar depends on the amount of the GSI received by the PV panels.Solar irradiance absorbed in each PV panel varies depending on geographic location, time, and the absorption capacity of the PV panels.Previous studies have presented a variety of mathematical models for GSI forecasting in relation to meteorological variables.GSI forecasting with k-nearest neighbor (k-NN) statistical methods has been described [3,4].Hocao glu [3] and Pedro and Coimbra [4] presented modeling of solar irradiation with stochastic methods using a k-NN artificial neural network (ANN) at a PV station.Solar irradiation prediction is an important problem in geosciences, with direct applications in renewable energy, where the data in the form of time series can also be analyzed using a regression model as described in reference [5].Salcedo-Sanz et al. [5] worked on the prediction of daily global irradiation using a temporal Gaussian process, in which the study explains the suitability of Gaussian regression (GPR) for the estimation of solar irradiation compared to other machine learning regression algorithms.The GSI forecasting is not only used in stochastic modeling, but in other studies [6,7] attempted forecasting was analyzed using exponential smoothing combined with decomposition methods and least absolute shrinkage and selection operator model.Yang et al. [6,7] studied the forecasting of global horizontal irradiance by exponential smoothing using decompositions, while on another study, they developed the least absolute shrinkage and selection operator model using irradiance very short-term forecasting.Combining a forecasting model with GSI is important to get a better result, and forecasting GSI by the spatiotemporal pattern recognition method, ANN method, parametric models and decomposition models, has been described in [8][9][10].Spatiotemporal pattern recognition and nonlinear principal component analysis (PCA) for global horizontal irradiance forecasting has been proposed as well by Licciardi et al. [8].Amrouche and Le Pivert [9] have presented an ANN based on daily local forecasting for global solar radiation, describing a novel methodology for local forecasting of daily global horizontal irradiance (GHI).The methodology is a combination of spatial modelling and ANNs algorithm.Wong et al. [10], have presented solar radiation models based on parametric models and decomposition models for predicting the average daily and hourly global radiation, beam radiation and diffuse radiation.
Obtaining GSI forecasting for a PV station is also the subject of several studies.In the previous references the GSI forecasting was carried out without being influenced by the location of the PV station, but in [11,12] the GSI forecasting was very influenced by the Mediterranean location studied.To achieve multi-horizon irradiation forecasting for Mediterranean locations, time series models have been proposed by Paolia et al. [11].Lorenz et al. [12] presented irradiance forecasting for the power prediction of grid-connected PV systems.In addition to GSI forecasting using grid-connected systems there are also other studies that similarly use the grid-connected method but different locations, as explained in [13,14].Wang et al. [13] studied a short term solar irradiance forecasting model based on an ANN using statistical feature parameters.Another study on GSI forecasting using an ANN was performed by Mellit and Pavan [14], which they applied the prediction to a grid-connected PV plant located at Trieste, Italy.The application of a statistical method to detect the motion of cloud structures for surface irradiance is widely used for forecasting GSI as in the previous reference, and therefore in [15] optimization and operational validation of the GSI forecasting, i.e., short term forecasting of solar radiation using statistical methods to determine cloud motion vector fields have been proposed by Hammer et al.Xiao and Chaovalitwongse [16] have presented optimization models for decomposed nearest neighbor feature selection.Validation of short and medium term operational solar radiation forecasts in the US was studied by Perez et al. [17].Many of the development models done by other researchers for forecasting GSI, tried GSI forecasting with statistical methods of stochastic learning and the development of analytical models with time series as in [18,19], i.e., forecasting of global and direct solar irradiance using stochastic learning methods, ground experiments and the national weather service's (NWS) database have been proposed by Marquez et al. [18].Martin et al. [19] have presented GSI forecasting methods based on time series analysis that have been used to predict half daily values of solar irradiance for the next three days.GSI forecasting models developed by performing functional and fuzzy approach spatiotemporal model development have been used too, as described in [20,21].Boata and Gravila [20] have presented a functional fuzzy approach for forecasting daily global solar irradiation and very short term forecasting of the global horizontal irradiance using a spatiotemporal autoregressive model has been proposed by Dambreville et al. [21].Of the various modeling approaches for forecasting GSI in the previous references, namely developing and combining forecasting models for short-term forecasts, [22][23][24] also describe several methods for forecasting GSI, namely using support vector machines and space exponential smoothing models.A review of solar Energies 2017, 10, 186 3 of 18 irradiance forecasting methods and a proposition for small-scale insular grids has been provided by Diagne et al. [22].Short term solar power prediction using a support vector machine has been proposed Zeng et al. [23].Dong et al. [24] have presented a short term solar irradiance forecasting method using an exponential smoothing state space model.A comparative empirical study based on a short term wind speed forecating model has been presented by Ren et al. [25].Mellit et al. [26] and Farhad et al. [27] have presented ANN models for the prediction of solar radiation and a new bloggers classification approach with a hybrid k-NN and ANN model.Probabilistic solar power forecasting approaches based on k-NN kernel and selection of input parameters to model direct solar irradiance by using ANN have been proposed by Zhang and Wang [28] and López et al. [29].The aforementioned studies elaborated on GSI forecasting at one target PV station, but have yet to forecast GSI at PV stations surrounded by other PV stations.
This paper presents part of a study in process seeking to estimate and predict one hour or 60 min ahead global solar irradiation at PV stations for energy production.This part focuses on how to predict hourly GSI for the target PV station with the availability of a local database and based on meteorology data.In this study, a new hybrid methodology that combines k-NN modelling and ANN modelling algorithm has been developed.A k-NN-ANN method is used to forecast GSI at the target PV station by means of calculating k-NNs based on the Euclidean distance and then do the testing and training data.
The remainder of the paper is organized as follows: Section 2 describes the modelling and data from a PV station, Section 3 describes the methodology used, i.e., the k-NN and neural network models, while Section 4 presents modelling study cases for very short-term forecasting and their measured errors.Finally, Section 5 presents some concluding remarks.

Modelling and Data Description
The GSI measurements were performed continuously every 5 min for four hours.The dataset thus contains four hours of data (from 5:  The remainder of the paper is organized as follows: Section 2 describes the modelling and data from a PV station, Section 3 describes the methodology used, i.e., the k-NN and neural network models, while Section 4 presents modelling study cases for very short-term forecasting and their measured errors.Finally, Section 5 presents some concluding remarks.

Modelling and Data Description
The GSI measurements were performed continuously every 5 min for four hours.The dataset thus contains four hours of data (from 5:

k-Nearest Neighbor and Artificial Neural Network Model for Forecasting Global Solar Irradiance Based on Meteorological Data
This section explains the basic idea of the construct methodology for GSI prediction, namely the k-NN-ANN model.In this study, the subject is the central station, which it is surrounded by several other PV stations.The first purpose of this study is the improvement of forecasting results using the k-NN method combined with an ANN model method, and the process is then used to predict GSI output result of a PV station one hour or 60 min ahead based on meteorological data.
Simulation of the k-NN neural networks can be programmed in a few minutes after the recording of the first measurements.For the GSI forecasting, k-NN-ANN method employs past meteorological data (GSI, temperature, humidity, wind speed and direction).The forecast horizon is four hours in 5 min increments.The first step in developing a k-NN-ANN method is to develop the database of features that will be used for comparison with the current conditions and the forecast GSI a few ahead.

k-Nearest-Neighbors
The k-NN method is one of the simplest machine learning algorithm methods.The k-NN algorithm is a non-parametric method used for classification and regression.The output depends on whether k-NN is used for classification or regression: in used k-NN model classification is the value output is a class membership.An object validation is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k-NNs.If k = 1, then the object is simply assigned to the class of that single nearest neighbor.In a k-NN regression model, the value output is the property value for the object.This value result output training is the average of the values of its k-NNs.The k-NN model is applied to perform classification of objects based on learning data that were located closest to the object, and the method is considered the simplest among other methods [2].The main idea is that the k-NN algorithm uses a training set for data modelling.Then, the prediction of new points can be the average of the values of its k-NNs.The variables employed for modelling very short term forecasting have been described in the previous section.The k-NN predict is computed using the features assembled in the matrices in a two-step process.In the first step, we have been calculating the pre-defined distance between the variables in the new dataset (the optimalization or the testing sets and training sets) and the features in the previous dataset.For a given set of features S = {p 1 , ..., p n } in the new dataset with lengths N1, ..., Nn, the distances of the previous data are calculated.In the second step, choosing k-NNs and have k smallest distances from training test [26].The distance D is sorted in ascending order, and the first k elements ( ) and their associated k time stamps { } 1 τ ,..., τ k are extracted [2] Equation (1).To find the k-NN based on the Euclidean distance, this mathematical equation is used:

k-Nearest Neighbor and Artificial Neural Network Model for Forecasting Global Solar Irradiance Based on Meteorological Data
This section explains the basic idea of the construct methodology for GSI prediction, namely the k-NN-ANN model.In this study, the subject is the central station, which it is surrounded by several other PV stations.The first purpose of this study is the improvement of forecasting results using the k-NN method combined with an ANN model method, and the process is then used to predict GSI output result of a PV station one hour or 60 min ahead based on meteorological data.
Simulation of the k-NN neural networks can be programmed in a few minutes after the recording of the first measurements.For the GSI forecasting, k-NN-ANN method employs past meteorological data (GSI, temperature, humidity, wind speed and direction).The forecast horizon is four hours in 5 min increments.The first step in developing a k-NN-ANN method is to develop the database of features that will be used for comparison with the current conditions and the forecast GSI a few ahead.

k-Nearest-Neighbors
The k-NN method is one of the simplest machine learning algorithm methods.The k-NN algorithm is a non-parametric method used for classification and regression.The output depends on whether k-NN is used for classification or regression: in used k-NN model classification is the value output is a class membership.An object validation is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k-NNs.If k = 1, then the object is simply assigned to the class of that single nearest neighbor.In a k-NN regression model, the value output is the property value for the object.This value result output training is the average of the values of its k-NNs.The k-NN model is applied to perform classification of objects based on learning data that were located closest to the object, and the method is considered the simplest among other methods [2].The main idea is that the k-NN algorithm uses a training set for data modelling.Then, the prediction of new points can be the average of the values of its k-NNs.The variables employed for modelling very short term forecasting have been described in the previous section.The k-NN predict is computed using the features assembled in the matrices in a two-step process.In the first step, we have been calculating the pre-defined distance between the variables in the new dataset (the optimalization or the testing sets and training sets) and the features in the previous dataset.For a given set of features S = {p 1 , ..., p n } in the new dataset with lengths N 1 , ..., N n , the distances of the previous data are calculated.In the second step, choosing k-NNs and have k smallest distances from training test [26].The distance D is sorted in ascending order, and the first k elements D S (D S,1 ≤ D S,2 ≤ ... ≤ D S,k ) and their associated k time stamps {τ 1 , ..., τ k } are extracted [2] Equation (1).To find the k-NN based on the Euclidean distance, this mathematical equation is used: where d is the number of forecast instances in the optimization set.We can calculate the distance between two scenarios using some distance function d(x, y), where x, y are the matrix scenarios composed of N features x = {x 1 , ..., x N }, y = {y 1 , ..., y N }, N is the length of data, and the distance between the current performance and previous condition, w j is the weight value of the dependent variable members of k-NN (kernel function) and j is the order of the k-NN based on their distance from the current performance condition and which the nearest with used the lowest order (j = 1, ..., K).

k-Nearest Neighbor Modelling
In this research, to obtain the k-NN model forecast using the algorithm model proposed described above, several parameters need to be specified and the k-NN modelling process can be divided into five calculation stages.The procedure of k-NN for regression is as follows: (1) The matrix scenarios composed modelling stage which includes form d-dimensional feature vectors C or nxy from the historical data x: c = c 1 , c 2 , ..., c p and nxy = nxy 1 , nxy 2 , ..., nxy p or x: C = nxy = [x t , x t−1 , ..., x t−d+1 ]; Their corresponding successors are denoted as x h .They are given two pieces point c and nxy in a space vector of n-dimensional c (c 1 , c 2 , ..., c n ) and nxy (nxy 1 , nxy 2 , ..., nxy n ).(2) The distance calculate vector stage which includes form n-dimensional distance vector ds i j (c, nxy) = D i for each testing vector C or nxy by calculating the Euclidean distance between ds i j (c, nxy) = D i and the remaining: where D i is the value of the dependent variable historical data set and D j is the value of the dependent the nearest neighbor based on distance: where ds i j (c, nxy) is value of the dependent variable GSI, nxy i pj is the position coordinat PV station (magnitude of nearest neighbors) c i p is the d-dimensional feature vectors, The index j is the current condition, i is the historical dataset, k = K the number of elements in the nearest neighbors, q is the historical dataset and where x, y are scenarios composed of k features and p is the number of the k-NN based on their distance from the current condition (j) in which the nearest have the lowest order (p = 1, ..., k).
(3) Select the value distance of the k-NN stage, which includes the sort ds i j (c, nxy) in ascending order and select the first K entries as the nearest neighbors D k , k ∈ {1, ..., K}. (4) Select the best value of k used in modelling the k-NN stage, because a high k value will reduce the effect of noise on the classification, but it will make the boundaries between each classification becomes increasingly blurred.Form a kernel function: where K(j) = k i is the value k-NN (kernel function), i is the historical data set and the index j is the number of the k-NN based on their distance from the current condition (i) in which the nearest have the lowest order (j = 1, ..., K), and K is the length data sets, and the distance between the current and previous condition [25].(5) Calculate the final estimation stage, using Equation ( 5): where x h j is the magnitude of nearest neighbor j, j is the order of the nearest neighbors based on their distance from the current condition (h) in which the nearest have the lowest order (j = 1, ..., k), sumd is the karnel function, k is the length of data sets, and K j is the kernel function.

Artificial Neural Network
ANN is a mathematical method inspired by the structure and information processing of biological neural networks.ANNs are intelligent systems that have the capacity to learn, memorize and create relationships among data [30].ANN model is a combination of pattern recognition, deductive reasoning and numerical computations to simulate learning in the human brain.ANN consists of an interconnected many groups namely neurons, and its main task is processes information using a connection approach to computation.ANN consists of an interconnected many groups namely neurons, and its main task is processes information using a connection approach to computation.ANN models have been used to predict solar radiation data [24].The methodology is a promising alternative to traditional approaches for forecasting GSI, especially in cases where radiation measurements are not readily available.ANN models fundamentally comprise multiple connected neurons and nodes.The neural networks are considered as the member of the non-parametric techniques which are usually used for estimation and classification [25].The neurons have five basic components, i.e., input, weight-bias, threshold, summing junction and output, as illustrated in Figure 3. Neurons are arranged in three layers which consist of input, hidden and output.
Energies 2017, 10, 186 6 of 17 where x is the magnitude of nearest neighbor j, j is the order of the nearest neighbors based on their distance from the current condition (h) in which the nearest have the lowest order (j = 1, ..., k), sumd is the karnel function, k is the length of data sets, and Kj is the kernel function.

Artificial Neural Network
ANN is a mathematical method inspired by the structure and information processing of biological neural networks.ANNs are intelligent systems that have the capacity to learn, memorize and create relationships among data [30].ANN model is a combination of pattern recognition, deductive reasoning and numerical computations to simulate learning in the human brain.ANN consists of an interconnected many groups namely neurons, and its main task is processes information using a connection approach to computation.ANN consists of an interconnected many groups namely neurons, and its main task is processes information using a connection approach to computation.ANN models have been used to predict solar radiation data [24].The methodology is a promising alternative to traditional approaches for forecasting GSI, especially in cases where radiation measurements are not readily available.ANN models fundamentally comprise multiple connected neurons and nodes.The neural networks are considered as the member of the nonparametric techniques which are usually used for estimation and classification [25].The neurons have five basic components, i.e., input, weight-bias, threshold, summing junction and output, as illustrated in Figure 3. Neurons are arranged in three layers which consist of input, hidden and output.

k-Nearest Neighbor and Artificial Neural Network Modelling
The forecasting problem in this research was forecasting based on meteorological data study in progress to forecast GSI at a PV station for energy production by one hour or 60 min ahead by using the hybrid k-NN-ANN model.The procedure of k-NN-ANN model for GSI forecasting based on meteorological data has the following steps:

k-Nearest Neighbor and Artificial Neural Network Modelling
The forecasting problem in this research was forecasting based on meteorological data study in progress to forecast GSI at a PV station for energy production by one hour or 60 min ahead by using the hybrid k-NN-ANN model.The procedure of k-NN-ANN model for GSI forecasting based on meteorological data has the following steps: Step 1: Calculate the distance of each data parameter: and the weight distance is: • Ho i (13) where ds GI j (c, nxy) is the Euclidan distance GSI ds Ta j (c, nxy) is the Euclidan distance temperature ds Ho j (c, nxy) is the Euclidan distance humidity ds Ws j (c, nxy) is the Euclidan distance wind speed ds Wd j (c, nxy) is the Euclidan distance wind direct GI x is the weight distance at GSI Ta x is the weight distance at temperature Ho x is the weight distance at humidity Ws x is the weight distance at wind speed Wd x is the weight distance at wind direct GI is the global irradiance Ta is the temperature Ho is the humidity i is the historical data set j is the current condition, i is the historical dataset, k = K the number of elements in the nearest neighbors q is the historical dataset and where x, y are scenarios composed of k features and p is the number of the k-NN based on their distance from the current condition (j) in which the nearest have the lowest order (p = 1, ..., k).
Step 2: Calculate number of nearest neighbors: where k is the distance nearest neighbor, i is the historical dataset and n is the total number of features (i = 1, 2, ..., n) Step 3: Calculate the final estimation as: where x h j is the value magnitude of the k-NN j, j is the order of the nearest neighbors based on distance (h) in which the nearest have the lowest order (j = 1, ..., k), sumd is the karnel function, k is the length of data sets, and the distance between the current and previous condition and K j is the kernel function.
Step 4: Training data and testing data using ANN method are obtained from the following steps: a.
The final estimation of GSI from k-NN method is into training sets, validation sets, and test sets for ANN model.b.
Architecture model and training sets parameters (create and configure the neural network, initialize the weights layer and biases layer) are selected c.
Model GSI forecast using the training set are run d.
Model forecasting is validated using the validation set e.
Step b-e are repeated using different architectures model forecasting and training set parameters f.
The optimum model is chosen and inserted into training process using data from the training set and validation set g.
The ultimate forecasting model is assessed using the test set The forecast model designed for k-NN is described in Figure 5a.If the validation test for forecasting the GSI is successful and get more better the result, the forecast model could perform its designed function, otherwise one or more changes should be made during the previous process.The procedures are shown in Figure 5b.GSI forecasting process using hybrid k-NN-ANN algorithm can also be divided into two process: (1) k-NN modelling to determine the d-dimensional feature and n-dimensional distance for input data at the ANN process; (2) the ANN modelling for GSI forecasting.The procedures are displayed in Figure 5c.In this research, k-NN and ANN structure construction was programmed using MATLAB (R2013a) programming.MATLAB is been provided with some ANN tools.The procedures are displayed in Figure 5c.In this research, k-NN and ANN structure construction was programmed using MATLAB (R2013a) programming.MATLAB is been provided with some ANN tools.

Normalization
The normalization of data input is very important to obtain good results in the ANN method [9].In this research for analysis, it needs normalization data for training process the GSI forecasting 60 min ahead, as defined [31], which can be calculated by Equation ( 18

Normalization
The normalization of data input is very important to obtain good results in the ANN method [9].In this research for analysis, it needs normalization data for training process the GSI forecasting 60 min ahead, as defined [31], which can be calculated by Equation ( 18): Energies 2017, 10, 186 11 of 18 Let us denote GSI n dnorm and GSI n d be normalized target feature at frame index n and the d-GSI output, respectively.Let us also denote GSI max and GSI min be the maximum value and minimum values of the GSI, respectively.

Results and Discussion
This section discusses the result of k-NN-ANN model used to forecast future GSI by using a one hour ahead forecasting procedure.The procedure is described in Figure 5.We tested our model using previously described databases based on meteorology data, i.e., wind direction, wind speed, GI, temperature, and humidity, during the 1 h or 60 min ahead process.

Data
Using the k-NN-ANN model, it is expected that a valid GSI forecast result will be produced.The d-dimensional feature and n-dimensional distances based on the Euclidean (k-NN model) every hour for all of PV station are shown in Figure 6.From the simulation results using the k-NN method based on meteorological data consisting of global irradiance, temperature, humidity, wind speed and wind direction values, respectively.All of them are used as pre-processing data in the ANN method, which can be calculated by the polynomial Equation ( 19): f (x, y) = p 00 + p 10 x + p 01 y + p 20 x 2 + p 11 xy + p 02 y 2 (19) where f (x, y) is the GSI value of the k-NN method and variable x, y is the coordinate value's position of the PV station.
Example Figure 6a can be produced by polynomial Equation (20): f (x, y) = 10.74 + 0.05105x + 0.01365y + (−0.0014)x 2 + 0.0015xy + 0.002923y 2 (20) Example Figure 6b can be produced by polynomial Equation (21): In which the polynomial equation above is the result of the simulation on 5:20 a.m. for Figure 6a  and 8:00 a.m. hours for Figure 6b using the k-NN method for Station S. Moreover that polynomial equation also can be implemented for the other PV stations.
To validate the proposed method, GSI data of 60 min ahead of a PV station has been calculated using the process described in Section 3. The results are then compared with the actual data of the GSI at target PV station as described in Section 4. Table 2 shows the optimal k-NN parameters for the GSI forecast with meteorology data every hours from 5:20 a.m. to 8:00 a.m. on 8 June 2012.
Energies 2017, 10, 186 12 of 17 In which the polynomial equation above is the result of the simulation on 5:20 a.m. for Figure 6a and 8:00 a.m. hours for Figure 6b using the k-NN method for Station S. Moreover that polynomial equation also can be implemented for the other PV stations.
To validate the proposed method, GSI data of 60 min ahead of a PV station has been calculated using the process described in Section 3. The results are then compared with the actual data of the GSI at target PV station as described in Section 4. Table 2 shows the optimal k-NN parameters for the GSI forecast with meteorology data every hours from 5:20 a.m. to 8:00 a.m. on 8 June 2012.The k-NN-ANN modelling approach is unique since the neural network algorithm is firstly developed using the k-NN model based on meteorology data.Results from the k-NN model are then divided it into two sets of data: the training and the validation set.After the simulation test, the ANN model based on the k-NN results will be ready to use in 60 min ahead GSI forecasting.As explained previously, the optimization parameters for GSI forecasting were obtained from a dataset based on meteorology data.The calculated values of GSI were then compared with measured values (GSI) of For the comparison, k-NN-ANN model performed only 500 iterations for each learning period.Figure 7 shows the all the data for the three subsets: (a) training dataset; (b) validation dataset; and (c) test dataset.In this figure the plots show for GSI versus the time.For the ANN algorithm is initially constructed for the training based on the k-NN model data, and after this, its training is periodically as the database expands over time.
Energies 2017, 10, 186 13 of 17 The k-NN-ANN modelling approach is unique since the neural network algorithm is firstly developed using the k-NN model based on meteorology data.Results from the k-NN model are then divided it into two sets of data: the training and the validation set.After the simulation test, the ANN model based on the k-NN results will be ready to use in 60 min ahead GSI forecasting.As explained previously, the optimization parameters for GSI forecasting were obtained from a dataset based on meteorology data.The calculated values of GSI were then compared with measured values ( ) For the comparison, k-NN-ANN model performed only 500 iterations for each learning period.Figure 7 shows the all the data for the three subsets: The k-NN model database provides pretraining data, and then the database divided it into two sets of data; the training data set and the validations data set, after the validation test, the k-NN-ANN based model will be ready to use for forecasting the GSI, and the the accuracy of the model can be determined using the testing set data that has been gained from the training data process.For the present forecast application program, the ANN model has to start learning based on the training data set, and subsequently construct and sharpen the knowledge while ensuring the continuity of its task.For the process of testing the accuracy of the forecasting model that has been gained from the training process using backpropagation method.The amount of testing data used was 90% of the total.Note that for this validation, the k-NN-ANNs were trained only while data processing is done on the training data patterns with a target error of 0.001, learning rate of 0.1 and a maximum of 50 epochs.
Figure 8 shows a comparison of the two methods for the GSI data between actual data and k-NN-ANN method, where: (a) there is a better between actual and forecasting data for very short term GSI forecasting at Station target S; and (b) shows the normalized GSI curve using the k-NN-ANN method.The k-NN model database provides pretraining data, and then the database divided it into two sets of data; the training data set and the validations data set, after the validation test, the k-NN-ANN based model will be ready to use for forecasting the GSI, and the the accuracy of the model can be determined using the testing set data that has been gained from the training data process.For the present forecast application program, the ANN model has to start learning based on the training data set, and subsequently construct and sharpen the knowledge while ensuring the continuity of its task.For the process of testing the accuracy of the forecasting model that has been gained from the training process using backpropagation method.The amount of testing data used was 90% of the total.Note that for this validation, the k-NN-ANNs were trained only while data processing is done on the training data patterns with a target error of 0.001, learning rate of 0.1 and a maximum of 50 epochs.
Figure 8 shows a comparison of the two methods for the GSI data between actual data and k-NN-ANN method, where: (a) there is a better between actual and forecasting data for very short term GSI forecasting at Station target S; and (b) shows the normalized GSI curve using the k-NN-ANN method.
that for this validation, the k-NN-ANNs were trained only while data processing is done on the training data patterns with a target error of 0.001, learning rate of 0.1 and a maximum of 50 epochs.
Figure 8 shows a comparison of the two methods for the GSI data between actual data and k-NN-ANN method, where: (a) there is a better between actual and forecasting data for very short term GSI forecasting at Station target S; and (b) shows the normalized GSI curve using the k-NN-ANN method.To evaluate the performance of the models, a statistical error measurement was used in the experiment, namely the MABE, and RMSE.To evaluate the accuracy of each method to forecast the GSI  Figure 10 illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its comparison with actual data.It is evident that the k-NN-ANN model is in a good agreement with the measured data at the object station.
To evaluate the performance of the models, a statistical error measurement was used in the experiment, namely the MABE, and RMSE.To evaluate the accuracy of each method to forecast the GSI values, MABE, and RMSE coefficient between results of k-NN-ANN and actual ground measurements were calculated.These statistical error indicators validation forecasting are calculated according to Equations ( 22) and ( 23) and the results statistical error are shown in Table 3. Figure 10 illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its comparison with actual data.It is evident that the k-NN-ANN model is in a good agreement with the measured data at the object station.To evaluate the performance of the models, a statistical error measurement was used in the experiment, namely the MABE, and RMSE.To evaluate the accuracy of each method to forecast the GSI values, MABE, and RMSE coefficient between results of k-NN-ANN and actual ground measurements were calculated.These statistical error indicators validation forecasting are calculated according to Equations ( 22) and ( 23) and the results statistical error are shown in Table 3.   MABE is calculated according to Equation ( 21): RMSE is calculated according to Equation (22): where e(t) is the forecasting data and k(t) is the measured (observed) data: where G f ,i is forecasted value GSI and G m,i is measured value GSI, (i = 1, 2, ..., N), N is the number of the GSI data, i is the number index variations.

Conclusions
A new methodology for very short term (60 min ahead) GSI forecasting of a target PVstation has been introduced.In this work we propose a novel methodology for GSI forecasting using a combination of k-NN modelling and an ANN.The model estimates and predicts the GSI profiles of PV stations in very short term (60 min ahead) based on hourly meteorology data from eight surrounding PV stations.The following conclusions can be drawn from this research:


A different formulation for very short term GSI forecasting using k-NN-ANN modelling based on meteorology data is proposed.The proposed model attempting to shape the patterns of a polynomial equation shows that the proposed model forecasting is more better.The variable meteorology data weather is of great very importance and affects the resulting GSI forecasting output.


The new model proposed in this study is a combination of k-NN modelling and an ANN model.The model is employed to forecast GSI data for very short term period (60 min ahead) based on

Conclusions
A new methodology for very short term (60 min ahead) GSI forecasting of a target PVstation has been introduced.In this work we propose a novel methodology for GSI forecasting using a combination of k-NN modelling and an ANN.The model estimates and predicts the GSI profiles of PV stations in very short term (60 min ahead) based on hourly meteorology data from eight surrounding PV stations.The following conclusions can be drawn from this research:

•
A different formulation for very short term GSI forecasting using k-NN-ANN modelling based on meteorology data is proposed.The proposed model attempting to shape the patterns of a polynomial equation shows that the proposed model forecasting is more better.The variable meteorology data weather is of great very importance and affects the resulting GSI forecasting output.

•
The new model proposed in this study is a combination of k-NN modelling and an ANN model.The model is employed to forecast GSI data for very short term period (60 min ahead) based on meteorology data.This research concerns how to predict GSI data at a target PV station, which is surrounded by eight other PV stations.The study also considers the availability of a local measured database.It clearly shows that the GSI forecasting using a different k-NN-ANN model for every hour based on meteorological data giving a better result output, which means the GSI forecasting largely depends on variable meteorological data, where the meteorology data variables consist of GI, wind speed, wind direct, humidity and temperatures.

•
This paper utilises k-NN-ANN modelling to determine the d-dimensional features and n-dimensional distances.The results demonstrate that the results of the k-NN-ANN model are closely matched with the actual data, and are better than data obtained from k-NN models.
The novelty of this article is to predict GSI at a PV station which position is at the center surrounded by eight other adjacent PV stations.The proposed model is able to learn the characteristics of meteorology weather data for the past four hours and use the data as model input.In this paper, the proposed k-NN-ANN model has better approximation compared to k-NN model.The very short term forecast evaluations of the GSI using the k-NN-ANN model are performed for only four hours and the results show that the k-NN-ANN method is better than the k-NN method.The error statistical indicators of the k-NN model are 44 W/m 2 for the MABE and 251 W/m 2 for the RMSE.On the other hand, the error statistical indicators for the proposed model (k-NN-ANN model) are 42 W/m 2 (MABE) and 242 W/m 2 (RMSE).We noted that the highest RMSE was 191 (W/m 2 ) during the thirty-two hour 20 a.m. to 8:00 a.m.) collected on 8 June 2012 at nine PV station locations: Station A (0 • , 8.3 km), Station B (36 • , 10.5 km), Station C (93 • , 10.8 km), Station D (140 • , 10.5 km), Station E (180 • , 10 km), Station F (250 • , 4.3 km), Station G (280 • , 5.4 km), Station H (310 • , 9.1 km) and Station S (0 • , 0 km), located in Taipei, Taiwan.For this study, the very short-term forecasts of GSI were only provided by PV Station S which was located in the center.The nearest neighbouring locations surrounding Station S were Station A, Station B, Station C, Station D, Station E, Station F, Station G and Station H (Figure 1).

Figure 1 .
Figure 1.Modelling of Station S which is surrounded by eight other photovoltaic (PV) stations.ST: Station.

Figure 2
Figure 2 illustrates the GSI values as measured at Station S.These monitored data have been used to evaluate the methodology algorithm using k-NN-ANN as the proposed model for GSI forecasting.

Figure 1 .
Figure 1.Modelling of Station S which is surrounded by eight other photovoltaic (PV) stations.ST: Station.

Figure 2 17 Figure 2 .
Figure2illustrates the GSI values as measured at Station S.These monitored data have been used to evaluate the methodology algorithm using k-NN-ANN as the proposed model for GSI forecasting.Energies 2017, 10, 186 4 of 17

Figure 3 .
Figure 3. Basic structure architecture of a simple artificial neuron model [27].Artificial Neural Network Modelling The ANN algorithm model can be divided into four step: (1) the design model and input pattern step which includes the choice of the ANN model, the number of its layers ANN, the number of neuron groups in each layer ANN, its inputs and outputs, the choice of training set and validation set samples which use k-NN design; (2) the training set and testing set ANN forecasting based on meteorology data are presented to the ANN and the weights layer are adjusted accordingly till a predetermined condition is more better; (3) the simulation test passed step, in which the result ANN forecasting model is tested using measurement data at nine station PV which are not treated during the training step; and finally (4) the performance evaluation stage.Layered perceptron ANN model with different topology designs were considered in order to obtain the best mapping between the ANN algorithm inputs and outputs.The proposed model in this research is used to predict the GSI values for the next hours or one hour ahead, forecasting based

Figure 3 .
Figure 3. Basic structure architecture of a simple artificial neuron model [27].

Figure 4 .
Figure 4.The proposed forecasting model using a k-nearest neighbor and artificial neural network (k-NN-ANN) model.

Figure 4 .
Figure 4.The proposed forecasting model using a k-nearest neighbor and artificial neural network (k-NN-ANN) model.

Step 5 :
Perform predictions for future GSI data at the Station S with k-NN-ANN Energies 2017, 10, 186 10 of 18

Energies 2017 ,
10, 186 10 of 17 dimensional distance for input data at the ANN process; (2) the ANN modelling for GSI forecasting.

Figure 6 .
Figure 6.(a) d-Distance PV station based on the Euclidean (k-NN model) in time 5:20 a.m. on 8 June 2012; and (b) d-Distance PV station based on the Euclidean (k-NN model) in time 8:00 a.m. on 8 June 2012.
each station: Station S, Station A, Station B, Station C, Station D, Station E, Station F, Station G, Station H. Mean absolute bias error (MABE) and root-mean-square error (RMSE) were used as error statistical indicators.The estimation from the k-NN-ANN model was then compared with the k-NN model and mean average of meteorological data prediction.

GSI
of each station: Station S, Station A, Station B, Station C, Station D, Station E, Station F, Station G, Station H. Mean absolute bias error (MABE) and root-mean-square error (RMSE) were used as error statistical indicators.The estimation from the k-NN-ANN model was then compared with the k-NN model and mean average of meteorological data prediction.

Figure 7 .
Figure 7. (a) GSI for the training set; (b) GSI for the validation set; and (c) GSI for the testing set.

Figure 7 .
Figure 7. (a) GSI for the training set; (b) GSI for the validation set; and (c) GSI for the testing set.

Figure 8 .
Figure 8.Comparison of the two methods for the GSI data: (a) very short term GSI forecasting using the k-NN-ANN model versus actual data for station S; and (b) the normalized GSI curve using the k-NN-ANN method.

Figure 9 17 Figure 8 .
Figure 9 illustrates the comparison of GSI forecasting in a four hour window (5:20 a.m.-8:00 a.m.) on 8 June 2012, based on the k-NN-ANN model, actual data, and the k-NN method.The result shows that very short term forecasting simulation using k-NN-ANN method during a four hour window gives better results compared to the k-NN method.

Figure 9
Figure 9 illustrates the comparison of GSI forecasting in a four hour window (5:20 a.m.-8:00 a.m.) on 8 June 2012, based on the k-NN-ANN model, actual data, and the k-NN method.The result shows that very short term forecasting simulation using k-NN-ANN method during a four hour window gives better results compared to the k-NN method.

Figure 9 .
Figure 9. Very short term GSI forecasting in 5-h window based k-NN-ANN model, actual data, and k-NN method.

Figure 10
Figure10illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its comparison with actual data.It is evident that the k-NN-ANN model is in a good agreement with the measured data at the object station.

Figure 10 .
Figure 10.Very short term (60 min ahead) GSI forecast using k-NN-ANN versus actual data.

Figure 9 .
Figure 9. Very short term GSI forecasting in 5-h window based k-NN-ANN model, actual data, and k-NN method.

Figure 9 .
Figure 9. Very short term GSI forecasting in 5-h window based k-NN-ANN model, actual data, and k-NN method.

Figure 10 .
Figure 10.Very short term (60 min ahead) GSI forecast using k-NN-ANN versus actual data.

Figure 10 .
Figure 10.Very short term (60 min ahead) GSI forecast using k-NN-ANN versus actual data.

Figure 11 Figure 11 .
Figure 11.(a) MABE coefficients between actual data and GSI forecasts using the k-NN-ANN model on the test data set; and (b) RMSE coefficients between actual data and GSI forecasts using the k-NN-ANN model on the test data set.

Figure 11 .
Figure 11.(a) MABE coefficients between actual data and GSI forecasts using the k-NN-ANN model on the test data set; and (b) RMSE coefficients between actual data and GSI forecasts using the k-NN-ANN model on the test data set.
[29]d on a short term wind speed forecating model has been presented by Ren et al.[25].Mellit et al.[26]and Farhad et al.[27]have presented ANN models for the prediction of solar radiation and a new bloggers classification approach with a hybrid k-NN and ANN model.Probabilistic solar power forecasting approaches based on k-NN kernel and selection of input parameters to model direct solar irradiance by using ANN have been proposed by Zhang and Wang[28]and López et al.[29].The aforementioned studies elaborated on GSI forecasting at one target PV station, but have yet to forecast GSI at PV stations surrounded by other PV stations.This paper presents part of a study in process seeking to estimate and predict one hour or 60 min ahead global solar irradiation at PV stations for energy production.This part focuses on how to predict hourly GSI for the target PV station with the availability of a local database and based on meteorology data.In this study, a new hybrid methodology that combines k-NN modelling and ANN modelling algorithm has been developed.A k-NN-ANN method is used to forecast GSI at the target PV station by means of calculating k-NNs based on the Euclidean distance and then do the testing and training data.

Table 1 .
The design forecast is divided into two stages: (a) The first stage is calculating d-dimensional feature and n-dimensional distance based on the Euclidean (k-NN model) every hour for all of the PV stations.Data shown in Table 1 are the order parameters of each PV station, i.e., angle, distance and position coordinates.The table summarizes all possible combinations of variables to be considered as an input for k-NN method.(b) The second stage uses the ANN based on the assumption that the existing data input is a combination of the results obtained from the k-NN model.The research model proposed in this study seeks to estimate and predict a PV station production 60 min ahead, which position is located at the center and surrounded by eight other PV stations.Data position and coordinates of the PV stations.

Table 2 .
Optimal k-NN parameters for the GSI forecast with meteorology data.

Table 3 .
Error statistical indicators of the GSI forecasting models.MABE: mean absolute bias error; RMSE: root-mean-square error.

Table 3 .
Error statistical indicators of the GSI forecasting models.MABE: mean absolute bias error; RMSE: root-mean-square error.