Our proposed method uses an “ensemble” of different machine learning algorithms to generate the probabilistic forecasts. In [

25], Bell and Koren presented the first-prize winning method in the $1 million Netflix prize challenge and observed that an ensemble approach using different predictors offered the best results. The reason behind this phenomenon is that each machine learning algorithm performs well only with specific type(s) of data. For example, SVM and ANN perform better with multi-dimensional data and continuous features. On the other hand, decision tree and rule-based learners perform well with categorical data. Likewise, SVM and ANN models perform at their best when dealing with large sample sizes, whereas naive Bayes models require only a small sample size [

26]. Thus, in order to take advantage of the strengths of various algorithms in various situations, ensemble methods are employed.

In this work, we follow an ensemble regression strategy for forecasting. First, we group the data based on zones and hours of the day. Then, we generate the point forecasts using seven individual machine learning-based regression models. Finally, we combine those point forecasts into the probabilistic forecasts using three different ensemble methods.

#### 4.1. Grouping of Data

As mentioned above in

Section 3, the data consist of those from 3 different solar farms (zones), and the power generated at each zone differs in magnitude. To avoid large fluctuations in the output values, the data are grouped based on zones. The solar power generated varies throughout the day, going to zero during the night. Hence, within each zone, the data are further grouped by each hour of the day. This gives us 24 different sets of data in each of the 3 zones (i.e., 24 × 3 = 72 datasets in total).

Figure 1 shows the values for the month of April 2012 in Zone 1. We can see that the values oscillate between 0 and 1 consistently throughout the month.

The dataset contains 12 input variables. Let X be a $(72t\times 13)$ matrix, where the 13th column is the output variable, i.e., solar power generated. Matrix X contains the data from the three different solar farms, Zones 1, 2 and 3, and t is the number of days in the training dataset.

Initially, the data are grouped based on the zone. Let ${X}_{z}$ denote the data from each zone where $z\in \{1,2,3\}$. ${X}_{z}$ is a $(24t\times 13)$ matrix. ${X}_{z}$ in turn contains the data for each of the 24 h in a day. Grouping ${X}_{z}$ based on each hour would give us the matrix ${X}_{zh}$, where z represents the zone and h represents the hour. ${X}_{zh}$ is a $(t\times 13)$ matrix.

At the end of the grouping process, we have 24 different datasets in each of the three zones, which results in 72 datasets in total. Hence, when we say, for instance, that the decision tree regressor is used to generate the point forecast, it means that 72 decision tree sub-models are built from 72 different datasets, and among them, a particular sub-model corresponding to the test instance at hand is selected to perform the point forecasting. For example, if we are to point forecast the solar power generated in Zone 1 at Hour 5 for a particular day, among the 72 different decision tree sub-models, we select the one built only from the historical data recorded at Hour 5 in Zone 1 in the training dataset.

For each sub-model dedicated to zone z and hour h, for the training phase, the input is a matrix ${X}_{zh}$ of size $(t\times 13)$, where t is the number of days in the training dataset, and the output is a regression model. For the testing (forecasting) phase, the input is a matrix ${F}_{zh}$ of size $(d\times 13)$ with its 13th column withheld, and the output is a matrix P of size $(d\times 1)$, where d is the number of days the forecasting is to be made. The training and testing processes are to be repeated 72 times to cover all of the combinations of zones and hours.

#### 4.3. Generating Probabilistic Forecasts

We propose three different ensemble methods to generate the probabilistic forecasts using the point forecasts from the 7 machine learning models mentioned above.

#### 4.3.1. Method I: Linear Method

The linear method is used to generate the 99 percentiles where the first percentile is the lowest among the point forecasts and the 99th percentile is the highest. The i-th percentile of a distribution is a number such that approximately i percent of the values in the distribution are equal or less than that number. For example, if we say that 12 is the 80th percentile of a distribution, then that means approximately 80% of the numbers in that distribution are less than or equal to 12.

Let

${x}_{1},{x}_{2},\dots ,{x}_{n}$ be a set of values where

n represents the total number of observations, which are point forecasts in our case. Here,

$n=7$ because we use 7 individual machine learning models to generate 7 distinct point forecasts. Following is the linear interpolation method, adapted from [

34], to calculate the percentiles. First, we sort the data such that

${x}_{1}$ is the smallest value and

${x}_{n}$ is the largest. Then, we calculate the relative index of the

i-th percentile, denoted as

${r}_{i}$, for

$i=1,\dots ,99$ using Equation (

1).

If

${r}_{i}$ is an integer, then

${x}_{{r}_{i}}$ will be the

i-th percentile value. If

${r}_{i}$ is not an integer, then we can separate it into the integer part

k and the fractional part

f, respectively. Then,

${p}_{i}$, the interpolated

i-th percentile value is calculated using Equation (

2). We regard

${x}_{0}={x}_{1}$ and

${x}_{n+1}={x}_{n}$, respectively.

#### 4.3.2. Method II: Normal Distribution Method

Let the

n point forecasts generated using

n regression models be represented as

${x}_{1},{x}_{2},\dots ,{x}_{n}$ with a mean

μ and standard deviation

σ (note:

$n=7$ in our case). For

$i=1,\dots ,99$, finding the

i-th percentile value

${p}_{i}$ is the same as finding

${p}_{i}$ such that

$P(X<{p}_{i})=i/100$. For that, we find the corresponding Z value, denoted as

${z}_{i}$, using the Z table or standard normal table [

35] by looking for the table entry that is closest to

$i/100$. Once we have the values of

μ,

σ and

${z}_{i}$, that of

${p}_{i}$ can be calculated using Equation (

3).

#### 4.3.3. Method III: Normal Distribution Method with Additional Features

This method is similar to Method II, but now, we add two additional sets of regression models along with the original model set. In the first additional model set, we use an additional feature “month” of the year along with the existing 12 features. In the second additional model set, only the most recent 30 days of data (instead of the whole of the available training data) are considered to carry out the forecasts. All 7 individual machine learning regression models are deployed for both additional model sets. This results in $n=21$ regression models in total (7 for the original model set + 7 for the first additional model set + 7 for the second additional model set). Having more data points ($n=21$) helps smoothen the percentile curve when compared to those curves in Methods I and II, where fewer data points are available ($n=7$).