Novel Mode Adaptive Artiﬁcial Neural Network for Dynamic Learning: Application in Renewable Energy Sources Power Generation Prediction

: A reasonable dataset, which is an essential factor of renewable energy forecasting model development, sometimes is not directly available. Waiting for a substantial amount of training data creates a delay for a model to participate in the electricity market. Also, inappropriate selection of dataset size may lead to inaccurate modeling. Besides, in a multivariate environment, the impact of di ﬀ erent variables on the output is often neglected or not adequately addressed. Therefore, in this work, a novel Mode Adaptive Artiﬁcial Neural Network (MAANN) algorithm has been proposed using Spearman’s rank-order correlation, Artiﬁcial Neural Network (ANN), and population-based algorithms for the dynamic learning of renewable energy sources power generation forecasting model. The proposed algorithm has been trained and compared with three population-based algorithms: Advanced Particle Swarm Optimization (APSO), Jaya Algorithm, and Fine-Tuning Metaheuristic Algorithm (FTMA). Also, the gradient descent algorithm is considered as a base case for comparing with the population-based algorithms. The proposed algorithm has been applied in predicting the power output of a Solar Photovoltaic (PV) and Wind Turbine Energy System (WTES). Using the proposed methodology with FTMA, the error was reduced by 71.261% and 80.514% compared to the conventional ﬁxed-sized dataset gradient descent-based training approach for Solar PV and WTES, respectively


Introduction
The application of Machine Learning (ML) has broken the barrier of correctly predicting different physical systems. The application of ML can be found in all sorts of industries. In today's world, renewable energy-based sources are an integrated part of almost any power system. The successful prediction of renewable energy sources is highly essential for successful electricity market participation. These sources are highly weather-dependent; therefore, predicting the plant output is a challenging task. The research and application of ML for renewable energy sources plant output prediction has been increased in recent years. The increase in the number of publications indirectly indicates the necessity of such research, development in the ML field, and improved computational capacities. In Sections 1.1 and 1.2, a brief discussion on the reviewed literature is conducted, where the necessity and scope for more research on this field can be found. The actual and abbreviated forms of the technical terms found in the reviewed literature and used in this paper are listed in Table 1.

Research Gaps and Motivation for the Proposed Methodology
The reviewed works have demonstrated the supremacy of ML/DL, and almost all the papers have given importance to the size of the dataset, forecasting duration, forecasting methodology, model performance, etc. But two important factors, which are dynamic learning (except References [14,17] for solar PV and Reference [34] for WTES), and determination of output impacting dominant input variables, also known as correlation analysis (except References [2,6,16,20,22] for solar PV and Reference [32] for WTES) were not covered. Also, none of them were applied together. The necessity of dynamic learning and correlation analysis is discussed in Sections 1.3.1 and 1.3.2. The discussed works are summarized in Table 2, featuring different categories, proposed, and compared methodologies, correlation analysis, and dynamic learning.

Necessity of Dynamic/Online Learning
Dynamic learning, which is commonly known as online learning, overcomes the problem of model training with a big fixed-sized dataset. The use of gradient descent backpropagation for neural network-based model training requires data division into multiple batches, as single batch training using all the samples in the training set may not guarantee an optimal model. On the other hand, dynamic learning trains the system with each entry of the data from the environment. An optimized model from the previous data entry can be a near-optimal solution point for the current data entry, making learning/training faster. The necessity of dynamic learning for renewable energy sources can be understood from Figure 1 and the related explanation.
Energies 2020, 13, x FOR PEER REVIEW 5 of 30 model from the previous data entry can be a near-optimal solution point for the current data entry, making learning/training faster. The necessity of dynamic learning for renewable energy sources can be understood from Figure 1 and the related explanation. If the model is trained using a "small" sized dataset, although the error of the model can be well below the acceptable error limit (black straight line), as the new data enters the system, due to lack of training on similar data, the error of the model may surpass the acceptable limit. Thus, in the future, a loss in profit may incur due to suboptimal performance. On the other hand, if a "big" sized dataset is chosen, the participant will lose its profit during the (Big-Critical) sized dataset duration by not participating in the market, as the model already reached its optimal state after the critical point (in terms of error performance). A dynamic learning approach can be adopted to participate in the market if the model error goes below the acceptable error limit. The model can stop participating, then retrain, and re-participate if the error gets larger. The conceptual diagram of the dynamic learning procedure can be understood from Figure 2. If the model is trained using a "small" sized dataset, although the error of the model can be well below the acceptable error limit (black straight line), as the new data enters the system, due to lack of training on similar data, the error of the model may surpass the acceptable limit. Thus, in the future, a loss in profit may incur due to suboptimal performance. On the other hand, if a "big" sized dataset is chosen, the participant will lose its profit during the (Big-Critical) sized dataset duration by not participating in the market, as the model already reached its optimal state after the critical point (in terms of error performance). A dynamic learning approach can be adopted to participate in the market if the model error goes below the acceptable error limit. The model can stop participating, then retrain, and re-participate if the error gets larger. The conceptual diagram of the dynamic learning procedure can be understood from Figure 2.  [10], Autoencoder [11], LSTM [10][11][12], ANFIS [13], ABP [14] GRNN [1], LR [2,7], persistent [2], LM [4,6], SVR, RT [5], BR, BFGSQN, RB etc. [6], M5PDT, GPR [7], BPNN, BPNN-GA, ENN, etc. [8], MLP [8,10], P-PVFM [10], SVM [12], PSO-ANN [13] Yes [2,6] Yes [14] Hybrid Models ANN-PSO with K-mean clustering [16], DELM and information fusion rule combined [15], SWT and RF combined [17], ML, Image Processing, and acoustic classification-based technique [18], MLSHM and Auto-GRU [19], General ensemble model with DL technique [20], GA-ANN [21], RCC-LSTM [22] is chosen, the participant will lose its profit during the (Big-Critical) sized dataset duration by not participating in the market, as the model already reached its optimal state after the critical point (in terms of error performance). A dynamic learning approach can be adopted to participate in the market if the model error goes below the acceptable error limit. The model can stop participating, then retrain, and re-participate if the error gets larger. The conceptual diagram of the dynamic learning procedure can be understood from Figure 2. As per the conceptual diagram, the forecasting model goes under the training/optimization cycle to reduce the error between the actual output from the physical system and predicted output from the forecasting model. The condition for training, training algorithms, etc., can be included inside the 'Forecasting model optimization' block. This block considers all the previous data entries for optimization, as consideration of the current state will produce an optimal model for current input data only, which will fail to perform efficiently in the presence of different data, or data similar to previous/historical entries. As per the conceptual diagram, the forecasting model goes under the training/optimization cycle to reduce the error between the actual output from the physical system and predicted output from the forecasting model. The condition for training, training algorithms, etc., can be included inside the 'Forecasting model optimization' block. This block considers all the previous data entries for optimization, as consideration of the current state will produce an optimal model for current input data only, which will fail to perform efficiently in the presence of different data, or data similar to previous/historical entries.

Necessity of Correlation Coefficient (CC) Analysis among the Input and Output Variables
Considering too many variables to develop a forecasting model increases the complexity. Also, all the variables may not have a dominant impact on the output. CC calculation can be a useful tool for determining dominant input variables. Application of Pearson correlation coefficient [38] analysis or sensitivity analysis on a fixed dataset was found in the literature as a CC analysis tool. Pearson correlation coefficient analysis works fine when a linear relationship between the considered variables (input and output) exists. Also, due to dependency on the data sample size, the use of CC analysis on a fixed-sized dataset may not represent the actual or stable relationship between the variables (input and output). To overcome the problem of dynamical selection of dominant variables, applications of tensor-based [39], autoregressive [40], and embedding-based methods [41] are found in a few recent studies. The applications are useful for multi-plant forecasting and change detection from multivariate time-series data from the smart grid considering the online/dynamic learning with smaller feature space. However, the use of Autoencoder [41] with a neural network-based forecasting model increases the complexity, as two different models (Autoencoder for feature extraction and neural network for forecasting) need to train on such a case. Also, Autoencoder does not guarantee the extracted features' orthogonality; therefore, collinearity issues may exist in the reduced feature space [41]. The use of tucker decomposition in the tensor-based method has limitations with rank estimation, and the stability of the proposed method needs to confirm with different rank values [39]. Although these methods are beneficial, there always remains some scope to work with different algorithms that are yet to be implemented and can be used effectively.

Contribution of This Research
Therefore, considering the discussed problems, in this work, a dynamic learning algorithm has been proposed to train a variable structured ANN, which changes according to the selected dominant input variables. Each combination of the input variables is defined as "Mode." Thus, the proposed algorithm has been named as "Mode Adaptive Artificial Neural Network (MAANN)." This adaptation of the Modes is made through dynamic CC analysis using Spearman's rank-order correlation [42] algorithm. Spearman's rank-order correlation algorithm performs robustly for both linear and nonlinear Energies 2020, 13, 6405 7 of 29 variables. Dynamic use of this algorithm helps the operator to choose dominant input variables and reveals the stability of the relationship between the variables. A stable relationship can be used to find the optimal size of the dataset or training termination criteria.
Also, in contrast to the conventional gradient descent and Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) training technique, in this work, dynamic learning methodology has been proposed using some recent metaheuristic algorithms. Metaheuristic algorithms overcome the premature convergence and local minimum trapping problem of the gradient descent algorithm. Also, metaheuristic algorithms iteratively improve the result using the population set. Additionally, in this work, to avoid premature convergence, modification in the population-based algorithms has also been introduced, which is described in Section 2.
Therefore, the contributions of the work can be summarized as follows: i. Determination of dominant input variables and data stability analysis using the dynamic application of Spearman's rank-order correlation. ii. Application of recent population-based algorithms for better accuracy of the forecasting models compared to the conventional approach (fixed-sized dataset training with gradient descent and ARIMAX algorithm). iii. Algorithm validation by application on two different types of renewable sources of different locations, with different numbers of input variables and dataset size.
The paper has been organized as follows: in Section 2, related theories for the proposed algorithm are discussed. The selection of the dominant input variables and dataset stability determination using the Spearman rank-order correlation is also discussed in Section 2. In Section 3, the flow chart of the proposed algorithm is presented in detail. Section 4 consists of a description of the sites and the dataset, experimental results discussion, and analysis. Finally, in Section 5, the conclusion and scopes for the future works are presented. Before explaining the major sections, the list of the symbols frequently used in this article has been listed and shown in Table 3. It should be mentioned that the solution particle x changes with the change in the optimization iteration cycle. Therefore, the changed values of solution positions are mentioned using the super and subscripts, which can be found in the descriptions of the optimization algorithms of Section 2.2.

Related Theories for the Proposed Methodology
ANN, metaheuristic algorithms, and Spearman's rank-order correlation are the heart of the proposed dynamic learning algorithm. Therefore, in this section, an introduction to the concepts are given.

Artificial Neural Network (ANN)
Among the different ANN structures found in many studies in the literature, feedforward neural network [1,3,14] has been used in this work. The internal weights and biases will be varied according to the choice of input variables by correlation analysis. Figure 3 shows the ANN model in a vector-matrix notational-based structure. It should be mentioned that the solution particle changes with the change in the optimization iteration cycle. Therefore, the changed values of solution positions are mentioned using the super and subscripts, which can be found in the descriptions of the optimization algorithms of Section 2.2.

Related Theories for the Proposed Methodology
ANN, metaheuristic algorithms, and Spearman's rank-order correlation are the heart of the proposed dynamic learning algorithm. Therefore, in this section, an introduction to the concepts are given.

Artificial Neural Network (ANN)
Among the different ANN structures found in many studies in the literature, feedforward neural network [1,3,14] has been used in this work. The internal weights and biases will be varied according to the choice of input variables by correlation analysis. Figure 3 shows the ANN model in a vectormatrix notational-based structure.
where is the ℎ sample input of the input variable . The hyperbolic tangent function has been used as the hidden layer activation function in this work [43]. In the output layer, a linear activation function is used. Once the neural network's predicted output is calculated using Equation (1), next, the error between the actual and predicted output should be calculated. In this work, to calculate the error during the learning process, Mean Squared Error (MSE) is considered: where, ̂ and are the predicted and actual output of the ℎ sample respectively, and is the number of training samples. The objective is to find the optimal values of weights { , }, and biases { , }, such that the error is reduced.

Optimization Algorithms
As discussed earlier, in this work, metaheuristic algorithms are used for the optimization (commonly known as training of ANN) of MAANN. Metaheuristic algorithms are population-based algorithms, and many of them are characterized by exploration and exploitation features. The exploration and exploitation feature improves optimization performance. The definition of where y j is the j th sample input of the input variable y. The hyperbolic tangent function has been used as the hidden layer activation function in this work [43]. In the output layer, a linear activation function is used. Once the neural network's predicted output is calculated using Equation (1), next, the error between the actual and predicted output should be calculated. In this work, to calculate the error during the learning process, Mean Squared Error (MSE) is considered: where,Ŷ j and Y j are the predicted and actual output of the j th sample respectively, and l is the number of training samples. The objective is to find the optimal values of weights {IW, OW}, and biases {IB, OB}, such that the error is reduced.

Optimization Algorithms
As discussed earlier, in this work, metaheuristic algorithms are used for the optimization (commonly known as training of ANN) of MAANN. Metaheuristic algorithms are population-based Energies 2020, 13, 6405 9 of 29 algorithms, and many of them are characterized by exploration and exploitation features. The exploration and exploitation feature improves optimization performance. The definition of exploration and exploitation properties can be found in Reference [44]. A general-purpose universal optimization strategy is impossible according to the No Free Lunch theory [45]; hence, there is always some scope to improve the result using new optimization techniques. Thus, three recent algorithms with fewer hyperparameters were chosen in this work. The chosen algorithms are: Jaya Algorithm (2016, hyperparameter: 0), Advanced Particle Swarm Optimization (APSO) (2018, hyperparameter: 3), and Fine-Tuning Metaheuristic Algorithm (FTMA) (2019, hyperparameter: 2).

Description of Jaya Algorithm
Jaya algorithm was proposed by Rao in 2015 [46]. It is a gradient-free algorithm and does not contain any hyperparameter. The essence of the Jaya algorithm is the Equation (3): where x k,i,g is the i th candidate's value of the k th variable at the g th iteration, x k,best,g and x k,worst,g are the best and worst values of the k th variable at the g th iteration, and x k,i,g+1 is the updated position/value of the candidate solution.

Description of APSO
PSO [47] is a population-based stochastic optimization technique developed by Eberhart and Kennedy in 1995, whereas APSO was proposed in 2018 by Khan et al. [48]. The velocity and position updating equations of APSO are shown in Equations (4) and (5), respectively. Inertia, w, in Equation (4) can be controlled according to Equation (6): Max gen The original velocity equation of PSO is modified by adding a third term on the right-hand side of Equation (4). The additional term is used to minimize the particles' positions iteratively by increasing the velocity to reach the optimal solution faster [48]. In Equations (4) and (5), x i g , v i g , and x i g+1 , v i g+1 are the i th particle positions and velocities in the g th and g + 1 th generations. A description of the remaining variables can be found in Table 3.

Description of FTMA
FTMA was proposed by Allawi et al. in 2019 [49] to solve global optimization problems. The fundamental equations describing the algorithm that is exploration, exploitation, and randomization are shown in Equations (7)-(9): where x exploration , x exploitation , and x randomization are the solutions obtained from the exploration, exploitation, and randomization stage, respectively. x g i , x g j , and x g best are the i th , j th , and best particle's solution from Energies 2020, 13, 6405 10 of 29 the g th generation.
x upper and x lower are the upper and lower bound of the search space. Based on the performance achieved by the solution obtained from the exploration stage, along with conditional random variables p and r [49], the exploitation and randomization stage will be performed. FTMA is faster than other algorithms because, if the solution is improved in one step, other steps can be avoided. Also, convergence towards the optimal solution is faster due to the application of 3 different position updating equations in each iteration. The flowcharts/pseudocodes of the population-based algorithms can be found in the corresponding literature [46][47][48][49]. As the population-based algorithms try to converge towards a smaller boundary or global solution with more and more training, early solution convergence may occur. This phenomenon of early convergence and a solution to this problem is discussed below.

Problem of Early Convergence and Solution
If a model train/retrains for each entry or new arrival of data, the optimal solution vectors of a model may trap into one another after a certain cycle. The trapping reduces the effective solution population size, which can be understood from Figure 4. Figure 4a shows the initial condition or any random state of the solutions within the preferred solution boundary (green line). The orange circle can be considered as the global solution boundary as more solutions tend to reach that ball. Once all the solutions have reached the ball after a certain iteration, to minimize the distance between the best solution, there will be a possibility that one or more solutions trap into another solution, and that will effectively reduce the number of solutions ( Figure 4b).
Energies 2020, 13, x FOR PEER REVIEW 10 of 30 along with conditional random variables and r [49], the exploitation and randomization stage will be performed. FTMA is faster than other algorithms because, if the solution is improved in one step, other steps can be avoided. Also, convergence towards the optimal solution is faster due to the application of 3 different position updating equations in each iteration. The flowcharts/pseudocodes of the population-based algorithms can be found in the corresponding literature [46][47][48][49]. As the population-based algorithms try to converge towards a smaller boundary or global solution with more and more training, early solution convergence may occur. This phenomenon of early convergence and a solution to this problem is discussed below.

Problem of Early Convergence and Solution
If a model train/retrains for each entry or new arrival of data, the optimal solution vectors of a model may trap into one another after a certain cycle. The trapping reduces the effective solution population size, which can be understood from Figure 4. Figure 4a shows the initial condition or any random state of the solutions within the preferred solution boundary (green line). The orange circle can be considered as the global solution boundary as more solutions tend to reach that ball. Once all the solutions have reached the ball after a certain iteration, to minimize the distance between the best solution, there will be a possibility that one or more solutions trap into another solution, and that will effectively reduce the number of solutions ( Figure 4b).
(a) (b) For example, if a system has a solution with a population size of 10, trapping into one of the solutions will reduce the effective population size to 9. Initially, trapping may suggest that the population vector has reached the global solution, but this may not always be true because metaheuristic algorithm performance is affected by the choice of hyperparameters. The selection of hyperparameters controls the speed of the population to move from one position to another. It may be possible that for the considered choice of hyperparameters, the solution vector may have missed any optimal location within the search space. Therefore, a modification in the application of metaheuristic has been proposed in this work; that is, whenever local trapping of one or more solutions occurs, a random solution against those solutions will be created within the search space, which in turn also create more opportunity to explore the search space. The modified portion of the algorithm is given in the Algorithm 1.  For example, if a system has a solution with a population size of 10, trapping into one of the solutions will reduce the effective population size to 9. Initially, trapping may suggest that the population vector has reached the global solution, but this may not always be true because metaheuristic algorithm performance is affected by the choice of hyperparameters. The selection of hyperparameters controls the speed of the population to move from one position to another. It may be possible that for the considered choice of hyperparameters, the solution vector may have missed any optimal location within the search space. Therefore, a modification in the application of metaheuristic has been proposed in this work; that is, whenever local trapping of one or more solutions occurs, a random solution against those solutions will be created within the search space, which in turn also create more opportunity to explore the search space. The modified portion of the algorithm is given in the Algorithm 1. for p = 1 : population 3: if J(p, g) == J(q, g), Where p q 4: J(p, g) = x min + rand(xmax − xmin) 5: end if 6: end for 7: end for

Spearman's Rank-Order Correlation Analysis
Spearman's rank-order correlation analysis between variables a and b can be mathematically expressed as: The variables a and b can be placed in two different columns of a table according to their sequence. The element of each variable a and b in the corresponding columns can be again ranked in ascending order, meaning that the highest and lowest value of a variable or column will be of the lowest and highest rank, respectively. The distance (d) of the variables a and b for any data j can be simply calculated by d j = rank(a j ) − rank(b j ). The numerical calculation procedure can also be obtained from Reference [50]. Spearman's correlation coefficient value varies between -1 (perfect negative correlation) to +1 (perfect positive correlation). The operator can set a threshold of CC value to choose a dominant variable. If the absolute CC value exceeds the threshold, then the corresponding variable can be chosen, or else, it can be discarded from the optimization process. Mathematically, In this work, upon the arrival of each new data, the CC value will be updated. The procedure of dynamic analysis of the CC to determine the dominant input variables can be understood from Figure 5.

Spearman's Rank-Order Correlation Analysis
Spearman's rank-order correlation analysis between variables a and b can be mathematically expressed as: The variables a and b can be placed in two different columns of a table according to their sequence. The element of each variable a and b in the corresponding columns can be again ranked in ascending order, meaning that the highest and lowest value of a variable or column will be of the lowest and highest rank, respectively. The distance ( ) of the variables a and b for any data can be simply calculated by = rank( ) − rank( ). The numerical calculation procedure can also be obtained from Reference [50]. Spearman's correlation coefficient value varies between -1 (perfect negative correlation) to +1 (perfect positive correlation). The operator can set a threshold of CC value to choose a dominant variable. If the absolute CC value exceeds the threshold, then the corresponding variable can be chosen, or else, it can be discarded from the optimization process. Mathematically, In this work, upon the arrival of each new data, the CC value will be updated. The procedure of dynamic analysis of the CC to determine the dominant input variables can be understood from Figure  5.  Figure 5 shows the change in CC between two variables with the change in data size. Analyzing the four quadrants of Figure 5a, it can be understood that, with the increase in data size (top left to bottom right), the CC also changes (from 0.82 to 0.27). The change in CC values as per the change in data size or increase in number of days can be graphically plotted as shown in Figure 5b, where a threshold parameter has been shown using a red straight line. With respect to the threshold parameter, with time, as the dataset's size grows, a variable may fluctuate around the state of dominance and non-dominance. By considering the state of dominance and non-dominance, the structure of the ANN can be adapted.  Figure 5 shows the change in CC between two variables with the change in data size. Analyzing the four quadrants of Figure 5a, it can be understood that, with the increase in data size (top left to bottom right), the CC also changes (from 0.82 to 0.27). The change in CC values as per the change in data size or increase in number of days can be graphically plotted as shown in Figure 5b, where a threshold parameter has been shown using a red straight line. With respect to the threshold parameter, with time, as the dataset's size grows, a variable may fluctuate around the state of dominance and non-dominance. By considering the state of dominance and non-dominance, the structure of the ANN can be adapted.

Proposed Algorithm for Dynamic Learning
In this section, the proposed algorithm is discussed, which is developed using the theories discussed in the previous section. The proposed algorithm is shown using the flowchart in Figure 6. The flowchart consists of 14 blocks, and in this section, the functions of each block are discussed briefly.
Energies 2020, 13, x FOR PEER REVIEW 13 of 30 Figure 6. Flow chart for the dynamic learning algorithm for renewable energy sources power generation prediction.

Blocks 2-6 (Data Collection and Sequential Entry, Feature Scaling and Correlation Analysis, Mode Consecutiveness Check)
In block 2, data from the electric grid (numerical weather data and synchronized plant power generation data) are taken and sequentially entered for dynamic learning purposes. Block 3 performs feature scaling to normalize the variables that are measured on different scales. Among many normalization methods, such as standard score, standardized moment, Min-Max feature scaling, etc. [51], Min-Max feature scaling has been selected. Mathematically, it can be expressed as: The maximum-minimum value of input ( , ) and output ( , ) parameters needs to be updated at each sample entry as system knowledge is not a priori. In block 4, to choose a Mode from all possible Modes, CC analysis among the variables must be conducted. The consecutiveness of a Mode is checked in block 5. By consecutiveness analysis, if a Mode remains stable/unchanged for

Block 1 (System Initialization)
In this block, the designer must decide the maximum possible number of input and output variables, the ANN network configuration (number of hidden layers, number of neurons in each layer, weights, biases, activation function, etc.), metaheuristic optimization hyperparameters, and total number of input combinations.
The total number of input combinations (Mode) can be mathematically expressed as: Total number o f Modes = n y=1 n C y = n C 1 + n C 2 + . . . . . . . . . + n C n where n is the number of input variables-the number of input variables in any 'Mode' may vary from one to n. A tabular description of the Mode is presented in Table 4.
variables-the number of input variables in any 'Mode' may vary ion of the Mode is presented in Table 4.
binations under different Modes for an arbitrary system. eights, biases, activation function, etc.), metaheuristic optimization hyperparameters, and mber of input combinations. e total number of input combinations (Mode) can be mathematically expressed as:

Inputs
is the number of input variables-the number of input variables in any 'Mode' may vary ne to . A tabular description of the Mode is presented in Table 4. layer, weights, biases, activation function, etc.), metaheuristic optimization hyperparameters, and total number of input combinations.
The total number of input combinations (Mode) can be mathematically expressed as: where is the number of input variables-the number of input variables in any 'Mode' may vary from one to . A tabular description of the Mode is presented in Table 4.  Table 4 can be interpreted as such that if Mode 1 is chosen, input 1 is solely responsible for the system's output change. Similarly, if Mode M is chosen, all the inputs are responsible for changes in the system output variable. layer, weights, biases, activation function, etc.), metaheuristic optimization hyperparameters, and total number of input combinations.
The total number of input combinations (Mode) can be mathematically expressed as: where is the number of input variables-the number of input variables in any 'Mode' may vary from one to . A tabular description of the Mode is presented in Table 4.  Table 4 can be interpreted as such that if Mode 1 is chosen, input 1 is solely responsible for the system's output change. Similarly, if Mode M is chosen, all the inputs are responsible for changes in the system output variable. layer, weights, biases, activation function, etc.), metaheuristic optimization hyperparameters, and total number of input combinations.
The total number of input combinations (Mode) can be mathematically expressed as: where is the number of input variables-the number of input variables in any 'Mode' may vary from one to . A tabular description of the Mode is presented in Table 4.  Table 4 can be interpreted as such that if Mode 1 is chosen, input 1 is solely responsible for the system's output change. Similarly, if Mode M is chosen, all the inputs are responsible for changes in the system output variable. Table 4 can be interpreted as such that if Mode 1 is chosen, input 1 is solely responsible for the system's output change. Similarly, if Mode M is chosen, all the inputs are responsible for changes in the system output variable.

Blocks 2-6 (Data Collection and Sequential Entry, Feature Scaling and Correlation Analysis, Mode Consecutiveness Check)
In block 2, data from the electric grid (numerical weather data and synchronized plant power generation data) are taken and sequentially entered for dynamic learning purposes. Block 3 performs feature scaling to normalize the variables that are measured on different scales. Among many normalization methods, such as standard score, standardized moment, Min-Max feature scaling, etc. [51], Min-Max feature scaling has been selected. Mathematically, it can be expressed as: The maximum-minimum value of input (X max , X min ) and output (Y max , Y min ) parameters needs to be updated at each sample entry as system knowledge is not a priori. In block 4, to choose a Mode from all possible Modes, CC analysis among the variables must be conducted. The consecutiveness of a Mode is checked in block 5. By consecutiveness analysis, if a Mode remains stable/unchanged for a predefined number of sample entries (z), the relation between an input and output variable can be considered to reach the steady-state condition. At this condition, the operator can stop the learning procedure (block 6) and conduct a comparative analysis of the model's performance. In the flow chart, blocks 5, 6, 13, and 14 are represented with the dotted box. This means these blocks are optional, as one can continue to train the model for an infinitely long sequence of data entry.

Blocks 7-9 (Mode Occurrence Check, Solution Search Space Generation, and Evaluation of Model Performance)
If a Mode occurs for the first time, the algorithm will move to block 8 and generate a random solution according to the random distribution equation [52]. To generate a randomly distributed population programmatically, the following equation can be used: where, → x the initial uniform randomly generated solution vector, rand no. o f solutionsX1 is a uniform distribution column vector, and x max and x min are boundary values of the initial search space.
If the Mode occurred for m th times (m|1 < m < z), the algorithm will move to block 9. Then, the model performance under the current data entry will be evaluated using the optimal solution from the Mode's previous occurrence, which increases the probability of faster convergence. Mathematically, Block 10 evaluates the error using MSE, and if MSE is found to be more than a predefined value (eph), the model will go into the optimization process in block 11 (tuning of MAANN weights and biases). If the error is less than the eph, the optimization will be skipped to avoid model overfitting and maintain the model generalization. Mode-specific optimal solution prevents the system from generating random solutions periodically; therefore, learning will be faster. The Mode-specific optimal results will be stored in Mode-specific ANN weights and bias matrix and vectors in block 12, which will be ready for reuse when new data arrives.
After stopping the learning procedure, the performance of different models (optimization methods) is evaluated in block 13, and the best model will be selected (block 14). For the model performance evaluation, along with the MSE, the following indexes are used: The standard deviation of the error, SDE = 1 l l j=1 Error j − Mean Error 2 (20) BIC [53] measures the distance between the actual data and the model. The lesser the distance, the better the model is. The second equation in (16) is the equivalent of the original equation for neural network-based application. For the population-based algorithms, complete or near convergence to a solution of all the particles can be defined as the optimal solution. At the optimal condition, the solution boundary will be significantly small, which also provides a smaller standard deviation. Hence, standard deviation is measured as an index to optimal solution convergence. R-value is used to demonstrate the fitting of the model. Finally, the MAE and SDE are used to measure the error of the model. The smaller the MAE and SDE, the better the model is.

Experimental Validations
In this work, the proposed algorithm's effectiveness has been demonstrated by applying two different renewable energy sources of two different places. The wind dataset has been collected from the NREL website [54] for 2012, near New Kirk. The original dataset consists of power, wind direction, wind speed, air temperature, surface air pressure, and air density. The later five (wind direction, wind speed, air temperature, surface air pressure, and air density) were considered as the input variables, and power was considered as the variable to be forecasted.
Similarly, the dataset was collected from the Republic of Korea's public data website [55] for the Yeongam F1 Stadium for the Solar PV system. The data is obtained for three years and ten months from January 2015 to October 2018. Inclined irradiance, surface temperature, and surrounding temperature from the dataset were considered as the input variables, while plant output power was considered as the predicting variable. The details of the dataset are given in Table 5.

Initialization of Experimental Setup
To initiate the experiments, the chosen ANN structure and metaheuristic algorithm hyperparameter settings are provided in Tables 6 and 7. These experiments have been conducted in a PC with Intel ® Core (TM) i7-6700 CPU @3.40 GHz processor in MATLAB 2018a.
The structure (number of weights, biases, hidden layers) choice of the activation function (hyperbolic tangent) has been taken the same as the gradient descent-based algorithm, which was developed and applied through the MATLAB built-in toolbox. Keeping the same structure makes the systems comparable on the same ground.

Number of Hidden Layers
Values (−5, 5) n × 10 10 10 1 1 According to Figure 6, the initial model's error should be more than a tolerance value (eph) to start the optimization. Error tolerance is a designer parameter, meaning that the designer can choose a relatively small value, below which the optimization cycle will not be initiated.
To choose the error limit/tolerance (eph), the minimum error achieved by applying a gradient descent technique on the ANN for the given dataset has been chosen. To do that, ANN has been trained exactly 100 times for each dataset, and the minimum error from that simulation is set as an error threshold for the metaheuristic optimization-based method. In Figure 7, the histogram of the error from the gradient descent-based training of the dataset is shown. From the figure, the minimum of the error for solar PV and WTES was found as 0.0682 and 6.2096 respectively, which are used as the error limit/tolerance (eph) for the metaheuristic-based model training.
start the optimization. Error tolerance is a designer parameter, meaning that the designer can choose a relatively small value, below which the optimization cycle will not be initiated.
To choose the error limit/tolerance ( ℎ), the minimum error achieved by applying a gradient descent technique on the ANN for the given dataset has been chosen. To do that, ANN has been trained exactly 100 times for each dataset, and the minimum error from that simulation is set as an error threshold for the metaheuristic optimization-based method. In Figure 7, the histogram of the error from the gradient descent-based training of the dataset is shown. From the figure, the minimum of the error for solar PV and WTES was found as 0.0682 and 6.2096 respectively, which are used as the error limit/tolerance ( ℎ) for the metaheuristic-based model training.

Dynamic Change in CC and Mode Analysis
The dynamic plotting of the actual and absolute value of the CC of power output to each input variable is presented in Figures 8 and 9. The CC threshold is also presented with the figures to indicate the change in dynamics of CC over time. For solar PV system, actual and absolute CC are both positive; however, for WTES, some quantities are found in the negative and positive range also. Therefore, to maintain uniformity across the models, absolute value has been considered.
Generally, CC between ±0.75 to ±1 is considered as very strong, ±0.50 to ±0.75 as moderate, ±0.25 to ±0.50 as weak, and below that has no association [56]. Therefore, in this work, the absolute threshold for CC analysis has been set as 0.25. Hence, any variable with the CC below 0.25 is discarded as a dominant input variable. Figure 8 shows that the absolute CC of inclined irradiance and surface temperature always remains above the threshold line. In contrast, the absolute CC for the surrounding temperature varies around the threshold line and becomes a non-dominant variable at the end.

Dynamic Change in CC and Mode Analysis
The dynamic plotting of the actual and absolute value of the CC of power output to each input variable is presented in Figures 8 and 9. The CC threshold is also presented with the figures to indicate the change in dynamics of CC over time. For solar PV system, actual and absolute CC are both positive; however, for WTES, some quantities are found in the negative and positive range also. Therefore, to maintain uniformity across the models, absolute value has been considered. For WTES, wind speed and air pressure were mostly above the threshold value, and that makes them dominant variables at the end of model training, whereas the other variables oscillate around the threshold, making them temporarily choice variables, but their dominance vanishes as the data size grows. According to actual CC analysis, pressure has a negative CC value, which means that power generation decreases with the increase in pressure. Generally, CC between ±0.75 to ±1 is considered as very strong, ±0.50 to ±0.75 as moderate, ±0.25 to ±0.50 as weak, and below that has no association [56]. Therefore, in this work, the absolute threshold for CC analysis has been set as 0.25. Hence, any variable with the CC below 0.25 is discarded as a dominant input variable. Figure 8 shows that the absolute CC of inclined irradiance and surface temperature always remains above the threshold line. In contrast, the absolute CC for the surrounding temperature varies around the threshold line and becomes a non-dominant variable at the end.
For WTES, wind speed and air pressure were mostly above the threshold value, and that makes them dominant variables at the end of model training, whereas the other variables oscillate around the threshold, making them temporarily choice variables, but their dominance vanishes as the data size grows. According to actual CC analysis, pressure has a negative CC value, which means that power generation decreases with the increase in pressure.  The data source countries' (South Korea/USA) weather can be classified according to 4 seasons (three months/season). Thus, the consecutiveness of a Mode for three months is considered as the stable dataset size and training stopping criteria. The training started from January's data, which is the second/third month of winter in the considered countries; thus, considering three months for stopping criteria overlaps two different seasons. The stopping criteria validate the stability of the dataset according to the use of CC. Analyzing the dynamic CC along with the threshold line, dynamic changes in the Modes can be obtained from Figure 10. For the solar PV system, occurred Modes are 5 (inclined irradiance, surrounding temperature), 7 (inclined irradiance, surface temperature, surrounding temperature), and 4 (inclined irradiance, For the solar PV system, occurred Modes are 5 (inclined irradiance, surrounding temperature), 7 (inclined irradiance, surface temperature, surrounding temperature), and 4 (inclined irradiance, surface temperature). Among them, the stable Mode was 4. For WTES, occurred modes are 31 (direction, speed, temperature, pressure, density), 24 (speed, pressure, density), and 11 (speed, pressure), 30 (speed, temperature, pressure, density). Among them, 11 is the stable one. The complete Mode table is given in the Appendix A, where Table A1 is for Solar PV, and Table A2 is for WTES. Greyed boxes represent the Modes that appeared during the training process.

Data Entry (Episode)-Wise Optimization Algorithms Performance Comparison
Each entry or arrival of new data and the corresponding cycle of model optimization has been defined as an episode. In Figure 11, episode-wise error reduction performance of the three algorithms has been shown simultaneously.  Table A1 is for Solar PV, and Table A2 is for WTES. Greyed boxes represent the Modes that appeared during the training process.

Data Entry (Episode)-Wise Optimization Algorithms Performance Comparison
Each entry or arrival of new data and the corresponding cycle of model optimization has been defined as an episode. In Figure 11, episode-wise error reduction performance of the three algorithms has been shown simultaneously.  Figure 11 shows that the models optimized with different algorithms are reaching the acceptable error performance towards the end of the training period. Due to the three levels of optimization process (randomization, exploration, exploitation), FTMA performs considerably better than the other two metaheuristic algorithms for both cases. Whereas, the Jaya algorithm with no hyperparameter being relatively simple could not reach better solutions than the other two algorithms. It can be mentioned that, at times, due to the appearance/reappearance of a new/previous mode, the error was increasing too. Because, at the first appearance of a new Mode, optimization  (randomization, exploration, exploitation), FTMA performs considerably better than the other two metaheuristic algorithms for both cases. Whereas, the Jaya algorithm with no hyperparameter being relatively simple could not reach better solutions than the other two algorithms. It can be mentioned that, at times, due to the appearance/reappearance of a new/previous mode, the error was increasing too. Because, at the first appearance of a new Mode, optimization starts with a random solution vector, which may not be optimal. Also, the reappearance of a Mode after a long time (meaning that not trained with the recent data entries) may produce a larger error. However, as the training dataset size increases, with multiple training loops, the error reduces gradually.
The error reduction performance of each optimization algorithm can be more clearly understood from the episode-wise winner analysis graph of Figure 12. Upon the arrival of each/set of new data, the optimization method with the best error reduction performance can be selected as the episode winner after performing the training. For episode-wise winner analysis, each algorithm was assigned with a weighted value. APSO, Jaya, and FTMA were assigned with 1, 2, and 3, respectively. For both cases (Solar PV and WTES), FTMA has won the maximum number of episodes. The numbers are shown in Tables 8 and 9.

Train/Test Status of the Algorithm
The train/test status indication figure proves the effectiveness of the proposed algorithm and FTMA. As the best error performance of the gradient descent algorithm has been taken as the error limit, therefore, if the error reduction by metaheuristic algorithms is found better, the MAANN will avoid optimization. This indicates that the trained models are showing better performance than the conventional approach (gradient descent). Figures 11 and 12 show that the FTMA outperforms the other two algorithms. Hence, the train/test status of the FTMA algorithm is only shown in Figure 13. Train/test status analysis is essential for understanding how the model can participate in the electricity market and earn more profits. The data specification table shows that the Solar PV data is collected for 3 years and 10 months, and WTES data were collected for one year. However, using the data stability condition, Solar PV and WTES training were stopped around the 290th and 177th days respectively, which is far less than the actual dataset size.
Also, from Figure 13, it can be found that, due to efficient training using the FTMA algorithm, the training cycle can be avoided in many instances (showing as 0 state in the figure). This means as the error is reduced, the model can successfully participate in the electricity market and earn profits. However, the participation should be stopped once the model performance exceeds the error threshold and reoptimizes it. Following that, if the next entries' error goes below the threshold quantity, the model will again participate in the market. This phenomenon can be explained by Figure  13b. For example, the model error of the WTES system around the 96-146th day is less than the threshold quantity; therefore, the model can participate in the market and earn a profit, but after the 146th day, the error again exceeds the threshold. Hence, the model should reoptimize again and stop participating in the electricity market. Again, around the 157th day, the error has been reduced to an acceptable limit. Consequently, the energy producer can participate in the market again.

Train/Test Status of the Algorithm
The train/test status indication figure proves the effectiveness of the proposed algorithm and FTMA. As the best error performance of the gradient descent algorithm has been taken as the error limit, therefore, if the error reduction by metaheuristic algorithms is found better, the MAANN will avoid optimization. This indicates that the trained models are showing better performance than the conventional approach (gradient descent). Figures 11 and 12 show that the FTMA outperforms the other two algorithms. Hence, the train/test status of the FTMA algorithm is only shown in Figure 13. Train/test status analysis is essential for understanding how the model can participate in the electricity market and earn more profits. The data specification table shows that the Solar PV data is collected for 3 years and 10 months, and WTES data were collected for one year. However, using the data stability condition, Solar PV and WTES training were stopped around the 290th and 177th days respectively, which is far less than the actual dataset size.
Also, from Figure 13, it can be found that, due to efficient training using the FTMA algorithm, the training cycle can be avoided in many instances (showing as 0 state in the figure). This means as the error is reduced, the model can successfully participate in the electricity market and earn profits. However, the participation should be stopped once the model performance exceeds the error threshold and reoptimizes it. Following that, if the next entries' error goes below the threshold quantity, the model will again participate in the market. This phenomenon can be explained by Figure 13b. For example, the model error of the WTES system around the 96-146th day is less than the threshold quantity; therefore, the model can participate in the market and earn a profit, but after the 146th day, the error again exceeds the threshold. Hence, the model should reoptimize again and stop participating in the electricity market. Again, around the 157th day, the error has been reduced to an acceptable limit. Consequently, the energy producer can participate in the market again.

Time Analysis of the FTMA
As metaheuristic algorithms are population-based methods, it may take a long time to reach an acceptable solution. Thus, the training time using the metaheuristic algorithms is always a concern for the designers. FTMA has better error reduction performance, and it is the winner of the maximum number of episodes; therefore, the time required for the algorithm with FTMA optimization is shown in Figure 14. It is understandable that, with the increase in data size, the training time will be increased. Thus, the help of the MATLAB parallel computation toolbox has been taken to reduce the optimization time. The use of the MATLAB parallel computing toolbox keeps the training time maximum of 91 seconds for the Solar PV and 168.3123 seconds for the WTES, whereas the mean training time was 26.13 and 48.84 seconds, respectively. The training time peaks occur due to the presence and removal of duplicate solutions to maintain the effective number of solutions uniform. It is found that the model average training time is much smaller than the data sample interval (1 hour for Solar PV and 5 minutes for WTES). Hence, the algorithm can be easily applied to the system.

Solution Convergence Analysis of Different Optimization Algorithms
The convergence of the solution vectors' population can be seen from the box-plot analysis of Figure 15. Compared to the APSO and Jaya algorithms, the population from the solution vector of FTMA has reached the optimal state for the Solar PV system. As the population of FTMA is located very near to each other, therefore, standard deviation for FTMA (0.0075) is much smaller than the

Time Analysis of the FTMA
As metaheuristic algorithms are population-based methods, it may take a long time to reach an acceptable solution. Thus, the training time using the metaheuristic algorithms is always a concern for the designers. FTMA has better error reduction performance, and it is the winner of the maximum number of episodes; therefore, the time required for the algorithm with FTMA optimization is shown in Figure 14. It is understandable that, with the increase in data size, the training time will be increased. Thus, the help of the MATLAB parallel computation toolbox has been taken to reduce the optimization time. The use of the MATLAB parallel computing toolbox keeps the training time maximum of 91 s for the Solar PV and 168.3123 s for the WTES, whereas the mean training time was 26.13 and 48.84 s, respectively. The training time peaks occur due to the presence and removal of duplicate solutions to maintain the effective number of solutions uniform. It is found that the model average training time is much smaller than the data sample interval (1 h for Solar PV and 5 min for WTES). Hence, the algorithm can be easily applied to the system.

Time Analysis of the FTMA
As metaheuristic algorithms are population-based methods, it may take a long time to reach an acceptable solution. Thus, the training time using the metaheuristic algorithms is always a concern for the designers. FTMA has better error reduction performance, and it is the winner of the maximum number of episodes; therefore, the time required for the algorithm with FTMA optimization is shown in Figure 14. It is understandable that, with the increase in data size, the training time will be increased. Thus, the help of the MATLAB parallel computation toolbox has been taken to reduce the optimization time. The use of the MATLAB parallel computing toolbox keeps the training time maximum of 91 seconds for the Solar PV and 168.3123 seconds for the WTES, whereas the mean training time was 26.13 and 48.84 seconds, respectively. The training time peaks occur due to the presence and removal of duplicate solutions to maintain the effective number of solutions uniform. It is found that the model average training time is much smaller than the data sample interval (1 hour for Solar PV and 5 minutes for WTES). Hence, the algorithm can be easily applied to the system.

Solution Convergence Analysis of Different Optimization Algorithms
The convergence of the solution vectors' population can be seen from the box-plot analysis of Figure 15. Compared to the APSO and Jaya algorithms, the population from the solution vector of FTMA has reached the optimal state for the Solar PV system. As the population of FTMA is located very near to each other, therefore, standard deviation for FTMA (0.0075) is much smaller than the APSO (13.0717) and Jaya algorithms (3.3917). For WTES, Jaya and FTMA are found to have a better

Solution Convergence Analysis of Different Optimization Algorithms
The convergence of the solution vectors' population can be seen from the box-plot analysis of Figure 15. Compared to the APSO and Jaya algorithms, the population from the solution vector of FTMA has reached the optimal state for the Solar PV system. As the population of FTMA is located very near to each other, therefore, standard deviation for FTMA (0.0075) is much smaller than the APSO (13.0717) and Jaya algorithms (3.3917). For WTES, Jaya and FTMA are found to have a better convergence state compared to APSO. Although FTMA has achieved the minimum error, to avoid duplicate solution, one of the solutions (red cross) appeared a little away from the solution cluster. Thus, the standard deviation of FTMA (65.0949) is more than the Jaya algorithm (12.9087).

Comparison of Training and Test Dataset
In this subsection, the training and testing set results have been shown in Figures 16 and 17, respectively. Blue lines are used as the original data, and the brown lines are used as the output of different trained models. The errors between the actual outputs and the trained model outputs are calculated using MSE, RMSE, and MAE. Corresponding values for training and testing datasets are shown in Tables 8 and 9. Also, using the original and trained model data, R-value is calculated using Equation (18) and presented in Tables 8 and 9. For both cases, FTMA and APSO perform relatively better than Jaya and gradient descent-based algorithms. In addition to the gradient descent-based method, the algorithms were compared with ARIMAX, a non-neural network-based algorithm. ARIMAX was implemented with the econometric toolbox in MATLAB.
As in this work, a MATLAB-based toolbox is used for base case (gradient descent) training, and by default, the toolbox partitions 15% of the data as a testing dataset. Therefore, whenever the algorithm termination criteria are satisfied based on the data stability, the algorithms' performances are validated by applying to the testing dataset, which is nothing but 15% of additional data points of the ending point of the training data. For example, Solar PV system training stopped at the 290th day; hence, the testing horizon is considered as 290 + 290 × 0.15 ≈ 334 days, and for WTES, 177 + 177 × 0.15 ≈ 204 days. Therefore, in Figure 16, the performance of the trained model on the training dataset (0-290th day for Solar PV and 0-177th day for WTES), and in Figure 17, the performance of the same model on the testing dataset (290-334th day for Solar PV and 177-204th day for WTES) has been shown. It should be mentioned that the testing data horizon is a designer parameter that can be varied, and the performance of the algorithm can be analyzed too. In practice, the training and testing cycle should occur according to the methodology discussed in Section 4.4.

Comparison of Training and Test Dataset
In this subsection, the training and testing set results have been shown in Figures 16 and 17, respectively. Blue lines are used as the original data, and the brown lines are used as the output of different trained models. The errors between the actual outputs and the trained model outputs are calculated using MSE, RMSE, and MAE. Corresponding values for training and testing datasets are shown in Tables 8 and 9. Also, using the original and trained model data, R-value is calculated using Equation (18) and presented in Tables 8 and 9. For both cases, FTMA and APSO perform relatively better than Jaya and gradient descent-based algorithms. In addition to the gradient descent-based method, the algorithms were compared with ARIMAX, a non-neural network-based algorithm. ARIMAX was implemented with the econometric toolbox in MATLAB.
As in this work, a MATLAB-based toolbox is used for base case (gradient descent) training, and by default, the toolbox partitions 15% of the data as a testing dataset. Therefore, whenever the algorithm termination criteria are satisfied based on the data stability, the algorithms' performances are validated by applying to the testing dataset, which is nothing but 15% of additional data points of the ending point of the training data. For example, Solar PV system training stopped at the 290th day; hence, the testing horizon is considered as 290 + 290 × 0.15 ≈ 334 days, and for WTES, 177 + 177 × 0.15 ≈ 204 days. Therefore, in Figure 16, the performance of the trained model on the training dataset (0-290th day for Solar PV and 0-177th day for WTES), and in Figure 17, the performance of the same model on the testing dataset (290-334th day for Solar PV and 177-204th day for WTES) has been shown. It should be mentioned that the testing data horizon is a designer parameter that can be varied, and the

Tabular Comparison
The performance and robustness of the algorithm and the optimization methods are verified using multiple performance indexes (MSE, RMSE, SDE, MAE, BIC, Standard Deviations, R-value, and the number of episode winners) and shown in Tables 8 and 9. The grey boxes show the best performance achieved among the algorithms, where RMSE is nothing but the square root of MSE, because the unit of MSE is the square unit of the output quantity (MW 2 ), where RMSE has the same as the output (MW), which makes it easily comprehensible. In terms of MSE, RMSE, MAE, SDE, and R-value, and the number of episode winners, FTMA has performed better than other algorithms for both cases (Solar PV and WTES). As mentioned in Section 3, the smaller the BIC is, the better the model is. However, looking carefully at the Table 8 shows that the Solar PV system's BIC order is negative (−1 × 10 4 ). As the sign convention is negative, the bigger the positive value inside the table, the better the model will be. Hence, FTMA was found to be better in terms of BIC for the solar PV system. However, for BIC of the WTES system, ARIMAX has performed better than other algorithms.
Also, the standard deviation as an index for the convergence of the optimization algorithms shows that FTMA has performed better in the case of Solar PV systems, whereas in the case of the WTES, the Jaya algorithm performed better. As the gradient descent-based algorithm and ARIMAX are not a population-based methodology, therefore, the standard deviation and episode-wise winner block is kept blank. By numerical calculation, it can also be found that error using FTMA is reduced, 0.0682−0.0196 0.0682 × 100 = 71.261% 6.2096−1.21 6.2096 × 100 = 80.514% of the gradient descent algorithm in the case of Solar PV and WTES, respectively.  Therefore, it can be said that the proposed algorithm with a suitable optimization algorithm can perform better than the conventional gradient descent or ARIMAX algorithm, which is FTMA in this case study. The major advantage is, as the proposed algorithm is a dynamic approach, the model can start participating in the electricity market from the beginning depending on the error performance. It is also found that training accuracy is increased, and learning time is decreased (train/test status) due to dynamic learning.

Conclusions and Future Work
In this work, a Mode Adaptive Artificial Neural Network has been proposed for the dynamic learning of renewable energy sources power generation prediction. The dynamic learning using Spearman's rank-order correlation provides a logical solution to choose the correct dominant inputs over time, which is found as a suitable tool for data stability analysis, too. As the model dynamically learns the system over a long period, therefore, different forecasting horizons (short, medium, long)-based modeling strategies can be avoided. FTMA was found better among the optimization algorithms due to its three-level algorithmic architecture, and provided better accuracy in every cycle.
The use of multiple performance indexes shows the robustness of the proposed algorithm using FTMA over other algorithms. Also, using FTMA, error has been reduced by 71.261% and 80.514% compared to the fixed-sized data-based trained model using the gradient descent algorithm. This also proves the benefit of dynamic/online learning over a conventional fixed-sized data-based learning approach.
One of the major concerns of the proposed algorithm is the choice of optimization algorithm and the respective hyperparameters. An increase in population and generation number may yield better results but at the cost of time. In the future, unlike this work, the verification of the proposed model should be conducted for multi-plant forecasting model development. In this work, testing was performed for instantaneous values of the input variables, which in the future can be easily extended for delayed input variables by performing auto-and cross-correlation.
In the future, this algorithm can be applied with the actual system installed in any geographical location. The model will be trained and participate in the electricity market without prior knowledge or training. In this work, the considered input variables were all continuous; in the future, systems with discrete variables can be considered by incorporating discrete to continuous CC analysis tools. Also, in contrast to the current approach, in the future, a unified feature reduction and forecasting approach can be taken, which will reduce the model complexity. The proposed algorithm can also be applied to any general case of real-time model learning, such as the share market. The algorithm can also be applied for load and EV modeling, as they are dependent on many external factors (variables). In brief, it can be concluded that the proposed algorithm can break the barrier of fixed-sized data training and help the operator to gain more profits by early participation in the market. Also, as in this problem, ANN's simple architecture has been adopted, therefore, it can be applied in any developing/under-developing countries where high processing computers are expensive for DL technologies.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Mode selection table for Solar PV system.

Mode Number Inputs
Inclined Irradiance Surface Temperature Surrounding Temperature

Mode Number Inputs
Direction Speed Temperature Pressure Density