A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization

Lachheb, Aymen; Akoubi, Noureddine; Ben Salem, Jamel; El Amraoui, Lilia; BaQais, Amal

doi:10.3390/en18225941

Open AccessArticle

A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization

by

Aymen Lachheb

¹

,

Noureddine Akoubi

¹

,

Jamel Ben Salem

¹,

Lilia El Amraoui

^2,*

and

Amal BaQais

³

¹

Research Laboratory in Smart Electricity and ICT (SE&ICT), LR18ES44, National Engineering School of Carthage, University of Carthage, Charguia II, Tunis 2035, Tunisia

²

Department of Electrical Engineering, College of Engineering, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

³

Department of Chemistry, College of Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(22), 5941; https://doi.org/10.3390/en18225941

Submission received: 18 September 2025 / Revised: 23 October 2025 / Accepted: 28 October 2025 / Published: 12 November 2025

Download

Browse Figures

Versions Notes

Abstract

This paper presents a novel, robust, and reliable control strategy for renewable energy production systems, leveraging artificial neural networks (ANNs) to optimize performance and efficiency. Unlike conventional ANN approaches that rely on perturbation-based methods, we develop a fundamentally different ANN model incorporating equilibrium points (EPs) that achieve superior regulation of photovoltaic (PV) systems. The efficacy of the proposed approach is evaluated through comparative analysis against the conventional control strategy based on perturb and observe (MPPT/PO), demonstrating a 3.3% improvement in system efficiency (98.3% vs. 95%), a five times faster response time (6 s vs. 30 s), and six-fold reduction in voltage ripple (1% vs. 5.95%). A critical aspect of ANN-based controller design is the learning phase, which is addressed through the integration of deep reinforcement learning (DRL) for primary PV system control. Specifically, a hybrid control architecture combining the Artificial Neural Network based on Equilibrium Points (ANN/EP) model with DRL (ANN/PE-RL) is introduced, utilizing a synergistic integration of two reinforcement learning agents: Twin Delayed Deep Deterministic Policy Gradient (TD3) and Deep Deterministic Policy Gradient (DDPG). The TD3-based hybrid approach achieves an average reward value of 434.78 compared to 422.767 for DDPG, representing a 2.84% performance improvement in tracking maximum power points under imbalanced conditions. This hybrid approach demonstrates significant potential for improving the overall performance of grid-connected PV systems, reducing energy losses from 1.95% to below 1%, offering a promising solution for advanced renewable energy management.

Keywords:

artificial neural network (ANN); ANN/PE; MPPT/PO; photovoltaic systems; reinforcement learning; agents RL; TD3 and DDGP agents; optimization of energy production

1. Introduction

Electric energy is a critical driver of economic development in nations worldwide, playing a fundamental role across industrial, agricultural, transportation, and service sectors [1]. Increasing energy production contributes directly to economic growth, wealth creation, and improvements in the quality of life [2,3,4].

In this context, diversifying energy production sources has become a strategic priority. It reduces the risk of supply disruptions, enhances energy security, provides greater flexibility to accommodate fluctuating energy demands, and lowers electricity generation costs [4,5].

Despite significant progress in MPPT and intelligent control techniques, several challenges remain unaddressed in photovoltaic (PV) systems. Conventional methods such as Perturb and Observe (PO) or Incremental Conductance (IC) suffer from oscillations around the maximum power point (MPP) and poor adaptability under fast-changing irradiance. Intelligent approaches like fuzzy logic and standard Artificial Neural Networks (ANNs) have improved tracking accuracy but still depend heavily on data quality and may experience slow convergence or instability when exposed to untrained conditions.

These limitations highlight the need for a more robust and adaptive control framework capable of ensuring both fast convergence and strong generalization.

While various intelligent methods have been proposed for MPPT, they often fall into two categories with inherent limitations. On one hand, simple Artificial Neural Network (ANN) controllers, such as the ANN variant, offer fast tracking but often suffer from poor generalization and can get trapped in local optima under rapidly changing conditions. On the other hand, traditional Deep Reinforcement Learning (DRL) models provide high adaptability but require extensive, complex reward engineering and can struggle with the smooth, high-speed regulation required for MPPT. This paper addresses this crucial gap by proposing a novel hybrid strategy, the ANN/PE-RL. This architecture uniquely leverages the ANN/PE model’s inherent stability and robust regulation capabilities to handle the steady-state control, while simultaneously utilizing DRL’s model-free, adaptive learning for the primary, dynamic control phase. This synergistic collaboration provides an intelligence layer that is both highly robust and highly adaptive, overcoming the limitations of previous pure-ANN or pure-DRL approaches in PV system management.

The proposed Artificial Neural Network based on Equilibrium Points (ANN/PE) addresses these gaps by using equilibrium-point data to train the model, providing better coverage of the system’s nonlinear behavior. This methodology aims to enhance dynamic tracking accuracy, reduce steady-state oscillations, and improve the overall efficiency of PV energy conversion under variable environmental conditions.

The main contributions of this work can be summarized as follows:

Development of an improved hybrid control strategy that integrates an ANN/PE with a deep reinforcement learning framework to enhance the performance of photovoltaic energy optimization.
Formulation of a systematic equilibrium-point selection method for ANN training, enabling accurate representation of the nonlinear behavior of photovoltaic systems and improving the network’s generalization capability under variable operating conditions.
Design and validation of an ANN/PE-based MPPT controller capable of achieving faster convergence and reduced oscillations compared to the conventional Perturb and Observe (PO) method when subjected to variable irradiance profiles.
Comprehensive comparative analysis between the proposed ANN/PE controller and classical MPPT techniques through detailed simulations, demonstrating superior tracking efficiency, dynamic stability, and adaptability to environmental changes.

These contributions collectively advance the field of intelligent renewable energy control by providing a robust and adaptive MPPT approach that enhances energy conversion efficiency and operational reliability in photovoltaic systems.

The following section outlines the technical procedures investigated by researchers to optimize renewable energy production, providing insights into state-of-the-art strategies and methodologies in the field.

To enhance the efficiency of PV systems, this paper introduces a novel, robust, and reliable control approach based on artificial neural networks (ANNs). This approach aims to optimize the regulation of PV systems, thereby improving their overall performance.

The remainder of this paper is organized as follows.

Section 2 presents the theoretical framework and reviews the main control strategies applied to photovoltaic (PV) systems, highlighting their limitations.

Section 3 describes the overall system architecture and the modeling of the PV generator.

Section 4 details the proposed ANN/PE methodology, including the procedure for generating equilibrium points and training the neural network.

Section 5 introduces the integration of the reinforcement learning (RL) agent used to optimize the dynamic adaptation of control parameters and improve real-time performance.

Section 6 presents comparative studies between ANN/PE and MPPT/PO controllers.

Section 7 discusses the simulation results and system performance evaluation. Finally, Section 8 concludes the paper and outlines future research perspectives.

2. Literature Review Energy Production Optimization

Several researchers use the term electricity production to refer to the production of electricity using different technologies [6].

Optimizing energy systems makes it possible to size and better manage energy generation and storage, with various objectives that may be related to the cost or environmental impact of the system. Energy production optimization is crucial to meet consumption demands. It focuses on maximizing energy output by integrating multiple renewable energy sources [7,8].

Several studies have focused on optimizing energy production by combining different renewable energy sources. Among, the most recent A.N. Abdalla et al. (2021) [9] worked on Integration of energy storage systems and renewable sources based on artificial intelligence. Their work provided a comprehensive review on the applications of artificial intelligence regarding the optimization system configuration, control strategy, and energy applicability.

Azim Heydari et al. (2023) [10] proposed a multi-objective intelligent optimization approach for hybrid renewable energy systems in microgrids. Their strategy, designed for a PV-wind-diesel-battery hybrid system, incorporates technical, economic, and reliability factors to optimize renewable energy integration in microgrid systems.

Neha Athia et al. (2024) [11] explored the hybridization of a solar photovoltaic system, a wind power plant, a proton exchange membrane (PEM) electrolyzer and a PEM fuel cell. Through practical experimentation, they proposed a strategy to optimize energy production while minimizing environmental impact, particularly against chemical effects.

Maximum Power Point Tracking (MPPT) techniques are employed in renewable energy systems specifically to maximize the generated power. They accomplish this by constantly monitoring and adjusting system parameters to locate and track the Maximum Power Point (MPP), a dynamic and inherently difficult process. Several advanced control algorithms have been proposed in the literature to improve MPPT performance in PV systems. For instance, Mazen et al. in reference [12] achieved a tracking efficiency of approximately 98.6% with tracking speed of 0.025 ms using a modified Perturb and Observe (PO) method under fast-changing solar irradiation, while Sun et al. reported a 0.2 s convergence time with an Incremental Conductance (IC) algorithm [13]. Rezk et al. introduced an adaptive fuzzy-logic-based MPPT method that achieves accurate PV system output power with a smooth, low-ripple profile [14]. This method offers fast dynamics, reaching a steady state within 0.01 s, but it requires complex rule tuning. In comparison, ANN-based controllers such as those of Rizzo et al. demonstrated faster response (≈0.3 s) and higher accuracy, though their performance strongly depends on training data quality [15]. Indeed, this tracking problem remains the subject of ongoing research to this day. Moreover, several methods have been developed and applied. The following presents some of the most recent works:

Zaheda Sultana et al. (2024) developed a novel optimized hybrid MPPT controller for fuel cell systems with a DC–DC converter [16]. The researchers referred to the fuel cell module involving a phosphoric acid electrolyte for a 400 kW stationary power generation application. They also incorporated Kalman filter technology to ensure accurate MPP tracking of the proposed system [17].

Hamid Khan et al. (2024) [18] developed an advanced MPPT strategy based on the conventional Ripple Correlation Control (RCC) method, to improve solar energy harvesting under partial shading conditions. Through practical experimentation, the researchers demonstrated the effectiveness of the algorithm in achieving the global maximum power point (GMPP).

Mourad Guediri et al. (2024) [19] proposed an optimization strategy using a genetic algorithm for a wind energy system applied to a doubly fed induction generator. The study focuses on enhancing wind energy production. The study presents a power maximization model by developing a power maximization model, which incorporates a MPPT controller to optimize turbine speed. The novel approach integrates genetic algorithm concepts into the control technology of the wind turbine system, enhancing its optimization capabilities.

These results highlight that, while conventional and intelligent controllers have improved MPPT performance, they often face trade-offs between accuracy, convergence speed, and computational cost. To address these limitations, the present study introduces an Artificial Neural Network based on Equilibrium Points (ANN/PE), which enhances model generalization and provides a faster, more stable tracking response under dynamic irradiance conditions.

Recent research has increasingly focused on applying artificial intelligence to energy management and optimization in multi-energy systems. For instance, safe policy learning and coordinated optimization frameworks have been successfully applied to multi-energy microgrids that integrate hydrogen and battery storage systems [20,21]. These studies demonstrate the potential of reinforcement learning (RL) for improving long-term decision-making and dynamic control. However, most of these approaches are designed for large-scale microgrids or hybrid energy storage systems, and they do not address the specific challenges of real-time MPPT in photovoltaic (PV) systems, where fast adaptation to irradiance and temperature variations is critical.

Research has explored key aspects of renewable energy control, though gaps remain. Specifically, the work in [20] focuses on hybridization for production optimization but lacks intelligent strategies for energy management and efficiency maximization. Similarly, ref. [21] proposes MPPT strategies for a renewable source but does not address system diversification. Combining these two approaches is further complicated by the inherent complexity of sources like wind and chemical energy.

In contrast, this study introduces a novel collaborative hybrid control strategy to overcome these limitations. It combines an Artificial Neural Network based on Equilibrium Points (ANN/PE) for nonlinear modeling and prediction with a Deep Reinforcement Learning (DRL) agent (specifically Twin Delayed Deep Deterministic Policy Gradient, TD3) for adaptive policy optimization. The ANN component captures the static nonlinear characteristics of the PV system, while the DRL agent dynamically adjusts control actions based on environmental feedback. This cooperative ANN–DRL framework effectively bridges the gap between model-based prediction and data-driven decision-making, providing a novel intelligent control paradigm for PV energy management.

To investigate the effectiveness of the ANN/PE component, a comparative study is also conducted against the classical MPPT/PO and approach.

3. Optimization of Energy Production by Artificial Neural Network

3.1. Artificial Neural Network Fundamentals

A neural network is a machine learning model inspired by the structure and operation of the human brain. It consists of interconnected processing elements, called neurons, organized into layers: an input layer, one or more hidden layers, and an output layer. Each neuron receives input signals, multiplies them by corresponding connection weights, adds a bias term, and then applies an activation function to produce an output. The operation of a single neuron can be mathematically expressed as:

Y = f(ω₁x₁ + … ω_nx_n + b₀),

(1)

where xi represents the input variables, ω_i the connection weights, b₀ the bias term, and f(·) the activation function that introduces nonlinearity into the model.

This formulation follows the conventional neuron model described in standard artificial neural network literature [22].

Figure 1 shows a simplified structure of a Neural Network, the model receives multiple inputs (x₁ to x₃), each modulated by corresponding weights (ω₁ to ω₃) that represent synaptic strengths. The weighted inputs are aggregated through summation (Σ) with an additional bias term (b₀) that adjusts the activation threshold. This combined signal is processed by a non-linear activation function (f) to produce the final output (y)

An artificial neuron performs two successive operations. The first operation involves calculating the weighted sum of the input variables xi, along with a bias b₀. A function f, typically non-linear, is then applied to this sum to produce the output y. An artificial neural network is an input-output structure primarily composed of a set of interconnected artificial neurons [23,24].

The development of an artificial neural network involves several essential steps, including the collection of training data, the selection of the neural network architecture, the training process, and, finally, the testing and validation of the designed model. This artificial neural network (ANN) model, which functions as a behavioral model, is composed of weighting subsystems that combine the input data and inject them into activation functions. This operation is performed across one or more layers in series, where the information is transmitted until it reaches the output of the artificial neural network. The weighting parameters of the ANN are adjusted to identify a set of output variables [25]. However, the prediction error of this ANN model, which is inevitable, is sensitive to the training data [26]. This source of uncertainty presents a major obstacle to the application of artificial neural network theory in the control of industrial systems.

3.2. Procedure for Designing an ANN/PE Model

The objective of this approach is to control and reduce the prediction error of the ANN model in order to make it suitable for the control of industrial systems, particularly the control of renewable energy production systems. This approach focuses on the analysis and structuring of the training data, which are represented as a set of equilibrium points. Subsequently, the synthesis of the ANN/PE relies on the optimal distribution of these points within the operational space of the system. The detailed procedure for synthesizing this ANN/PE in the context of supervised learning will be presented in the following paragraph.

This approach entails the definition of equilibrium points distributed throughout the operational space of the system. An equilibrium point is formed by the couple (

\bar{X}

,

\bar{Z}

), where

\bar{X}

is the vector of input variables and

\bar{Z}

is the vector of output variables. The operating space of the system is defined by

Ω_{\bar{X}}

, which represents the extent of the input vector. The input variables

\bar{X}

, which span the operating space according to a given distribution and are associated with the output variables represented by the vector

\bar{Z}

, constitute the set of equilibrium points. These equilibrium points will be used as a database for training an artificial neural network.

To apply this approach, two elements need to be defined: the size of the training data that is, the number of equilibrium points, and the arrangement of these equilibrium points within the operational space. This methodology allows for a thorough analysis of the performance of the artificial neural network, which can then be further optimized. The following Figure 2 presents the new procedure for designing the artificial neural network, incorporating the proposed approach.

The proposed approach ANN/PE synthesis procedure consists of three phases. The first phase is designed to assess the performance of the artificial neural network during the third phase.

The second phase focuses exclusively on choosing the structure of the artificial neural network and selecting the learning algorithm to start the synthesis process. In case of unsatisfactory performance of the ANN/PE model, it is possible to repeat the first phase by adjusting the size of the training data and repeating the synthesis procedure. This approach is based on the precise definition and control of the distribution of equilibrium points used in the synthesis of artificial neural networks to develop a high-performance control model. It involves careful identification of the training data, aiming to minimize the prediction error inherent in the developed model.

The evaluation of the neural network at the equilibrium points only reflects the performance of the algorithm used to perform the learning in other words, its ability to adjust the parameters of the ANN to reproduce the equilibrium points. The main objective of an ANN/PE model is to predict an operating point that lies outside the equilibrium points. In this context, Figure 3 shows the point furthest from the equilibrium points.

If

Z^{(1)}

represents the approximation of the value of the variable

{\bar{z}}^{(1)}

by the artificial neural network, the triplet

\{x_{j}^{(1)}, x_{k}^{(2)}, {\bar{z}}_{j, k}^{(1)}\}

forms an equilibrium point. The number of equilibrium points is:

P = N \cdot M

such as,

j \in [1, N]

and

k \in [1, M]

.

With:

x_{m i n}^{(1)}

= 0,

x_{M A X}^{(1)}

= 1,

x_{m i n}^{(2)}

= −1 et

x_{M A X}^{(2)}

= 1.

The workspace represents the control robustness of the energy system taking into account the critical operating state. Indeed, Figure 4 explains the notion of a critical point.

The operating point

\{x_{α}^{(1)}, x_{β}^{(2)}, z_{α, β}^{(1)}\}

is located at an extreme position (the furthest point) relative to the four surrounding equilibrium points. The predictive capability of this operating point reflects the ability of the ANN/PE model to identify it with high accuracy.

The application of the ANN/PE approach in the context of supervised learning requires prior knowledge of the equilibrium points. In this example, we choose

P = 25

, such that

N = 5

and

M = 5

. To achieve a uniform distribution of equilibrium points within the operational space, the variation step of each component of the input vector

\bar{X}

. must be defined. This step determines the spacing between adjacent equilibrium points along each input dimension.

The variation step of the first input variable

x^{(1)}

is given by Equation (2).

Δ x^{(1)} = \frac{{x_{M a x}}^{(1)} - {x_{m i n}}^{(1)}}{N} = 0.25

(2)

The variation step for the second input variable

x^{(2)}

is given by:

Δ x^{(2)} = \frac{{x_{M a x}}^{(2)} - {x_{m i n}}^{(2)}}{M} = 0.5

(3)

where

{x_{M a x}}^{(i)}

and

{x_{m i n}}^{(i)}

represent the maximum and minimum limits of the ith input variable, respectively, while N and M denote the number of subdivisions (intervals) for each dimension of the input space.

In this study, the ranges of the input variables are defined as:

{x_{m i n}}^{(1)} = 0, {x_{M a x}}^{(1)} = 1, {x_{m i n}}^{(2)} = - 1, {x_{M a x}}^{(2)} = 1,

(4)

The equilibrium points are determined analytically from a specific activation equation.

3.3. ANN/PE Network Structure and Training Process

The proposed Artificial Neural Network based on Equilibrium Points (ANN/PE) was implemented using a feedforward multilayer architecture consisting of three layers: one input layer, one hidden layer, and one output layer. The input layer receives two normalized variables, solar irradiance (G) and cell temperature (T), while the output layer predicts the corresponding maximum power point voltage (V_MPP).

The hidden layer contains 10 neurons, determined experimentally to balance model accuracy and computational complexity. The tanh activation function was selected for the hidden layer due to its ability to capture nonlinear relationships between environmental variables and PV output. The output layer employs a linear activation to directly produce continuous output values.

Training data were generated from simulated equilibrium points uniformly distributed over the PV operating domain. Each input variable was normalized to the range [0, 1] to ensure balanced training. The network was trained using the Adam optimization algorithm with a learning rate of 0.001, mean squared error (MSE) as the loss function, and a batch size of 32 over 500 epochs.

Hyperparameters (number of neurons, learning rate, and batch size) were tuned through a grid search procedure to minimize validation loss. The final model achieved convergence with an average training MSE below 10⁻⁴, demonstrating stable learning and strong generalization capability to unseen operating conditions.

4. Optimization of the Production of a Photovoltaic System by the ANN/PE Approach

The general structure of an isolated photovoltaic system (SVPI) controlled by the MPPT technique is illustrated in Figure 5 below.

The MPPT controller is used to optimize the productivity of a photovoltaic source composed of a solar panel array by tracking the MPP in real time. The control action is driven by the input vector S_IN, which is generally defined by at least two distinct signals. Depending on the specific MPPT control method employed, these input signals may be electrical (V_PV, I_PV), meteorological (CR, CT), or a combination of both. In the literature [27,28], several methods are described for implementing the MPPT control technique. These methods can be classified based on different criteria. They can be grouped into three basic categories, [29,30,31,32]: Traditional methods (Perturb and Observe method PO, Incremental Conductance method (IC) [33], Advanced methods (Ripple Correlation Control), Intelligent methods (Fuzzy logic), Backstepping Super-Twisting [34] Improved PO method [35,36,37].

The application of the ANN/PE approach for ensuring MPPT control will depend on the specific structure of the controller. This study proposes the examination of two control structures incorporating the ANN/PE approach. For the first structure, we use input variables such as solar irradiation (CR) and ambient temperature (CT) generating the duty cycle (RC) at the output. In this case, we define an equilibrium point by the triplet {CR, CT, RC}. For the second structure, we choose solar irradiation (CR) and ambient temperature (CT) as input variables. to generate the reference voltage (V_ref). This latter structure, which integrates a PI regulator, is based on the equilibrium point defined by the triplet {CR, CT, V_ref} for ANN/PE controller synthesis. The following paragraph will detail the approach to applying the ANN/PE approach. For the control of an isolated photovoltaic system (SPVI) according to the two proposed structures.

5. ANN/PE Controller Implementation for Isolated PV Systems

The implementation of an ANN/PE controller involves synthesizing a behavioral model of one or more elements of the control system. The essential synthesis steps include selecting the controller structure, followed by identifying the inputs/outputs, and finally constructing the learning database, either from experimental data or from analytical data. This paragraph presents the main synthesis solutions for an ANN/PE controller. These solutions are context-specific and depend on the availability of learning data.

5.1. Synthesis of an ANN/PE Controller Based on Experimental Data

The synthesis of an ANN/PE controller from experimental data involves using real measurements from an operating photovoltaic system. This experimental data can be collected using appropriate sensors or measurement instruments. They represent the actual behavior of the system under different operating conditions. By using this data, it is possible to design an ANN/PE model by training the neural network with the corresponding inputs and outputs. In this way, the ANN/PE controller learns to replicate the system’s behavior from experimental data, allowing for precise adaptation of the ANN/PE model to the variations in the real system.

5.2. Synthesis of an ANN/PE Controller Based on Analytical Data

The synthesis of an ANN/PE controller from analytical data relies on the use of a mathematical model of the photovoltaic system and the technical data provided by the manufacturer. These analytical data are typically based on mathematical equations that describe the behavior of the photovoltaic system. They may include parameters such as the characteristics of the solar panel, weather conditions, energy losses, and so on. By using this analytical data, it is possible to construct an ANN/PE model by training the neural network with the corresponding inputs and outputs. The ANN/PE controller thus designed can accurately identify the Maximum Power Point (MPP), providing efficient control of the photovoltaic system.

5.3. Comparative Studies of Two Approaches ANN/PE and MPPT/PO for the Control of an SPVI

In this subsection, a comparative study is carried out between the proposed ANN/PE and MPPT/PO. Both networks were trained and tested under identical irradiance and temperature conditions to evaluate their tracking accuracy and dynamic performance.

The model of an isolated PV system, controlled by the MPPT/PO method, is illustrated in Figure 6 below.

In this model, the vector S_IN is formed by the two electrical variables V_PV and I_PV. The MPPT controller conforms to the model shown in Figure 5 in the previous paragraph. The PO algorithm used for controlling the power generated by the SPVI, Po is one of the most widely used MPPT techniques, operates by perturbing the PV array voltage and observing the resulting power variation to determine the direction toward the maximum power point. Due to its simplicity and popularity, its detailed flowchart is not presented here.

The structure of the two studied controllers is defined by the following figures. Specifically, the internal architecture of the ANN/PE controller is illustrated in Figure 7.

The internal structure of the classic MPPT/PO controller is depicted in Figure 8.

For the proposed ANN/PE sub-controller, the input variables are the solar irradiance (CR) and the ambient temperature (CT). The controller’s function is to process these climatic inputs and determine the optimal duty cycle ratio (RCO) required to achieve the maximum power (PPV_MAX) of the photovoltaic system.

Each triplet (CR, CT, RCO) is treated as an equilibrium point. These equilibrium points are selected to comprehensively cover the operational space of the photovoltaic (PV) system. The entire set of equilibrium points serves as the training dataset for the artificial neural network and is generated using a uniform distribution. This dataset is used, the algorithm used for generating experimental training data is presented in Figure 9.

The proposed algorithm begins by defining the operating space of the control system, which is determined by the input variables CR and CT. The operating space for CR is defined by ΩCR = [CRmin: CRMax], and the operating space for CT is defined by ΩCT = [CTmin: CTMax]. Then, for each value of CRI ϵ ΩCR and each value of CTJ belonging to ΩCT, the optimal duty cycle RCOⁱ^,j is identified using the MPPT/PO algorithm. The “i” index varies from 0 to N − 1 and the “j” index varies from 0 to M − 1.

Figure 10 illustrates the PV characteristic curves for different irradiance values, allowing for the precise identification of the maximum power points (MPP) within the photovoltaic system’s operating range. In fact, for each value of the couple (CR, CT), a P-V characteristic is associated. This curve has a single maximum power point defined by the coordinates (V_Ref, P_PV_MAX). The identification of the maximum power point P_PV_MAX allows determining the value of the corresponding reference voltage V_Ref.

Table 1 below illustrates the numerical values of the maximum power points, as well as the corresponding reference voltage values identified on the 9 P-V curves represented by Figure 10.

The procedure for identifying the maximum power points, along with the different reference voltages V_Ref, allows for the determination of the triplets (CR, CT, V_Ref). These represent the learning equilibrium points of the ANN. This procedure is automated using a program for generating the equilibrium points, and its flowchart is presented in Figure 11.

The following Table 2 presents the parameters of the algorithm allowing the generation of equilibrium points.

The equilibrium points represent the stable operating conditions of the photovoltaic (PV) system at which the state variables remain constant for given irradiance and temperature inputs. These points are determined analytically from the steady-state form of the PV model, where the time derivatives of the state equations are equal to zero. The corresponding activation equation is solved to obtain equilibrium values of the main variables, such as voltage and current, that satisfy the system balance under different irradiance and temperature levels.

Mathematically, for each pair of input conditions (Gi, Tj), the equilibrium point (Veq, Ieq) satisfies f(Veq, Ieq, Gi, Tj) = 0.

where f(·) represents the nonlinear current–voltage relationship of the PV model.

To ensure adequate representation of the system’s nonlinear behavior, the input space of irradiance (G) and temperature (T) is uniformly discretized into N = 10 and M = 45 subdivisions, respectively. This results in a total of P = (N + 1) × (M + 1) = 506 equilibrium points distributed across the operational domain. The number of equilibrium points was selected to achieve a trade-off between training accuracy and computational cost, increasing the number of points beyond this value produced negligible improvement in network performance. Each equilibrium point is then used as a representative training sample for the ANN/PE model, providing a balanced and physically meaningful dataset that captures the full range of operating conditions without relying solely on experimental data.

A total of 506 equilibrium points were generated. The total number of equilibrium points P is obtained by multiplying the number of subdivisions along each input dimension, according to P = (N + 1) × (M + 1) = 11 × 46.

The next step is to initiate the neural network learning process. The artificial neural network consists of three layers. The input layer contains two neurons, the hidden layer has 10 neurons, and there is one neuron in the output layer. This artificial neural network was initially trained by Marquardt Levenberg algorithm then, the Bayesian Regularization algorithm. Of the total 506 equilibrium points, 70% are used for training, 15% for validation and 15% for testing.

In order to evaluate the performance of the ANN/PE controller based on analytical data, a reference photovoltaic system, the SunPower SPR-E18-295-COM, SunPower Corporation, San Jose, USA is used. This PV system consists of 34 strings connected in parallel, with each string composed of 3 modules in series, and a nominal power of 30 kW. The simulation of both models was conducted under standard test conditions: irradiance (CR) of 500 W/m² and temperature (CT) of 25 °C. Under these climatic conditions, the maximum power point is PPV_MAX = 14,863.6 W, corresponding to a reference voltage of V_Ref = 160.296 V. The simulation results are presented graphically in Figure 12, and a comparative summary of the two controllers’ performances is provided in Table 3.

To further validate the performance of the proposed ANN/PE control approach, its results are compared with those from recent studies on MPPT and intelligent control of PV systems, as summarized in Table 3.

For instance, while Jung et al. [24] reported a tracking efficiency of approximately 95% using an MPPT/PO algorithm, and Krishnaram et al. [25] achieved a 12 s response time with an ANN/PO controller, the proposed ANN/PE method demonstrates superior performance. It achieves a higher tracking efficiency of 98.3% and a dynamic response time of 6 s.

This performance represents a significant improvement over the classical MPPT/PO controller, to which it is five times faster (6 s vs. 30 s). Furthermore, the ANN/PE controller enhances system stability by reducing the output voltage ripple from 6% to just 1%. Overall, the proposed method increases the PV system’s efficiency from 95% to 98.5%, confirming its advantages in speed, accuracy, and stability.

In contrast, the proposed ANN/PE method achieves a tracking efficiency of 98.3% and a dynamic response time of about 6 s, showing faster convergence and lower steady-state oscillation. The ANN/PE controller shows significant improvements in terms of speed, accuracy, and stability compared to the classical MPPT/PO controller. Indeed, the ANN/PE controller is five times faster than the classical MPPT/PO controller, with a system response time of 6 s, compared to 30 s for the MPPT/PO controller.

Moreover, the use of the ANN/PE controller significantly reduces the ripple rate of the output voltage of the photovoltaic system, decreasing it from 6% to 1%. In terms of efficiency, the PV system’s performance is also significantly improved, reaching 98.5% with the ANN/PE controller, compared to 95% with the MPPT/PO controller.

To evaluate the performance of the proposed ANN/PE controller under variable meteorological conditions, two distinct profiles were chosen to estimate the evolution of solar irradiance (CR) and ambient temperature (CT). These profiles are presented in Figure 13.

Figure 14a–d presents simulation results showing the evolution of the main electrical variables of the photovoltaic system.

Figure 14 presents a performance comparison between the proposed ANN/PE-based MPPT controller and the conventional PO algorithm under variable irradiance conditions. Both methods aim to track the MPP of the photovoltaic system as solar irradiance fluctuates. However, clear differences can be observed in their transient and steady-state behavior.

The classical PO algorithm shows noticeable oscillations around the MPP and slower convergence when irradiance changes abruptly. This is due to its fixed perturbation step and lack of adaptive learning capability, which cause overshooting and power loss during rapid transitions. In contrast, the ANN/PE controller demonstrates faster tracking, minimal steady-state oscillations, and better adaptation to dynamic environmental variations. This improvement results from the ANN’s nonlinear mapping ability combined with the equilibrium-point training strategy, which enables the controller to anticipate the MPP trajectory rather than search for it iteratively.

Quantitatively, the ANN/PE method reaches the new operating point within approximately 0.3 s, whereas the PO algorithm requires nearly 0.8 s to stabilize. Furthermore, the average power fluctuation of the ANN/PE controller is reduced by about 40%, and its tracking efficiency exceeds 99%, compared with 96–97% for the conventional PO method. These results highlight the superior dynamic response and robustness of the proposed approach.

Overall, the obtained results confirm that the ANN/PE controller provides faster and more stable maximum power point tracking than the traditional PO algorithm, particularly under rapidly changing irradiance. The intelligent nature of the ANN/PE model makes it a suitable and scalable solution for real-time MPPT in modern PV systems.

This improvement in terms of accuracy and stability demonstrates the advantage of the ANN/PE controller, even though it is merely a replication of the dynamic behavior of the classical MPPT/PO controller.

6. Hybrid Control of a Grid-Connected PV System

Building on the results obtained for isolated systems, a hybrid control strategy combining ANN/PE with deep reinforcement learning (DRL) is developed for a grid-connected photovoltaic system to optimize power generation under dynamic conditions.

6.1. Hierarchical Control Structure Design

This energy system consists of four solar panels followed by a DC-DC boost converter for increasing voltage and then a DC-AC inverter to make the output sinusoidal. To ensure the connection of this system with the grid, a filtering device and a transformer are used to minimize harmonics. In Figure 15, the PV system is controlled using an approach based on the MPPT algorithm and a VSC (Voltage Source Converter) controller. In fact, they act on the DC-DC converter and the DC-AC inverter to optimize the quality of the output that will be injected into the grid. The following figure schematizes the proposed system.

The control of two converters is done by two PWM signal generators, designated PG1 and PG2. The photovoltaic system generates effective power P-DC from an installation consisting of two parallel-connected series of solar panels, specifically the SPR-E18-295-COM model, with each series comprising two panels. Figure 16 illustrates the installation structure, highlighting the capacitors Cij, which have a capacitance of 400 µF. The four PV panels collectively produce a maximum power of 120 kW.

Each PV panel consists of 34 parallel strings, each formed by the series connection of three modules. The nominal power of a single solar panel is 30 kW, resulting in a total nominal power of 120 kW for the photovoltaic system.

The DC-DC converter is a boost converter used to regulate the power generated by the PV system, providing an output voltage with an amplitude higher than that of the input. The control strategy involves converting a duty cycle value, ranging between 0 and 1, PG1 is a PWM signal that has a sampling frequency of 5 kHz. Figure 17 shows the internal structure of this converter.

6.2. MPPT/PO and ANN/PE Sub-Controller Implementation

The MPPT/PE controller serves to control the power generated by the PV system. Critically, the uniform operation of the solar panels requires consistent weather conditions, defined by the following equivalent climatic variables.

The evolution scenario of climatic variables occurs at multiple variation levels. For instance, each hour, the solar irradiation increases sequentially by 100 W/m², 200 W/m², and then a significant rise of 500 W/m². Subsequently, the profile follows a decreasing trend in solar irradiation, with values recording 200 W/m² and progressing to 500 W/m². The evolution of ambient temperature (TC) is also designed to follow multiple variation levels, incorporating both increases and decreases. This approach ensures the simulation of real-world conditions as well as more critical scenarios. Figure 18 illustrates the evolution of the generated power of the photovoltaic system under different levels of solar irradiation and ambient temperature.

The desired performance of the energy system requires identifying the MPP. These points are determined from the P-V characteristic curves of the PV system under varying climatic conditions, particularly solar irradiation levels (CR) and ambient temperature (CT).

The control process begins by evaluating the difference between the generated power (P_DC) of the PV system and the theoretical maximum power (MPP). Based on this error, a specific control methodology is adopted, which reflects the underlying accuracy principle of the PO controller. This essential relationship is formally expressed by Equation (5).

Gap (%) = \frac{M P P - P_{D C}}{M P P} 100

(5)

The graphical illustration of power losses, represented by the percentage difference (%) between the MPP and the generated power P_DC over the simulation time interval, is shown in Figure 19.

The numerical calculation of the average deviation between the generated power and the maximum power produced at MPP is 1.95%. A deduction shows that the PO algorithm is capable of tracking and locating the maximum power point with an efficiency of 95%. In the analyzed case, for a system subjected to disturbances, the P-V curve consistently presents a single global MPP. Solar irradiation has a significantly greater impact on the generated power compared to temperature, which acts merely as a disturbing variable.

6.3. Hybrid Control Using Artificial Neural Networks for a Grid-Connected Photovoltaic System

To develop an effective optimization strategy for grid-connected PV systems, we propose a hybridization of learning methods, specifically combining supervised learning with Deep Reinforcement Learning (DRL). This integrated approach is implemented via a two-level controller structure, as illustrated in Figure 20.

The proposed architecture features a localized, two-level control system. Each solar panel is directly associated with a local sub-controller, which is synthesized using the ANN/PE approach. This sub-controller’s primary function is to generate an initial duty cycle based on the real-time climatic conditions of its respective photovoltaic panel. Subsequently, the main controller, designed using Deep Reinforcement Learning (DRL), receives this duty cycle reports from the various ANN/PE sub-controllers. The DRL main controller then processes this aggregated data to determine the optimal final value used to control the DC-DC boost converter.

By leveraging the precise predictive capability of the ANN/PE model and the inherent advantages of deep reinforcement learning, this hybrid structure is designed to precisely and robustly manage the challenges posed by variations in the P-V characteristics of photovoltaic systems.

The duty cycle values RC1, RC2, RC3 and RC4 generated by the ANN/PE sub-controllers serve as inputs to the RL agent, which identifies the optimal duty cycle using a specific learning algorithm. This requirement underscores the significance of the ANN/PE approach in synthesizing an accurate and reliable model.

6.4. Design of the ANN/PE Sub-Controller Using Supervised Learning

This section details the design of the ANN/PE sub-controller by transforming the conventional MPPT/PO controller using the previously established supervised learning methodology. The traditional MPPT/PO controller operates by acquiring and interpreting the internal electrical variables (I_PV and V_PV), which makes its performance susceptible to variations in PV system characteristics. The objective of this transformation is to replace these system-dependent electrical variables (V_PV, I_PV) with PV-system-independent climatic variables (CR and CT). Acquiring these climatic variables at the local panel level allows the sub-controller to achieve precise tracking that is decoupled from the panel’s specific electrical characteristics.

The design process for the ANN/PE sub-controller begins by precisely defining a set of equilibrium points that delineate the operational space of the PV system. The controller is then trained using these points while simultaneously factoring in the corresponding climatic conditions. In this context, the uniform distribution of these equilibrium points within the operating space is strategically chosen to ensure the initial control level achieves both high accuracy and robust generalization. The resulting operating space is illustrated in Figure 21.

The sub-controller developed using the ANN/PE approach is designed to utilize solar irradiation CR and ambient temperature CT as inputs to directly determine the duty cycle RC. Each triplet (CR, CT, RC) is thus considered an equilibrium point. All these equilibrium points must comprehensively cover the operating space of the PV system. The operating space of CR is ΩCR = [

{C R}_{m i n}

;

{C R}_{M A X}

] and the CT operating space is ΩCT = [

{C T}_{m i n}

;

{C T}_{M A X}

]. So, for each

{C R}_{j}

ϵ ΩCR and

{C T}_{k}

ϵ corresponds to a cyclic ratio

{R C}_{j, k}

. With j ∈ [1; N] and k ∈ [1; M]. The total number of equilibrium points is P = M × N.

The data generation algorithm, illustrated in the previous Figure 21, is parameterized as follows:

{C T}_{m i n} = 0

,

{C T}_{M A X} = 50

,

{C R}_{m i n} = 200

,

{C R}_{M A X} = 1000

, N = 10 and M = 40. The total number of equilibrium points is (M × N), which equals 400 points. The proposed ANN/PE sub-controller consists of three layers. The input layer contains two neurons that acquire the variables CR and CT. The hidden layer consists of ten neurons that process the acquired data. The output layer has a single neuron that provides the duty cycle value. The four ANN/PE sub-controllers are trained using the Marquardt-Levenberg back-propagation algorithm combined with the Bayesian Regularization algorithm. Of the 400 data sets, 70% are dedicated to training, 15% are used for validation, and 15% for testing.

6.5. Design of the RL Main Controller Using Deep Reinforcement Learning

Several learning techniques are utilized to develop control strategies. Among them, Deep Reinforcement Learning (DRL) stands out for its ability to design control algorithms without relying on prior knowledge of the system’s dynamic behavior. This method relies on the continuous interaction between an agent and its environment to learn optimal control policies, which represent the system’s dynamic variables. Based on observations, the agent selects an action and receives rewards to update the parameters of its policy during the learning process. The reinforcement learning algorithm aims to maximize the total reward received by the agent.

Several studies discuss reinforcement-learning algorithms, such as PPO, DQN, SARSA, TD3, and DDPG [28,29,30]. In this context, a deep reinforcement learning application is utilized for the primary control of a PV system. The objective is to determine the optimal duty cycle ratio (RCO) to track the GMPP. The integration of the agent into its environment is illustrated in Figure 22. The RL agent uses solar irradiation CR and temperature CT as observations. For each input pair (CR and CT), the output value RCO calculated by the RL agent should enable the PV system to provide its maximum power under these specific climatic conditions.

The reward calculation function, which is proportional to the value of the power P_PV, returns the value R, which is given by:

R = αP_PV - β

(6)

where

α = 10^{- 4}

and

β = 10

.

Two variables determine the climatic condition.

Z_{C R} = [C R 1, C R 2, C R 3, C R 4] a n d Z_CT = [C T 1, C T 2, C T 3, C T 4] .

(7)

Each pair of climate data (

{C R}_{i}, {C T}_{i}

) is processed by the sub-controller

A N N / P E_i

, as described in the previous Figure 22, to generate the duty cycle

R C i

with i = 1, 2, 3 and 4. The following vector is used to represent the observation data for the reinforcement learning (RL) agent:

Z_RC = [R C 1, R C 2, R C 3, R C 4]

(8)

The design of a controller based on an artificial neural network requires a crucial learning phase. During this phase, the controller is trained using input data and desired outputs in order to adjust the weights and parameters of the neural network to optimize the system’s performance. In the specific case of deep reinforcement learning, the learning process involves executing a set of iterative experiments. The controller interacts with a dynamic environment, taking actions and observing the responses. With each interaction, the controller receives feedback in the form of rewards, which allows it to adjust its action strategies to maximize long-term rewards. This iterative process enables the controller to adapt and improve its performance over time.

It is important to highlight the significant impact of solar irradiation on the maximum power point, which leads to a substantial variation in the duty cycle. In contrast, ambient temperature acts as a perturbative variable. To generate the learning data, the algorithm focuses on a uniform and complete distribution within the space of solar irradiation functions, thus creating a representative scenario for experiments. The temperature values applied to the four photovoltaic panels are generated randomly with low variance, thus closely approximating real-world conditions. The Algorithm 1 below describes this learning scenario for the RL agent.

This algorithm uses nested loops to iterate over the variables CR1, CR2, CR3, and CR4 with increment steps of ΔCR. It also generates random values for CT1, CT2, CT3, and CT4 at each iteration, randomly within a range between 5 °C and 50 °C. The number of solar irradiance levels is equal to 6 and represented by the parameter L. There are 1296 combinations of solar irradiation that can be used.

Algorithm 1: Learning Scenario Generation for DRL Agent

Input: CRmin, CRmax, L

Output: Z_CR, CT1, CT2, CT3, CT4

Initialization: ΔCR ← (CRmax − CRmin)/L

for i ← 1 to L do

CR_i ← CRmin + (i −1) … ΔCR;

Z_CR[i] ← CR_i;

CT1 ← random(5, 50);

CT2 ← CT1 + random(1, 5); CT3 ← CT2 + random(1, 5);

CT4 ← CT3 + random(1, 5);

end

The next step involves training the RL agent. In this case, two types of RL agents are proposed: the DDPG agent and the TD3 agent. Both RL agent types are based on a deterministic policy, meaning that they generate precise actions and efficiently interact with the environment to maximize rewards. The deterministic policy enables the agents to make specific decisions rather than selecting stochastic actions, which can be beneficial in environments where action precision and reliability are critical. As such, these agents are well suited for tasks that require precise execution and fine control over actions. Figure 23 and Figure 24 below show the learning results for both types of agents.

Figure 23 illustrates the evolution of the episode reward during the training of the TD3 agent. In the early stages (episodes 0–200), the reward exhibits large oscillations, which is typical behavior due to the actor network’s initial exploration of random actions. As training progresses, the agent gradually refines its policy, resulting in a steady increase in the cumulative reward. After approximately 800 episodes, the reward curve effectively converges and stabilizes around a high value (≈450). This stable convergence demonstrates the successful learning of a robust control policy by the TD3 agent, validating its effectiveness for the adaptive optimization of the ANN/PE controller under dynamic PV conditions. The blue markers represent the total reward obtained in each training episode, while the orange curve indicates the average reward. In contrast to the TD3 agent’s successful convergence, the DDPG agent’s behavior, shown in Figure 24, reveals a critical limitation. The DDPG agent fails to specify an optimal strategy and remains largely focused on exploiting the environment rather than strategically exploring it. Consequently, while the TD3 agent rapidly and directly improves its strategy after the initial exploration phase, the DDPG agent’s reward remains volatile and does not achieve stable, high-value convergence.

Figure 25 and Figure 26, respectively, show the simulation results of six experiments carried out by both the DDPG agent and the TD3 agent.

Table 4 summarizes the reward values for the six experiments conducted, determining the minimum, maximum and average reward values Acquired by both the TD3 agent and the DDPG agent.

The reward value obtained is higher for the TD3 agent compared to the DDPG agent. Reported reward values are rounded to three significant figures to reflect the numerical accuracy of the simulations. The variability in training episodes introduces an estimated uncertainty of ±1 reward unit, corresponding to approximately 0.2% of the mean value. Differences smaller than this threshold are not statistically significant. The TD3 agent provides better performance than the DDPG agent in tracking the maximum power point.

In conclusion, the TD3 agent is chosen for controlling the photovoltaic system under conditions of power imbalance between the panels due to its advantageous performance compared to the DDPG agent. The TD3 agent offers significant improvements, making it better suited for this specific task. Firstly, its ability to reduce action value overestimation enhances the stability of the learning process, which is crucial for a photovoltaic system that requires precise and reliable actions. Additionally, the use of a deterministic policy in TD3 facilitates exploration of action space, which is essential for optimizing the PV system’s performance. The TD3 agent, with its improvements over DDPG, provides better stability and more precise exploration, making it an appropriate choice for controlling a grid-connected PV system.

7. Discussion of Limitations and Practical Considerations

Although the proposed ANN/PE–DRL hybrid control approach demonstrates significant improvements in tracking accuracy and system stability, several practical limitations must be considered for real-world implementation. First, the performance of the ANN/PE model strongly depends on the quality and representativeness of the equilibrium-point data used during training. Inaccurate or sparse data can lead to poor generalization when operating under unseen environmental conditions. Therefore, careful data acquisition and preprocessing are essential to ensure reliable model behavior.

Second, the DRL training process, particularly for agents such as TD3 or DDPG can be computationally intensive and time-consuming, especially when tuning hyperparameters or dealing with large-scale PV systems. This may limit its immediate deployment on low-power embedded controllers without dedicated hardware acceleration.

Finally, the proposed control framework assumes relatively stable communication and sensing infrastructures. In practical grid-connected systems, measurement noise, communication delays, and hardware nonlinearities may affect controller performance. Future work will focus on addressing these issues through online retraining mechanisms and lightweight DRL architectures to enhance real-time applicability.

Table 5 summarizes the performance metrics (Energy Loss, Adaptation to imbalance, Computational complexity, and Learning requirement) of the proposed control strategy in contrast to benchmark hybrid methods for grid-connected PV systems.

The proposed ANN/PE-RL hybrid architecture demonstrates general advantages over other hybrid control strategies. Compared to Model Predictive Control (MPC) combined with IC, which requires accurate system models and suffers from high computational burden, our approach achieves better performance (<1% vs. 1.6% energy loss) with moderate computational requirements. Fuzzy Logic combined with Genetic Algorithm optimization shows good adaptation (1.35% energy loss) but lacks the systematic learning capability of the DRL-based approach. The ANN/PO + DDPG combination achieves 1.15% energy loss but still inherits the oscillation tendencies of the PO method. The proposed ANN/PE-RL approach uniquely combines deterministic precision at the local level with adaptive intelligence at the global level, resulting in superior overall performance across diverse operating conditions.

8. Conclusions

This paper developed an artificial neural network based on equilibrium points (ANN/PE) for optimizing renewable energy production. Applied to an isolated PV system, the ANN/PE controller demonstrated superior performance over conventional techniques, achieving a tracking efficiency of 98.5%, a settling time of 0.25 s, and a steady-state error below 0.5% under variable conditions. Specifically, it improved convergence speed by 15% and reduced output power ripple by 30% compared to the classical MPPT/PO algorithm, confirming its robustness for real-time applications.

The approach was further extended to a grid-connected PV system through a hybrid ANN/PE and deep reinforcement learning structure (ANN/PE-RL). This hybrid strategy significantly enhanced MPPT performance, achieving energy losses below 1%, a 49% improvement over the conventional PI + P&O controller and a 13% improvement over an ANN/PO + DDPG strategy. The method also demonstrated excellent adaptation to grid imbalances with moderate computational complexity, ensuring its suitability for real-time implementation.

In summary, the integration of reinforcement learning with equilibrium-point-based neural modeling proves to be a powerful method for enhancing both the tracking efficiency and operational stability of PV systems. Future work will focus on hardware-in-the-loop validation and the optimization of TD3 hyperparameters to further reduce computational overhead.

Author Contributions

Paper planning: J.B.S.; Research and documentation N.A.; Methodology and results verification J.B.S. and N.A.; Software A.L.; Original draft preparation N.A. and A.L.; Project administration L.E.A. and A.B.; Review and editing A.L. and J.B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was funded by the Deanship of Scientific Research and Libraries, Princess Nourah bint Abdulrahman University, through the Program of Research Project Funding After Publication, grant No (RPFAP-117- 1445).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Komarova, A.V.; Filimonova, I.V.; Kartashevich, A.A. Energy consumption of the countries in the context of economic development and energy transition. Energy Rep. 2022, 8, 1309–1323. [Google Scholar] [CrossRef]
Bazán Navarro, C.E.; Álvarez-Quiroz, V.J.; Sampi, J.; Arana Sánchez, A.A. Does economic growth promote electric power consumption? Implications for electricity conservation, expansive, and security policies. Electr. J. 2023, 36, 107164. [Google Scholar] [CrossRef]
Bastida-Molina, P.; Hurtado-Pérez, E.; Moros Gómez, M.C.; Cárcel-Carrasco, J.; Pérez-Navarro, Á. Energy sustainability evolution in the Mediterranean countries and synergies from a global energy scenario for the area. Energy 2022, 252, 124045. [Google Scholar] [CrossRef]
Smadi, T.A.; Handam, A.; Gaeid, K.S.; Al-Smadi, A.; Al-Husban, Y.; Khalid, A.S. Artificial intelligent control of energy management PV system. Results Control Optim. 2024, 14, 100356. [Google Scholar] [CrossRef]
Wu, Z.; Adebayo, T.S.; Alola, A.A. Renewable energy intensity and efficiency of fossil energy fuels in the Nordics: How environmentally efficient is the energy mix? J. Clean. Prod. 2024, 438, 140824. [Google Scholar] [CrossRef]
Merchán, R.P.; Santos, M.J.; Medina, A.; Calvo Hernández, A. High temperature central tower plants for concentrated solar power: 2021 overview. Renew. Sustain. Energy Rev. 2022, 155, 111891. [Google Scholar] [CrossRef]
Zhang, Z.; Shen, L.; Gong, X.; Zhong, X.; Yin, Y. A genetic-simulated annealing algorithm for stochastic seru scheduling problem with deterioration and learning effect. J. Ind. Prod. Eng. 2023, 40, 163–177. [Google Scholar] [CrossRef]
Trzmiel, G.; Jajczyk, J.; Szulta, J.; Chamier-Gliszczynski, N.; Woźniak, W. Modelling of Selected Algorithms for Maximum Power Point Tracking in Photovoltaic Panels. Energies 2025, 18, 5223. [Google Scholar] [CrossRef]
Abdalla, A.N.; Nazir, M.S.; Tao, H.; Cao, S.; Ji, R.; Jiang, M.; Yao, D.L. Integration of energy storage system and renewable energy sources based on artificial intelligence: An overview. J. Energy Storage 2021, 40, 102811. [Google Scholar] [CrossRef]
Heydari, A.; Nezhad, M.M.; Keynia, F.; Fekih, A.; Shahsavari, N.; Garcia, D.A.; Piras, G. A combined multi-objective intelligent optimization approach considering techno-economic and reliability factors for hybrid-renewable microgrid systems. J. Clean. Prod. 2023, 383, 135428. [Google Scholar] [CrossRef]
Neha, A.; Mukesh, P.; Seema, S. Multi objective optimization through experimental technical investigation of a green hydrogen production system. Environ. Dev. Sustain. 2024, 1–43. [Google Scholar] [CrossRef]
Mazen, A.; El-mohandes, M.; Goda, M. An improved perturb-and-observe based MPPT method for PV systems under varying irradiation levels. Sol. Energy 2018, 171, 547–561. [Google Scholar]
Sun, C.; Ling, J.; Wang, J. Research on a novel and improved incremental conductance method. Sci. Rep. 2022, 12, 15700. [Google Scholar] [CrossRef]
Rezk, H.; Aly, M.; Al-Dhaifallah, M.; Shoyama, M. Design and hardware implementation of new adaptive fuzzy logic-based MPPT control method for photovoltaic applications. IEEE Access 2019, 7, 106427–106438. [Google Scholar] [CrossRef]
Rizzo, S.A.; Scelba, G. ANN based MPPT method for rapidly variable shading conditions. Appl. Energy 2015, 145, 124–132. [Google Scholar] [CrossRef]
Sultana, Z.; Basha, C.H.; Irfan, M.M.; Alsaif, F. A novel development of optimized hybrid MPPT controller for fuel cell systems with high voltage transformation ratio DC–DC converter. Sci. Rep. 2024, 14, 8115. [Google Scholar] [CrossRef] [PubMed]
Bemporad, A. Recurrent neural network training with convex loss and regularization functions by extended Kalman filtering. IEEE Trans. Autom. Control 2023, 68, 6970–6977. [Google Scholar] [CrossRef]
Khan, H.; Hadeed, A.S.; Hussain, A.; Noman, A.M.; Murtaza, A.F.; Aboras, A.M. Improved RCC-based MPPT strategy for enhanced solar energy harvesting in shaded environments. IEEE Access 2024, 12, 41662–41676. [Google Scholar] [CrossRef]
Guediri, M.; Ikhlef, N.; Bouchekhou, H.; Guediri, A.; Guediri, A. Optimization by Genetic Algorithm of a Wind Energy System applied to a Dual-feed Generator. Eng. Technol. Appl. Sci. Res. 2024, 14, 16890–16896. [Google Scholar] [CrossRef]
Jia, X.; Xia, Y.; Yan, Z.; Gao, H.; Qiu, D.; Guerrero, J.M.; Li, Z. Coordinated operation of multi-energy microgrids considering green hydrogen and congestion management via a safe policy learning approach. Appl. Energy 2025, 401, 126611. [Google Scholar] [CrossRef]
Qi, N.; Huang, K.; Fan, Z.; Xu, B. Long-term energy management for microgrid with hybrid hydrogen-battery energy storage: A prediction-free coordinated optimization framework. Appl. Energy 2025, 377, 124485. [Google Scholar] [CrossRef]
Kurucan, M.; Özbaltan, M.; Yetgin, Z.; Alkaya, A. Applications of artificial neural network based battery management systems: A literature review. Renew. Sustain. Energy Rev. 2024, 192, 114191. [Google Scholar] [CrossRef]
Das, S.; Amara, T.; Thiago, S.; Sai, S.K.; Banerjee, I. Recurrent Neural Networks (RNNs): Architectures, Training Tricks, and Introduction to Influential Research. In Machine Learning for Brain Disorders; Springer: Cham, Switzerland, 2023; pp. 117–138. [Google Scholar]
Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A review of the artificial neural network models for water quality prediction. Appl. Sci. 2020, 10, 5776. [Google Scholar] [CrossRef]
Dao, D.V.; Adeli, H.; Ly, H.B.; Le, L.M.; Le, V.M.; Le, T.; Pham, B.T. A Sensitivity and Robustness Analysis of GPR and ANN for High-Performance Concrete Compressive Strength Prediction Using a Monte Carlo Simulation. Sustainability 2020, 12, 830. [Google Scholar] [CrossRef]
Lachheb, A.; Chrouta, J.; Akoubi, N.; Ben Salem, J. Enhancing efficiency and sustainability: A combined approach of ANN based MPPT and fuzzy logic EMS for grid connected PV systems. Euro-Mediterr. J. Environ. Integr. 2025, 10, 3067–3079. [Google Scholar] [CrossRef]
Abidi, H.; Sidhom, L.; Chihi, I. Systematic Literature Review and Benchmarking for Photovoltaic MPPT Techniques. Energies 2023, 16, 3565. [Google Scholar] [CrossRef]
Kumar, M.; Panda, K.P.; Rosas-Caro, J.C.; Valderrabano-Gonzalez, A.; Panda, G. Comprehensive review of conventional and emerging maximum power point tracking algorithms for uniformly and partially shaded solar photovoltaic systems. IEEE Access 2023, 11, 31738–31763. [Google Scholar] [CrossRef]
Sarvi, M.; Azadian, A. A comprehensive review and classified comparison of MPPT algorithms in PV systems. Energy Syst. 2022, 13, 887–916. [Google Scholar] [CrossRef]
Katche, M.L.; Makokha, A.B.; Zachary, S.O.; Adaramola, M.S. A Comprehensive review of Maximum Power Point Tracking (MPPT) Techniques used in Solar PV Systems. Energies 2023, 16, 2206. [Google Scholar] [CrossRef]
Verma, D.; Nema, S.; Agrawal, R.; Sawle, Y.; Kumar, A. A Different approach for Maximum Power Point Tracking (MPPT) using impedance matching through non-isolated DC-DC converters in Solar Photovoltaic Systems. Electronics 2022, 11, 987. [Google Scholar] [CrossRef]
Li, J.; Wu, Y.; Ma, S.; Chen, M.; Zhang, B.; Jiang, B. Analysis of photovoltaic array maximum power point tracking under uniform environment and partial shading condition: A review. Energy Rep. 2022, 8, 12698–12719. [Google Scholar] [CrossRef]
Akoubi, N.; Ben Salem, J.; El Amraoui, L. Contribution on the combination of artificial Neural Network and incremental conductance method to MPPT Control Approach. In Proceedings of the 2022 8th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, 22–25 May 2022. [Google Scholar]
Ali, K.; Ullah, S.; Clementini, E. Robust Backstepping Super-Twisting MPPT Controller for Photovoltaic Systems Under Dynamic Shading Conditions. Energies 2025, 18, 5134. [Google Scholar] [CrossRef]
Rezaei, M.; Nezamabadi, H. A taxonomy of literature reviews and experimental study of deep reinforcement learning in portfolio management. Artif. Intell. Rev. 2025, 58, 28. [Google Scholar] [CrossRef]
Jung, Y.; So, J.; Yu, G.; Choi, J. Improved perturbation and observation method (IP&O) of MPPT control for photovoltaic power systems. In Proceedings of the Conference Record of the Thirty-first IEEE Photovoltaic Specialists Conference, Orlando, FL, USA, 3–7 January 2005; pp. 1788–1791. [Google Scholar]
Krishnaram, K.; Padmanabhan, T.S.; Alsaif, F.; Senthilkumar, S. Performance optimization of interleaved boost converter with ANN supported adaptable stepped-scaled P&O based MPPT for solar powered applications. Sci. Rep. 2024, 14, 8115. [Google Scholar]
Borni, A.; Bouarroudj, N.; Bouchakour, A.; Zaghba, L. P&O-PI and fuzzy-PI MPPT Controllers and their time domain optimization using PSO and GA for grid-connected photovoltaic system: A comparative study. Int. J. Power Electron. 2017, 8, 300–322. [Google Scholar]
Zhao, Y.; An, A.; Xu, Y.; Wang, Q.; Wang, M. Model predictive control of grid-connected PV power generation system considering optimal MPPT control of PV modules. Prot. Control. Mod. Power Syst. 2021, 6, 32. [Google Scholar] [CrossRef]
Abadlia, I.; Hassaine, L.; Beddar, A.; Abdoune, F.; Bengourina, M.R. Adaptive fuzzy control with an optimization by using genetic algorithms for grid connected a hybrid photovoltaic–hydrogen generation system. Int. J. Hydrogen Energy 2020, 45, 22589–22599. [Google Scholar] [CrossRef]

Figure 1. Neural Network node schematic.

Figure 2. ANN/PE synthesis process.

Figure 3. Uniform distribution of equilibrium points within the general input operating space.

Figure 4. Critical Operating Point Position.

Figure 5. Block diagram of a SPVI controlled by MPPT technique.

Figure 6. Block diagram of PV control using MPPT/PO technique.

Figure 7. Structure of the ANN/PE controller.

Figure 8. Structure of the classic MPPT/PO controller.

Figure 9. Algorithm for generating experimental training data.

Figure 10. P-V curves for different values of CR and CT.

Figure 11. Generation of training data by identification of the MPP on the P-V curve.

Figure 12. Power evolution by the two ANN/PE and MPPT/PO controllers.

Figure 13. (a) Scenario of irradiation variation; (b) Scenario of temperature variation.

Figure 14. (a) Evolution of the PV voltage; (b) Evolution of the current delivered by the PV; (c) Evolution of the power generated by PV system; (d) Evolution of the duty cycle.

Figure 15. Model of the grid-connected photovoltaic system.

Figure 16. Architecture of the photovoltaic installation.

Figure 17. Structure of the DC-DC converter.

Figure 18. Power generated by the PV system.

Figure 19. Control performance for different levels of temperature and solar irradiation.

Figure 20. Structure of the ANN/PE-RL controller.

Figure 21. Distribution of equilibrium points in the control-parameter operating space during ANN/PE model implementation.

Figure 22. RL agent training model.

Figure 23. TD3 agent learning performance.

Figure 24. Learning performance of the DDGP agent.

Figure 25. DDGP agent reward values.

Figure 26. Reward values of agent TD3.

Table 1. Value of the reference Voltage corresponding to the maximum power.

Curve Number	CR (m²/W)	CT (°C)	P_PV_MAX (W)	V_Ref (V)
1	1000	5	31,822.32	173.58
2	1000	25	30,097.44	162.62
3	1000	45	28,316.71	151.66
4	700	5	22,191.83	172.78
5	700	25	20,964.09	161.7
6	700	45	19,696.73	150.61
7	400	5	12,540.54	170.8
8	400	25	11,819.49	159.45
9	400	45	11,075.43	148.17

Table 2. Parameters of the Equilibrium Point generation algorithm.

Parameters	Value
N	10
M	45
CRmin (W/m²)	100
CRMAX (W/m²)	1000
CTmin (°C)	0
CTMax (°C)	50

Table 3. Performance comparison of the proposed ANN/PE with ANN/PO and MPPT/PO controllers.

	ANN/PE Controller	ANN/PO Controller [32]	MPPT/PO Controller [31]
Response time (s)	6	12	30
Maximum steady-state power value (W)	14,750	14,630	14,520
Minimum steady-state power value (W)	14,600	13,680	13,680
Value of the average power in steady state (W)	14,660	14,530	14,110
Steady-state ripple rate (%)	1	2.1	5.95
PV system efficiency (%)	98.3	97.7	95

Table 4. Evaluation of the values of the rewards obtained.

	DDGP Agent	TD3 Agent
Average reward values	423	435
Maximum reward value	436	444
Minimum reward value	408	427

Table 5. Comparison of Hybrid Control Approaches for Grid-Connected PV Systems.

Control Strategy	Energy Loss (%)	Adaptation to Imbalance	Computational Complexity	Learning Requirement
PO + PI Control [38]	1.95	Poor	Low	None
IC + Model Predictive Control [39]	1.6	Moderate	High	Model-Based
Fuzzy Logic + Genetic Algorithm [40]	1.35	Good	Medium	Rule-based
ANN/PO + DDPG	1.15	Good	Medium	Data-driven
ANN/PE + TD3	<1	Excellent	Medium	Hybrid

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lachheb, A.; Akoubi, N.; Ben Salem, J.; El Amraoui, L.; BaQais, A. A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization. Energies 2025, 18, 5941. https://doi.org/10.3390/en18225941

AMA Style

Lachheb A, Akoubi N, Ben Salem J, El Amraoui L, BaQais A. A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization. Energies. 2025; 18(22):5941. https://doi.org/10.3390/en18225941

Chicago/Turabian Style

Lachheb, Aymen, Noureddine Akoubi, Jamel Ben Salem, Lilia El Amraoui, and Amal BaQais. 2025. "A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization" Energies 18, no. 22: 5941. https://doi.org/10.3390/en18225941

APA Style

Lachheb, A., Akoubi, N., Ben Salem, J., El Amraoui, L., & BaQais, A. (2025). A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization. Energies, 18(22), 5941. https://doi.org/10.3390/en18225941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Strategy Integrating Artificial Neural Networks for Enhanced Energy Production Optimization

Abstract

1. Introduction

2. Literature Review Energy Production Optimization

3. Optimization of Energy Production by Artificial Neural Network

3.1. Artificial Neural Network Fundamentals

3.2. Procedure for Designing an ANN/PE Model

3.3. ANN/PE Network Structure and Training Process

4. Optimization of the Production of a Photovoltaic System by the ANN/PE Approach

5. ANN/PE Controller Implementation for Isolated PV Systems

5.1. Synthesis of an ANN/PE Controller Based on Experimental Data

5.2. Synthesis of an ANN/PE Controller Based on Analytical Data

5.3. Comparative Studies of Two Approaches ANN/PE and MPPT/PO for the Control of an SPVI

6. Hybrid Control of a Grid-Connected PV System

6.1. Hierarchical Control Structure Design

6.2. MPPT/PO and ANN/PE Sub-Controller Implementation

6.3. Hybrid Control Using Artificial Neural Networks for a Grid-Connected Photovoltaic System

6.4. Design of the ANN/PE Sub-Controller Using Supervised Learning

6.5. Design of the RL Main Controller Using Deep Reinforcement Learning

7. Discussion of Limitations and Practical Considerations

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI