1. Introduction
Given the increasing global emphasis on sustainable and renewable energy solutions, photovoltaic (PV) energy plays a central role in the transition towards a sustainable energy paradigm. The global climate crisis, combined with the reduced availability of fossil resources, has accelerated the adoption of renewable energy solutions, positioning solar technology as a key pillar in decarbonization strategies worldwide. In this context, PV systems have emerged as a viable and scalable alternative, not only for their ability to generate clean energy but also for their flexibility in applications ranging from small-scale residential installations to large utility-scale power plants [
1].
The overall efficiency of PV systems is directly linked to their ability to continuously extract the maximum available power from the solar panels. However, this task is inherently complex due to the dynamic nature of environmental conditions such as irradiance, temperature, and especially the occurrence of partial shading. These non-linearities introduce multiple local maximum in the power–voltage (P–V) characteristics curve, making the tracking of the global maximum power point (GMPP) a significant challenge. As a result, MPPT algorithms have become an essential component in PV system design, aimed at dynamically adjusting the operating point to ensure optimal energy harvesting under all conditions [
2].
Over the past decades, MPPT has been the subject of extensive research, with traditional methods such as Perturb and Observe (P&O) and Incremental Conductance (InC) being widely adopted due to their simplicity and low computational cost. Nevertheless, these algorithms are known to suffer from several limitations, particularly under rapidly changing irradiance or partial shading conditions, where they tend to converge to local maxima or induce persistent oscillations around the operating point [
3,
4].
To overcome these limitations, the integration of intelligent control techniques has gained significant attention. Among these, Artificial Neural Networks (ANNs) and Deep Q-Networks (DQNs), a type of Deep Reinforcement Learning (DRL), have shown remarkable performance in handling the complex, non-linear, and dynamic behavior of PV systems. These AI-based approaches offer faster convergence, improved tracking precision, and greater stability under challenging conditions such as partial shading. By learning from environmental variations and adapting their control strategies in real time, ANN and DQN algorithms significantly enhance the system’s ability to identify and follow the true GMPP, even in highly variable scenarios.
The work by J and SY [
5] presents an MPPT strategy using Artificial Neural Networks (ANNs), comparing it against the conventional P&O method. The study revealed that the ANN-based approach provides more accurate tracking and faster response times under dynamic irradiance conditions, outperforming P&O in terms of efficiency and stability.
In a more recent study, Giraldo et al. [
6] applied Deep Reinforcement Learning (DRL), specifically a Deep Q-Network (DQN), for MPPT in real PV systems. The work includes both simulation and experimental validation, showing that, while DQN outperformed P&O in simulations, especially under partial shading conditions (PSCs), the real-world results were mixed. Notably, in PSC scenarios, P&O often got stuck in a local MPP, whereas DQN was able to extract up to 63.5% more energy, demonstrating the advantages of autonomous learning strategies. An early contribution to the use of reinforcement learning in MPPT is presented in the study by Kofinas et al. [
7]. In their work, a tabular Q-learning algorithm was developed to track the MPP, showing strong convergence stability and reduced computational effort when compared to traditional metaheuristic techniques. Their results demonstrated that the method consistently outperformed the conventional P&O algorithm across multiple scenarios involving variations in irradiance and temperature. This line of research has since been extended by authors such as CSE [
8] and Bavarinos et al. [
9], who investigated alternative tabular reinforcement learning strategies for MPPT. These studies introduced comparative analyses between Q-learning and SARSA agents, as well as hybrid schemes incorporating fuzzy logic and sliding mode control, further validating the effectiveness of RL-based methods under dynamic environmental conditions.
Similarly, Phan et al. [
10] explored DRL methods, including DQN and Deep Deterministic Policy Gradient (DDPG), for MPPT in MATLAB/Simulink simulations. Their work focused on the challenges of partial shading and showed that DQN significantly improved tracking efficiency compared to P&O. Under uniform conditions, DQN achieved a 5.83% efficiency gain, while DDPG yielded a 3.21% improvement, supporting the applicability of DRL in optimizing PV performance.
Remoaldo and Jesus [
11] presented a comparative analysis between traditional P&O and a fuzzy logic-enhanced version (FLP&O). Using a system with five series-connected PV panels and a boost converter, the study demonstrated that FLP&O offers quicker convergence to the MPP under rapid irradiance changes, confirming the benefits of fuzzy logic in improving MPPT response time and accuracy.
Furthermore, the comprehensive review by Katche et al. [
12] highlighted the limitations of conventional MPPT algorithms like P&O under partial shading. The authors advocated for hybrid and intelligent optimization techniques, which offer improved performance at the expense of higher computational complexity and cost, suggesting that the trade-off is justified in dynamic environments.
In addition to conventional approaches, the scientific community has increasingly explored the application of soft computing and evolutionary algorithms for MPPT control in photovoltaic systems. As highlighted by Rezk et al. [
13], advanced control techniques such as fuzzy logic control (FLC) [
14] and adaptive neuro-fuzzy inference systems (ANFISs) [
15,
16] have shown strong capability in managing the non-linear and time-varying behavior of PV systems. In parallel, a wide range of bio-inspired optimization algorithms have emerged as promising alternatives for global optimization. Among these, techniques such as genetic algorithms (GAs) [
17], cuckoo search (CS) [
18], ant colony optimization (ACO) [
19], bee colony algorithm (BCA) [
20], bat-inspired optimization (BAT) [
21], and the memetic salp swarm algorithm [
22] have demonstrated considerable effectiveness. These methods are particularly well-suited to address the challenges posed by partial shading conditions as they enable good exploration of the search space and enhance the system’s ability to reliably detect the GMPP. According to Jiang Jiang et al. [
23], both soft computing and evolutionary techniques exhibit enhanced adaptability and are capable of delivering efficient and reliable tracking performance even under highly dynamic and non-linear operating conditions.
Additionally, recent works have explored data-driven and neural network-based methods that, while applied in different domains, offer valuable insights for adaptive control strategies in PV systems. For example, Lu et al. [
24] proposed a transfer learning framework to enhance adaptability in dynamic environments, while Aizenberg and Tovt [
25] introduced a multilayer neural network approach for intelligent frequency domain filtering. These contributions reinforce the relevance of learning-based techniques in handling non-linear time-varying conditions.
Lastly, Sharma et al. [
26] compared traditional and metaheuristic MPPT techniques. Their findings reinforce that, although conventional algorithms such as P&O are easier to implement, they struggle in PSC scenarios.
Although previous studies have provided valuable insights, the majority of works in the literature still present notable limitations that prevent a comprehensive evaluation of MPPT strategies. Many focus on comparing only two or three algorithms, often within limited and idealized scenarios. Furthermore, performance assessments are frequently restricted to a few basic metrics, such as efficiency or steady-state accuracy, overlooking crucial dynamic aspects like convergence time or tracking stability under complex PSC. In particular, convergence time is an essential indicator of system responsiveness that remains underexplored and rarely quantified rigorously. Additionally, detailed scenario characterization, especially for PSC cases, is often lacking or simplified.
To address these gaps, this work presents a comprehensive comparative analysis of MPPT algorithms, including conventional methods (P&O and InC), control logic (FLC), hybrid (GA-based), and intelligent approaches (ANNs and DQNs). All the methods are evaluated under both uniform and partial shading conditions using a modular MATLAB/Simulink framework. Multiple performance metrics—MAE, IAE, MSE, ISE, efficiency, and convergence time—are considered. Convergence time is calculated using formal signal analysis with graphical validation. Test cases are clearly defined with specific irradiance and temperature settings, ensuring transparency. This study offers deeper insight into the strengths and limitations of MPPT strategies, especially AI-based ones, to support their practical application in PV systems.
The article is organized as follows.
Section 2 describes the photovoltaic system modeling and simulation setup.
Section 3 outlines the buck converter design.
Section 4 defines the test scenarios, including both uniform and partial shading conditions.
Section 5 presents the implemented MPPT algorithms, covering conventional, hybrid, and AI-based methods.
Section 6 introduces the evaluation metrics, followed by the results and comparative analysis in
Section 7.
Section 8 discusses the key findings, and
Section 9 concludes the work with suggestions for future developments.
5. MPPT Algorithms
Table 6 presents the classification of the MPPT algorithms implemented in this study. Conventional techniques such as Perturb and Observe (P&O) and Incremental Conductance (InC) were included as baseline methods due to their simplicity and widespread use in PV applications.
Additionally, a control logic-based approach using the fuzzy logic controller (FLC) was implemented for its ability to handle system non-linearities through rule-based decision-making. An optimization-based algorithm, the genetic algorithm (GA), was also tested for its global search capability in identifying the maximum power point, particularly under partial shading conditions.
Hybrid algorithms were explored to combine the advantages of different methods, including InC combined with P&O (InC+P&O), GA combined with InC (GA+InC), and GA combined with P&O (GA+P&O), aiming to enhance convergence speed and tracking accuracy in dynamic environments.
However, particular emphasis in this work was placed on artificial intelligence (AI)-based algorithms due to their superior adaptability and learning capabilities. Specifically, the Artificial Neural Network (ANN) was employed for its powerful pattern recognition and non-linear mapping ability, enabling accurate prediction of the optimal operating voltage under varying irradiance and temperature conditions. Furthermore, the Deep Q-Network (DQN) algorithm, which integrates reinforcement learning with deep neural networks, was implemented to achieve autonomous learning and optimal decision-making without requiring an explicit PV system model, demonstrating good performance even under complex partial shading scenarios.
Overall, the integration of AI-based algorithms, namely ANN and DQN, formed a key focus of this study, aiming to improve MPPT accuracy, adaptability, and robustness beyond traditional approaches.
5.1. Artificial Neural Network (ANN)
The Artificial Neural Network (ANN) approach leverages its ability to model non-linear systems and recognize patterns to predict the maximum power point (MPP) under varying environmental conditions. The ANN used in this study was trained with irradiance and temperature as input features, while the output was the predicted maximum power point voltage (). This prediction served as the reference for a PI controller, which adjusted the duty cycle of the buck converter to drive the PV panel towards the MPP.
The main characteristics of the implemented ANN are summarized in
Table 7.
5.1.1. Dataset
To train the Artificial Neural Network (ANN), a comprehensive dataset was generated under realistic photovoltaic (PV) operating conditions, including dynamic variations in irradiance and temperature over a period of 1000 s. The simulation was carried out using a discrete step size of
, resulting in a total of one million samples per variable:
The time vector was defined as
To ensure representative behavior under Standard Test Conditions (STCs), three time windows (at 200 s, 600 s, and 850 s) were defined with fixed irradiance of 1000 W/m
2 and temperature of 25 °C. Outside these intervals, both irradiance and temperature followed sinusoidal functions with added Gaussian noise to emulate natural environmental fluctuations:
where
, , , ;
, , , .
These profiles simulate realistic irradiance oscillations every 5 min and slower temperature drifts every 10 min.
The PV system was simulated (
Figure 7) using a resistive load calculated to match the maximum power point (MPP) at STC using the manufacturer specifications:
The temporal evolution of the temperature and irradiance profiles is illustrated in
Figure 8.
Table 8 summarizes the main statistics of the temperature and irradiance profiles.
This large and diverse dataset enabled the ANN to learn the non-linear relationship between environmental inputs and the optimal MPP voltage, ensuring good generalization across varying PV conditions.
5.1.2. Training Results of ANN
Figure 9 shows the training results of the ANN. The performance plot (
Figure 9a) highlights a best validation performance of 0.94349 at epoch 266, indicating effective convergence. The regression plot (
Figure 9b) reveals a strong correlation between predicted and actual values, with a regression coefficient of
for the test dataset, confirming the model’s ability to generalize. A dataset containing one million samples was used for each variable (irradiance, temperature,
, and
), ensuring a broad representation of different operating conditions and enhancing the generalization capability of the ANN.
The trained network is subsequently integrated as a Simulink block within the MPPT control system, as illustrated in
Figure 10.
To evaluate the generalization performance of the network, 10 random test samples (10% of the dataset) were selected from the testing dataset.
Table 9 combines the real and predicted values for
and
, along with absolute and relative errors.
It is worth noting that the performance saturation can be primarily attributed to the size of the dataset used for training. With approximately one million samples covering variables such as irradiance, temperature, voltage, and power, the dataset is sufficiently large to ensure that the network effectively learns the mapping between input and output variables. As a result, increasing or decreasing the number of neurons in the hidden layers did not lead to significant improvements in prediction error since the network already achieves good generalization.
5.1.3. ANN Training Under PSC
To enhance the model’s performance and adaptability under PSC, the neural network was retrained using updated datasets that included PSC scenarios. The network architecture remained identical, with two hidden layers comprising 20 and 10 neurons, respectively, while the input layer was expanded to six neurons to accommodate additional input features. Notably, the training algorithm and hyperparameters were kept unchanged to ensure evaluation consistency with the baseline model.
The retrained network was integrated as a Simulink block within the MPPT control system, as illustrated in
Figure 11.
5.2. Deep Q-Network (DQN)
Figure 12 presents a simplified block diagram of the Deep Q-Network (DQN) algorithm used for MPPT control in PV systems. It illustrates the key components involved in the agent–environment interaction, including state observation, action selection, Q-value estimation, and policy updating.
The agent receives the system state, comprising the PV voltage, current, duty cycle, and its variation, and evaluates possible actions using a Q-network (a deep neural network). The agent selects the next action using an
-greedy policy, balancing exploration (random actions) and exploitation (choosing the action with the highest Q-value). This policy is formally defined as follows [
10,
31]:
where
This approach allows the agent to continuously explore new strategies while also exploiting the best-known actions, facilitating learning in complex and high-dimensional environments. The selected action is applied to the PV system by modifying the duty cycle of the buck converter, which changes the operating point of the panel. The environment then moves to a new state and returns a reward that reflects the change in power output resulting from the agent’s action.
In DQN, the traditional Q-table is replaced by a deep neural network (
Q-network) to approximate the action-value function
, where
are the network weights. Two networks are used: the
predict Q-network (with weights
) and the
target Q-network (with weights
) [
10]. The loss function minimized during training is the mean squared error (MSE) between the predicted and target Q-values described in Equation (
13) [
10,
32,
33]:
The target Q-value is computed as
And the predicted Q-value from the online network is
The critic network evaluates the Q-values and calculates the loss, which is used to update the weights of the predict network through backpropagation. Over time, this allows the agent to improve its decision-making, leading to convergence to an optimal policy that maximizes long-term power extraction from the PV system.
Being a model-free method, the DQN agent adapts autonomously to varying conditions, including non-uniform irradiance and PSC, demonstrating strong performance in identifying the GMPP without requiring prior knowledge of the system dynamics.
Additionally,
Figure 13 shows the detailed Simulink implementation of the DQN-based MPPT control strategy developed in this study. This model integrates the trained agent within the PV system environment to perform real-time duty cycle adjustments based on learned policies.
5.2.1. Agent Architecture and Reward Function
The Deep Q-Network (DQN) agent used in this work was designed to operate based on three core elements: the state space, the action space, and a multi-objective reward function. These are defined as follows:
The agent observes the system state at each timestep through the following vector:
where
and
are the PV panel voltage and current,
D is the duty cycle of the DC/DC converter, and
is the perturbation applied to
D.
The agent can select from a discrete set of actions that modify the duty cycle:
These values allow the agent to fine-tune the converter operation in small or large steps, or maintain the current setting.
The total reward at each timestep is composed of seven distinct components:
The components are:
This multi-component reward function was carefully designed to guide the agent towards fast convergence, stable operation, and global maximum power tracking while penalizing unnecessary oscillations or control saturation.
5.2.2. Simulation Setup
The training of the Deep Q-Network (DQN) agent was conducted in the MATLAB/Simulink environment using the Reinforcement Learning Toolbox [
34]. A stochastic training strategy was adopted, where the irradiance and temperature applied to each of the three PV panel segments were randomly varied at the beginning of each episode. This approach ensured generalization of the learned policy across a wide range of environmental conditions, including STC and PSC.
Table 10 summarizes the main parameters used to configure the agent and the training process.
5.2.3. Training Results of DQN
The training process of the DQN agent is depicted in
Figure 14, where the light-blue curve represents the episode reward, the dark-blue line indicates the moving average of the reward, and the yellow curve corresponds to the Episode Q0 metric. The Episode Q0 provides an estimation of the expected future reward based on the current policy and is a useful indicator of learning progress.
It is evident that the training converged around episode 100 as the average reward stabilized and episode rewards consistently exceeded the defined performance threshold. The final episode reward and Q0 values confirmed the agent’s ability to maximize the power extracted from the PV system.
A summary of the training performance is presented in
Table 11, with hardware specified in
Table 12.
5.3. Benchmark Algorithms
To ensure a fair and consistent comparison across all MPPT algorithms evaluated in this study, a common benchmarking strategy was adopted.
Table 13 summarizes the key implementation parameters defined for each method, including initial duty cycle, perturbation step size, input variables, and control structure. This standardized setup ensures that observed performance differences are driven by algorithmic behavior rather than configuration disparities.
Additionally, for the hybrid algorithms GA-P&O and GA-InC, a small exponential moving average (EMA) filter was applied to the input measurements of panel voltage and current (
and
) in order to mitigate the effect of high-frequency measurement noise. The filtered signal is computed using Equation (
16) [
35,
36]:
where
x represents the measured signals
and
, and
is the smoothing coefficient. This filtering strategy enhances the stability of the GA-based controllers by providing cleaner input data during operation.
8. Discussion
The comparative analysis of the MPPT algorithms, summarized in
Table 22, highlights distinct strengths and limitations across the different approaches.
Conventional methods such as P&O, InC, and InC+P&O offer simplicity and fast convergence but consistently underperform in dynamic conditions, particularly under partial shading (PSC), where they tend to become trapped in local maxima. Their average efficiencies remained below 85% in PSC scenarios, confirming their limited adaptability.
Fuzzy logic control (FLC) improved upon the traditional methods by providing better stability and slightly higher accuracy. While it showed moderate resilience under variable conditions, its tracking capability in PSC scenarios was still outperformed by more advanced techniques.
AI-based methods, namely ANN and DQN, demonstrated superior overall performance across all the scenarios. These algorithms achieved high average efficiencies (above 91%), fast convergence, and excellent results, even under complex PSC conditions. Their ability to learn and generalize from environmental variations enabled them to reliably track the GMPP.
Hybrid methods combining traditional techniques with genetic algorithms, such as InC + GA and P&O + GA, also yielded notable improvements in accuracy and adaptability. However, these gains came at the cost of increased convergence time due to the computational overhead of evolutionary optimization.
Overall, the results reinforce the effectiveness of AI-based and hybrid strategies for MPPT, particularly in challenging and rapidly changing operating environments.
9. Conclusions
This work demonstrates the effectiveness of intelligent control strategies, particularly Artificial Neural Networks (ANNs) and Deep Q-Networks (DQNs), in enhancing the performance of photovoltaic systems through MPPT optimization. The implemented MATLAB/Simulink models, which simulated both uniform and partial shading scenarios, provide a robust framework for comparative analysis. The results show that AI-based methods not only achieved superior tracking efficiency and faster convergence times but also exhibited greater resilience to dynamic changes and adaptability to non-linear environmental conditions. Among all the tested algorithms, the DQN agent consistently outperformed the others in PSC scenarios, confirming its ability to identify the GMPP more effectively. These findings underscore the potential of integrating learning-based controllers into PV systems to improve energy harvesting in real-world applications.
A natural progression of this research involves the integration of a bidirectional DC/DC converter, specifically a buck–boost topology, which would enable both the charging and discharging of an energy storage system. This architecture would facilitate the transition from a passive PV system to a hybrid energy management solution capable of supplying loads autonomously during periods of low solar generation.
Additionally, future work could focus on the experimental validation of the proposed MPPT algorithms under real-world operating conditions. This would involve developing a physical prototype that integrates photovoltaic panels, bidirectional converters, embedded controllers (e.g., microcontrollers), and appropriate sensors for current, voltage, irradiance, and temperature measurement. Implementing the control logic directly in embedded hardware would allow for the assessment of real-time performance, computational constraints, and resilience to disturbances, such as measurement noise or sudden environmental changes.
Another promising research direction includes the development of hybrid renewable energy systems, combining solar and wind energy sources under a unified control framework. The MPPT algorithms could be extended or adapted to coordinate multiple energy sources with complementary profiles, enhancing system reliability and reducing energy intermittency.