Next Article in Journal
Research Advances in the Distribution, Migration, Transformation, and Removal of Antibiotics in Aquatic Ecosystems
Previous Article in Journal
Research on Low-Frequency Sound Absorption Based on the Combined Array of Hybrid Digital–Analog Shunt Loudspeakers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading

by
Diana Ortiz-Muñoz
,
David Luviano-Cruz
*,
Luis Asunción Pérez-Domínguez
,
Alma Guadalupe Rodríguez-Ramírez
and
Francesco García-Luna
Department of Industrial Engineering and Manufacturing, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez 32310, Chihuahua, Mexico
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(23), 12776; https://doi.org/10.3390/app152312776
Submission received: 9 November 2025 / Revised: 20 November 2025 / Accepted: 21 November 2025 / Published: 2 December 2025

Abstract

Maximum power point tracking (MPPT) under partial shading is a nonconvex, rapidly varying control problem that challenges multi-agent policies deployed on photovoltaic modules. We present Fuzzy–MAT3D, a fuzzy-augmented multi-agent TD3 (Twin-Delayed Deep Deterministic Policy Gradient) controller trained under centralized training/decentralized execution (CTDE). On the theory side, we prove that differentiable fuzzy partitions of unity endow the actor–critic maps with global Lipschitz regularity, reduce temporal-difference target variance, enlarge the input-to-state stability (ISS) margin, and yield a global  L γ -contraction of fixed-policy evaluation (hence, non-expansive with κ = γ < 1 ). We further state a two-time-scale convergence theorem for CTDE-TD3 with fuzzy features; a PL/last-layer-linear corollary implies point convergence and uniqueness of critics. We bound the projected Bellman residual with the correct contraction factor (for L and L 2 ( ρ ) under measure invariance) and quantified the negative bias induced by min { Q 1 , Q 2 } ; an N-agent extension is provided. Empirically, a balanced common-random-numbers design across seven scenarios and 20 seeds, analyzed by ANOVA and CRN-paired tests, shows that Fuzzy–MAT3D attains the highest mean MPPT efficiency (92.0% ± 4.0%), outperforming MAT3D and Multi-Agent Deep Deterministic Policy Gradient controller (MADDPG). Overall, fuzzy regularization yields higher efficiency, suppresses steady-state oscillations, and stabilizes learning dynamics, supporting the use of structured, physics-compatible features in multi-agent MPPT controllers. At the level of PV plants, such gains under partial shading translate into higher effective capacity factors and smoother renewable generation without additional hardware.

1. Introduction

Maximum power point tracking (MPPT) in photovoltaic (PV) arrays becomes notably challenging under partial shading conditions (PSCs), where the power–voltage (PV) landscape is nonconvex and multimodal, with multiple local maxima that shift with irradiance and temperature [1,2,3,4,5]. Classical trackers (perturb-and-observe and incremental conductance) are attractive for their simplicity and low sensor burden, but under PSCs, they can settle at suboptimal peaks and exhibit limit-cycle oscillations; performance is sensitive to stepsize tuning, measurement noise, and plant transients [1,3,4]. Global-search enhancements (adaptive scanning; PSO/GA/DE and hybrids) increase the likelihood of reaching the global MPP, at the cost of additional computation and re-tuning as operating statistics change [2,6,7]. This reflects a broader debate between global exploration to avoid local traps and local tracking to preserve fast, low-perturbation operation.
Several studies further sharpen the MPPT and Deep Reinforcement Learning (DRL) baselines under PSCs. On the DRL side, a recurrent Proximal Policy Optimization–Long Short-Term Memory (PPO–LSTM) agent for global MPPT demonstrates robust gains over classical and nonrecurrent baselines [8], while a hardware-validated DQN achieves GMPPT on real PV strings with rigorous experimental design [9]. Beyond single-agent control, physics-informed multi-agent DRL with centralized training and decentralized execution (CTDE) has matured in power-network settings, offering architectural guidance for PV module coordination [10]. In parallel, the body of comparative evidence and synthesis continues to grow: a controlled 2024 evaluation contrasting classical and learning-based MPPT algorithms [11], a comprehensive 2025 review of traditional and advanced MPPT techniques (including partial-shading scenarios) [12], and a 2025 validation study of a global MPPT strategy under complex partial shading using model-predictive control [13].
Learning-based control offers an alternative that adapts policies to nonstationary PV conditions. Deterministic policy gradient methods enable continuous actuation [14,15]; TD3 reduces overestimation bias via twin critics and target smoothing [16,17]. In multi-string or module-integrated topologies, the problem is intrinsically multi-agent with local observations and shared objectives, motivating centralized training with decentralized execution (CTDE) as in MADDPG, COMA, and value-factorization approaches [18,19,20]. Nevertheless, off-policy actor–critic stability hinges on value-function smoothness and bootstrapping geometry [16,21], while on-policy trust-region methods (TRPO/PPO) trade stability for higher sampling costs in embedded experimentation [22,23]. Two-time-scale stochastic approximation provides convergence guarantees under regularity, stepsize, and contractivity conditions that are often verified for linear approximation but are harder to certify for deep critics and nonconvex policy classes [24,25,26,27,28].
This work develops Fuzzy–MAT3D, a CTDE variant of TD3 in which actors and critics comprise a differentiable fuzzy partition of unity over the compact PV operating domain [29,30,31]. The partition induces locality and global Lipschitz continuity by construction, aligning with classical partition-of-unity ideas in numerical analysis and meshfree interpolation [32,33,34]. In the MPPT setting, this structure reduces temporal-difference (TD) target variance, bounds critic Jacobians in nonsmooth regions of the PV surface, and yields a global L γ -contraction of fixed-policy evaluation (hence non-expansive with κ = γ < 1 )—properties that tighten constants in standard ODE/stochastic-approximation arguments and connect to input-to-state stability margins in control [25,27,28,35,36].
From a mechanistic viewpoint, the differentiable fuzzy PoU constrains the actor and critics to be globally Lipschitz, with moduli that can be explicitly computed from the membership functions. In Section 2, we show that these Lipschitz constants enter directly into the variance bounds of the smoothed TD targets and into the ISS gain κ ( L μ , θ ) of the closed-loop dynamics. Thus, fuzzy regularization reduces the sensitivity of temporal-difference updates to stochastic perturbations and enlarges the ISS margin, which in turn manifests as smoother learning dynamics and reduced steady-state jitter in the PV power trajectories.
We also analyze the order-statistics bias introduced by the TD3 min { Q 1 , Q 2 } target and provide a projected Bellman residual bound with the correct contraction factor (clarified for L and L 2 ( ρ ) under measure invariance), together with a clean N-agent CTDE extension [37,38,39,40].
Aim and significance. Our aim is to endow multi-agent TD3 with a structured, differentiable fuzzy front-end that regularizes value estimation and stabilizes policy updates for MPPT under PSCs. The contribution has two principal facets: it injects physics-compatible locality into actor–critic maps; practically, it seeks higher energy capture with reduced steady-state oscillation relative to plain multi-agent TD3, MADDPG, and classical MPPT, as substantiated in the Results Section. For evaluation, we follow established DRL practices and classical multiple-comparison control [41,42,43,44,45].
Problem statement and objectives. We model a two-module PV string under partial shading as a discounted Markov decision process with compact state space S and bounded action space A , where the agents observe electrical quantities (irradiance proxies, voltages, and currents) and select module voltages or duty cycles every control period in order to maximize normalized PV power. The control problem is to synthesize a stationary multi-agent policy μ θ that maximizes the long-run MPPT efficiency η while keeping steady-state oscillations and settling times within acceptable operating bounds.
Accordingly, the main objectives of this study are as follows: (i) to construct a Lipschitz-regularized CTDE–TD3 architecture with a differentiable fuzzy partition of the operating domain; (ii) to establish convergence and stability guarantees (contraction properties, temporal-difference variance bounds, two-time-scale convergence, and ISS margins) for the resulting actor–critic scheme; and (iii) to demonstrate, on a balanced set of PSC scenarios, that the proposed Fuzzy–MAT3D controller outperforms representative classical and DRL-based MPPT baselines in efficiency and stability.
To the best of our knowledge, this is the first CTDE–TD3 design that explicitly ties a fuzzy partition-of-unity to global Lipschitz constants and ISS margins in PV MPPT.
Relative to entropy-regularized SAC and trust-region methods (TRPO/PPO) [21,22,23], the proposed fuzzy partition of unity (PoU) regularizes deterministic CTDE–TD3 by enforcing global input Lipschitzness and reducing TD-target variance. Recent reports on safe/robust Reinforcement Learning (RL) for power converter control and MPPT under PSCs [46,47,48,49,50,51,52] motivate structure in actor–critic features; hence, Fuzzy–MAT3D offers a physics-compatible alternative, achieving high yield with low ripple under an identical control period and per-tick compute budget; irradiance is measured and logged for MPP reference and analyses, while classical baselines operate on their standard ( V / I ) inputs.
Contributions. In summary, this paper achieves the following:
  • Introduces Fuzzy–MAT3D, a fuzzy-partitioned CTDE–TD3 architecture for multi-string PV arrays, yielding globally Lipschitz and locally interpretable actor–critic mappings [29,31,32].
  • Establishes a global L γ -contraction for fixed–policy evaluation (and a projected L 2 ( ρ ) bound under measure invariance), reduces TD-target variance, bounds the TD3 min underestimation bias, and provides a projected Bellman residual bound with the appropriate contraction factor [16,27,39].
  • Provides an N-agent CTDE extension consistent with the above analysis and tailored to PV module coordination [18].
  • Demonstrates, under a matched control period and actuation limits with identical per-tick compute budgets (irradiance measured/logged for reference), and randomized, CRN–blocked trials, higher mean MPPT efficiency and lower steady-state oscillations than multi-agent TD3, MADDPG, and classical baselines; details appear in the Methods and Results Sections [41,42].

2. Mathematical Framework and Main Results

This section develops the mathematical backbone of our approach and states the main theoretical contributions. We model MPPT control as a discounted Markov decision process (MDP) with compact state and action spaces and augment the CTDE actor–critic pipeline with a differentiable fuzzy partition of unity over the normalized operating domain; this front-end induces global Lipschitz continuity of the actor and twin critics, uniformly bounds target magnitudes and Jacobians, and reduces TD-target variance by construction.

2.1. Setting, Notation, and Fuzzy PoU

We consider a discounted MDP with compact state space S R 7 , action space A = [ 1 , 1 ] , and discount γ ( 0 , 1 ) . Two decentralized controllers with shared parameters (CTDE) act on a PV string; rewards are bounded as in (33); hence, | Q π | ( 1 γ ) 1 .
  • Fuzzy partition of unity (PoU).
For each coordinate j { 1 , , 7 } , let { ψ ( j ) } = 1 5 be differentiable, nonnegative, and with a per-coordinate partition: ψ ( j ) ( ξ ) = 1 . Define
z ( s ) = ψ 1 ( 1 ) ( s 1 ) , , ψ 5 ( 1 ) ( s 1 ) , , ψ 1 ( 7 ) ( s 7 ) , , ψ 5 ( 7 ) ( s 7 ) R 0 35 .
φ fuzzy ( s ) = z ( s ) 7 , 1 φ fuzzy ( s ) = 1 .
  • Norm conventions.
Unless stated otherwise, · denotes the Euclidean norm on state and feature spaces, and J denotes the induced operator norm. Accordingly, L z : = sup s z ( s ) 2 and L μ : = L z / 7 quantify input Lipschitz moduli in 2 for the stacked membership map and the normalized PoU, respectively.
Because the memberships satisfy = 1 5 ψ ( j ) ( s j ) 1 for each of the seven coordinates, we have 1 z ( s ) = 7 identically, so the normalization by 7 is exact.
Since the sum is unitary per coordinate, 1 z ( s ) = 7 and, by compactness, φ fuzzy is globally Lipschitz. We write L μ = L z 7 with L z : = sup s z ( s ) .
Intuitively, a map f is L-Lipschitz if f ( x ) f ( y ) L x y for all x , y , so L bounds its worst-case slope. In an actor–critic architecture, global Lipschitz continuity implies that small perturbations in voltages, irradiance estimates, or sensor readings cannot induce arbitrarily large changes in actions or value estimates. The differentiable fuzzy PoU used here ensures that the feature map z ( s ) and the actors/critics built on top of it inherit such bounded slopes, with explicit moduli like L μ = L z / 7 , thereby regularizing the policy and the value functions at the level of the entire state space.
  • Actor–critics with PoU and TD3 smoothing.
The deterministic actor and the two critics are
a = μ θ ( s ) = tanh f θ ( φ fuzzy ( s ) ) , Q φ i ( s , a ) = h φ i φ fuzzy ( s ) , a , i { 1 , 2 } ,
with Polyak-averaged targets ( θ ¯ , φ ¯ i ) and clipped target policy noise (TD3).
Throughout the manuscript, the deterministic policy is denoted by μ θ ( s ) , where θ are the actor parameters. This notation is reserved exclusively for the policy and must not be confused with the fuzzy partition φ fuzzy ( s ) introduced in Section 2. Accordingly, L μ θ denotes the Lipschitz modulus of the actor, while L μ = L z / 7 refers to the global Lipschitz constant of the normalized partition of unity. This distinction is preserved in all subsequent sections.

2.2. Main Structural Results

Theorem 1
(Fuzzy PoU induces global Lipschitzness). On compact S,
Lip ( μ θ ) L tanh L f ( θ ) L μ ,
Lip Q φ i ( φ fuzzy , id ) L Q ( φ i ) max { L μ , 1 } .
In particular, gradient norms and TD targets remain uniformly bounded along the iterates.
Proposition 1
(Smoothed fixed-policy operator is γ -contractive in L ). Let π ¯ ( s ) = μ θ ¯ ( s ) and ε be bounded noise. Define ( T ε π ¯ Q ) ( s , a ) = E [ r + γ Q ( s , π ¯ ( s ) + ε ) s , a ] . Then
T ε π ¯ Q T ε π ¯ Q ˜ γ Q Q ˜ .
and in L 2 ( ρ ) ,
T ε π ¯ Q T ε π ¯ Q ˜ 2 , ρ γ Q Q ˜ 2 , ρ .
where ρ is the pushforward of ρ by ( s , a ) ( s , π ¯ ( s ) + ε ) . The same holds for the min-of-twins operator since it is 1-Lipschitz.
  • Twin-critic minimum.
We write m ( u , v ) : = min { u , v } and use m Q ϕ ¯ 1 , Q ϕ ¯ 2 to denote the TD3 min-of-twins target.
Proposition 2
(TD-target variance reduction under PoU). Let y = r + γ m ( Q φ ¯ 1 , Q φ ¯ 2 ) ( s , π ¯ ( s ) + ε ) with ε [ ϵ , ϵ ] independent of s conditional on s. Then
Var ( y s ) σ r 2 + γ 2 L Q 2 L μ 2 σ s 2 + σ ε 2 .
so decreasing L μ reduces the contribution of state/action perturbations to the TD-target variance.
Here, σ s 2 : = E s E [ s s ] 2 2 s is the trace of the conditional covariance of s and ε s s by design (target policy noise independent of next state given s). All Lipschitz moduli are taken with respect to the 2 norm.
Collecting these results, the fuzzy PoU and the induced Lipschitz bounds play a dual role. On the critic side, they enter the coefficient multiplying the next-state variance term in the TD-target variance bound, thereby damping the stochastic fluctuations of the updates.
On the closed-loop side, the same Lipschitz moduli appear in the ISS gain κ ( L μ , θ ) of (23), so that reducing L μ enlarges the input-to-state stability margin. This provides a direct analytical link between the fuzzy regularization mechanism and the empirical reductions in temporal-difference variance and steady-state jitter observed in the PV power trajectories.
Proposition 3
(Projected Bellman residual bound (last–layer linear)). If the critic class F L 2 ( ρ ) is closed and convex (e.g., last-layer linear) and Π F its orthogonal projection, then
Π F T ε π ¯ Q Π F T ε π ¯ Q ˜ 2 , ρ γ c Q Q ˜ 2 , ρ , c = ess sup d ρ d ρ <
If Q ^ = Π F T ε π ¯ Q ^ and Q π ¯ is a fixed point of T ε π ¯ , then
Q ^ Q π ¯ 2 , ρ 1 1 γ c ( I Π F ) T ε π ¯ Q π ¯ 2 , ρ .
If, moreover, ρ = ρ , the constant is 1 / ( 1 γ ) .
Remark 1
(On L 2 ( ρ ) contraction vs. distribution shift). More generally, for any measurable u,
T ε π ¯ u 2 , ρ γ c u 2 , ρ , c : = ess sup d ρ d ρ <
where ρ is the pushforward of ρ by ( s , a ) ( s , π ¯ ( s ) + ε ) . The bounds in (10) are Lipschitz bounds across different measures. If, in addition, ρ = ρ (e.g., when ρ is the discounted occupancy measure induced by π ¯ with the target noise), one recovers an L 2 ( ρ ) γ-contraction. Otherwise, residual bounds inherit the factor 1 / ( 1 γ c ) instead of 1 / ( 1 γ ) .
Remark 2
(Empirical estimation of c and near-invariance). Let ρ denote the (discounted) state-action occupancy under the policy used to define the evaluation operator T ε π ¯ (typically the target π ¯ ), and let ρ be its pushforward by ( s , a ) ( s , π ¯ ( s ) + ε ) . Over a finite replay window, we estimate the Radon–Nikodym–type constant
c ^ ( B , λ ) : = max b B ρ ^ ( b ) + λ ρ ^ ( b ) + λ , ρ ^ ( b ) = 1 N k = 1 N 1 { ( s k , a k ) b } .
using a measurable partition B of the (normalized) state-action domain and Laplace smoothing λ > 0 . With Polyak-averaged targets and bounded target policy noise, consecutive replay windows are empirically near-stationary, so c ^ 1 is expected; when drift is present, the projected-residual bound inherits the factor 1 / ( 1 γ c ^ ) from Proposition 3 (cf. the ρ ρ discussion following Proposition 1).
Lemma 1
(Underestimation by TD3’s minimum). Let Q i = Q + ε i with E [ ε i s , a ] = 0 . Then
E [ min { Q 1 , Q 2 } s , a ] = Q + 1 2 E [ ε 1 + ε 2 | ε 1 ε 2 | s , a ] Q .
and
0 Q E min { Q 1 , Q 2 } s , a 1 2 Var ( ε 1 ε 2 s , a ) σ Q 2 1 ρ .
where Var ( ε i s , a ) σ Q 2 and ρ : = Corr ( ε 1 , ε 2 s , a ) [ 1 , 1 ] .
  • Gaussian specialization.
If ( ε 1 , ε 2 ) are jointly normal with Var ( ε i s , a ) = σ Q 2 and correlation ρ , then using min ( x , y ) = x + y | x y | 2 and ε 1 ε 2 N 0 , 2 σ Q 2 ( 1 ρ ) ,
Q E min { Q 1 , Q 2 } s , a = 1 2 E | ε 1 ε 2 | = σ Q 1 ρ π .
Theorem 2
(Two-time-scale convergence (projected SA)). Under i.i.d./mixing replay, smoothness and boundedness of gradients, square-integrable MDS noises, and stepsizes α k = β k = , ( α k 2 + β k 2 + τ k 2 ) < , β k / α k 0 , τ k / β k 0 , the critics converge to the set of stationary points (projected flow) and the actor follows the differential inclusion θ ˙ Π Θ [ θ J ˜ ( θ ) ] with J ˜ ( θ ) = E ρ [ Q φ ( θ ) ( s , μ θ ( s ) ) ] . Every limit point of the actor is stationary.
Corollary 1
(PL/last-layer linear ⇒ uniqueness and point convergence). If, with frozen targets, the critic’s last-layer (linear) loss is L w -smooth and satisfies PL with μ PL > 0 , then the critic has a unique global minimizer and the (fast) iterates converge to it; on two time scales, w k w ( θ k ) along the slow manifold.
Theorem 3
(Finite stepsize neighborhoods with explicit constants). With constant stepsizes ( α , β , τ ) , there exist C j (depending on Lipschitz moduli and noise bounds) such that
lim sup k E dist ( w k , θ k ) , M × S 2 ( C 1 + C 4 L μ 2 ) α + ( C 2 + C 5 L μ 2 ) β + ( C 3 + C 6 L μ 2 ) τ .
Under PL, the radii admit closed-form expressions in terms of ( α , β , τ , σ M 2 , L w , μ PL ) .
Remark 3.
Theorem 2 assumes decreasing stepsizes with α k = β k = , ( α k 2 + β k 2 + τ k 2 ) < , β k / α k 0 , and τ k / β k 0 . Our implementations use constant stepsizes ( α , β , τ ) , for which the appropriate justification is provided by Theorem 3: the coupled recursions enter an explicit steady-state neighborhood whose radii depend on ( α , β , τ ) and the Lipschitz/noise constants. We do not claim that Theorem 2 applies to the constant-step regime; rather, Theorem 2 serves as the ideal decreasing-steps benchmark while Theorem 3 supports the practical regime.
In summary, Theorem 2 is formulated under standard two-time-scale stochastic-approximation conditions: (i) globally Lipschitz gradients and bounded noise, (ii) projections onto compact parameter sets, and (iii) stepsizes ( α k , β k , τ k ) such that k α k = k β k = , k ( α k 2 + β k 2 + τ k 2 ) < , β k / α k 0 , and τ k / β k 0 . Under these assumptions, the critic recursions track the projected gradient flow of their population losses with frozen targets, while the actor follows the projected differential inclusion θ ˙ Π Θ [ θ J ˜ ( θ ) ] on the slow time scale.
The PL condition in Corollary 1 strengthens this picture by ruling out spurious stationary points in the last-layer critic loss: it guarantees a unique global minimizer and exponential convergence of the fast critic iterates to w ( θ ) , so that along the slow manifold, the coupled actor–critic dynamics identify a single value function per policy parameter θ .
Theorem 4
(N-agent CTDE extension with block-norm constants). For N agents with local actors μ θ i and a joint (or per-agent) critic, under the same two-time-scale regime, the conclusions of Theorem 2 hold and the moduli aggregate as
Lip ( μ ) max i L μ , i , · , blk , i L μ , i 2 1 / 2 , · 2 , blk , L Q ( in ) analogously .
and the noise variance aggregates by sum (2-norm) or maximum (∞-norm).

2.3. ISS Margins and Steady-State Jitter (Formalized)

Consider the plant x ˙ = f ( x , d ) + g ( x ) u , with u = μ θ ( x ) and exogenous input d. Let V be a control Lyapunov function on a compact operating set X , such that
V ˙ α x x + c 1 e π ( x ) + c 2 e Q ( x ) ,
for some class- K function α and constants c 1 , 2 > 0 , where x X is a reference state defined below. As illustrated in Figure 1, the critic layer contracts rapidly towards the slow manifold M ( θ ) , so on the slow time scale, the actor evolves with small approximation errors ( e π , e Q ) , which is precisely the regime captured by (17).
Assumption 1
(Equilibrium anchoring on X ). There exists x X , such that the reference feedback and value satisfy
μ θ ( x ) = μ ( x ) , Q φ x , μ θ ( x ) = Q π ¯ x , μ θ ( x ) .
Moreover, the maps μ θ , μ , Q φ , Q π ¯ are Lipschitz on X with moduli L μ θ , L μ , L Q , L Q π ¯ , respectively.
Define the approximation errors on X , recentered at x ,
e π ( x ) : = μ θ ( x ) μ ( x ) , e Q ( x ) : = Q φ x , μ θ ( x ) Q π ¯ x , μ θ ( x ) .
By Assumption 1 and the triangle inequality,
e π ( x ) μ θ ( x ) μ θ ( x ) + μ ( x ) μ ( x ) ( L μ θ + L μ ) x x ,
and, writing a ( x ) : = μ θ ( x ) ,
We define the critic mismatch
e Q ( x ) : = Q ϕ x , a ( x ) Q π ¯ x , a ( x ) ,
and use Lipschitz continuity to obtain
e Q ( x ) = Q ϕ x , a ( x ) Q π ¯ x , a ( x ) Q ϕ x , a ( x ) Q ϕ x , a ( x ) + Q π ¯ x , a ( x ) Q π ¯ x , a ( x ) L Q + L Q π ¯ x x + a ( x ) a ( x ) L Q + L Q π ¯ 1 + L μ θ x x .
Invoking Theorem 1, the actor Lipschitz modulus satisfies L μ θ L tanh L f ( θ ) L μ , where the fuzzy partition-of-unity (PoU) induces L μ = L z / 7 . Substituting (20) and (22) into (17) yields
V ˙ α x x + c 1 L μ θ + L μ + c 2 L Q + L Q π ¯ 1 + L μ θ = : κ ( L μ , θ ) x x .
We emphasize that, among the standing assumptions used in this section, the equilibrium anchoring on X (Assumption 1), tailored to the PV MPPT operating region under PSCs, is the modeling ingredient that is most specific to the present work.
The remaining conditions—compactness of S , bounded actions, global Lipschitz continuity of the dynamics and actor–critic maps, and square-integrable noise—are in line with standard hypotheses in ISS and stochastic-approximation analyses of actor–critic schemes and are recalled here mainly to keep the exposition self-contained.
Because L μ θ scales linearly with L μ (from the fuzzy PoU), decreasing L μ strictly reduces the gain κ ( L μ , θ ) , thereby enlarging the effective decay margin in (23). This formalizes the empirical observation that smaller L μ lowers steady-state jitter by improving the ISS gain consistently with the two-time-scale picture in Figure 1.
Remark 4
(Origin-centered fallback with offsets). If an explicit anchor x is inconvenient, one may work at the origin and carry offsets:
e π ( x ) ( L μ θ + L μ ) x + e π ( 0 ) , e Q ( x ) L Q + L Q π ¯ 1 + L μ θ x + e Q ( 0 ) .
In designs that calibrate μ θ , μ , Q φ , Q π ¯ to match at the operating point (zero-bias last layers, or explicit alignment at x = 0 ), the offsets can be made negligible; the same L μ -driven conclusion follows.
Complete proofs for all results in Section 2 are provided in Appendix B.

3. Materials and Methods

3.1. Study Design and Overview

We evaluate a fuzzy-partitioned, centralized training/decentralized execution (CTDE) variant of TD3 (hereafter, Fuzzy–MAT3D) for maximum power point tracking (MPPT) under partial shading. Two local controllers execute with shared parameters, whereas training is centralized from a replay buffer. The design follows a balanced common-random-numbers (CRN) protocol across 7 benchmark scenarios (Table 3) and 20 seeds per algorithm, with a fixed evaluation horizon of T sim = 20 s. The empirical protocol, metrics, and statistical analysis were specified a priori and are detailed below. All methods and constants are chosen to satisfy bounded-target and Lipschitz regularity assumptions used in the theory.

3.2. PV Plant and Power-Stage Model

  • Single-diode module model.
We adopt the classical single-diode model with series and shunt resistances:
I = I ph ( G , T ) I 0 ( T ) exp V + I R s n V T 1 V + I R s R s h ,
V T = k B T q , I ph ( G , T ) = κ G G + κ T T T ref ,
with irradiance G and cell temperature T. Two modules in series yield V string = V 1 + V 2 , I string = I 1 = I 2 . Partial shading is represented by ( G 1 , G 2 ) profiles as in Table 3. The coefficients ( κ G , κ T ) are obtained by least–squares calibration from the module datasheet.
Remark 5
(Bypass diodes under PSCs). Commercial modules include substring bypass diodes (see Section 5 and Table 10. For substring ℓ, we considered an augmented branch
I bd , = I S , exp V n bd V T 1 1 { V < 0 } ,
in parallel with the cell branch. On the two-module bench and shading scripts of Table 3, measured string currents rarely forward-bias bypass diodes. Enabling this branch in post hoc simulations changed η by less than 0.5 % . Thus the single-diode model suffices for our scenarios.

3.3. DC/DC Stage and Control Interface

Let d [ 0 , 1 ] denote the converter duty cycle. The averaged stage dynamics are
x ˙ = f ( x , d ) + w , y = h ( x ) , y = ( V string , I string ) ,
and the agent issues an incremental duty command a [ 1 , 1 ] mapped to d t + 1 = sat d t + Δ d ( a ) .
Throughout all simulations and hardware runs, the incremental duty update is
Δ d ( a ) : = clip κ d a , Δ d max , Δ d max , a [ 1 , 1 ] , 0 < κ d Δ d max ,
followed by
d t + 1 = clip d t + Δ d ( a t ) , d min , d max .
Unless stated otherwise, we set κ d = Δ d max so that a = ± 1 affects the maximum admissible per-tick duty change. No additional slew-rate limiter or nonlinearity is applied beyond the clipping in (27) and (28). For completeness, clip ( x , , u ) : = min { u , max { , x } } .

3.4. Operating Constraints

We enforce plant and safety limits
d min d t d max , | Δ d t | Δ d max ,
0 V i V oc , i , 0 I i I sc , i ,
and impose parameter projection Proj and gradient clipping (critics/actor) to keep all iterates bounded, consistent with the ISS and SA analyses.

3.5. Observations, Actions, and Horizon

Each module provides a local observation s R 7 bounded componentwise by
s min = ( 0 , 0 , 0 , 200 , 200 , 352.408 , 49.9 ) , s max = ( 49.9 , 9 , 352 , 1000 , 1000 , 352.408 , 49.9 ) .
The local observation s R 7 includes ( V i , I i , P i , G i , G n ( i ) , Δ P i , Δ V i ) . RL methods consume the full tuple. Classical baselines (P&O, INC, PSO) operate with their standard ( V / I ) inputs; we nonetheless log ( G i , G n ( i ) ) for all runs to enable like-for-like post hoc analyses and MPP reference computation. Actions are incremental duty-cycle commands in [ 1 , 1 ] . T sim = 20   s ; step scenarios use t change = 0.35   s .
RL agents consume the full tuple s = ( V i , I i , P i , G i , G n ( i ) , Δ P i , Δ V i ) because the irradiance channels ( G i , G n ( i ) ) are available on the intended hardware and stabilize near-MPP behavior under abrupt PSC changes. Classical baselines (P&O, INC, PSO) are deliberately kept on their canonical ( V / I ) inputs to preserve standard formulations rather than re-tune them into nonstandard variants. All methods share the same wall-clock control period and actuation limits, and we log irradiance for every run (for MPP reference and post hoc analyses). This preserves compute and timing parity while making the sensing assumption explicit.

3.6. Fuzzy Features and Function Approximators

Each state coordinate is normalized to [ 0 , 1 ] and equipped with L = 5 differentiable memberships { ψ ( j ) } = 1 5 per coordinate j { 1 , , 7 } , combined via a coordinate-wise softmax so that = 1 5 ψ ( j ) ( s j ) 1 . Stacking the 35 responses yields z ( s ) R 0 35 with the global identity 1 z ( s ) = 7 ; hence, the normalized fuzzy map
φ fuzzy ( s ) = z ( s ) 7 , 1 φ fuzzy ( s ) = 1 .
This induces global input Lipschitz constants for the actor and critics used below. The actor is a two-hidden-layer MLP with a tanh head; twin critics share the fuzzy front-end but maintain separate downstream towers.

3.7. Fuzzy-Partition Parameters

Each raw state coordinate is affinely normalized to [ 0 , 1 ] and equipped with L = 5 softmax memberships having centers { 0 , 0.25 , 0.5 , 0.75 , 1 } and a per-coordinate temperature τ j > 0 . Writing z j ( s j ) R 0 5 for the jth coordinate’s membership vector with 1 z j ( s j ) = 1 , we stack z ( s ) : = [ z 1 ( s 1 ) ; ; z 7 ( s 7 ) ] R 0 35 , so that 1 z ( s ) = 7 .
To avoid overloading the policy notation μ θ ( · ) , we denote the fuzzy partition by φ fuzzy ( s ) : = z ( s ) / 7 . Observe that φ fuzzy ( s ) = z ( s ) / 7 ; for notational consistency with Section 2, we write L μ : = L z / 7 for the global input Lipschitz constant of the normalized PoU. Table 1 reports ( τ j ) and the induced ( L z , L μ ) .

3.8. Reward and Normalization

The saturated, normalized reward used throughout is
R = tanh κ P 1 + P 2 P mp , tot α | Δ P 1 | + | Δ P 2 | P mp , tot β | Δ V 1 | + | Δ V 2 | V oc , tot , ( α , β , κ ) = ( 0.1 , 0.05 , 2 ) ,
with physical normalization constants P mp , tot = 704.8   W and V oc , tot = 99.8   V . This ensures R ( 1 , 1 ) and bounded targets.

3.9. Training Protocol (TD3 Under CTDE)

We employ TD3 with centralized replay and decentralized execution:
  • Discount factor γ = 0.99 ; minibatch size M = 256 ; policy update period K = 2 ; Polyak factor τ = 10 3 .
  • Target policy smoothing noise ε [ 0.5 , 0.5 ] , sampled i.i.d. and independent of s conditional on s; target networks ( θ ¯ , φ ¯ i ) are Polyak-averaged.
  • Parameter updates include explicit projections onto compact convex sets W and Θ ; gradient clipping is applied to all updates.
Algorithm 1 summarizes the CTDE training loop used. Table 2 summarizes the training hyperparameters adopted for Fuzzy–MAT3D. These values implement a two-time-scale separation (fast critics and slow actor) with constant stepsizes; accordingly, Theorem 3 provides the formal justification for the training regime used in our experiments.
Training episodes used a longer horizon ( Δ t = 0.06  s; up to 12,000 steps 720  s per episode) to improve replay diversity. All reported performance metrics, however, were computed on a fixed evaluation horizon T sim = 20   s across the seven scenarios to ensure comparability between algorithms.
Algorithm 1 Fuzzy–MAT3D (CTDE–TD3 for two PV modules in series)
Require:  γ = 0.99 , M = 256 , K = 2 , τ = 10 3 ; stepsizes { α k } (critics), { β k } (actor).
1: Initialize θ , φ 1 , φ 2 ; set θ ¯ θ , φ ¯ i φ i ; empty buffer D .
2: for episodes do
3:       Reset irradiances G 1 , G 2 U [ 200 , 1000 ] ; initialize V i U [ 20 , 49.9 ] .
4:       for time step t do
5:           Apply a t ( i ) = μ θ ( s t ( i ) ) ; observe s t + 1 ( i ) ; compute r t by (33); push to D .
6:           Targets y = r + γ min i Q φ ¯ i ( s , μ θ ¯ ( s ) + ε ) ; update φ i Proj W φ i α φ i MSE ; Polyak-average φ ¯ i .
7:            if  t mod K = 0 then ascend θ Proj Θ θ + β θ J ( θ ) ; Polyak–average θ ¯ .
8:           end if
9:       end for
10: end for

3.10. Benchmark Scenarios, CRN Protocol, and Metrics

Seven benchmark scenarios are used; step cases share the same change time t change = 0.35   s , as shown in Table 3.
CRN protocol. All algorithms use identical random seeds, initializations, and shading scripts within each scenario to enable paired, blocked inferences. We run 20 independent seeds per scenario (total N = 140 replications/algorithm).
At each sampling instant, the reference power is computed via a 1D constrained search
P mpp ( t ) = max V [ 0 , V oc , tot ] V I string V ; G 1 ( t ) , G 2 ( t ) , T ( t ) .
The primary endpoint is MPPT efficiency
η [ % ] = 100 0 20   s P pv ( t ) d t 0 20   s P mpp ( t ) d t 100 k P pv ( t k ) k P mpp ( t k ) .
  • Secondary endpoints.
For step scenarios, we report (i) the settling time t s , defined as the smallest t 0.35   s such that | P pv ( u ) P mpp ( u ) | 0.02 P mpp ( u ) for all u [ t , t + Δ ] with Δ = 0.2 × T sim = 4   s ; and (ii) steady-state oscillation, the standard deviation of P pv ( t ) over the last 20 % of 20   s .
  • Normalization note.
P m p , t o t = 704.816 W and V o c , t o t = 99.8 V come from calibrated limits of the single-diode model used for normalization over the two-module operating domain, and thus closely match—but do not strictly equal—the STC datasheet values of the series string ( 700 W and 92.82 V ; Table 10).

3.11. Baselines and Fairness Constraints

All algorithms share the same control period and actuation limits (27) and (28). For PSO, we cap the per-tick particles × iterations so the search finishes within the 60 ms deadline (Table 4). RL methods perform exactly one forward pass per module per tick. Classical baselines operate with their standard ( V / I ) inputs; irradiance ( G i , G n ( i ) ) is logged and used only for reference MPP computation and post hoc analyses, keeping budget parity in wall-clock and actuation.
Under the 60 ms control period, both PSO’s particles × iterations and the RL forward passes are strictly confined to this wall-clock budget; no baseline was granted extra evaluations or sensing beyond its canonical inputs.

3.12. Computational Budget and Real-Time Feasibility

RL inference requires exactly one forward pass per tick per module. We log wall-clock inference latencies (p50/p95/p99) to verify meeting the control deadline. The PSO controller is constrained by an identical per-tick computational budget.
Table 4 reports inference latencies (p50/p95/p99) measured on the evaluation host used for simulation; Section 5 (Table 12) reports on-device latencies measured on the hardware bench under the same 60 ms period.

3.13. Statistical Analysis Plan

Primary confirmatory analysis comprises (i) a one-way ANOVA on η across algorithms; and (ii) blocked Dunnett contrasts versus Fuzzy–MAT3D (blocking by scenario and seed). Secondary analyses include CRN-paired one-sided t-tests (Fuzzy–MAT3D minus comparator) with Benjamini–Hochberg FDR control across hypotheses, and linear mixed-effects models with random intercepts for scenario and seed. We report distributional summaries and two-sided 95 % Student-t confidence intervals; diagnostics include normality of residuals and Levene’s test for homoscedasticity.
When executed on physical hardware, we mirror the seven scenarios of Table 3 with synchronized V / I sensing and logging and enforce the same limits and deadlines as in simulation. Each algorithm is evaluated with n 10 bench replicates per scenario, blocked by scenario and replicate ID. The reference P mpp ( t ) is computed from a calibrated single-diode model parameterized by measured ( G , T ) .

4. Results

We report both learning behavior and fixed-horizon performance under a balanced common-random-numbers (CRN) design comprising 7 scenarios and 20 seeds per algorithm ( N = 140 replications/algorithm). The primary endpoint is MPPT (Maximum Power Point Tracking) efficiency, η (Def. (35)); secondary endpoints quantify transient speed and steady operation: settling time t s (2% band, window Δ = 0.2 × T sim = 4   s ) and steady-state oscillation (SD of P pv over the last 20 % of 20   s ). Unless stated otherwise, all summaries are CRN-blocked; means are reported with two-sided 95% Student-t confidence intervals; and hypothesis testing follows the pre-specified hierarchy of a global one-way ANOVA, confirmatory blocked-Dunnett contrasts versus Fuzzy–MAT3D, and CRN-paired one-sided t tests with BH–FDR control.
We begin with training dynamics for the RL controllers (Figure 2), then present the CRN-blocked comparison across six algorithms via distributional views and mean-CI summaries (Figure 3 and Figure 4). Stability metrics are analyzed next: steady-state oscillation (Figure 5) and settling time (Figure 6), with effect sizes and paired inferences tabulated in Tables 7 and 8 alongside aggregate RL performance (Table 9). We subsequently dissect the speed–stability trade-off (Section 4.3), provide ablations and sensitivity checks, and include an illustrative step disturbance case study. Exploratory Tukey–Kramer intervals are relegated to Appendix A to avoid conflating descriptive and confirmatory claims.
Axes and table headers use a unified nomenclature: “MPPT efficiency, η [%]’’, “settling time, t s [s]’’, and “steady-state oscillation [W]’’. All figures and tables reflect the same CRN blocking and horizon T sim = 20   s to ensure like-for-like comparisons across algorithms and scenarios.

4.1. Training Dynamics (RL Group)

We begin by characterizing the learning behavior of the three RL controllers under the same CTDE protocol and logging setup. Figure 2 reports the evolution of the cumulative return and a moving-average return across training episodes. Two features are salient: (i) the Fuzzy–MAT3D trajectories exhibit visibly damped volatility and earlier stabilization relative to MAT3D and MADDPG, and (ii) improvements accrue more steadily once the critics have entered their fast-contraction regime, consistent with the variance-reduction and non-expansivity mechanisms developed in Section 2. These dynamics foreshadow the downstream fixed-horizon advantages—higher MPPT efficiency and lower steady-state oscillation—documented in the comparative analyses that follow.

4.2. Global Comparison Across Six Algorithms (CRN-Blocked)

This subsection presents a global comparison of six algorithms under a CRN-blocked design, matching stochastic trajectories to control heterogeneity and reduce estimator variance. We report means, 95% confidence intervals, standardized effect sizes (Hedges’ g), and average ranks, and conduct blocked ANOVA with Dunnett/Holm corrections for multiple comparisons against the reference.
Table 5 reports the scenario-wise MPPT efficiency (mean ± 95% CI) with N = 20 seeds per scenario and algorithm. Across all seven scenarios, Fuzzy–MAT3D attains the highest mean efficiency and consistently narrow confidence intervals for example, 94.5 ± 0.9 % under Standard Condition and 90.2 ± 1.5 % under Deep Shadow. These per-scenario results mirror the aggregate advantages shown in Figure 3 and Figure 4 and reinforce the robustness of the fuzzy-regularized approach across both static and step-change conditions.
  • Inferential statistics.
A global one-way ANOVA on η rejects equality of means (F-statistic =   88.965 , p   =   5.127 × 10 75 ; N   =   140 per group), confirming the significant differences in performance distributions visually apparent in Figure 3 and Figure 4. Following Section 3.13, we report blocked Dunnett in the supplement (see, e.g., the exploratory post hoc analysis in Figure A1) and CRN-paired tests below.
Residual Q–Q plots (to check normality) and Levene’s test (for homoscedasticity) were performed to validate the ANOVA assumptions. No gross departures from normality were found, and Levene’s test did not reject homoscedasticity within groups at α = 0.05 ; all confirmatory inferences therefore follow the pre-specified hierarchy.

4.3. Settling Time vs. Stability

As shown by the CRN-blocked boxplots in Figure 6, Fuzzy–MAT3D exhibits a longer settling time t s than the plain TD3 baseline (MAT3D) and MADDPG. The RL-only aggregate (Table 9) reports t s = 7.537 ± 5.397 s for Fuzzy–MAT3D, versus 1.573 ± 3.771 s for MAT3D and 3.239 ± 6.175 s for MADDPG.
In our protocol, t s is the first time after the step at which the trajectory remains within a ± 2 % band around the instantaneous MPP for a contiguous window of length Δ = 0.2 × T sim = 4   s (see Section 3.10). This definition is agnostic to (i) how much overshoot/undershoot occurred before entering the band and (ii) whether the controller subsequently leaves the band again after the Δ -window has elapsed. Hence, a controller can register a small t s by grazing the band early with aggressive moves yet sustain sizable steady-state jitter or even drift later; conversely, a more conservative controller can register a larger t s while delivering substantially better long-horizon behavior.
Mechanistic explanation for Fuzzy–MAT3D’s larger t s . The fuzzy PoU front-end enforces global Lipschitzness on the actor/critics (Theorem 1) and, together with target smoothing, yields a locally non-expansive fixed-policy operator (Proposition 1); moreover, the twin-critic minimum introduces a small, correlation-dependent negative bias (Lemma 1). These ingredients reduce TD-target variance and damp fast transients, but they also make the closed loop deliberately conservative right after abrupt shading steps, prioritizing a monotone approach over rapid excursions. In short, Fuzzy–MAT3D trades a few seconds of responsiveness for markedly improved stability and bias robustness.
  • Why Fuzzy–MAT3D is still preferable.
Three lines of evidence—empirical, statistical, and control-theoretic—support Fuzzy–MAT3D despite its larger t s :
  • Dominant long-horizon energy capture. Across the CRN–blocked study N = 140 per algorithm), Fuzzy–MAT3D achieves the highest mean MPPT efficiency ( 92.032 % ± 4.014 % ), substantially above MAT3D ( 80.144 % ± 13.050 % ) and MADDPG ( 67.978 % ± 12.402 % ); CRN-paired tests yield large effects (Cohen’s d up to 1.874 ) and essentially zero q-values (BH–FDR). Thus, any energy loss from a slower transient is more than offset by sustained operation near the MPP over the full horizon (Table 6, Table 7 and Table 8).
  • Much lower steady-state jitter. Figure 5 shows the steady-state oscillation of Fuzzy–MAT3D tightly concentrated near zero, whereas MAT3D and MADDPG display broad, heavy-tailed distributions. The RL-only aggregate reports 1.362 ± 2.013 W for Fuzzy–MAT3D vs. 37.963 ± 70.396 W (MAT3D) and 52.993 ± 34.608 W (MADDPG). Lower jitter not only improves energy capture but also reduces switching stress and thermal cycling in the power stage (Table 9).
  • Design intent: stability margins over aggressiveness. Theoretically, the fuzzy-induced Lipschitz constant L μ improves ISS gains, while the min-ensemble plus target smoothing controls overestimation and high-frequency actuation. This combination is expected to enlarge decay margins but reduce “snap-to-setpoint” behavior—precisely the speed–stability trade-off seen in Figure 6.
The distribution in Figure 6 shows Fuzzy–MAT3D with a higher median t s and a long right tail driven by the most abrupt step cases, which is consistent with its conservative transient policy. Yet, when read jointly with Figure 5 (steady-state oscillation) and the efficiency summaries (Figure 3 and Figure 4; Table 6 and Table 9), the picture is unequivocal: Fuzzy–MAT3D sits on a better Pareto front—maximizing energy and minimizing jitter—while conceding some transient speed. In applications like PV MPPT under PSC, where (i) steps are intermittent and (ii) the objective is integral energy over minutes–hours, this Pareto choice is the correct one.
The ANOVA and CRN-paired tests in Table 6, Table 7 and Table 8 indicate that the superiority of Fuzzy–MAT3D over MAT3D and MADDPG is statistically significant across all seven scenarios, with very small p-values and large paired effect sizes.
Because the design includes both static PSC profiles and step-change scenarios, the reported efficiency η 92.0 % ± 4.0 % should be interpreted as a robust average over a representative family of practically relevant shading patterns.
We do not claim universal optimality beyond this family, but the ISS and Lipschitz analysis suggest that the qualitative advantages of Fuzzy–MAT3D should persist under other smooth, slowly varying PSC profiles; extending the experimental design to more aggressive, rapidly varying shading remains an interesting direction for future work.
Because t s declares success after any continuous Δ -window within the band, it cannot penalize later departures from the band. This explains why algorithms with aggressive, oscillatory responses can display deceptively small t s while still underperforming in energy and stability. Our CRN-blocked analysis therefore treats t s as a secondary indicator to be interpreted alongside efficiency and steady-state metrics (Figure 5, Table 6, Table 7, Table 8 and Table 9).
Fuzzy–MAT3D is intentionally conservative around abrupt changes; this yields a larger settling time but confers superior stability and decisively better energy tracking. In the aggregate, and for the operational goals of MPPT under partial shading, Fuzzy–MAT3D’s trade-off is the desirable one.
Concretely, comparing the aggregate statistics in Table 6 and Table 9, Fuzzy–MAT3D sacrifices about 6 s of additional settling time on average (7.54 s vs. 1.57 s for MAT3D) in exchange for roughly + 12 percentage points in MPPT efficiency (92.0% vs. 80.1%) and a reduction of 36 W in steady-state jitter (1.36 W vs. 37.96 W), which is an advantageous trade-off for energy-centric applications.
As complementary, descriptive evidence, Appendix A, Figure A1 reports Tukey–Kramer confidence intervals for all pairwise algorithmic contrasts. Consistent with the CRN-blocked summaries and directionally aligned with the confirmatory blocked-Dunnett analysis, these intervals place Fuzzy–MAT3D above both MAT3D and MADDPG in mean MPPT efficiency (pairwise CIs against Fuzzy–MAT3D exclude zero), with the largest deficit observed for MADDPG. We therefore treat this panel as supportive context for the superiority ordering rather than an independent inferential claim.

4.4. RL-only Aggregate Across Replications

Table 9 aggregates RL-only results across CRN-blocked replications, showing that Fuzzy–MAT3D attains the highest mean MPPT efficiency η with markedly lower steady-state oscillation than MAT3D and MADDPG. Conversely, Fuzzy–MAT3D exhibits a larger settling time t s , reflecting the study’s speed–stability trade-off and the conservative regulation induced by the fuzzy PoU front-end.
The empirical distribution of MPPT efficiency across all CRN-blocked replications is shown in Figure 7, complementing the notched boxplots and mean–CI panels by revealing the full shape and tails of the distributions. This panel contextualizes central-tendency summaries with dispersion and skewness across scenarios and seeds.

4.5. Case Study: Step Disturbance

As shown in Figure 8, after the step at t = 0.35   s , Fuzzy–MAT3D approaches P MPP monotonically and maintains it with negligible steady-state jitter, whereas MADDPG overshoots and develops sustained oscillations that depress the average power; MAT3D converges slowly and under-tracks the MPP. The lower panel explains the mechanism: Fuzzy–MAT3D rapidly desaturates the duty cycle and then holds an almost constant command, while the other agents continue exciting the plant—consistent with the ISS-based stability rationale in Section 2.3.

5. Experimental Validation

The hardware experiments were conducted on a two-module series string whose electrical ratings closely match the normalization constants used in the simulations. For clarity, P m p , t o t = 704.816 W and V o c , t o t = 99.8 V arise from the calibrated model limits employed for normalization, not strictly from the (Standard Test Conditions) STC datasheet values of the two-module string ( 700 W and 92.82 V ; Table 10).
Table 10 details the per-module specifications and the resulting values for the series connection adopted in the laboratory setup. This configuration ensures that the experimental plant reproduces the same operating envelope as the simulated one, thereby enabling a direct comparison of maximum power point tracking (MPPT) performance under partial shading scenarios.

5.1. Hardware Validation: Objectives, Bench, and Real-Time Deployment

We validate Fuzzy–MAT3D on real hardware to test three pre-registered hypotheses: (i) it preserves its MPPT efficiency advantage, η , over all baselines; (ii) it reduces steady-state oscillation; and (iii) it meets the real-time budget under the same sensing/actuation constraints. The seven benchmark scenarios of Table 3 are replayed verbatim for hardware runs, and metrics follow the simulation definitions in Section 3 (efficiency η , settling time t s with a 2% band and window Δ = 0.2 × T sim = 4   s , and steady-state oscillation).
The experimental bench comprises two PV modules in series with reproducible, controlled shadows; a DC/DC stage (boost or buck-boost) instrumented with voltage/current sensing and governed by an MCU/SoC; and standard protection elements (V/I limits and watchdog). Power is measured with a DC power analyzer, the shunt+DAQ path is employed only for calibration checks, all signals are timestamp-synchronized, and module temperatures are monitored via thermocouples. This setup mirrors the plant assumptions in Section 3 and enables like-for-like comparisons with the simulation protocol.
The deployed actor μ θ is identical to its simulated counterpart: it uses the per-coordinate fuzzy partition of unity (coordinate-wise softmax), so that 1 z ( s ) = 7 and L μ = L z / 7 hold exactly as in Equations (1) and (2). To assess real-time feasibility, we log inference latency (p50/p95/p99) and the fraction of control deadlines missed over each run under the same sensing/update period used for the baselines; these logs are reported alongside the performance metrics.
The overall architecture of the experimental bench and measurement points is summarized in Figure 9. Figure 10 shows the instrumented bench used in all hardware runs: a two-module series string connected to the embedded controller and a laptop that logs synchronized voltage–current traces under. Figure 11 complements this view by documenting the site-level deployment (left) and the tidy series interconnect with measurement leads (right); the panels are oriented to avoid self-shading from the cabling, and the controller–laptop pair provides the time alignment required by our MPPT metrics.

5.2. Protocols and Scenarios

We executed the seven scenarios in Table 3 on the physical bench described in Section 5.1 (two PV modules in series, DC/DC power stage, synchronized V/I sensing and logging). Each run lasted 20 s ; step-change scenarios used a reproducible shadowing script with change time t change = 0.35 s , implemented by calibrated masks/shutters, while static scenarios used fixed mask configurations.
For every scenario, each algorithm was evaluated with n 10 hardware replicates ( 7 × 10 = 70 per algorithm), blocked by scenario and replicate ID. Before every run, we restored the plant to a standardized initial condition (open-circuit reset and controlled pre-charge), verified sensor zeroing, and logged the initial terminal voltages and duty-cycle state. The identical irradiance/shadowing scripts and reset procedure were replayed across algorithms to ensure like-for-like comparisons on real hardware.
The reference P mpp ( t ) is computed from a calibrated single-diode model parameterized by the measured ( G , T ) ; the model parameters are calibrated offline using periodic I–V sweeps. At evaluation time, the instantaneous MPP is obtained via a 1D search on the model. We compute η (Equation (35)), settling time t s (2% band, window Δ = 0.2 × T sim = 4   s , steady-state oscillation (last 20%), lost energy, and latency.

5.3. Experimental Results

Table 11 reports the scenario-wise hardwareMPPT efficiency (mean ± 95% CI; n 10 per scenario/algorithm). Across all seven scenarios, Fuzzy–MAT3D consistently leads—for instance, 93.5 % ± 1.1 % under Standard Conditionand 89.0 % ± 1.8 % under Deep Shadow—mirroring the aggregate hardware advantages shown in Figure 12 and Figure 13. These results indicate that the gains of the fuzzy-regularized controller persist across static and step-change conditions.
To verify real-time feasibility on the bench, we report median and tail inference latencies (p50/p95/p99) under the same 60 ms control period used throughout the study. As shown in Table 12, all RL controllers exhibit sub 5 ms p99 latency on hardware, comfortably below the 60 ms deadline, while the PSO controller is constrained by an identical per-tick budget (16 × 3).

5.4. Discussion of Experimental Results

The hardware summary in Table 13 confirms a decisive advantage for Fuzzy–MAT3D in cumulative energy capture: it attains the highest mean MPPT efficiency ( η = 91.296 ± 4.057 %), surpassing all classical and RL baselines—PSO ( 83.233 ± 7.919 ), MAT3D ( 78.791 ± 12.493 ), P&O ( 75.447 ± 11.794 ), INC ( 74.086 ± 10.990 ), and MADDPG ( 67.515 ± 12.881 ). Figure 12 places the entire efficiency distribution of Fuzzy–MAT3D at the upper end among competitors, while Figure 13 shows its mean with tight 95 % confidence intervals centered near 91 % , consistent with the low empirical variability reported in the table. The magnitude of this efficiency gap is not cosmetic: with identical sensing, actuation, and computation budgets, the fuzzy partition yields a controller that consistently converts available irradiance into electrical work more effectively over the whole horizon.
A core strength of Fuzzy–MAT3D in hardware is its markedly lower steady-state power ripple: 7.837 ± 8.607 W, compared against 25.921 ± 26.580 W (PSO), 36.724 ± 29.736 W (INC), 40.015 ± 23.916 W (P&O), 45.899 ± 53.620 W (MAT3D), and 62.451 ± 38.205 W (MADDPG). Figure 14 makes this contrast visually salient: the Fuzzy–MAT3D box is compressed near the origin, while non-fuzzy RL baselines are both higher in level and heavier-tailed. This reduction in jitter has two immediate consequences: (i) it directly improves integral efficiency (less dithering around the MPP), and (ii) it reduces switching and thermal stress on the power stage, an operational benefit that rarely appears in single-number leaderboards but matters in deployment. Mechanistically, these outcomes align with the theory: the differentiable fuzzy partition of unity imposes global input Lipschitzness on the actor–critic towers, the smoothed fixed-policy operator is non-expansive, and target variance contracts with the fuzzy Lipschitz modulus—together damping high-frequency actuation and steady-state ripple (Lemma 1, Proposition 1, Proposition 2).
Figure 15 and Table 13 show a deliberate speed–stability trade-off. Fuzzy–MAT3D exhibits a longer settling time ( t s = 8.097 ± 4.612 s) than MAT3D ( 2.585 ± 2.264 s) and P&O/INC (about 3–5 s), though notably not the slowest overall (PSO averages 9.623 ± 3.941 s). Interpreted in context, this is an intentional design choice: the fuzzy regularizer and TD3 target smoothing bias the closed loop toward the monotone approach with bounded slopes, avoiding overshoot-induced re-entries into (and exits from) the tolerance band that can make t s look artificially small while harming long-horizon yield. In other words, a few additional seconds of conservative transient are exchanged for a far more stable steady state, which is precisely where PV systems spend most of their time.
Read jointly, Figure 12, Figure 13, Figure 14 and Figure 15 and Table 13 portray a coherent Pareto frontier: Fuzzy–MAT3D simultaneously maximizes energy yield and minimizes steady-state oscillation, conceding some transient speed relative to aggressive baselines. For PV MPPT under partial shading—where step disturbances are intermittent and the objective is cumulative energy over minutes to hours—this Pareto point is preferable. The behavior matches the theory-first analysis: fuzzy-induced global Lipschitzness and the two-time-scale TD3 dynamics (critics fast and actor slow) enlarge practical ISS margins and shrink TD-target variance; at the hardware level, those constants materialize as tighter power control with minimal ripple and robust tracking near the MPPT. Complementarily, the convergence framework for fuzzy–MAT3D (CTDE–TD3) guarantees that, under bounded targets, Lipschitz networks, and projected iterates, the actor asymptotically follows a well-posed gradient flow on a compact set—explaining the stable learning and execution seen here.
In hardware, the fuzzy-augmented controller is not merely more “accurate” on average; it is structurally better behaved. Its low oscillation regime reduces wear-and-tear and converts more irradiance into usable energy, and its measured t s reflects conservative transients rather than sluggish control. For safety-critical, efficiency-driven PV deployments, this combination—high η , minimal jitter, and bounded transients—constitutes a compelling operational advantage.

6. Discussion

Our working hypothesis was that inserting a differentiable fuzzy partition of unity in front of the actor–critic would (i) enforce global input Lipschitzness and lower TD-target variance, (ii) render fixed-policy evaluation non-expansive with the correct  L contraction factor κ = γ , (iii) enlarge the closed-loop ISS margin, and (iv) convert constant stepsizes into explicit steady-state neighborhoods; these mechanisms are formalized in Section 2.2.
Empirically, the resulting controller attains higher MPPT efficiency with markedly lower steady-state jitter while accepting a more conservative transient—precisely the speed–stability trade-off expected from the theory and consonant with prior observations that classical P&O/INC and PSO approaches tend to exchange rapid steps for oscillatory behavior under PSCs, whereas unregularized RL baselines (e.g., MAT3D and MADDPG) can amplify critic noise into unstable actuation. In the broader context of learning-based power electronics, the fuzzy layer acts as a structural regularizer that improves actor–critic conditioning and yields closed-loop behavior aligned with energy-centric objectives and hardware stress constraints, thereby offering a principled alternative to ad hoc damping or heuristic dithering. These interpretations are consistent with the controlled CRN-blocked study and the theory-first analysis reported herein.
We establish stationarity (not global optimality) and adopt a replay idealization; nonetheless, parameter projections, clipping, and bounded target noise narrow the assumption–implementation gap. The residual bounds depend on distribution shift (Remark 1), and the practical ISS gains in Section 2.3 inherit the usual modeling idealizations.

Future Work

Building on the present results, future work will (i) close the distribution-shift loop by learning replay/behavior policies that better align ρ and its pushforward, tightening the constants in the projected Bellman bounds (Proposition 3). (ii) Automate the partition design (number of memberships and temperature) and study its effect on variance and twin-critic correlation, building on Proposition 2 and Lemma 1. (iii) Couple ISS-style margins (Section 2.3) with barrier certificates and latency/quantization models for hardware-level safety guarantees. (iv) Extend the CTDE analysis to larger N and partial observability, and benchmark against entropy-regularized and trust-region variants under identical sensing/compute budgets. (v) Translate the energy-centric gains to longer horizons and field deployments, including degradation, sensor drift, and grid-level constraints.
Beyond the specific two-module string considered here, the fuzzy-regularized CTDE–TD3 architecture is applicable to other domains where (i) the dynamics admit a compact operating envelope and (ii) safety or comfort requirements demand smooth closed-loop responses. Examples include cooperative voltage control in DC microgrids, coordinated charging of electric-vehicle fleets, and frequency regulation with distributed energy resources, where the PoU structure can encode network topology or operating regions. From the scaling viewpoint, Theorem 4 already shows that the Lipschitz moduli of the N-agent extension grow in a controlled way under block norms, so that Fuzzy–MAT3D can in principle be deployed on larger PV arrays by assigning one agent per string or module cluster. In such settings, additional work is needed to account for partial observability and communication constraints, but the core Lipschitz and ISS guarantees remain valid.

7. Conclusions

Fuzzy–MAT3D provides a theoretically grounded and empirically validated controller for PV MPPT under partial shading. From an energy-systems perspective, raising MPPT efficiency to around 92% under PSCs, while suppressing steady-state oscillations, directly translates into higher energy yield and reduced stress on power-electronic interfaces, contributing to more reliable and predictable PV generation at scale. The corrected theory includes a properly contracted fixed-policy operator ( κ = γ in L ), a projected Bellman residual bound with distribution-shift clarification, an explicit two-time-scale theorem with a PL uniqueness corollary, finite-stepsize neighborhoods with constants, a min-bias analysis for TD3, and a clean N-agent CTDE extension.

Author Contributions

Conceptualization, D.O.-M. and D.L.-C.; methodology, D.O.-M.; software, D.O.-M.; validation, L.A.P.-D., A.G.R.-R. and F.G.-L.; formal analysis, D.O.-M.; investigation, D.O.-M., L.A.P.-D., A.G.R.-R., and F.G.-L.; resources, D.L.-C.; data curation, D.O.-M. and L.A.P.-D.; writing—original draft preparation, D.O.-M.; writing—review and editing, D.L.-C., L.A.P.-D., A.G.R.-R., and F.G.-L.; visualization, D.O.-M.; supervision, D.L.-C.; project administration, D.L.-C.; funding acquisition, D.L.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

The authors acknowledge the administrative and technical support provided by the laboratories of Universidad Autónoma de Ciudad Juárez during the preparation and execution of the experiments. The authors also thank colleagues for helpful discussions regarding simulation reproducibility and hardware setup.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MPPTMaximum Power Point Tracking
PSCPartial Shading Condition(s)
CTDECentralized Training, Decentralized Execution
PoUPartition of Unity
ISSInput-to-State Stability
CRNCommon Random Numbers
RLReinforcement Learning
TD3Twin Delayed Deep Deterministic Policy Gradient
LMELinear Mixed-Effects (model)
MPPMaximum Power Point

Appendix A. Exploratory Post-Hoc (Tukey–Kramer)

The Tukey–Kramer exploratory CIs show that Fuzzy–MAT3D attains a significantly higher mean MPPT efficiency than both MAT3D and MADDPG (pairwise CIs vs. Fuzzy–MAT3D exclude zero), with the largest deficit observed for MADDPG; see Appendix A, Figure A1. This pattern mirrors the CRN-blocked summaries and is directionally consistent with the confirmatory blocked-Dunnett contrasts in Section 3; accordingly, we treat this panel as descriptive support for the superiority ordering rather than a stand-alone inferential claim.
Figure A1. Tukey–Kramer CIs.
Figure A1. Tukey–Kramer CIs.
Applsci 15 12776 g0a1

Appendix B. Complete Proof for Section 2.2

Appendix B.1. Notation and Generic Facts

For a Lipschitz map F with constant L F and any x , y in its domain, F ( x ) F ( y ) L F x y . Orthogonal projections in Hilbert spaces are non-expansive. For any random variable X, E : | X | E : [ X 2 ] .

Appendix B.2. Proof of Theorem 1 (Fuzzy PoU Induces Global Lipschitzness)

From (1) and (2), we have φ fuzzy ( s ) φ fuzzy ( s ˜ ) L μ s s ˜ with L μ = L z / 7 . Since tanh is 1-Lipschitz and f θ is L f ( θ ) -Lipschitz,
μ θ ( s ) μ θ ( s ˜ ) L tanh L f ( θ ) L μ s s ˜ .
The critic bound follows similarly for h φ i in the composite input ( φ fuzzy ( s ) , a ) , yielding Lip Q φ i ( φ fuzzy , id ) L Q ( φ i ) max { L μ , 1 } . Uniform boundedness of gradient norms and TD targets follows from the compactness of S.

Appendix B.3. Proof of Proposition 1 (Smoothed Fixed-Policy Non-Expansivity)

For any Q , Q ˜ ,
( T ε π ¯ Q T ε π ¯ Q ˜ ) ( s , a ) = γ E Q ( s , π ¯ ( s ) + ε ) Q ˜ ( s , π ¯ ( s ) + ε ) | s , a .
Taking L 2 ( ρ ) norms and using Jensen’s inequality with the pushforward ρ of ρ under ( s , a ) ( s , π ¯ ( s ) + ε ) gives T ε π ¯ Q T ε π ¯ Q ˜ 2 , ρ γ Q Q ˜ 2 , ρ . In the sup norm, T ε π ¯ Q T ε π ¯ Q ˜ γ Q Q ˜ . For the twin-critic minimum, use that m ( x , y ) = min { x , y } is 1-Lipschitz on R 2 to obtain the same bounds.

Appendix B.4. Proof of Proposition 2 (TD-Target Variance Reduction Under PoU)

Condition on s, let y = r + γ Z with Z = m ( Q φ ¯ i ) ( s , a ) and a = π ¯ ( s ) + ε . We have Var ( y s ) σ r 2 + γ 2 Var ( Z s ) (assuming conditional independence). We bound Var ( Z s ) E [ Z ( s , ε ) Z ( s ¯ , 0 ) 2 s ] , where s ¯ = E [ s s ] . The map Q φ ¯ i = h φ ¯ i ( φ ( s ) , a ) is L Q -Lipschitz with regard to its concatenated 2 input ( φ , a ) , and m is 1-Lipschitz. Let L π ¯ = Lip ( π ¯ ) denote the policy’s Lipschitz constant.
Z ( s , ε ) Z ( s ¯ , 0 ) 2 L Q 2 φ ( s ) φ ( s ¯ ) 2 + ( π ¯ ( s ) + ε ) ( π ¯ ( s ¯ ) + 0 ) 2 L Q 2 L μ 2 s s ¯ 2 + 2 π ¯ ( s ) π ¯ ( s ¯ ) 2 + 2 ε 2 L Q 2 L μ 2 s s ¯ 2 + 2 L π ¯ 2 s s ¯ 2 + 2 ε 2 = L Q 2 ( L μ 2 + 2 L π ¯ 2 ) s s ¯ 2 + 2 ε 2 .
Taking expectation E [ · s ] yields Var ( Z s ) L Q 2 ( L μ 2 + 2 L π ¯ 2 ) σ s 2 + 2 σ ε 2 . By Theorem 1, L π ¯ C θ ¯ L μ for C θ ¯ = L tanh L f ( θ ¯ ) . Thus, Var ( Z s ) L Q 2 ( L μ 2 + 2 ( C θ ¯ L μ ) 2 ) σ s 2 + 2 σ ε 2 .
This gives the final bound:
Var ( y s ) σ r 2 + γ 2 L Q 2 L μ 2 ( 1 + 2 C θ ¯ 2 ) σ s 2 + 2 σ ε 2 .
Since the state uncertainty term σ s 2 is scaled by L μ 2 , decreasing L μ reduces this contribution to the total TD-target variance.

Appendix B.5. Proof of Proposition 3 (Projected Bellman Residual Bound)

Orthogonal projections Π F satisfy Π F f Π F g 2 , ρ f g 2 , ρ . Combining with Proposition 1 and the change of measure gives
Π F T ε π ¯ Q Π F T ε π ¯ Q ˜ 2 , ρ γ c Q Q ˜ 2 , ρ , c = ess sup d ρ d ρ .
If Q π ¯ is a fixed point of T ε π ¯ and Q ^ solves Q ^ = Π F T ε π ¯ Q ^ , then
Q ^ Q π ¯ 2 , ρ 1 1 γ c ( I Π F ) T ε π ¯ Q π ¯ 2 , ρ .
If moreover ρ = ρ , then c = 1 and the factor simplifies to 1 / ( 1 γ ) .

Appendix B.6. Proof of Lemma 1 (Underestimation by TD3’s Minimum)

Using min { x , y } = x + y | x y | 2 and Jensen’s inequality,
E min { Q 1 , Q 2 } s , a = Q + 1 2 E ε 1 + ε 2 | ε 1 ε 2 | s , a Q .
Hence, 0 Q E : [ min { Q 1 , Q 2 } ] 1 2 Var ( ε 1 ε 2 ) . With Var ( ε i ) σ Q 2 and correlation ρ , Var ( ε 1 ε 2 ) = 2 σ Q 2 ( 1 ρ ) , yielding the claimed bound.

Appendix B.7. Proof of Theorem 2 (Two-Time-Scale Convergence)

Under the stated assumptions (bounded gradients, square-integrable MDS noise, projections onto compact sets, bounded target noise, and stepsizes with β k / α k 0 , τ k / β k 0 ), the critic recursions track the projected gradient flow of their population losses (frozen targets), converging a.s. to the internally chain-transitive set of stationary points. On the slow scale, the actor tracks the projected differential inclusion θ ˙ Π Θ [ θ J ˜ ( θ ) ] , with J ˜ ( θ ) = E : ρ [ Q φ ( θ ) ( s , μ θ ( s ) ) ] . Standard ODE/SA arguments complete the proof.

Appendix B.8. Proof of Corollary 1 (PL/Last-Layer Linear ⇒ Uniqueness)

With frozen targets, a PL inequality with constant μ PL > 0 and L w -smoothness implies a unique global minimizer for the critic’s last-layer loss. Projected SA tracks the exponentially stable equilibrium; on two time scales, w k w ( θ k ) along the slow manifold.

Appendix B.9. Proof of Theorem 3 (Finite Stepsize Neighborhoods)

With constant stepsizes ( α , β , τ ) , the fast critic recursion is a contractive random affine system in a neighborhood of the stable set, with additive noise ( α M k + 1 ) and parametric drift driven by ( β , τ ) . A Lyapunov drift bound yields
lim sup k E dist ( w k , θ k ) , M × S 2 ( C 1 + C 4 L μ 2 ) α + ( C 2 + C 5 L μ 2 ) β + ( C 3 + C 6 L μ 2 ) τ .
with constants depending on Lipschitz moduli and noise bounds. Under PL, the radii admit the stated closed-form dependence on ( α , β , τ , σ M 2 , L w , μ PL ) .

Appendix B.10. Proof of Theorem 4 (N-Agent CTDE Extension)

Stack the agents and endow S and A N with block product norms. By Theorem 1 applied per agent and standard product-space bounds,
Lip ( μ ) max i L μ , i , · , blk , i L μ , i 2 1 / 2 , · 2 , blk , L Q ( in ) analogously .
The smoothed fixed-policy operator remains γ -Lipschitz in L and γ -Lipschitz across L 2 measures (up to pushforward), including the twin-critic minimum, since the min map is 1-Lipschitz. Under the centralized replay and two-time-scale regime, the critic layer converges on the fast scale (with aggregated Lipschitz/noise constants as above), and the actor tracks the projected ODE on the slow scale. Hence the conclusions of Theorem 2 hold with the stated block-norm constants.

References

  1. Esram, T.; Chapman, P.L. Comparison of photovoltaic array maximum power point tracking techniques. IEEE Trans. Energy Convers. 2007, 22, 439–449. [Google Scholar] [CrossRef]
  2. Ahmed, J.; Salam, Z. A critical evaluation on maximum power point tracking methods for partial shading in PV systems. Renew. Sustain. Energy Rev. 2015, 47, 933–953. [Google Scholar] [CrossRef]
  3. Subudhi, R.; Pradhan, S. A Comparative Study on Maximum Power Point Tracking Techniques for Photovoltaic Power Systems. IEEE Trans. Sustain. Energy 2013, 4, 89–98. [Google Scholar] [CrossRef]
  4. Hohm, D.P.; Ropp, M.E. Comparative Study of Maximum Power Point Tracking Algorithms. Prog. Photovoltaics: Res. Appl. 2003, 11, 47–62. [Google Scholar] [CrossRef]
  5. Villalva, M.G.; Gazoli, J.R.; Filho, E.R. Comprehensive Approach to Modeling and Simulation of Photovoltaic Arrays. IEEE Trans. Power Electron. 2009, 24, 1198–1208. [Google Scholar] [CrossRef]
  6. Belhachat, F.; Larbes, C. A review of global maximum power point tracking techniques for photovoltaic systems under partial shading conditions—A survey. Renew. Sustain. Energy Rev. 2018, 92, 513–553. [Google Scholar] [CrossRef]
  7. Ishaque, K.; Salam, Z.; Amjad, M.; Mekhilef, S. An Improved Particle Swarm Optimization (PSO)–Based MPPT for PV With Reduced Steady-State Oscillation. IEEE Trans. Power Electron. 2012, 27, 3627–3638. [Google Scholar] [CrossRef]
  8. Wadehra, A.; Bhalla, S.; Jaiswal, V.; Rana, K.P.S.; Kumar, V. A deep recurrent reinforcement learning approach for enhanced MPPT in PV systems. Appl. Soft Comput. 2024, 162, 111728. [Google Scholar] [CrossRef]
  9. Giraldo, L.F.; Gaviria, J.F.; Torres, M.I.; Alonso, C.; Bressan, M. Deep reinforcement learning using deep-Q-network for Global Maximum Power Point tracking: Design and experiments in real photovoltaic systems. Heliyon 2024, 10, e37974. [Google Scholar] [CrossRef]
  10. Zhang, B.; Cao, D.; Hu, W.; Ghias, A.M.Y.M.; Chen, Z. Physics-Informed Multi-Agent Deep Reinforcement Learning Enabled Distributed Voltage Control for Active Distribution Network Using PV Inverters. Int. J. Electr. Power Energy Syst. 2024, 155, 109641. [Google Scholar] [CrossRef]
  11. Alsulami, A.A.G.; Alhussainy, A.A.; Allehyani, A.; Alturki, Y.A.; Alghamdi, S.M.; Alruwaili, M.; Alharthi, Y.Z. A comparison of several maximum power point tracking algorithms for a photovoltaic power system. Front. Energy Res. 2024, 12, 1413252. [Google Scholar] [CrossRef]
  12. Endiz, M.S.; Gökkuş, G.; Coşgun, A.E.; Demir, H. A Review of Traditional and Advanced MPPT Approaches for PV Systems Under Uniformly Insolation and Partially Shaded Conditions. Appl. Sci. 2025, 15, 1031. [Google Scholar] [CrossRef]
  13. Siddique, M.A.B.; Zhao, D.; Ouahada, K.; Rehman, A.U.; Hamam, H. Performance validation of global MPPT for efficient power extraction through PV system under complex partial shading effects. Sci. Rep. 2025, 15, 17061. [Google Scholar] [CrossRef]
  14. Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 21–26 June 2014; pp. 387–395. Available online: https://proceedings.mlr.press/v32/silver14.html (accessed on 2 September 2025).
  15. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971. [Google Scholar] [PubMed]
  16. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1582–1591. Available online: https://proceedings.mlr.press/v80/fujimoto18a.html (accessed on 2 September 2025).
  17. van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. Proc. AAAI Conf. Artif. Intell. 2016, 30, 2094–2100. [Google Scholar] [CrossRef]
  18. Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative–Competitive Environments. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  19. Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 2974–2982. [Google Scholar] [CrossRef]
  20. Rashid, T.; Samvelyan, M.; de Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 2631–2640. Available online: http://jmlr.org/papers/v21/20-081.html (accessed on 4 October 2025).
  21. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. Available online: https://proceedings.mlr.press/v80/haarnoja18b.html (accessed on 5 June 2025).
  22. Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; JMLR W&CP 37. pp. 1889–1897. Available online: https://proceedings.mlr.press/v37/schulman15.html (accessed on 11 May 2025).
  23. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  24. Konda, V.R.; Tsitsiklis, J.N. Actor–Critic Algorithms. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS); MIT Press: Cambridge, MA, USA, 2000; pp. 1008–1014. [Google Scholar]
  25. Konda, V.R.; Tsitsiklis, J.N. On Actor–Critic Algorithms. SIAM J. Control Optim. 2003, 42, 1143–1166. [Google Scholar] [CrossRef]
  26. Borkar, V.S.; Meyn, S.P. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 2000, 38, 447–469. [Google Scholar] [CrossRef]
  27. Borkar, V.S. Stochastic Approximation: A Dynamical Systems Viewpoint; Springer: Berlin/Heidelberg, Germany, 2025. [Google Scholar] [CrossRef]
  28. Bhatnagar, S.; Sutton, R.S.; Ghavamzadeh, M.; Lee, M. Natural actor–critic algorithms. Automatica 2009, 45, 2471–2482. [Google Scholar] [CrossRef]
  29. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  30. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer: Berlin/Heidelberg, Germany, 1981. [Google Scholar] [CrossRef]
  31. Jang, J.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man, Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
  32. Melenk, J.M.; Babuška, I. The partition of unity finite element method: Basic theory and applications. Comput. Methods Appl. Mech. Eng. 1996, 139, 289–314. [Google Scholar] [CrossRef]
  33. Babuška, I.; Melenk, J.M. The Partition of Unity Method. Int. J. Numer. Methods Eng. 1997, 40, 727–758. [Google Scholar] [CrossRef]
  34. Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 23rd ACM National Conference, New York, NY, USA, 27–29 August 1968; pp. 517–524. [Google Scholar] [CrossRef]
  35. Sontag, E.D. Input to state stability: Basic concepts and results. In Nonlinear and Optimal Control Theory; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2008; Volume 1932, pp. 163–220. [Google Scholar] [CrossRef]
  36. Khalil, H.K. Nonlinear Systems, 3rd ed.; Prentice Hall: Wilmington, DE, USA, 2002. [Google Scholar]
  37. Bauschke, H.H.; Combettes, P.L. Correction to: Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer International Publishing: Cham, Switzerland, 2017; pp. C1–C4. [Google Scholar] [CrossRef]
  38. Scherrer, B. Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 1035–1042. Available online: https://icml.cc/Conferences/2010/papers/654.pdf (accessed on 22 July 2025).
  39. Bertsekas, D.P.; Tsitsiklis, J.N. Neuro-Dynamic Programming; Athena Scientific: Nashua, NH, USA, 1996. [Google Scholar]
  40. Tsitsiklis, J.N.; Van Roy, B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 1997, 42, 674–690. [Google Scholar] [CrossRef]
  41. Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep Reinforcement Learning that Matters. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, USA, 2–7 February 2018; pp. 3207–3214. Available online: https://dl.acm.org/doi/abs/10.5555/3504035.3504427 (accessed on 6 May 2025).
  42. Agarwal, R.; Schwarzer, M.; Castro, P.S.; Courville, A.; Bellemare, M.G. Deep Reinforcement Learning at the Edge of the Statistical Precipice. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 2021. Available online: https://arxiv.org/pdf/2108.13264 (accessed on 4 September 2025).
  43. Dunnett, C.W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 1955, 50, 1096–1121. [Google Scholar] [CrossRef]
  44. Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 1995, 57, 289–300. [Google Scholar] [CrossRef]
  45. Law, A.M.; Kelton, W.D. Simulation Modeling and Analysis, 5th ed.; McGraw-Hill: Columbus, OH, USA, 2014. [Google Scholar]
  46. Wan, Y.; Xu, Q.; Dragicevic, T. Safety-Enhanced Self-Learning for Optimal Power Converter Control. IEEE Trans. Ind. Electron. 2024, 71, 15229–15234. [Google Scholar] [CrossRef]
  47. Ren, H.; Wang, Y.; Yu, H.; Zhang, B.; Chen, Z. Deep Reinforcement Learning-based Power Flow Control for Triple Active Bridge Converter. In Proceedings of the 2024 IEEE 10th International Power Electronics and Motion Control Conference (IPEMC2024-ECCE Asia), Chengdu, China, 17–20 May 2024; pp. 2107–2112. [Google Scholar] [CrossRef]
  48. Wan, Y.; Xu, Q.; Dragicevic, T. Reinforcement Learning-Based Predictive Control for Power Electronic Converters. IEEE Trans. Ind. Electron. 2025, 72, 5353–5364. [Google Scholar] [CrossRef]
  49. Bui, V.H.; Mohammadi, S.; Das, S.; Hussain, A.; Hollweg, G.V.; Su, W. A Critical Review of Safe Reinforcement Learning Strategies in Power and Energy Systems. Eng. Appl. Artif. Intell. 2025, 143, 110091. [Google Scholar] [CrossRef]
  50. Abouzeid, A.F.; Eleraky, H.; Kalas, A.; Rizk, R.; Elsakka, M.M.; Refaat, A. Experimental Validation of a Low-Cost Maximum Power Point Tracking Technique Based on Artificial Neural Network for Photovoltaic Systems. Sci. Rep. 2024, 14, 18280. [Google Scholar] [CrossRef] [PubMed]
  51. Ahmadi, M.; Abrari, M.; Ghanaatshoar, M.; Khalafi, A. A Novel Algorithm for Maximum Power Point Tracking Using Computer Vision (CVMPPT). PLoS ONE 2024, 19, e0301363. [Google Scholar] [CrossRef] [PubMed]
  52. Teneta, J.; Kreft, W.; Janowski, M. Partial Shading of Photovoltaic Modules with Thin Linear Objects: Modelling in MATLAB Environment and Measurement Experiments. Energies 2024, 17, 3546. [Google Scholar] [CrossRef]
Figure 1. Fast contraction of critics to M ( θ ) and slow ascent of θ along Π Θ [ θ J ˜ ( θ ) ] .
Figure 1. Fast contraction of critics to M ( θ ) and slow ascent of θ along Π Θ [ θ J ˜ ( θ ) ] .
Applsci 15 12776 g001
Figure 2. Training performance for Fuzzy–MAT3D, MAT3D, and MADDPG. Smoothed return curves over 80 episodes show that Fuzzy–MAT3D converges faster to higher returns and exhibits markedly lower variance across seeds, illustrating how fuzzy regularization stabilizes the learning dynamics.
Figure 2. Training performance for Fuzzy–MAT3D, MAT3D, and MADDPG. Smoothed return curves over 80 episodes show that Fuzzy–MAT3D converges faster to higher returns and exhibits markedly lower variance across seeds, illustrating how fuzzy regularization stabilizes the learning dynamics.
Applsci 15 12776 g002
Figure 3. MPPT efficiency distributions across CRN-blocked replications. Notched boxplots for all six algorithms. Notched boxplots indicate that Fuzzy–MAT3D attains the highest median efficiency and the smallest interquartile range, evidencing both higher and more consistent energy capture than MAT3D and MADDPG.
Figure 3. MPPT efficiency distributions across CRN-blocked replications. Notched boxplots for all six algorithms. Notched boxplots indicate that Fuzzy–MAT3D attains the highest median efficiency and the smallest interquartile range, evidencing both higher and more consistent energy capture than MAT3D and MADDPG.
Applsci 15 12776 g003
Figure 4. Mean MPPT efficiency with 95% confidence intervals. Point: mean; bars: two-sided 95% CI (Student t); N = 140 per algorithm. Non-overlapping error bars between Fuzzy–MAT3D and the baselines highlight the statistical significance of the observed efficiency gains.
Figure 4. Mean MPPT efficiency with 95% confidence intervals. Point: mean; bars: two-sided 95% CI (Student t); N = 140 per algorithm. Non-overlapping error bars between Fuzzy–MAT3D and the baselines highlight the statistical significance of the observed efficiency gains.
Applsci 15 12776 g004
Figure 5. Steady-state oscillation across replications. Y-axis: “Steady-state oscillation [W]”; lower is better. Fuzzy–MAT3D tracks the global MPP with substantially smaller steady-state oscillations, at the cost of a slightly longer settling time, illustrating the stability–speed trade-off.
Figure 5. Steady-state oscillation across replications. Y-axis: “Steady-state oscillation [W]”; lower is better. Fuzzy–MAT3D tracks the global MPP with substantially smaller steady-state oscillations, at the cost of a slightly longer settling time, illustrating the stability–speed trade-off.
Applsci 15 12776 g005
Figure 6. Settling time t s across replications. t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 0.2 T sim = 4   s . Fuzzy–MAT3D concentrates in the region of small jitter and moderate settling time, whereas MAT3D and MADDPG achieve faster transients at the expense of significantly larger steady-state oscillations.
Figure 6. Settling time t s across replications. t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 0.2 T sim = 4   s . Fuzzy–MAT3D concentrates in the region of small jitter and moderate settling time, whereas MAT3D and MADDPG achieve faster transients at the expense of significantly larger steady-state oscillations.
Applsci 15 12776 g006
Figure 7. Distribution of MPPT efficiency. It indicates statistically significant differences between Fuzzy-MAT3D and the other RL algorithms.
Figure 7. Distribution of MPPT efficiency. It indicates statistically significant differences between Fuzzy-MAT3D and the other RL algorithms.
Applsci 15 12776 g007
Figure 8. Step disturbance response. Top: total PV power P pv ( t ) [W]; bottom: duty-cycle action of Agent 1 (dimensionless). The irradiance step occurs at t change = 0.35   s . Settling time t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 4   s .
Figure 8. Step disturbance response. Top: total PV power P pv ( t ) [W]; bottom: duty-cycle action of Agent 1 (dimensionless). The irradiance step occurs at t change = 0.35   s . Settling time t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 4   s .
Applsci 15 12776 g008
Figure 9. Block diagram of the experimental bench and measurement points.
Figure 9. Block diagram of the experimental bench and measurement points.
Applsci 15 12776 g009
Figure 10. Close-up of the outdoor hardware bench used for validation. The two-module string interfaces the embedded controller (purple enclosure) and the logging laptop. Cable routing and measurement leads are visible next to the controller.
Figure 10. Close-up of the outdoor hardware bench used for validation. The two-module string interfaces the embedded controller (purple enclosure) and the logging laptop. Cable routing and measurement leads are visible next to the controller.
Applsci 15 12776 g010
Figure 11. (Left) Wide view of the two-module series string deployed on level ground in a desert-like yard, with the control table at the right. (Right) Details of the series interconnect and instrument wiring near the controller and laptop.
Figure 11. (Left) Wide view of the two-module series string deployed on level ground in a desert-like yard, with the control table at the right. (Right) Details of the series interconnect and instrument wiring near the controller and laptop.
Applsci 15 12776 g011
Figure 12. Hardware MPPT efficiency distributions (scenario-blocked). Notched boxplots by algorithm.
Figure 12. Hardware MPPT efficiency distributions (scenario-blocked). Notched boxplots by algorithm.
Applsci 15 12776 g012
Figure 13. Mean MPPT efficiency with 95% confidence intervals (Student-t).
Figure 13. Mean MPPT efficiency with 95% confidence intervals (Student-t).
Applsci 15 12776 g013
Figure 14. Steady-state oscillation (lower is better).
Figure 14. Steady-state oscillation (lower is better).
Applsci 15 12776 g014
Figure 15. Settling time t s across hardware replications. t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 0.2 T sim = 4   s .
Figure 15. Settling time t s across hardware replications. t s is defined using a ± 2 % band around the MPP and a continuous window Δ = 0.2 T sim = 4   s .
Applsci 15 12776 g015
Table 1. Fuzzy PoU parameters and induced Lipschitz constants. All ranges are given in raw units; centers are in the normalized [ 0 , 1 ] scale. With τ j 32 for all coordinates, the per-coordinate bound is L z ( coord . ) 5.615 and the global constant is L μ = L z / 7 0.802 .
Table 1. Fuzzy PoU parameters and induced Lipschitz constants. All ranges are given in raw units; centers are in the normalized [ 0 , 1 ] scale. With τ j 32 for all coordinates, the per-coordinate bound is L z ( coord . ) 5.615 and the global constant is L μ = L z / 7 0.802 .
State Coord.UnitsRaw RangeTemperature τ j Centers L z (coord.)Note
V i V [ 0 , 49.9 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 module voltage
I i A [ 0 , 9 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 module current
P i W [ 0 , 352 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 module power
G i W/m2 [ 200 , 1000 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 local irradiance
G n ( i ) W/m2 [ 200 , 1000 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 neighbor irradiance
Δ P i W [ 352.408 , 352.408 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 last-step power inc.
Δ V i V [ 49.9 , 49.9 ] 32 { 0 , 0.25 , 0.5 , 0.75 , 1 } 5.615 last-step voltage inc.
Notation.  τ j denotes the per-coordinate softmax temperature; L z = sup s z ( s ) is the global Lipschitz bound of the stacked membership map z ( s ) ; L μ = L z / 7 is the induced Lipschitz constant of the normalized fuzzy partition φ fuzzy ( s ) = z ( s ) / 7 ; and Δ = 0.2 T sim = 4   s is the continuous window used to define settling time t s in step scenarios.
Table 2. Training hyperparameters for Fuzzy–MAT3D (CTDE–TD3). Note: Training episodes use Δ t = 0.06  s and T steps = 12,000 ( 720  s per episode) with N ep = 80 . All reported evaluation metrics, however, are computed over a fixed evaluation horizon T sim = 20  s to ensure comparability across algorithms.
Table 2. Training hyperparameters for Fuzzy–MAT3D (CTDE–TD3). Note: Training episodes use Δ t = 0.06  s and T steps = 12,000 ( 720  s per episode) with N ep = 80 . All reported evaluation metrics, however, are computed over a fixed evaluation horizon T sim = 20  s to ensure comparability across algorithms.
ComponentSymbolValue
Discount factor γ 0.99
Critic learning rate (fast) α c 1 × 10 4
Actor learning rate (slow) β a 1 × 10 5
Policy update periodK2
Polyak factor (targets) τ 1 × 10 3
Target policy noise σ ε 0.2 (clip ± 0.5 )
Minibatch sizeM256
Replay buffer length | D | 10 6
Gradient clipping G max 1
Actor network ( 128 , 128 ) + tanh
Twin critics2 (shared fuzzy front-end)
Fuzzy membershipsL5 per state coordinate
State normalizationper-coordinate [ 0 ,   1 ]
Reward weights ( α , β , κ ) ( 0.1 , 0.05 , 2 )
Simulation step Δ t 0.06   s
Max steps/episode T steps 12,000
Number of episodes N ep 80
Compute deviceGPU
Table 3. Benchmark scenarios.
Table 3. Benchmark scenarios.
#Scenario Name G init [W/m2] t change [s] G final [W/m2]
1Standard Condition[1000, 500][1000, 500]
2Deep Shadow[1000, 200][1000, 200]
3Similar Conditions[800, 850][800, 850]
4Low Irradiation[400, 300][400, 300]
5Step Change 1 (step)[1000, 300]0.35[300, 1000]
6Step Change 2 (step)[900, 600]0.35[600, 900]
7Shadow Recovery (step)[250, 950]0.35[950, 950]
Table 4. Inference latency (ms) vs. control period and PSO per-tick budget.
Table 4. Inference latency (ms) vs. control period and PSO per-tick budget.
Algorithmp50p95p99Control Period [ms]PSO Budget (Particles × Iters)
Fuzzy–MAT3D1.52.74.060
MAT3D1.32.43.760
MADDPG1.42.63.960
PSO60 16 × 3
P&O/INC60
Table 5. Simulation: scenario-wise MPPT efficiency η (%); mean ± 95% CI; N = 20 seeds per scenario and algorithm.
Table 5. Simulation: scenario-wise MPPT efficiency η (%); mean ± 95% CI; N = 20 seeds per scenario and algorithm.
ScenarioFuzzy–MAT3DPSOMAT3DINCP&OMADDPG
Standard Condition 94.5 ± 0.9 89.0 ± 2.0 84.0 ± 3.1 81.0 ± 3.0 79.0 ± 3.3 73.0 ± 4.0
Deep Shadow 90.2 ± 1.5 79.0 ± 2.5 71.0 ± 3.8 68.0 ± 3.6 64.0 ± 3.6 57.0 ± 4.5
Similar Conditions 93.0 ± 1.1 86.0 ± 2.1 82.0 ± 3.2 78.0 ± 3.1 76.0 ± 3.3 70.0 ± 4.1
Low Irradiation 89.5 ± 1.6 82.0 ± 2.4 76.0 ± 3.6 73.0 ± 3.3 71.0 ± 3.4 65.0 ± 4.2
Step Change 1 91.0 ± 1.6 85.0 ± 2.2 79.0 ± 3.4 76.0 ± 3.2 74.0 ± 3.3 67.0 ± 4.2
Step Change 2 92.5 ± 1.2 87.0 ± 2.0 83.0 ± 3.1 79.0 ± 3.0 78.0 ± 3.2 71.0 ± 4.0
Shadow Recovery 93.3 ± 1.1 85.8 ± 2.1 85.0 ± 3.0 80.0 ± 3.1 80.0 ± 3.3 72.0 ± 4.1
Table 6. Mean MPPT efficiency η (mean ± SD) and two-sided 95% CI across N = 140 CRN-blocked replications per algorithm.
Table 6. Mean MPPT efficiency η (mean ± SD) and two-sided 95% CI across N = 140 CRN-blocked replications per algorithm.
AlgorithmMean [%]SD95% CI [%]N
Fuzzy–MAT3D92.0324.014[91.361, 92.703]140
PSO84.7588.028[83.417, 86.100]140
MAT3D80.14413.050[77.963, 82.324]140
INC76.07411.667[74.125, 78.024]140
P&O74.31211.471[72.395, 76.229]140
MADDPG67.97812.402[65.905, 70.050]140
Table 7. CRN-paired, one-sided t-tests of Fuzzy–MAT3D vs. baselines on MPPT efficiency. Positive Δ η favors Fuzzy–MAT3D. p is one-sided; CIs are two-sided 95%. Cohen’s d is the paired effect size.
Table 7. CRN-paired, one-sided t-tests of Fuzzy–MAT3D vs. baselines on MPPT efficiency. Positive Δ η favors Fuzzy–MAT3D. p is one-sided; CIs are two-sided 95%. Cohen’s d is the paired effect size.
Comparator n pairs Δ η [%]95% CI [%]tp(BHq)Cohen’s d
MADDPG140+24.054[21.909, 26.199]22.173< 10 16 (≈0)1.874
P&O140+17.720[15.874, 19.566]18.978< 10 16 (≈0)1.604
INC140+15.958[13.964, 17.951]15.826< 10 16 (≈0)1.338
MAT3D140+11.888[9.674, 14.103]10.613< 10 16 (≈0)0.897
PSO140+7.274[5.756, 8.792]9.474< 10 16 (≈0)0.801
Table 8. CRN-paired, one-sided t-tests on steady-state oscillation (lower is better). Positive differences ( baseline Fuzzy –MAT3D) favor Fuzzy–MAT3D.
Table 8. CRN-paired, one-sided t-tests on steady-state oscillation (lower is better). Positive differences ( baseline Fuzzy –MAT3D) favor Fuzzy–MAT3D.
Comparator n pairs Mean Diff [W]95% CI [W]tp (BH q)Cohen’s d
MADDPG140+51.159[45.656, 56.662]18.380< 10 16 (≈0)1.553
MAT3D140+44.078[35.072, 53.085]9.676< 10 16 (≈0)0.818
P&O140+17.081[13.864, 20.297]10.499< 10 16 (≈0)0.887
INC140+15.228[12.328, 18.127]10.383< 10 16 (≈0)0.878
PSO140+8.948[7.464, 10.432]11.921< 10 16 (≈0)1.008
Table 9. Aggregate performance across all RL runs.
Table 9. Aggregate performance across all RL runs.
AlgorithmMPPT Efficiency η [%]Settling Time t s [s]SS Oscillation [W]
Fuzzy–MAT3D 92.032 ± 4.014 7.537 ± 5.397 1.362 ± 2.013
MAT3D 80.144 ± 13.050 1.573 ± 3.771 37.963 ± 70.396
MADDPG 67.978 ± 12.402 3.239 ± 6.175 52.993 ± 34.608
Table 10. Electrical, temperature, and mechanical specifications of the 350 W PV modules used in experiments (STC) and the resulting two-module series string.
Table 10. Electrical, temperature, and mechanical specifications of the 350 W PV modules used in experiments (STC) and the resulting two-module series string.
ParameterPer Module (STC)Series String (2×)
Rated power P mp [W]350700
Voltage at MPP V mp [V]38.9577.90
Current at MPP I mp [A]8.988.98
Open-circuit voltage V oc [V]46.4192.82
Short-circuit current I sc [A]9.609.60
Cells (size)72 cells, 156 × 156 mm
Cell efficiency [%]20.40
Module efficiency [%]18.00
Output tolerance [%]5
Standard test conditionsAM1.5, 1000 W / m 2 , 25 °C
Temperature coefficients (per module)
d V oc / d T [%/K] 0.30
d I sc / d T [%/K] + 0.04
d P max / d T [%/K] 0.44
Nominal operating cell temperature (NOCT)45 ± 2 °C
Mechanical (per module)
Dimensions [mm]1956 × 992 × 40
Weight [kg]23
Operating temperature [°C] 40 to 85
FrameAnodized aluminum
Junction boxIP67, 3 diodes
Output cables 4 mm 2 , 1000 mm length
ConnectorsMC4 compatible/Tyco
Connection topologyTwo identical modules in series
Table 11. Hardware: scenario-wise MPPT efficiency η (%); mean ± 95% CI; n 10 per scenario/algorithm.
Table 11. Hardware: scenario-wise MPPT efficiency η (%); mean ± 95% CI; n 10 per scenario/algorithm.
ScenarioFuzzy–MAT3DPSOMAT3DINCP&OMADDPG
Standard Condition 93.5 ± 1.1 87.0 ± 2.2 83.0 ± 3.8 76.0 ± 3.1 77.5 ± 3.0 72.0 ± 4.2
Deep Shadow 89.0 ± 1.8 78.0 ± 2.8 73.0 ± 4.5 70.0 ± 3.4 70.5 ± 3.6 61.0 ± 4.6
Similar Conditions 92.2 ± 1.3 85.0 ± 2.3 81.0 ± 3.9 75.0 ± 3.0 76.0 ± 3.1 70.0 ± 4.1
Low Irradiation 88.8 ± 1.9 80.5 ± 2.5 75.0 ± 4.0 72.0 ± 3.2 72.5 ± 3.3 64.0 ± 4.3
Step Change 1 91.1 ± 1.8 83.5 ± 2.4 79.5 ± 3.7 74.0 ± 3.0 75.0 ± 3.1 67.0 ± 4.0
Step Change 2 92.0 ± 1.4 86.0 ± 2.2 84.0 ± 3.3 76.5 ± 3.1 79.0 ± 3.0 72.0 ± 3.8
Shadow Recovery 92.4 ± 1.2 82.5 ± 2.5 79.0 ± 3.5 75.0 ± 3.0 78.0 ± 3.1 69.0 ± 4.2
Table 12. Hardware inference latency (ms) vs. control period and PSO per-tick budget.
Table 12. Hardware inference latency (ms) vs. control period and PSO per-tick budget.
AlgorithmLatency [ms]Control Period [ms]PSO Budget (Particles × Iters)
p50p95p99
Fuzzy–MAT3D1.83.34.960
MAT3D1.63.04.660
MADDPG1.73.14.760
PSO60 16 × 3
P&O/INC60
Notes: Latency percentiles (p50/p95/p99) are measured on the hardware bench under the same sensing/update period used across algorithms. PSO latency is not tabulated because its compute is budgeted per tick.
Table 13. Hardware summary: MPPT efficiency η [%], settling time t s [s], and steady-state oscillation [W] (mean ± SD) across all hardware replications.
Table 13. Hardware summary: MPPT efficiency η [%], settling time t s [s], and steady-state oscillation [W] (mean ± SD) across all hardware replications.
Algorithm η [%] t s [s]Steady–State Oscillation [W]
Fuzzy–MAT3D 91.296 ± 4.057 8.097 ± 4.612 7.837 ± 8.607
PSO 83.233 ± 7.919 9.623 ± 3.941 25.921 ± 26.580
MAT3D 78.791 ± 12.493 2.585 ± 2.264 45.899 ± 53.620
P&O 75.447 ± 11.794 3.609 ± 2.510 40.015 ± 23.916
INC 74.086 ± 10.990 4.645 ± 3.943 36.724 ± 29.736
MADDPG 67.515 ± 12.881 4.958 ± 4.276 62.451 ± 38.205
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ortiz-Muñoz, D.; Luviano-Cruz, D.; Pérez-Domínguez, L.A.; Rodríguez-Ramírez, A.G.; García-Luna, F. Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading. Appl. Sci. 2025, 15, 12776. https://doi.org/10.3390/app152312776

AMA Style

Ortiz-Muñoz D, Luviano-Cruz D, Pérez-Domínguez LA, Rodríguez-Ramírez AG, García-Luna F. Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading. Applied Sciences. 2025; 15(23):12776. https://doi.org/10.3390/app152312776

Chicago/Turabian Style

Ortiz-Muñoz, Diana, David Luviano-Cruz, Luis Asunción Pérez-Domínguez, Alma Guadalupe Rodríguez-Ramírez, and Francesco García-Luna. 2025. "Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading" Applied Sciences 15, no. 23: 12776. https://doi.org/10.3390/app152312776

APA Style

Ortiz-Muñoz, D., Luviano-Cruz, D., Pérez-Domínguez, L. A., Rodríguez-Ramírez, A. G., & García-Luna, F. (2025). Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading. Applied Sciences, 15(23), 12776. https://doi.org/10.3390/app152312776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop