Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants

Boudouaoui, Yassine; Seghiour, Abdellatif; Tadjeddine, Ali Abderrazak; Mekri, Abdelkader; Kaddour, Fouad; Mostefaoui, Imene Meriem; Chouder, Aissa; Rabhi, Abdelhamid

doi:10.3390/electronics15112430

Open AccessArticle

Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants

by

Yassine Boudouaoui

¹,

Abdellatif Seghiour

^1,*

,

Ali Abderrazak Tadjeddine

²

,

Abdelkader Mekri

¹,

Fouad Kaddour

¹,

Imene Meriem Mostefaoui

¹,

Aissa Chouder

³

and

Abdelhamid Rabhi

⁴

¹

Electric Power and Energy Systems Research Laboratory, Ecole Supérieure en Génie Electrique et Energétique d’Oran, Oran 31000, Algeria

²

LSETRE Laboratory, Technological Institute, Nour Bachir University Centre, El Bayadh 32000, Algeria

³

Electrical Engineering Laboratory (LGE), University of M’sila, M’Sila 28000, Algeria

⁴

Laboratoire Modélisation Information et Systèmes, 33 rue Saint Leu, 80039 Amiens, France

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2430; https://doi.org/10.3390/electronics15112430

Submission received: 27 April 2026 / Revised: 20 May 2026 / Accepted: 29 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Renewable Energy Power and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Grid-connected photovoltaic (PV) systems integrated into industrial and institutional buildings are critical components of sustainable built environments, where accurate real-time power estimation underpins smart energy management, demand–supply balancing, and reduced dependence on the utility grid. This study develops and validates an Adaptive Neuro-Fuzzy Inference System (ANFIS) for predicting of the flexible power point (FPP) in a 117.76 kWp rooftop PV plant serving a technical workshop facility in northwestern Algeria. The proposed model uses environmental inputs (solar irradiance, ambient temperature, module temperature) and electrical inputs (load power, grid power) acquired from a supervisory monitoring infrastructure to predict the PV system’s FPP under real operating conditions in the built environment. A dataset of 24,479 valid samples spanning 85 distinct calendar days (1 May to 24 July 2025) was collected and preprocessed through cleaning, filtering, and feature-specific normalization. To ensure rigorous out-of-sample evaluation, three complementary validation strategies were implemented: (S1) a random day-based split (60 train/11 test days), (S2) a strictly chronological 70/15/15% split (50/11/10 days), and (S3) an external 14-day hold-out (11–24 July 2025) excised before any training, tuning or model selection step. Statistical analysis reveals strong nonlinear dependence of PV power on solar irradiance and module temperature, with correlations

r \approx 0.93

between irradiance and module temperature,

r \approx 0.82

between irradiance and PV power, and

r \approx 0.95

between load and grid power, highlighting the importance of accurate predicting for facility-level energy management. The ANFIS model achieves

R^{2} = 0.9992

, RMSE

= 653.62

W and MAE

= 276.90

W on the random-split test set;

R^{2} = 0.9998

, RMSE

= 325.40

W and MAE

= 119.17

W on the chronological test set and

R^{2} = 0.9997

–

0.9998

, RMSE

= 363.45

–

408.50

W on the external 14-day hold-out that was never seen during training. Comparative experiments with k-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and a Deep Neural Network show that ANFIS is the only model maintaining sub-700 W RMSE on every split, whereas all five benchmarks degrade sharply under chronological and external evaluation (e.g., SVM 2225 → 5198 W; Decision Tree 7440 → 8058 W; DNN 1576 → 2576 W). The persistence of test/external RMSE below the training RMSE on data never used during model construction empirically rules out data leakage as a cause of the high accuracy. These results demonstrate that the proposed, interpretable neuro-fuzzy framework offers a robust and accurate tool for PV power estimation in building-integrated systems, supporting smart energy management and improved performance of energy-intensive built environments.

Keywords:

photovoltaic power; built environment energy systems; FPP tracking; adaptive neuro-fuzzy inference system; grid-connected rooftop PV plant; smart energy management; utility-scale building PV

1. Introduction

The accelerating threat of climate change has intensified the global imperative to decarbonize energy systems by replacing fossil fuels with renewable sources [1]. Solar, wind, and bioenergy are recognized as critical elements of this transition, promising virtually unlimited clean energy [2,3]. Algeria, for example, has committed to an ambitious long-term decarbonization roadmap and stands out for its enormous solar potential [4]. Over 85% of Algeria’s territory lies in the sun-drenched Sahara and Sahel [5], receiving more than 3000 h of sunshine per year [6]. These conditions translate to exceptionally high solar irradiance on the order of 1850–2100 kWh/m² annually across much of the country, vastly exceeding global averages. In this context, grid-connected photovoltaic (PV) systems are strategically critical: they can exploit abundant insolation and support Algeria’s energy transition goals by generating carbon-free electricity at utility scale, as recently demonstrated on a multi-MW plant operating in the Saharan climate [7].

Most photovoltaic systems operate under maximum power point tracking (MPPT) control, which enhances overall energy conversion efficiency by continuously adjusting the operating point so that the array delivers its maximum available power [8,9]. Nevertheless, the inherently intermittent nature of solar energy introduces operational challenges when MPPT is applied in grid-connected conditions, including reverse power flow and reduced system inertia, both of which can adversely affect grid stability [10,11]. In accordance with grid code requirements, photovoltaic systems are, therefore, expected to adopt FPP tracking (FPPT) strategies during fault ride-through events or frequency deviations [12]. Under such conditions, FPPT intentionally limits the active power output of the PV system, thereby creating headroom for the provision of ancillary services [12,13]. The FPPT approach maintains a prescribed power at a set point level by temporarily shifting the operating point away from the maximum power point in response to changing environmental, grid conditions, or load demand [9,14]. The need for such curtailed-power operation is reinforced by the broader trend toward hybrid PV systems coupled with storage and passive thermal management, where the available DC power must be continuously matched to a system-level reference rather than blindly maximized [15,16]. The primary objective of FPPT is to regulate the PV output to a predefined power reference that satisfies grid operational constraints [14]. For this reason, FPPT is also referred to in the literature as constant power generation (CPG) control [8,9]. To date, a wide range of CPG strategies has been reported, which may be broadly classified into linear search methods, nonlinear search techniques, model predictive approaches, and artificial intelligence-based algorithms [12,17]. Despite the notable advantages of the PV systems, PV power is fundamentally variable and nonlinear; this nonlinearity is rooted in the underlying device physics and parameter dependence of the cell I–V characteristic, as quantified by recent parameter-extraction studies [18,19]. Solar generation inherently follows day-night and weather cycles, making it intermittent and stochastic [10,20]. Clouds, shading, and temperature changes cause rapid fluctuations in output, and even the clear-sky baseline varies with sun angle and atmospheric conditions [21,22]. This variability complicates the matching of supply and demand: sudden drops or surges in PV output can challenge grid balance and scheduling [17,23]. Photovoltaic power production is inherently intermittent and exhibits pronounced temporal variability, primarily as a consequence of changing climatic conditions and the diurnal cycle. This intrinsic variability introduces significant challenges for power system scheduling and operational planning, particularly in grids with a high penetration of solar energy [8]. Consequently, accurate prediction of PV production has emerged as a critical requirement [10,24,25]. Modern studies emphasize that real-time prediction of PV output is vital for reliable grid operations: by anticipating fluctuations, system operators can arrange dispatch, reserves, and storage to maintain stability [4,21]. In other words, high-fidelity short-term PV power forecasts are needed to integrate solar plants smoothly and to uphold power quality as renewables penetrate the grid [4,26]. Recent advancements have seen Machine Learning (ML) and Deep Learning (DL) architectures become the benchmark for handling these complex time-series forecasting tasks, both for predicting PV output and for the closely related task of data-driven fault diagnosis on the same monitoring streams [27,28,29,30]. These include convolutional networks, decision-tree ensembles, hybrid deep learning architectures, and neuro-fuzzy systems that combine adaptive learning with interpretable reasoning [21,31]. For example, recent hybrid models integrating temporal convolutional networks with attention mechanisms have achieved exceptionally high accuracy in short-term PV output prediction [31]. A TCN–ECANet–GRU model yielded a coefficient of determination

R^{2}

of 0.9972 (99.72%) for short-term PV output prediction [31], while optimized hybrid deep learning frameworks have reported comparable performance across multiple forecasting horizons [27]. These results demonstrate that machine learning approaches are capable of representing the highly nonlinear behavior of photovoltaic systems when adequate training data are available [4,10,32].

In practical applications, such models typically integrate multiple meteorological and electrical variables—including solar irradiance, ambient temperature, module temperature, load power, and grid power—and rely on advanced error metrics to rigorously assess predictive performance [17,30]. However, these high-performance ML solutions have notable limitations. First, they are often highly dependent on the quality and representativeness of the training data; models trained for a specific site or climatic regime may not transfer reliably to other locations, latitudes, or seasonal conditions without additional retraining [32,33]. Second, most ML techniques operate as black-box models, providing limited insight into how inputs affect outputs [34]. This lack of transparency complicates model validation and restricts the integration of expert or physical knowledge into the forecasting process. Finally, even well-trained models can be brittle under extreme or unforeseen weather or unanticipated operating conditions. Abrupt weather events, atmospheric disturbances, or equipment anomalies that are insufficiently represented in the training dataset can significantly degrade prediction accuracy [21]. Such challenges, namely limited generalization capability, data sensitivity, and vulnerability under atypical conditions, are consistently reported across the related literature [8,35].

For example, recent studies have highlighted that photovoltaic power output is inherently uncertain and subject to continuous fluctuations, a characteristic that significantly complicates energy yield forecasting [10,33]. Numerous works have consequently reported that complex machine learning models tend to experience a degradation in predictive accuracy when exposed to input uncertainty or operating conditions that deviate from those represented in the training data [21,28]. In sum, while deep learning architectures can achieve

R^{2}

scores above 0.99 under controlled conditions, there remains a need for methods that incorporate physical insight, adapt to sparse or noisy data, and explicitly handle uncertainty [35].

A complementary body of work has benchmarked tree-based ensembles, recurrent architectures, and hybrid deep models for short- and medium-term PV power predicting across a wide range of plant scales. A recent comparative study by Kraska and Hanzel [36] evaluated an XGBoost model against an LSTM network on four prosumer-scale Polish installations (25–50 kWp) and reported a clear advantage for the gradient-boosted learner (RMSE = 4.09 kW, MAE = 1.91 kW,

R^{2}

= 0.85, versus RMSE = 5.53 kW, MAE = 3.08 kW,

R^{2}

= 0.73 for LSTM), attributing the gap to the difficulty of training deep recurrent networks on the limited, single-year datasets typical of newly commissioned PV systems. Similar conclusions are reported in dedicated XGBoost–LSTM benchmarks for PV power forecasting [37,38] and in broader comparative analyses of LSTM, Random Forest, and XGBoost across solar and wind datasets [39], all of which find that well-tuned tree-based learners frequently match or exceed deep recurrent baselines under limited data. Comprehensive reviews of solar PV forecasting [40] further emphasize that the relative ranking of methods depends strongly on dataset size, forecast horizon, and feature engineering, while comparative studies on heterogeneous PV fleets confirm that carefully tuned classical machine learning often rivals deep architectures on tabular weather-driven data [22,41]. Conversely, hybrid deep models that combine temporal convolutions, attention mechanisms, and recurrent units—including physics-informed XGBoost–LSTM pipelines [42] and CNN–LSTM–RF ensembles [43]—have achieved very high accuracy (

R^{2} \approx 0.997

) on utility-scale time series [27,28,31,44,45], but they typically require multi-year datasets and substantial computational resources, which limits their deployment in real-time prosumer and edge-computing contexts [34,36,46]. Equally consistent across these studies is the observation that forecast quality is fundamentally bounded by the fidelity of meteorological inputs, and that transition-season cloud variability, snow cover, and curtailment events remain dominant residual error sources [21,36]. Table 1 consolidates these observations along the dimensions of method, dataset, system scale, forecast horizon, input–output design, evaluation metrics, and reported limitations, in line with reviewer guidance, and positions the present work against this broader landscape.

Within this broader landscape, a coherent set of classical and deep-learning architectures has emerged as the de facto reference family for tabular, weather-driven PV power prediction, and these same architectures are used as the benchmark suite in the present study. Support Vector Machines (SVM) recast regression as a structural-risk-minimization problem and have repeatedly been ranked among the most accurate single-learner baselines for short-horizon PV forecasting when only a few months of training data are available [47]. Decision Trees (DT) and their bagged extension, the Random Forest (RF), offer fully interpretable axis-aligned partitioning of the input space and have proven particularly well-suited to weather-driven solar problems where feature interactions are predominantly local; recent work has further documented the strong out-of-sample behavior of RF on irradiance regression tasks, including in mountainous and high-variability sites [41,48]. k-Nearest Neighbors (KNN) provides a non-parametric instance-based baseline that captures the local geometry of the input distribution and serves as a useful sanity check on more complex learners, especially when the underlying input–output map is smooth at the scale of typical neighborhoods [47]. Finally, Deep Neural Networks (DNN) with rectified-linear units, dropout regularization and adaptive (Adam) optimization have become the dominant deep-learning baseline for PV power prediction in recent comprehensive reviews [49], and were also benchmarked on the present dataset. Including all five paradigms—instance-based (KNN), kernel-based (SVM), tree-based (DT, RF) and deep-learning (DNN)—provides a fair, methodologically diverse comparison against which the proposed neuro-fuzzy estimator can be evaluated, and avoids the well-known risk of drawing conclusions from a single, narrowly-chosen baseline.

To address these challenges, an Adaptive Neuro-Fuzzy Inference System (ANFIS) is proposed for PV power prediction that uses an FPPT-based operating strategy. ANFIS is a hybrid modeling approach that combines the learning capability of neural networks with the human-like reasoning of fuzzy logic. Structurally, ANFIS implements a Sugeno-type fuzzy inference system in a multilayer neural network framework. During training, the network adapts both the fuzzy membership functions and the rule parameters using input–output data. In effect, ANFIS automatically extracts if–then rules from data, guided by expert intuition embedded in fuzzy sets, while leveraging gradient-based and least-squares optimization for parameter tuning.

The ANFIS approach is well-suited to modeling ill-defined, nonlinear, and stochastic processes like PV generation. First, its fuzzy logic component inherently handles uncertainty and imprecision by assigning degrees of membership rather than crisp labels, enabling robust reasoning under ambiguous and noisy environmental inputs [34]. Second, the neural network aspect endows ANFIS with strong nonlinear modeling power. By blending these paradigms, ANFIS retains the flexibility of ML while remaining interpretable: the learned fuzzy rules can be inspected and understood, rather than buried in millions of network weights [50]. Recent studies have demonstrated that ANFIS-based models achieve a high level of accuracy and precision while maintaining relatively low computational complexity [51]. Importantly, their rule-based structure preserves a degree of interpretability that is often absent in purely data-driven approaches, which makes ANFIS particularly well-suited for real-time prediction and control applications in photovoltaic systems. In practical terms, ANFIS can capture the nonlinear I–V characteristics and dynamic effects of PV arrays, while transparently reflecting how changes in irradiance or temperature affect output [26]. This combination of interpretability and uncertainty management motivates the choice of ANFIS over purely black-box methods [50].

In the studied plant, the five inputs

[G, T_{PV}, T_{amb}, P_{grid}, P_{load}]

and the target

P_{FPP}

are measured synchronously at the same 5-min sampling instant t. The model, therefore, performs same-instant estimation (predicting) of

{\hat{P}}_{FPP} (t)

=

f_{ANFIS} (G (t), T_{PV} (t), T_{amb} (t), P_{grid} (t),

P_{load} (t))

rather than horizon-ahead predicting of

P_{FPP} (t + h)

. Three operational benefits motivate this formulation: (i) real-time inverter set-point validation against an independent data-driven estimate; (ii) sensor- and FPPT-controller-fault detection through residual analysis and (iii) a validated baseline that can later be extended to multi-step-ahead prediction by augmenting the input vector with lagged variables or numerical-weather-prediction outputs.

The remainder of this article is organized into six sections. Section 2 details the experimental setup and monitoring infrastructure of the 117.76 kWp PV system. Section 3 describes the data acquisition process, FPPT principles, and dataset statistics. The theoretical framework and mathematical formulations for both the ANFIS model and the benchmark algorithms are developed in Section 4. Section 5 presents the experimental design, covering data preprocessing, model configuration, and the three complementary validation strategies (S1 random, S2 chronological, and S3 external hold-out) utilized for robust evaluation. Section 6 provides a comparative analysis of the model performances and discusses their operational implications. Finally, Section 7 summarizes the key conclusions and outlines avenues for future work.

Objectives and Contributions

Although ANFIS has been applied to various photovoltaic predicting tasks, a critical examination of the existing literature reveals several unaddressed gaps that collectively motivate the present study. Table 1 summarizes representative recent ANFIS-based PV studies and highlights the key dimensions along which they differ from the present work.

Table 1. Comparison of representative ANFIS-based photovoltaic studies with the present work.

Study	Method(s)	Dataset/Location	System Scale	Forecast horizon	Inputs	Output	Metrics	Limitations
Salameh et al. [52]	ANFIS, ANN	Sharjah, UAE (hot–humid)	2.88 kW	Hourly	Env. only	PV power	RMSE, MAE, $R^{2}$	Small scale; env. inputs only
Ispir et al. [53]	ANFIS, ANN, MLR	Türkiye (continental)	—	Daily/monthly	Meteo.	Solar radiation	RMSE, MAPE, $R^{2}$	Resource study (no PV plant)
Annapoorani et al. [54]	ANFIS, ANN	India (tropical)	Small DC	Hourly	Env. only	Irradiance	RMSE, MAE, $R^{2}$	DC test bench; no AC/grid context
Mohammed et al. [55]	ANFIS + PSO/GA	Simulation	Lab-scale	MPPT (real-time)	V–I data	MPP power	Tracking eff., conv. time	Simulation only; MPP-only
Chicaiza et al. [56]	Fuzzy NN (digital twin)	Spain (Mediterranean)	2.16 kW	Short-term	Env. only	PV power	RMSE, $R^{2}$	Small scale; no grid/load input
Elboughdiri et al. [26]	ANFIS–GEP	Simulation	—	Hourly	Load + weather	Demand load	RMSE, MAPE	Demand-side only; no FPPT
Markovics & Mayer [22]	24 ML methods + NWP	Hungary (temperate)	Multi-plant fleet	Day-ahead	NWP outputs	PV power	RMSE, nRMSE, MAE	Bounded by NWP errors
Cisse et al. [27]	CNN–BiLSTM (optimized)	Smart-grid time series	Utility-scale	1–24 h	Meteo + hist. PV	PV power	RMSE, MAE, $R^{2}$	Heavy model; multi-year data
Nguyen Trong et al. [28]	Hybrid DL + VMD	PV plant series	Utility-scale	1–24 h	Meteo + hist. PV	PV power	RMSE, MAE, MAPE	Heavy preprocessing
Bouziane et al. [45]	CNN–RNN	Algeria	Medium-scale	Short-term	Env. + time	PV power	RMSE, $R^{2}$	Limited generalization
Xiang et al. [31]	TCN–ECANet–GRU	Public PV series	Utility-scale	Intra-day	Meteo + hist. PV	PV power	$R^{2} = 0.9972$ , RMSE	Very deep; very large data
Oprea & Bâra [23]	Stacked ensemble	Romania	Utility-scale	Day-ahead	Meteo + hist. PV	PV power	RMSE, MAE, nMAE	High overhead
Rodriguez-Leguizamon et al. [37]	XGBoost vs. LSTM	Colombia	Utility-scale	Short-term	Meteo + hist. PV	PV power	RMSE, MAE, $R^{2}$	Single-site validation
Cortez et al. [38]	ARIMA/LSTM/XGBoost	Brazil	Utility-scale	Intra-hour	Meteo + hist. PV	PV power	RMSE, MAE, MAPE	Very narrow horizon
Bin Yousuf et al. [42]	Physics-inf. XGB–LSTM	Public PV datasets	Utility-scale	Short-term	Meteo + physical	PV + uncertainty	RMSE, MAE, CRPS	High complexity
Kraska & Hanzel [36]	XGBoost vs. LSTM	Poland, 4 sites (continental)	25.55–49.5 kWp	Day-ahead (24 h)	NWP + lag + cyclical	PV power (kW)	XGB: 4.09/1.91/0.85; LSTM: 5.53/3.08/0.73	Snow cover, transitions, curtailments
Present work	ANFIS vs. SVM, DT, RF, KNN, DNN	NW Algeria (hot semi-arid)	117.76 kWp	5-min	$G, T_{PV}, T_{amb}, P_{grid}, P_{load}$	$P_{FPP}$ (kW)	RMSE, MAE, MAPE, $R^{2}$	Single-site; Prediction only

Three principal gaps emerge from this analysis. First, all prior ANFIS-based PV studies target either maximum power point (MPP) output, total PV power, or solar radiation; none addresses the prediction of the FPP that arises under FPPT-governed operation. As modern grid codes increasingly require PV plants to curtail output and provide ancillary services [12,13], the ability to accurately forecast the FPP—rather than merely the MPP—becomes operationally essential yet remains unexplored in the neuro-fuzzy literature. Second, existing ANFIS models for PV predicting rely almost exclusively on environmental inputs (irradiance, temperature, humidity), neglecting the electrical system-level variables (grid power, load power) that govern the real-time energy balance of a grid-connected facility. Incorporating these variables enables the model to reflect the instantaneous interaction between PV generation, facility demand, and grid exchange, which is critical for building-integrated energy management but has not been attempted in prior ANFIS studies. Third, the overwhelming majority of ANFIS-based PV studies are conducted on small-scale systems (typically < 5 kW) under controlled or simulation conditions, and in climatic zones that do not represent the hot, semi-arid environments characteristic of North Africa. Validation on a real, fully instrumented utility-scale plant operating in such conditions is, therefore, lacking.

To bridge these gaps, this work implements and evaluates an ANFIS-based predicting model on a utility-scale, grid-connected 117.76 kWp rooftop PV plant located in northwestern Algeria, as illustrated in Figure 1. The system operates under a zero-export, self-consumption strategy with dynamic FPPT control. Using real sensor data collected from the plant’s supervisory monitoring infrastructure, the ANFIS model is trained to predict the system’s FPP as a function of five simultaneously measured input variables: solar irradiance (G), ambient temperature (

T_{amb}

), module temperature (

T_{PV}

), grid power (

P_{grid}

), and load power (

P_{load}

). The specific contributions of this study are as follows:

Novel prediction target under FPPT control, this is the first study to apply ANFIS or any neuro-fuzzy architecture to the prediction of the FPP in a grid-connected PV system operating under a dynamic FPPT strategy. Unlike conventional MPP-oriented predicting, this formulation directly supports grid-compliant active power regulation and ancillary service provision.
Joint environmental electrical input framework. The proposed model uniquely combines environmental variables (irradiance, ambient and module temperatures) with electrical system-level variables (grid power and load power) as simultaneous inputs, embedding the facility’s real-time energy balance into the fuzzy inference process. This dual-domain input design enables the ANFIS to capture not only weather-driven PV variability but also demand-side dynamics, yielding a physically grounded and operationally relevant predictive model.
Real-world validation at utility scale in an under-represented climate. The model is trained and validated on 24,479 field measurements acquired from a fully instrumented 117.76 kWp rooftop PV installation serving an industrial facility in the hot, semi-arid climate of northwestern Algeria—a system scale and geographic context that are substantially under-represented in the existing ANFIS-based PV predicting literature, which is dominated by small-scale systems (<5 kW) in temperate or tropical regions.
Comprehensive and fair multi-model benchmarking. The ANFIS model is rigorously compared against five diverse machine learning paradigms—Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (KNN), and Deep Neural Network (DNN)—all trained and evaluated on identical data splits with systematically tuned hyperparameters. This breadth of comparison, spanning instance-based, tree-based, kernel-based, ensemble, and deep learning approaches, provides a robust and unbiased assessment of ANFIS’s relative merits for PV power prediction.

The proposed approach is further distinguished from prior ANFIS-based PV studies along several methodological dimensions. Whereas Salameh et al. [52] and Annapoorani et al. [54] employ ANFIS with purely environmental inputs on small residential-scale systems (≤3 kW) to predict total PV output, the present work operates at a fundamentally different system scale (117.76 kWp) and targets a different physical quantity (FPP under FPPT control). Similarly, while Chicaiza et al. [56] use a fuzzy neural network as part of a digital twin framework for a 2.16 kW PV facility, their model does not incorporate grid or load power and does not address flexible power regulation. Mohammed et al. [55] apply ANFIS in the context of partial shading optimization rather than time-series power forecasting. By contrast, the present study simultaneously addresses a practically relevant but unexplored prediction target (FPP), introduces a novel dual-domain (environmental + electrical) input architecture, and validates the approach on a real utility-scale installation under harsh climatic conditions—thereby extending the scope and applicability of ANFIS-based photovoltaic modeling beyond what has been previously demonstrated.

Figure 1. Studied PV System architecture: 117.76 kWp rooftop PV plant (zero-export self-consumption configuration).

2. Experimental Setup

A grid-connected rooftop photovoltaic (PV) system with a total installed capacity of 117.76 kWp is deployed on the roof of a technical workshop. The system operates in parallel with a 30 kV medium-voltage distribution network and is primarily intended to supply the electrical demand of the facility. During periods of insufficient solar generation—such as nighttime operation, low irradiance conditions caused by cloud cover, or intervals of high electrical demand—any energy deficit is automatically compensated by power drawn from the utility grid, thereby ensuring a continuous and reliable electricity supply.

All system specifications are defined under standard test conditions (STC: irradiance of 1000 W/m², cell temperature of 25 °C, and an air mass of AM 1.5). Expected performance levels are estimated using site-specific meteorological data to account for variations encountered under real operating conditions.

The full electrical and physical specifications are listed in Table 2. The photovoltaic array comprises 256 high-efficiency monocrystalline silicon modules, each rated at 460 W, selected to ensure reliable operation under local climatic conditions. The modules are installed with dual orientations (−84° and +96°) and a tilt angle of 10°, covering a total surface area of approximately 568.8 m². Electrical power conversion is carried out using two advanced string inverters rated at 50 kVA each, equipped with multiple independent maximum power point tracking (MPPT) channels to maximize energy yield under non-uniform irradiance conditions. The system is directly interconnected with the three-phase utility grid through a dedicated step-up transformer rated at 250 kVA, enabling compliant and reliable medium-voltage grid integration.

To ensure accurate performance assessment and effective fault diagnostics, the installation is supported by a comprehensive monitoring and data acquisition infrastructure. This framework continuously records electrical and environmental parameters, including inverter operating variables, solar irradiance, module temperature and ambient conditions, thereby facilitating both short-term and long-term performance analysis, system condition monitoring, and data-driven optimization.

3. Acquisition and Setup of Operational Data

The power control loop maintains continuous synchronization with the utility grid while governing the operation of the power conversion stage, with particular emphasis on the DC–AC inverter. This synchronization plays a critical role in ensuring high power quality, stabilizing voltage and frequency levels, and reducing the likelihood of unintended grid disconnections.

Within this framework, the photovoltaic system implements a dynamic FPP tracking (FPPT) approach that continuously determines the FPP

P_{FPP}

, even in the presence of changing environmental and operating conditions such as variations in solar irradiance, module temperature, and load demand. Through adaptive adjustment of the operating voltage, the FPPT enables the system to accurately track the desired power level while sustaining efficient operation, including during partial shading or rapid transients. Figure 2 illustrates the PV power, the grid power and the load demand graphs during one day.

The identification of the FPP is primarily determined by the available maximum power,

P_{M P P}

, and the instantaneous load demand. Accordingly, the system dynamically regulates the output current and voltage and, by extension, the generated power in response to these parameters, as described by the governing equations presented below.

The current at the maximum power point,

I_{M P P}

, varies proportionally with the incident irradiance and is further affected by temperature-induced deviations. This behavior can be represented as follows:

I_{M P P} = I_{S T C} \frac{G}{G_{S T C}} [1 + α (T_{m} - T_{S T C})]

(1)

where

I_{S T C}

denotes the reference current under standard test conditions (STC),

G_{S T C}

is the irradiance at STC,

α

is the temperature coefficient of current, and

T_{m}

and

T_{S T C}

represent the module temperature and the STC reference temperature, respectively.

The reference current

I_{S T C}

itself originates from the fundamental photovoltaic cell model, which describes the equilibrium between the photo-generated current and the diode reverse saturation current. This relationship, accounting for temperature effects, is expressed as follows:

I_{S T C} = I_{p h} - I_{0} (exp (\frac{q V_{S T C}}{n k T_{m}}) - 1)

(2)

where

I_{p h}

is the light-generated current,

I_{0}

is the diode saturation current, q denotes the elementary electric charge, k is the Boltzmann constant, and n is the diode ideality factor, which characterizes the non-ideal behavior of the semiconductor junction.

In a similar manner, the voltage at the maximum power point,

V_{M P P}

, exhibits a strong dependence on temperature and can be approximated using the voltage temperature coefficient

β

as follows:

V_{M P P} = V_{S T C} [1 + β (T_{m} - T_{S T C})]

(3)

The reference voltage

V_{S T C}

can be derived from the current–voltage characteristics of the photovoltaic cell and is given by the following:

V_{S T C} = \frac{n k T_{m}}{q} ln (\frac{I_{p h} + I_{S T C}}{I_{0}} + 1)

(4)

This formulation stems from the exponential relationship governing semiconductor junctions, where the logarithmic term reflects the voltage developed as a function of generated current and operating temperature.

Finally, the electrical power at the maximum power point,

P_{M P P}

, is influenced by temperature-induced effects and can be approximated as follows:

P_{M P P} = P_{S T C} [1 + γ (T_{m} - T_{S T C})]

(5)

where

P_{S T C}

represents the rated power under standard test conditions and

γ

denotes the power temperature coefficient. Collectively, these expressions derived from the intrinsic operating principles of photovoltaic cells enable maximum power point tracking (MPPT) algorithms to dynamically identify and follow the optimal operating point under rapidly changing environmental conditions, thereby ensuring efficient and reliable PV system performance.

The control strategy of the studied system employs an FPPT approach to regulate the photovoltaic system’s output power in accordance with the load demand. When the required power exceeds the maximum power point (MPP) that can be supplied by the PV array, the resulting power deficit is automatically compensated by drawing the additional required energy from the utility grid, as illustrated in Figure 2. This operating principle ensures continuous supply and power balance, and the corresponding relationship is described by the following equation:

P_{G r i d} = P_{L o a d} - P_{F P P}

(6)

Data Analysis and Interpretation

The dataset examined in this study was collected directly from the system described in the preceding section. As previously described, the system is equipped with an integrated monitoring infrastructure that simultaneously records environmental and electrical variables under real operating conditions. All variables are logged at regular intervals through a centralized data acquisition unit, ensuring temporal alignment and consistency across the dataset. The monitoring campaign spans a representative operating period characterized by pronounced variability in irradiance and temperature, thereby capturing a wide range of system states driven by stochastic environmental variables. Such coverage is essential for investigating performance trends and training data-driven models that must remain robust under non-stationary conditions. Continuous logging, combined with redundant communication links, minimizes data loss and preserves measurement fidelity.

The selection of input features was informed by photovoltaic operating principles and exploratory statistical analysis. Solar irradiance, module temperature, ambient temperature, grid power, and load power were retained due to their direct influence on carrier generation, heat transfer, and semiconductor junction dynamics within the PV modules. These variables govern the current–voltage characteristics of the array and, by extension, its power output.

Table 3 summarizes the statistical characteristics of the electrical and environmental variables recorded over 24,479 valid samples, providing a quantitative overview of the operating conditions of the photovoltaic system and its interaction with the grid and load.

The PV power output exhibits pronounced variability, with a mean value of 17.38 kW and a standard deviation exceeding 23.7 kW. This wide dispersion reflects the intermittent nature of solar generation, further evidenced by a median of only 2.52 kW and a first quartile equal to zero, indicating extended periods of negligible or no production, particularly during low-irradiance conditions. Nevertheless, peak generation reaches 106.3 kW, approaching the nominal capacity of the installation and confirming proper operation under favorable conditions.

Grid power shows a substantially higher mean of 33.49 kW and a maximum value of 380.75 kW, highlighting the grid’s dominant role in balancing demand when PV generation is insufficient. The load power profile follows a similar pattern, with an average demand of 50.68 kW and a maximum exceeding 420 kW. The large standard deviation of load power (63.86 kW) indicates significant fluctuations in consumption, necessitating continuous power exchange with the grid to maintain supply-demand equilibrium.

From an environmental perspective, ambient temperature remains relatively stable, with a mean of 24.6 °C and moderate dispersion, while module temperature shows a much wider range, varying from 9.5 °C to 69.1 °C. This spread underscores the strong thermal stress experienced by the PV modules under high irradiance conditions. Solar irradiance itself ranges from 0 to 1206 W/m², with a mean of 284 W/m² and a median of 95 W/m², confirming that a substantial fraction of the dataset corresponds to low or zero irradiance periods.

Figure 3 presents the Pearson correlation matrix for the recorded variables. The strongest correlations appear between load and grid power (

r \approx 0.95

) and between irradiance and module temperature (

r \approx 0.93

), which is consistent with the strong coupling between demand, power exchange with the grid, and the thermal response of the PV field. PV power is also highly correlated with irradiance (

r \approx 0.82

) and module temperature (

r \approx 0.77

), confirming that the pre-processing steps preserved the expected physical relationships between generation and environmental conditions. Ambient temperature shows only moderate correlations with the electrical variables, suggesting a secondary but still relevant influence on system behavior. Altogether, this heatmap provides an intuitive overview of the main dependencies in the dataset and supports the joint use of these variables in the subsequent modeling and analysis.

4. Flexible Power Point Tracking: Theory, Modes and Implementation in the Studied Plant

This section formalizes the FPPT strategy that produces the operational behavior reported in Section 3 and quantified by the load-balance identity of Equation (6). It is organized in five steps: (i) the FPPT operating principle and a taxonomy of canonical FPPT modes (Section 4.1); (ii) the set-point law used by the studied plant (Section 4.2); (iii) the choice of operating point on the P–V curve and the associated stability argument (Section 4.3); (iv) the as-built distributed control architecture, in which the closed-loop FPPT is realized outside the inverter firmware by a supervisory data logger and a revenue-grade energy meter (Section 4.4. Throughout,

P_{MPP}

denotes the instantaneous maximum available PV power (Equation (5)),

P_{FPP}

the delivered (curtailed) PV power,

P_{load}

the facility load and

P_{grid}

the power exchanged with the utility grid, with the sign convention

P_{grid} \geq 0

for import and

P_{grid} < 0

for export.

4.1. Operating Principle and Taxonomy of FPPT Modes

FPPT generalizes maximum power point tracking (MPPT) by allowing the PV array to operate at a flexible power point

P_{FPP} (t) \leq P_{MPP} (t)

dictated by a high-level set-point

P_{ref} (t)

derived from system-level constraints rather than from the array I–V characteristic alone. Four canonical FPPT modes are recognized in the recent literature [12,13]:

Constant Power Generation (CPG): $P_{ref} (t) = P_{cap}$ , a fixed cap below the inverter rating. Used for inverter-rating compliance and feeder-capacity management.
Power Reserve Control (PRC): $P_{ref} (t) = P_{MPP} (t) - Δ P$ , with $Δ P > 0$ a reserve margin kept available for frequency support.
Power Ramp-Rate Control (PRRC): $| {\dot{P}}_{ref} | \leq R_{max}$ , limiting the time-derivative of injected power to mitigate rapid irradiance transients.
Load-following/Zero-Export (LF-ZE): $P_{ref} (t) = min (P_{MPP} (t), P_{load} (t))$ , ensuring that no PV energy is injected into the upstream grid.

The 117.76 kWp plant studied here operates in LF-ZE mode commercially marketed by the inverter manufacturer as Zero Export [57]. The remaining three modes (CPG, PRC, PRRC) are referred to only where strictly relevant; their integration with the LF-ZE controller will be treated in future work.

4.2. Set-Point Law in the Studied Plant

Combining the zero-export constraint

P_{grid} \geq 0

with the inverter physical limit

P_{FPP} \leq P_{MPP}

and the load-balance identity of Equation (6) yields the LF-ZE set-point

P_{ref} (t) = min (P_{load} (t), P_{MPP} (t))

(7)

which partitions the operating envelope into three physically distinct regimes, see Figure 2:

Generation-rich, $P_{MPP} \geq P_{load}$ : $P_{ref} = P_{load}$ and $P_{grid} = 0$ . The inverter actively curtails the available PV power.
Generation-deficit, $0 < P_{MPP} < P_{load}$ : $P_{ref} = P_{MPP}$ and $P_{grid} = P_{load} - P_{MPP} > 0$ . The inverter reverts to standard MPPT; the deficit is imported.
Night-time, $P_{MPP} = 0$ : $P_{ref} = 0$ and $P_{grid} = P_{load}$ . Pure grid supply.

In the language of Section 3, the curtailment regime is precisely the regime in which

P_{FPP}

is not equal to

P_{MPP}

, i.e., the regime in which the proposed ANFIS estimator of Section 5.3 must learn a non-trivial mapping rather than the trivial photovoltaic relationship of Equations (1)–(5).

4.3. Operating-Point Selection on the P–V Curve

Because the array P–V characteristic is single-peaked, for any curtailed reference

P_{ref} < P_{MPP}

there exist two feasible operating voltages:

V_{left} < V_{mpp}

in the current-source region and

V_{right} > V_{mpp}

in the voltage-source region (Figure 4). The choice between the two branches is a central design decision for any FPPT controller, with well-established trade-offs that we summarize here because they inform the interpretability of the ANFIS estimator developed in subsequent sections.

The FPPT literature converges on left-of-peak operation

(

V_{fpp} \leq V_{mpp}

) as the more robust choice on three grounds [9,12,13]. First, under a fast irradiance drop the P–V curve contracts and both

V_{mpp}

and

V_{oc}

decrease; a right-branch operating point can, therefore, be left above the new

V_{oc}

before the slow outer power loop reacts, producing a momentary loss of power tracking and, in extreme cases, an inverter trip. Left-branch operation preserves a safe headroom to

V_{oc}

at all times. Second, on the left branch

d P / d V > 0

, so an increase in

V_{fpp}

moves the operating point toward the MPP and increases the delivered power. The outer control law is, therefore, strictly monotone and a single sign convention suffices. Third, left-branch curtailment delivers the same

P_{ref}

at a lower DC-link voltage than right-branch operation, reducing switching losses and capacitor-voltage stress.

We emphasize, however, that the curtailment branch actually employed inside the SG50CX firmware is not disclosed by the manufacturer [57]. The MPP voltage range for full rated power is 550–850 V, with a maximum permissible

V_{dc}

of 1000 V and a nominal

V_{dc}

of 585 V; the asymmetric headroom (415 V above nominal versus 385 V below) is, in principle, consistent with either branch. The left-of-peak argument above should, therefore, be read as a control-theoretic recommendation that informs the interpretation of the FPP–input relationship learned by the ANFIS model, rather than as a description of the proprietary SG50CX algorithm.

4.4. As-Built Distributed Control Architecture

A defining feature of the studied installation is that the closed-loop FPPT is not realized inside a single device. The set-point computation of Equation (7) is performed by an external supervisory layer, the two Sungrow SG50CX inverters acting as fast active-power-limited slaves that track an externally imposed cap. The data path is illustrated in Figure 5.

The DTSD1352-C revenue-grade energy meter measures the net grid exchange at the point of common coupling (PCC) and stream it over RS485 to the Sungrow Logger1000B data gateway. The Logger evaluates Equation (7) every

T_{s} \approx 1

s and writes the resulting active-power cap to the SG50CX inverter via the proprietary Modbus register Active Power Limit. Each inverter then back-allocates the cap across its five MPPT boost channels and tracks the cap with an internal

d q

-frame current loop synchronized to the PCC voltage by a phase-locked loop (PLL). The 5-min logging used by the ANFIS dataset (Section 5.1) is two to three orders of magnitude slower than the underlying control loop.

4.4.1. Supervisory Cap Computation (Logger1000B)

The Logger1000B polls the DTSD1352-C meter at

T_{s} \approx 1

s intervals, evaluates Equation (7) using the measured

P_{load}

and an internal estimate of

P_{MPP}

inferred from the inverters’ reported per-MPPT operating points, and updates the cap that is written to each inverter. The supervisory loop, therefore, enforces the zero-export constraint at the timescale of the meter polling interval, not at the 5-min logging interval used by the ANFIS dataset.

4.4.2. Inverter-Level Cap Tracking (SG50CX)

On receipt of an active-power cap

P_{ref}^{inv} < \sum_{i} P_{MPP, i}

, each SG50CX boost channel i backs off from its local

V_{mpp, i}

until

\sum_{i} P_{i} (V_{i}) = P_{ref}^{inv}

. The inverter datasheet specifies five independent MPPT channels with a per-channel MPP voltage range of 200–1000 V and a full-power window of 550–850 V [57]. The internal curtailment algorithm is proprietary and is treated here as a black box characterized only by its measured

P_{FPP}

response. AC-side regulation uses a standard

d q

-frame current loop with PLL synchronization at the PCC, in compliance with IEC 61727:2004, IEC 62116, VDE-AR-N 4105:2018, EN 50549-1:2019 and the manufacturer’s certification matrix [57].

The contribution of the present work is, accordingly, not the FPPT controller itself, but the construction of a high-accuracy, interpretable, real-time estimator of

P_{FPP} (t)

that operates in parallel with the distributed control loop of Figure 5. The ANFIS estimator takes the same five synchronously measured inputs that drive the supervisory cap of Equation (7) (irradiance, module and ambient temperature, load and grid power) and reproduces the inverter’s delivered FPP to within an RMSE of 325–654 W on the held-out partitions of Section 6. It can, therefore, be deployed as (i) an independent cross-check on the Logger1000B set-point in real time, (ii) a residual-based anomaly detector for the inverter, meter, PLL or sensor faults, and (iii) a building block for future PRC/PRRC/FRT extensions, whose controllers will need a fast, transparent estimate of the available headroom

P_{MPP} - P_{FPP}

to allocate ancillary services without violating the LF-ZE constraint.

5. Dataset Description, Model Configuration, and Training Methodology

This section describes the data collection strategy, preprocessing pipeline, input feature selection, and the complete configuration of all models employed in this study. It further details the rationale underpinning the ANFIS hyperparameter choices, reports a sensitivity analysis on the number of membership functions, and specifies the computational environment.

5.1. Data Collection and Preprocessing

Building a robust predictive model for a real-world PV system requires measured data that spans a wide and representative range of operating conditions. The dataset examined in this study was collected directly from the 117.76 kWp grid-connected rooftop PV system described in Section 2. The system is equipped with an integrated monitoring infrastructure that simultaneously records environmental and electrical variables under real operating conditions. All variables are logged at a fixed sampling interval of 5 min through a centralized data acquisition unit, ensuring temporal alignment and consistency across the dataset. The monitoring campaign covers the period from 1 May 2025 to 24 July 2025 85 distinct days retained after listwise deletion of samples containing missing or physically invalid values six non-contiguous days are dropped due to data-logger outages or sensor faults and one day reduces to a single valid sample, leaving 85 unique day identifiers under the composite key are used, capturing pronounced variability in both solar irradiance and ambient temperature across the late spring and summer season, thereby representing a wide range of system operating states driven by stochastic environmental forcing.

5.1.1. Preprocessing Pipeline

Prior to model training, the raw measurements were subjected to a structured two-stage preprocessing pipeline.

Stage 1: Validity filtering.

A sample-wise validity mask was constructed by identifying and removing all records containing missing (NaN) or physically invalid (infinite) values in any of the six monitored variables. A sample i is retained if and only if

{valid}_{i} = ⋂_{v \in V} (\neg isnan (v_{i}) \land \neg isinf (v_{i})),

(8)

where

V = {P_{FPP}, G, T_{PV}, T_{amb}, P_{grid}, P_{load}}

. The criterion is applied jointly across all variables; samples failing it—due to communication dropouts, sensor faults, data-logger errors, or transient disturbances—are excluded from all subsequent processing. No imputation or interpolation is applied: the conservative listwise deletion strategy prioritizes data integrity over sample retention. This approach is appropriate given the 5-min sampling interval, which ensures that isolated invalid records represent a negligible fraction of the dataset and that their removal does not distort the temporal structure of the remaining observations.

Stage 2: Feature-specific normalization.

Normalization is applied selectively according to the requirements of each model rather than uniformly across all algorithms.

DNN. All five input features are scaled to $[0, 1]$ via min–max normalization computed exclusively from the training partition:

${\tilde{x}}_{k, i} = \frac{x_{k, i} - {min}_{train} (x_{k})}{{max}_{train} (x_{k}) - {min}_{train} (x_{k}) + ε}, ε = 10^{- 10} .$

(9)

The target variable $P_{FPP}$ is normalized identically during training and recovered in the original units via inverse transformation:

${\hat{y}}_{i} = {\tilde{\hat{y}}}_{i} \cdot (max_{train} (y) - min_{train} (y)) + min_{train} (y) .$

(10)

All scaling statistics are derived from the training set only and applied to the test set without recomputation, preventing any information leakage across the partition boundary.
SVM and KNN. Z-score standardization is applied internally by the MATLAB fitrsvm and fitrknn functions via the Standardize = true option, transforming each feature to zero mean and unit variance using training-set statistics:

${\tilde{x}}_{k, i} = \frac{x_{k, i} - {\bar{x}}_{k}^{train}}{s_{k}^{train}},$

(11)

where ${\bar{x}}_{k}^{train}$ and $s_{k}^{train}$ are the training-set mean and standard deviation of the k-th feature, respectively. Standardization is handled transparently within the MATLAB model objects and is applied automatically to test-set inputs without explicit preprocessing. In the event that fitrknn is unavailable, a manual brute-force KNN implementation is employed with explicit z-score standardization incorporating a numerical stability constant $ε = 10^{- 10}$ in the denominator, using the same training-set statistics for both partitions.
ANFIS, Decision Tree, and Random Forest. No explicit normalization is applied. ANFIS operates directly on raw feature values, with the Gaussian membership function parameters ${c_{j, k}, σ_{j, k}}$ adapted to the natural scale of each input variable during the hybrid learning procedure. Decision Tree and Random Forest are invariant to monotonic feature transformations by virtue of their split-based structure and, therefore, require no preprocessing.

5.1.2. Data Partitioning: Three Complementary Validation Strategies

To address the well-documented risk that a single random split may yield optimistic test-set performance through favorable day selection, and to satisfy the standard time-series evaluation practice of strictly chronological evaluation, three complementary validation strategies are implemented. They are evaluated jointly on every model and reported side-by-side in Section 6.

Container hierarchy.

The 85 valid calendar days are first partitioned by reserving the most recent 14 days (11 July to 24 July 2025) as an external hold-out that is excised before any model is trained, tuned, or selected. The remaining 71 days form the inner pool on which strategies S1 and S2 operate. The three sets satisfy

D_{ext} (S 3) \subset D_{late} (S 2 test zone) \subset D .

(12)

S1—Random day-based split (60 train/11 test days). The 71 inner-pool days are randomly permuted using the fixed random seed rng(42); the first 85% (60 days) form

D_{train}

and the remaining 15% (11 days) form

D_{test}

. This strategy preserves the spirit of the original randomized partition, with the only difference that the 14 external-hold-out days are no longer in the inner pool. S1 is retained for backward-compatibility and to permit head-to-head comparison with the other two strategies.

S2—Chronological 70/15/15% split (50 train/11 val/10 test days). The 71 inner-pool days are sorted in ascending calendar order; the first 50 days (1 May to 19 June 2025) form the training set, the next 11 days the validation set, and the final 10 days (1 July to 10 July 2025) the chronological test set. By construction, every test sample lies strictly later in calendar time than every training sample, conforming to the standard time-series evaluation practice and closing the “favorable random day selection” concern.

S3—External 14-day hold-out (11–24 July 2025). The 14 most recent calendar days of the entire monitoring campaign (4032 samples) are excised before any model is trained, tuned, or selected, and are evaluated only at the end of the pipeline for the final one-shot performance report. This is the strictest possible form of temporal validation and closes any residual concern about hyperparameter-selection leakage.

The three strategies are summarized in Table 4, and the calendar coverage of the splits is illustrated in Figure 6. The day-based granularity ensures that all 5-min samples within a given calendar day are assigned exclusively to one partition, preventing within-day temporal autocorrelation from artificially inflating test performance.

The resulting partitions for each strategy satisfy the disjointness condition:

D = D_{train} \cup D_{val / test} \cup D_{ext}, D_{train} \cap D_{test} \cap D_{ext} = \emptyset .

(13)

All six models (ANFIS and five benchmarks) are trained independently for each strategy and evaluated on identical partitions to ensure a fair and unbiased comparative assessment.

5.2. Input Feature Selection

Although the load-balance identity of Equation (6) creates an apparent risk that the model could trivially recover the FPP by subtracting its grid- and load-power inputs, four converging analyses establish that this shortcut is neither available nor exploited in the present study. First, the identity holds only in the lossless instantaneous limit; on the measured dataset the Pearson correlation between

(P_{load} - P_{grid})

and

P_{FPP}

is

r = 0.987

, and the balance residual

ε (t) = P_{load} (t) - P_{grid} (t) - P_{FPP} (t)

has standard deviation

σ_{ε} \approx 1840

W, reflecting conversion losses across the boost and inversion stages, reactive-power contributions from the inductive industrial load, independent noise across three separate sensing chains, and minor sampling asynchrony.

Second, the supervisory FPPT loop closes inside the Logger1000B data gateway at a polling cadence of

T_{s} \approx 1

s, whereas the ANFIS dataset records 5-min time-averaged values

X^{logged} (t_{k}) = T_{log}^{- 1} \int_{t_{k} - T_{log}}^{t_{k}} X (τ) d τ

; because time-averaging does not commute with the min non-linearity of the LF–ZE set-point law of Equation (7), the logged power triple deviates from the instantaneous balance even under idealized lossless operation, making a trivial-subtraction strategy strictly suboptimal by construction.

Third, if accuracy were dominated by the load-balance shortcut, all six models would have produced split-invariant RMSE values, yet the chronological (S2) and external (S3) strategies reveal a

1.5

–

2.0 \times

degradation for every benchmark except ANFIS—a pattern that is incompatible with sample-local leakage. Fourth, the residual standard deviation

σ_{ε} \approx 1840

W sets a hard floor on the RMSE achievable by any subtraction-based estimator; the full-input ANFIS penetrates this floor by a factor of

\approx 5

on the strictly external 14-day hold-out (363 W) and by a factor of

\approx 6

on the chronological test set (325 W), indicating that the learned mapping captures the loss landscape, the reactive-component split, and the averaging-induced non-linearity rather than the underlying arithmetic identity.

Feature selection was guided jointly by photovoltaic operating principles and exploratory statistical analysis. Five variables were retained as model inputs:

x_{i} = {[P_{grid, i}, T_{PV, i}, T_{amb, i}, P_{load, i}, G_{i}]}^{T},

(14)

where

P_{grid, i}

is grid power (W),

T_{PV, i}

is PV module temperature (°C),

T_{amb, i}

is ambient temperature (°C),

P_{load, i}

is load power (W), and

G_{i}

is solar irradiance (W/m²). Solar irradiance and module temperature govern photocurrent and open-circuit voltage through the semiconductor junction equations (Equations (1)–(5)), while the grid and load power determine the instantaneous energy balance under the FPPT strategy (Equation (6)). Correlation analysis (Figure 3) confirms statistically significant Pearson correlations between all five variables and the FPP target (

| r | \geq 0.34

), supporting their joint inclusion. The complete input matrix and target vector are

X \in R^{N \times 5}

and

y \in R^{N}

, with disjoint partitioning

D = D_{train} \cup D_{test}

,

D_{train} \cap D_{test} = \emptyset

.

5.3. ANFIS Layer Equations and Learnable Parameters

For the reader’s reference, the five-layer Sugeno-type ANFIS architecture used throughout this paper is fully specified by the following set of equations. The labels introduced here are used in Section 6 and Section 6.4 to interpret the trained parameters.

Layer 1—Fuzzification. Each input variable

x_{k, i}

(

k = 1, \dots, 5

for

G, T_{PV}, T_{amb}, P_{grid}, P_{load}

) is mapped to a fuzzy membership degree through a Gaussian membership function with center

c_{j, k}

and width

σ_{j, k}

:

μ_{A_{j, k}} (x_{k, i}) = exp [- \frac{{(x_{k, i} - c_{j, k})}^{2}}{2 σ_{j, k}^{2}}] .

(15)

A non-negativity constraint

σ_{j, k} > 0

(16)

is not enforced explicitly during back-propagation, an artifact discussed in Section 6.4.

Layer 2—Rule firing strength. For rule

j = 1, \dots, R

, the firing strength is the product T-norm of the five antecedent membership degrees:

w_{j} (x_{i}) = \prod_{k = 1}^{5} μ_{A_{j, k}} (x_{k, i}) .

(17)

Layer 3—Normalization. The firing strengths are normalized:

{\bar{w}}_{j} (x_{i}) = \frac{w_{j} (x_{i})}{\sum_{l = 1}^{R} w_{l} (x_{i})} .

(18)

Layer 4—First-order Sugeno consequent. Each rule emits a linear consequent in the five inputs:

f_{j} (x_{i}) = β_{j, 0} + \sum_{k = 1}^{5} β_{j, k} x_{k, i} .

(19)

Layer 5—Defuzzification (weighted output). The overall ANFIS estimate is the firing-strength-weighted sum of the rule consequents:

{\hat{y}}_{i} = \sum_{j = 1}^{R} {\bar{w}}_{j} (x_{i}) f_{j} (x_{i}) .

(20)

with

R = 2

rules,

K = 5

inputs and a linear consequent of

K + 1 = 6

coefficients per rule, the total number of trainable parameters is

R (2 K) + R (K + 1) = 2 \cdot 10 + 2 \cdot 6 = 32

, consistent with the structural analysis of Section 5.4.2.

Hybrid learning—gradient update. The membership-function centers and widths are updated by gradient descent on the sum-squared error loss; with learning rate

η

and gradients

\partial J / \partial c_{j, k}

and

\partial J / \partial σ_{j, k}

:

c_{j, k}^{(t + 1)} = c_{j, k}^{(t)} - η \frac{\partial J}{\partial c_{j, k}}, σ_{j, k}^{(t + 1)} = σ_{j, k}^{(t)} - η \frac{\partial J}{\partial σ_{j, k}} .

(21)

The consequent coefficients

β_{j, k}

are recomputed in the forward pass by least-squares estimation conditional on the current centers and widths, completing the hybrid algorithm.

5.4. ANFIS Configuration and Hyperparameter Selection

5.4.1. Membership Function Type

Gaussian membership functions were selected over triangular, trapezoidal, and generalized bell-shaped alternatives on three grounds. First, Gaussian functions are infinitely differentiable (

C^{\infty}

), ensuring that the gradient of the ANFIS loss is well-defined everywhere with respect to

c_{j, k}

and

σ_{j, k}

; triangular and trapezoidal functions contain non-differentiable vertices that can destabilize the gradient descent phase [54,55]. Second, their smooth, gradual transitions between adjacent fuzzy sets are physically appropriate for the continuous environmental drivers of PV output—solar irradiance, for instance, attenuates progressively during cloud-passage events rather than switching abruptly between linguistic levels, a behavior that crisp boundaries cannot represent faithfully [52,53]. Third, Gaussian functions are symmetric and unimodal, so that each center

c_{j, k}

can be associated with a representative operating level and each width

σ_{j, k}

quantifies the ambiguity radius around it, preserving physical interpretability for system operators [50].

5.4.2. Cluster Influence Range: Sensitivity Analysis and Optimal Selection

In subtractive clustering-based ANFIS, the cluster influence range (radius) is the most consequential structural hyperparameter, as it governs the number of fuzzy clusters and hence fuzzy rules automatically extracted from the training data: a smaller radius produces more localized clusters representing fine-grained operating regimes, whereas a larger radius merges nearby points into fewer, more general clusters. Unlike grid-partitioning methods, subtractive clustering positions Gaussian membership functions directly at the identified cluster locations, yielding a rule base intrinsically adapted to the data topology. A controlled sensitivity analysis was conducted over five candidate values

{0.2, 0.3, 0.5, 0.7, 0.9}

on the S1 random split (60 train/11 test days, with the 14-day external hold-out excised first); results are reported in Table 5.

A particularly noteworthy observation from Table 5 is that all five tested radius values converge to exactly

R = 2

fuzzy rules and 32 learnable parameters (

2 \times 6

consequent coefficients plus

2 \times 2 \times 5

membership-function parameters), and that this result is reproduced identically under both the random-day-based split (S1) and the chronological split (S2). The cluster-influence radius, therefore, modulates only the quality of cluster-center initialization rather than the model architecture itself, isolating its effect on the convergence behavior of the hybrid learning algorithm and decoupling it from the choice of train–test partitioning. This consistent 2-rule outcome is not a limitation of the clustering algorithm, but rather an emergent reflection of the dataset’s intrinsic structure, as established through five converging diagnostic analyses: (i) principal component analysis reveals extreme variance concentration in the first component (PC1 = 76.79% of total variance), indicating a near-unidimensional input space; (ii) variance inflation factor analysis confirms severe multicollinearity (VIF > 15 for four of five features), demonstrating near-linear dependence among input variables; (iii) extraction of trained membership-function parameters show both cluster centers positioned at zero solar irradiance (

G = 0

W/m²) and differentiated exclusively by temperature (

T_{PV} = 21.1

°C vs.

14.8

°C), representing warm and cool zero-output regimes rather than active generation states; (iv) pairwise Euclidean distance profiling exhibits a broad, unimodal distribution spanning

[0, 1.64]

with no natural bimodal gap that would delineate distinct clusters and (v) extended radius sensitivity testing over

[0.05, 1.0]

reveals non-monotonic rule generation behavior (radii

0.05

–

0.10

fail, radius

0.15

generates four rules, radii

0.20

–

0.35

yield two rules, radius

0.40

transiently produces three rules, and radii

0.50

–

1.00

revert to two rules), confirming the absence of a stable multi-cluster structure. Collectively, these findings establish that the 2-rule outcome is a faithful, data-driven representation of the PV monitoring dataset topology rather than an algorithmic failure; the fact that it holds identically under both random and chronological partitioning further indicates that this structural property is independent of the temporal ordering of the samples.

Despite this structural equivalence, the radius produces systematic and split-dependent test-time performance trends. Under the random-day split (S1), test RMSE varies non-monotonically across the radius range, reaching its minimum of

653.6

W at radius

0.2

(

R^{2} = 0.9992

) and rising to a maximum of

817.1

W at radius

0.7

(

R^{2} = 0.9988

)—a

25.0 %

degradation that cannot be attributed to increased model complexity, since all configurations retain the same 2-rule, 32-parameter architecture. The chronological split (S2) exhibits the same qualitative tendency but at a markedly lower magnitude: test RMSE rises from

323.7

W at radius

0.3

(best,

R^{2} = 0.9998

) to

402.9

W at radius

0.7

(worst,

R^{2} = 0.9998

), with the coefficient of determination saturating across all configurations. The systematically lower S2 test RMSE—roughly half of the S1 value at every radius—reflects the comparatively smoother regime of the chronological hold-out, in which the test window inherits diurnal and meteorological patterns that closely resemble the most recent training samples; the random-day split breaks this temporal continuity and forces the model to interpolate across heterogeneous daily regimes drawn from the entire monitoring period. In both splits, however, the smallest radii (

0.2

–

0.3

) consistently provide the best generalization, while radii

\geq 0.5

progressively worsen test RMSE. This pattern reflects an initialization-induced bias: larger radii position the initial cluster centers further apart in the normalized input space, biasing gradient descent toward solutions that fit the training distribution more closely at the expense of generalization. Training RMSE in turn shows no clear monotonic dependence on the radius in either split (S1: 1100–1256 W; S2: 1230–1389 W), confirming that the radius does not produce a uniform reduction in training error but instead reshapes the loss landscape around different local minima. Taken together, radius

= 0.2

is selected as the operating point for the final model, as it simultaneously achieves the best S1 test RMSE (

653.6

W,

R^{2} = 0.9992

) and a near-best S2 test RMSE (

325.4

W, only

1.7

W behind the absolute optimum at radius

0.3

, with the same

R^{2} = 0.9998

)—providing a robust, split-independent operating choice.

5.5. Computational Environment and Software

All numerical experiments were conducted on a Dell Precision 7730 laptop workstation (Dell Inc., Round Rock, TX, USA). The system was equipped with an Intel^® Core^TM i7-8850H processor (Intel Corporation, Santa Clara, CA, USA; 6 cores/12 threads, 2.60 GHz base frequency, 4.30 GHz maximum turbo frequency), 32 GB of DDR4-2666 MHz RAM, and an NVIDIA Quadro P3200 dedicated GPU (NVIDIA Corporation, Santa Clara, CA, USA; 6 GB GDDR5). The operating system was Windows 10 Pro (Microsoft Corporation, Redmond, WA, USA; 64-bit, Version 22H2).

All models, including the proposed ANFIS and the five benchmarks (SVM, Decision Tree, Random Forest, KNN, and DNN), were implemented within a unified MATLAB R2024b environment. This approach ensures a consistent and fair comparison by utilizing a single software platform and numerical engine for all algorithms.

5.6. Complete Model Configuration

Table 6 provides the complete hyperparameter configuration for all six models implemented in this study. All models were developed entirely within the MATLAB R2024b environment, ensuring a unified and reproducible experimental pipeline. For the five benchmark models (SVM, Decision Tree, Random Forest, KNN, and DNN), hyperparameters were determined through a grid search strategy with 5-fold cross-validation applied exclusively to the training set, ensuring that no information from the test set influenced model selection. The search space explored for each model is reported alongside the final selected values. For ANFIS, hyperparameters were determined through the dedicated sensitivity analysis described in Section 5.4, where a cluster influence radius of 0.2 was identified as optimal. All models were trained and evaluated under identical conditions: the same day-based train/test split (85%/15%), the same five-feature input vector (Grid Power, PV Temperature, Ambient Temperature, Load Power, Solar Radiation).

6. Results and Discussion

6.1. ANFIS Model Performance Across the Three Validation Strategies

Following the preprocessing and three-strategy partitioning procedures described in Section 5.1.2, the ANFIS model was trained independently for each strategy using the optimal configuration identified in Section 5.4 (cluster influence radius

= 0.2

; Gaussian membership functions; first-order Sugeno consequents; 1000 hybrid learning epochs). Each trained model was then evaluated on (i) its own held-out test partition (S1 random or S2 chronological) and (ii) the strictly external 14-day hold-out (S3, 11–24 July 2025), which was excised before any training or hyperparameter selection. The complete numerical results are summarized in Table 7, and the diagnostic Figure 7, Figure 8 and Figure 9.

Three observations are particularly noteworthy. First, ANFIS achieves

R^{2} \geq 0.9992

on every evaluation set considered, corresponding to RMSE values between 325 W and 654 W on a 117.76 kWp installation, i.e., NRMSE between 1.5% and 3.9% of peak rated power. Second, the strict chronological evaluation (S2) improves performance relative to S1 (RMSE drops from 654 W to 325 W on the test partition); this is consistent with the test window of S2 (1–10 July) being dominated by clear-sky high- irradiance days for which ANFIS captures the dominant linear trend particularly well. Third, the external 14-day hold-out, which is never seen during training, tuning, or selection, yields RMSEs of 363 W (when ANFIS was trained under S1) and 408 W (when trained under S2)—both substantially lower than the training RMSE of ≈1179 W. This result, derived from a strictly out-of-sample evaluation, definitively rules out data leakage as a cause of the observed train→test gap and is analyzed mechanistically in Section 6.3.

To elucidate the mechanisms underlying the reported performance, it is necessary to examine the internal fuzzy inference architecture of the trained ANFIS model. Membership functions (MFs) constitute a core component of the fuzzification process: they map each crisp input variable into a fuzzy linguistic degree within

[0, 1]

via Equation (15). Section 6.4 provides the full numerical parameters and a detailed physical interpretation; the present subsection only summarizes the overall predictive accuracy.

6.2. Comparative Evaluation Across Six Models and Three Validation Strategies

To assess the relative merits of ANFIS against established machine learning baselines under the most demanding evaluation protocol available, all six models were trained and evaluated under each of the three strategies S1, S2, and S3 with identical preprocessing, identical hyperparameter selection, and the same random seed. Performance was assessed using four complementary metrics:

R^{2}

, RMSE, MAE, and NRMSE (RMSE normalized by the target range). The complete cross-strategy results are reported in Table 8 and visualized in Figure 10, Figure 11 and Figure 12.

The cross-strategy results yield four robust conclusions that the ANFIS is the only model whose RMSE remains below 700 W on any combination of strategy and evaluation set; the next-best model, DNN, attains a minimum RMSE of

\approx 1226

W on the S1 external hold-out and degrades to 2576 W under S2. However, chronological evaluation reveals overfitting that random splitting concealed. Where the SVM, Decision Tree, Random Forest, and KNN all degrade markedly when moving from S1 random to S2 chronological evaluation (SVM: 2225 W → 4332 W; DT: 7440 W → 8880 W; RF: 5745 W → 6880 W; KNN: 4325 W → 5365 W). This degradation, of order 1.5–2.0×, exposes a true sensitivity to the temporal structure of the test data that a single random split would have masked.

A model that owes its accuracy to leakage (e.g., trivial exploitation of the energy-balance identity

P_{FPP} = P_{Load} - P_{Grid}

) would exhibit identical performance on every split, since the identity is sample-local. The opposite is observed: ANFIS RMSE on the 14-day external hold-out (363 W under S1, 408 W under S2) is substantially lower than the training RMSE (≈1179 W) yet remains in the same order of magnitude as the in-strategy test RMSE. This pattern is incompatible with leakage and confirms that ANFIS learns a physically meaningful mapping rather than a trivial arithmetic shortcut. With only 32 learnable parameters (vs. thousands for DNN,

B = 100

deep trees for RF, etc.), ANFIS cannot memorize sample-level training noise. The compact 2-rule Sugeno structure captures the dominant near-linear input–output trend of the FPPT-controlled system, generalizes naturally to unseen days, and degrades gracefully when the evaluation distribution shifts.

The corresponding per-model diagnostic figures for SVM, Decision Tree, Random Forest, KNN, and DNN (see Appendix A for the detailed mathematical formulations) are provided as Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17 and confirm the quantitative findings of Table 8 visually.

6.3. Critical Evaluation of Model Validity

The consistently high performance metrics reported across both training and evaluation regimes (S1 random, S2 chronological, S3 external)—with

R^{2} \geq 0.9992

on every set—necessitate a rigorous and transparent critical examination. High predictive accuracy in data-driven models can arise mainly from two distinct sources: (i) overfitting to training data or (ii) data leakage. Each possibility is examined systematically below, and the new external 14-day hold-out (S3) provides definitive empirical confirmation that neither mechanism is at work.

6.3.1. Overfitting Assessment via Parametric Efficiency, Cross-Validation and the External Hold-Out

The ANFIS model converges to a compact architecture of

R = 2

fuzzy rules and 32 learnable parameters: 20 membership-function parameters

{c_{j, k}, σ_{j, k}}

and 12 consequent coefficients

{β_{j, k}}

(Section 5.4.2). The ratio of learnable parameters to training samples on the S1 random split,

ρ = \frac{32}{N_{train}^{S 1}} \approx \frac{32}{17, 300} \approx 1.85 \times 10^{- 3},

(22)

is several orders of magnitude lower than typical deep-learning architectures, and the corresponding ratio on the chronological split,

ρ^{S 2} = 32 / N_{train}^{S 2}

, lies in the same order of magnitude. This provides structural immunity to overfitting by design: a 32-parameter model cannot memorize the training samples and can only learn global or piecewise-global trends.

This structural argument is corroborated by three empirical lines of the obtained results.

(a) Train→test gap.

On the S1 random split, ANFIS achieves training RMSE

= 1240.7

W and test RMSE

= 653.6

W (

Δ RMSE = + 587.1

W,

Δ R^{2} = + 0.0019

; test better than training). On the S2 chronological split, training RMSE

= 1344.4

W and test RMSE

= 325.4

W (

Δ RMSE = + 1019.0

W,

Δ R^{2} = + 0.0033

). Both gaps are positive and large, which is the statistical opposite of overfitting and is mechanistically explained below. Among the five benchmark models, only the DNN under S1 reproduces this positive gap; all instance-based and tree-based learners (SVM, RF, KNN, DT) exhibit the expected negative gap, with severity increasing under the more demanding S2 chronological hold-out (Table 9).

(b) Five-fold stratified cross-validation. Independent of the single-split evaluation, five-fold cross-validation stratified by solar-irradiance quantile bin and performed on the S1 inner pool yields the results reported in Table 9: ANFIS attains mean CV

R^{2} = 0.9992 \pm 0.0002

and mean CV RMSE

= 646.5 \pm 31.4

W, with a

95 %

confidence interval

{CI}_{95 %} (RMSE) = [583.7, 709.3]

W. The fold-to-fold variance is the smallest of any model considered, approximately one-tenth of the Decision Tree’s fold variability (

312.4

W), confirming that the reported test performance is a stable and reproducible property of the model rather than an artefact of the particular day-based split assigned by the random seed rng(42).

(c) External 14-day hold-out (S3). The strongest piece of evidence is provided by the external 14-day hold-out, which is excised before any model training, tuning, or selection step. ANFIS achieves RMSE

= 363.45

W (

R^{2} = 0.9998

) when trained under S1 and RMSE

= 408.50

W (

R^{2} = 0.9997

) when trained under S2. By construction, no information flow from these days to the trained model is possible; hence, the persistence of high accuracy on S3 closes the leakage pathway empirically and not merely procedurally.

Table 9. Train–test performance gap on both the S1 random split and the S2 chronological split, with five-fold stratified cross-validation results on the S1 inner pool. Sign convention:

Δ R^{2} = R_{test}^{2} - R_{train}^{2}

and

Δ RMSE = {RMSE}_{train} - {RMSE}_{test}

, so positive values indicate that the test set is more accurately predicted than the training set.

Table 9. Train–test performance gap on both the S1 random split and the S2 chronological split, with five-fold stratified cross-validation results on the S1 inner pool. Sign convention:

Δ R^{2} = R_{test}^{2} - R_{train}^{2}

and

Δ RMSE = {RMSE}_{train} - {RMSE}_{test}

, so positive values indicate that the test set is more accurately predicted than the training set.

Model	S1—Random Split			S2—Chronological Split			5-Fold CV (S1 Pool)
Model	$Δ R^{2}$	$Δ RMSE$ (W)	Overfit.	$Δ R^{2}$	$Δ RMSE$ (W)	Overfit.	$R^{2}$ (Mean ± Std)	RMSE (W) (Mean ± Std)
ANFIS	$+ 0.0019$	$+ 587.1$	None	$+ 0.0033$	$+ 1019.0$	None	$0.9992 \pm 0.0002$	$646.5 \pm 31.4$
SVM	$- 0.0067$	$- 1070.2$	Minimal	$- 0.0263$	$- 3230.2$	Significant	$0.9951 \pm 0.0008$	$1521.3 \pm 87.6$
DNN	$+ 0.0058$	$+ 845.7$	None	$- 0.0006$	$- 304.5$	Minimal	$0.9939 \pm 0.0011$	$1687.2 \pm 94.1$
Random Forest	$- 0.0549$	$- 4035.7$	Severe	$- 0.0675$	$- 5327.9$	Severe	$0.9801 \pm 0.0031$	$3106.8 \pm 142.7$
KNN	$- 0.0308$	$- 2971.7$	Significant	$- 0.0409$	$- 4127.2$	Severe	$0.9643 \pm 0.0089$	$4218.5 \pm 198.3$
Decision Tree	$- 0.0802$	$- 4020.0$	Severe	$- 0.1037$	$- 5957.8$	Severe	$0.9412 \pm 0.0124$	$5319.7 \pm 312.4$

6.3.2. Data Leakage Analysis

Three potential leakage pathways are systematically addressed in the present study: temporal autocorrelation, normalization, and hyperparameter-selection leakage.

Photovoltaic power time series exhibit strong intra-day autocorrelation: A naive sample-level random split would allow the model to exploit morning irradiance patterns from training samples to predict afternoon power from the same day during testing. This pathway is closed by the day-based partitioning strategy of Section 5.1.2, in which all 5-min samples within a given calendar day are assigned exclusively to either the training or the testing partition, preventing intra-day autocorrelation from propagating predictive signal across the partition boundary. The strict chronological strategy (S2) and external hold-out (S3) further close this pathway by ensuring that every test sample lies later in calendar time than every training sample.

Normalization leakage is prevented by computing all scaling statistics (minimum, maximum, mean, standard deviation) exclusively from the training samples and applying them to the test set without recomputation, as formalized in Equations (9) and (11). ANFIS, Decision Tree, and Random Forest require no explicit normalization and are, therefore, immune to this pathway by construction.

Hyperparameter-selection leakage is excluded by conducting all model selection via 5-fold cross-validation applied exclusively to the training partition (Table 6); the test set was accessed only once—for final performance reporting—after all configurations were fixed. The ANFIS cluster influence radius was selected on the same basis, using the cross-validation RMSE pattern identified in Section 5.4.2 without any reference to the held-out partition. The external 14-day hold-out (S3) provides an even stronger guarantee: it is the very first thing the script removes from the dataset, before any preprocessing statistic, hyperparameter, or rule-count is computed. These four procedural safeguards collectively ensure that the reported test metrics reflect genuine out-of-sample generalization, free from any form of information leakage.

6.4. ANFIS Learned Parameters: Physical Interpretation

Before proceeding to the numerical analysis of Table 10 and Table 11, the structural limitations of the learned ANFIS architecture are stated openly. The 2-rule structure with both centers at

G \approx 0

W/m² is a faithful, data-driven reflection of the May–July dataset topology—PC1 captures 76.79 % of the total variance, VIF exceeds 15 for four of the five input features, and approximately 75 % of the 24,479 samples have

G < 95

W/m²—rather than an algorithmic failure. It implies, however, that the two learned rules do not correspond to two physically distinct, well-separated operating regimes: they encode two complementary linear approximators whose weighted superposition reproduces a globally near-linear input–output relationship, with the irradiance antecedents degenerating to near-crisp discriminators (effective widths below 0.12 W/m²) and the modeling burden borne by the first-order Sugeno consequents. The interpretability claim of this paper is, therefore, restricted to (i) coefficient-level transparency—every one of the 32 parameters can be inspected, their signs and magnitudes verified against photovoltaic physics, and the compact piecewise-linear structure embedded in supervisory control logic—and (ii) directional physical consistency—the dominant consequent coefficients reproduce the qualitative

+ G

,

- T_{PV}

,

+ P_{load}

dependencies expected from Equations (1)–(6). Classical fuzzy-linguistic interpretability in the sense of physically separated regime-specific rules is not claimed; it is identified as a future work item contingent on the 12-month multi-season dataset of Section 7. As a transparency anchor, a 6-parameter Ordinary Least Squares baseline attains

R^{2} = 0.9991

and RMSE = 695.3 W on the S1 test partition, within 6 % of ANFIS in RMSE and indistinguishable in

R^{2}

, confirming that the dominant FPP–input relationship is near-linear and that ANFIS contributes a statistically measurable but quantitatively modest non-linear correction. The following analysis connects the numerical values of both parameter classes to the physical operating principles of the FPPT-governed installation, while maintaining this structural context throughout.

6.4.1. Membership Function Visualization and Analysis

Figure 18 presents the learned Gaussian membership functions for all five input variables across both fuzzy rules, directly corroborating the numerical values of Table 10. The five panels reveal markedly different structural patterns across input variables, reflecting the heterogeneous roles of each variable in the fuzzy inference process.

Solar irradiance (G)

The irradiance panel of Figure 18 (bottom right) displays the most distinctive pattern: two extremely narrow, near-disjoint spikes centered at

c_{R 1} = 86.8

W/m² and

c_{R 2} = 83.7

W/m², with effective widths of approximately

0.10

–

0.11

W/m². These near-zero widths are a direct consequence of the zero-irradiance cluster initialization: because both rule centers were positioned at

G \approx 0

W/m² by subtractive clustering on the predominantly idle-state dataset, gradient descent during the backward pass converged these centers toward the low-irradiance boundary of the observed data range rather than toward distinct active-generation regimes. Functionally, the near-zero widths transform the irradiance antecedent into an approximate binary switch near the dataset median (95 W/m², Table 3), such that the product firing strengths

w_{j} (x_{i})

(Equation (17)) are effectively suppressed for samples deviating from this narrow boundary. The modeling burden, therefore, falls on the first-order Sugeno consequents

f_{j} (x_{i})

(Equation (19)), which function as globally linear regressors across the full observed irradiance range of 0–1206 W/m². The negative width

σ_{R 2} = - 0.11

W/m² is a gradient descent artifact arising from the absence of a non-negativity constraint on

σ_{j, k}

during the backward pass (Equation (21)); since

σ_{j, k}

appears squared in the Gaussian exponent of Equation (15), the membership function value is identical for

+ 0.11

and

- 0.11

, rendering this sign difference as physically inconsequential, as confirmed by the visually symmetric spike in Figure 18.

Grid power ( $P_{grid}$ ) and load power ( $P_{load}$ )

The grid and load power panels (Figure 18, top left and bottom left) display broad, bell-shaped Gaussian membership functions centered at

c_{j, P_{grid}} = 26, 923.1

W and

c_{j, P_{load}} = 29, 719.3

W for both rules. The shared centers across rules for each variable reflect the strong mutual correlation

r (P_{grid}, P_{load}) \approx 0.95

(Figure 3), which causes the clustering algorithm to identify nearly coincident density centroids for these two co-varying inputs. A clear width difference distinguishes the two rules: Rule 1 (blue,

σ = 25, 500

W) extends broadly across the full observed range of both variables, while Rule 2 (orange,

σ = 13, 750

W) provides more selective activation near the central operating point of approximately 27–30 kW. The identical proportional width ratio

σ_{R 1} / σ_{R 2} \approx 1.85

for both variables confirms that the clustering algorithm assigns structurally proportional membership functions to strongly co-varying inputs. The broad overlap between Rule 1 and Rule 2 membership functions in both panels indicates a soft, gradual fuzzy partition, enabling the ANFIS model to interpolate smoothly between the two consequent linear models across intermediate load–grid operating states.

Module temperature ( $T_{PV}$ ) and ambient temperature ( $T_{amb}$ )

The temperature panels (Figure 18, top center and top right) show broad, heavily overlapping Gaussian membership functions for both rules. For module temperature, centers fall at

c_{R 1} =

5.5 °C and

c_{R 2} = 6.4

°C—both below the observed dataset minimum of

9.5

°C—with widths

σ_{R 1} =

18.54 °C and

σ_{R 2} =

16.50 °C. For ambient temperature, centers are

c_{R 1} =

4.8 °C and

c_{R 2} =

3.2 °C, below the observed minimum of

12.7

°C, with widths

σ_{R 1} =

22.24 °C and

σ_{R 2} =

19.83 °C. The extrapolatory center placement below the observed data range reflects the unconstrained gradient descent optimization phase converging toward the low-temperature tail of the zero-irradiance cluster distribution. Despite this extrapolation, the resulting membership functions produce monotonically decreasing membership degrees across the entire observed operating ranges

[9.5, 69.1]

°C for

T_{PV}

and

[12.7, 39.6]

°C for

T_{amb}

, assigning progressively lower activation to high-temperature states. This behavior is directionally consistent with the thermal derating effect quantified by the power temperature coefficient

γ = - 0.35

%/°C (Equation (5)), under which elevated module temperatures reduce the maximum available power. The near-complete overlap between Rule 1 and Rule 2 temperature membership functions—more pronounced than for the power variables—reflects the close proximity of the two rule centers relative to the membership function widths and indicates that temperature alone provides limited discriminating power between the two fuzzy regimes. The broader widths assigned to ambient temperature relative to the module temperature in both rules are consistent with the weaker Pearson correlation

r (T_{amb}, P_{FPP}) \approx 0.62

compared with

r (T_{PV}, P_{FPP}) \approx 0.77

(Figure 3), suggesting that the hybrid learning algorithm allocates proportionally less discriminating resolution to inputs with weaker marginal predictive relevance.

6.4.2. Consequent Coefficient Analysis and Physical Consistency

The 12 first-order Sugeno consequent coefficients reported in Table 11 encode two complementary linear models whose normalized weighted superposition (Equation (20)) approximates the nonlinear FPP response across the full operating envelope of the 117.76 kWp installation. Because the irradiance antecedent membership functions are near-degenerate (Section 6.4.1), these consequent functions operate primarily as globally weighted linear regressors rather than as locally active rule-based approximators. The sign, magnitude, and physical consistency of each coefficient pair are interpreted below in this context, with the understanding that opposing signs across Rule 1 and Rule 2 for the same variable reflect the weighted compensation required for accurate global aggregation under near-equal firing strengths, rather than unambiguously distinct local physical regimes.

Solar irradiance carries the largest absolute coefficients across both rules:

β_{R 1, G} = - 542.581

and

β_{R 2, G} = + 323.857

. The positive Rule 2 coefficient is directionally consistent with the fundamental photovoltaic relationship in which increasing irradiance drives higher photocurrent (Equation (1)) and consequently higher FPP under FPPT control, in alignment with

r (G, P_{FPP}) \approx 0.82

(Figure 3). The negative Rule 1 coefficient provides the compensatory correction required for accurate weighted aggregation: since the two consequent functions are combined as a normalized weighted sum (Equation (20)), and both firing strengths remain finite across the irradiance range, the negative

β_{R 1, G}

partially offsets the positive

β_{R 2, G}

to encode the net irradiance–temperature coupling—specifically, the tendency for high-irradiance episodes to be accompanied by elevated module temperatures that partially offset raw photocurrent gains through thermal derating.

Module temperature coefficients are opposite in sign (

β_{R 1, T_{PV}} = + 46.377

;

β_{R 2, T_{PV}} = - 29.210

). The negative Rule 2 coefficient is directionally consistent with the thermal derating effect (

γ = - 0.35

%/°C, Equation (5)), under which rising module temperature reduces the maximum power point. The positive Rule 1 coefficient provides the cross-rule compensation necessary for the weighted aggregation to reproduce the net temperature effect accurately across the full operating range. The coefficient magnitude ratio

| β_{R 1, T_{PV}} | / | β_{R 2, T_{PV}} | \approx 1.59

is numerically close to the irradiance ratio

| β_{R 1, G} | / | β_{R 2, G} | \approx 1.68

, suggesting a structurally proportional inter-rule compensation mechanism shared by the two dominant environmental drivers.

Ambient temperature coefficients are both small and positive (

β_{R 1, T_{amb}} = + 0.985

;

β_{R 2, T_{amb}} = + 0.787

), an order of magnitude below the module temperature coefficients and nearly identical across rules. This near-invariance is consistent with the secondary and indirect influence of

T_{amb}

on

P_{FPP}

once irradiance and module temperature are jointly accounted for, and aligns with the weaker observed correlation

r (T_{amb}, P_{FPP}) \approx 0.62

(Figure 3).

Grid power coefficients are opposite in sign (

β_{R 1, P_{grid}} = - 26.154

;

β_{R 2, P_{grid}} = + 36.841

). These values are interpretable through the FPPT energy balance

P_{Grid} = P_{Load} - P_{FPP}

(Equation (6)): the negative Rule 1 coefficient reflects the inverse relationship between grid import and PV contribution in active generation states, while the positive Rule 2 coefficient is consistent with transitional operating conditions in which both

P_{grid}

and

P_{FPP}

increase concurrently with rising load demand during FPPT ramp-up.

Load power coefficients are both positive (

β_{R 1, P_{load}} = + 1.797

;

β_{R 2, P_{load}} = + 13.602

), reflecting the zero-export self-consumption FPPT strategy under which the system targets

P_{FPP} \approx P_{Load}

when irradiance is sufficient (Equation (6)). The substantially larger Rule 2 coefficient suggests that load power is more influential in the transitional generation regime, where the FPPT controller actively tracks load demand with the available solar resource, while irradiance and module temperature are the dominant drivers in the higher-generation regime captured by Rule 1. It should be noted, however, that the strong co-determination among

P_{grid}

,

P_{load}

, and

P_{FPP}

imposed by the energy balance identity (Equation (6)) creates near-linear dependence among these three variables, which limits the uniqueness of their individual consequent coefficients; the interpretations offered here should accordingly be understood as directionally consistent with the physics rather than as uniquely identified causal estimates.

Taken together, the learned membership function parameters and consequent coefficients indicate that the trained ANFIS model has organized the FPPT-governed system’s operating envelope into two complementary linear approximations whose weighted superposition achieves globally nonlinear behavior. Rule 1, characterized by broader membership functions for grid, load power, and a negative irradiance consequent, encodes the compensatory dynamics of active generation states—correcting for irradiance–temperature coupling and the inverse grid–PV relationship imposed by the zero-export strategy. Rule 2, with narrower and more selective power membership functions and a positive irradiance consequent, captures the primary irradiance-driven generation-tracking behavior and the load-following characteristics of the FPPT controller. Both rules are anchored in the low-irradiance region of the input space (

G \approx 0

W/m²), and their consequent functions extend linearly across the full irradiance range. The normalized weighted aggregation of these two complementary models achieves

R^{2} \geq 0.9992

on every evaluation set considered (S1 random test, S2 chronological test and the strictly external 14-day hold-out, with RMSEs of 654, 325, 363 and 408 W, respectively) with only 32 learnable parameters, a result that reflects the parametric efficiency of the neuro-fuzzy paradigm and the predominantly linear structure of the underlying input–output relationship, while preserving the inspectability of the learned mapping through transparent Gaussian membership functions and first-order Sugeno consequent functions that can be directly examined and validated against photovoltaic physics.

7. Conclusions

This study develops and validates an Adaptive Neuro-Fuzzy Inference System (ANFIS) for predicting the FPP in a 117.76 kWp grid-connected rooftop photovoltaic plant operating under a zero-export strategy in northwestern Algeria. Using only 32 learnable parameters organized into two fuzzy rules, the model achieves

R^{2} \geq 0.9992

and RMSE values between 325 W and 654 W on all three validation strategies considered: random day-based (S1), strictly chronological (S2), and an external 14-day hold-out (S3). To the best of the authors’ knowledge, this is the first application of neuro-fuzzy methods to FPPT-governed PV systems, where curtailment and ancillary-service provision are increasingly required by modern grid codes.

When benchmarked against five established machine learning paradigms—Support Vector Machine, Decision Tree, Random Forest, k-Nearest Neighbors, and Deep Neural Network—ANFIS is the only model that maintains sub-700 W RMSE on every split, while every benchmark degrades by a factor of 1.5–2.0× when the evaluation protocol shifts from random S1 to chronological S2 or external S3. Most importantly, ANFIS achieves RMSE values of 363 W and 408 W on the external 14-day hold-out, well below the training RMSE of ≈1179 W on data the model never accessed during training, tuning, or hyperparameter selection. This result definitively rules out data leakage as a source of the reported accuracy.

The trained model reveals a fundamentally near-linear FPP–input relationship, as confirmed by an Ordinary Least Squares baseline that attains

R^{2} = 0.9991

with only six parameters. ANFIS, therefore, operates as a compact piecewise-linear regressor whose 32 parameters can be directly inspected and verified against photovoltaic physics, making it well suited for integration into supervisory control and grid-compliance frameworks where transparency is a prerequisite for operational deployment.

Four limitations bound the scope of these findings. First, temporal coverage is restricted to a single three-month summer period (May–July 2025; 85 retained calendar days), and robustness under winter and inter-annual conditions remains untested. Second, geographic scope is limited to a single site, so transferability to other climates, array configurations, and FPPT modes requires dedicated experimental validation. Third, the model has not yet been demonstrated in closed-loop control, where predictions drive real-time curtailment decisions. Fourth, formal cross-model statistical hypothesis tests (e.g., Friedman–Nemenyi) and multi-seed aggregation would further strengthen the comparative claims, although the convergence of S1, S2, S3 and five-fold cross-validation results around the same numerical envelope already provides multi-criterion evidence of stability.

Future work will prioritize extending data collection to a full annual cycle, validating the model across multiple sites and climatic zones, integrating ANFIS-driven set-point tracking with the facility’s SCADA system, embedding physics-informed constraints into the training loss, and exploring the learned rule structure as a basis for residual-driven fault detection and predictive maintenance. The convergence of high predictive accuracy, operational interpretability, and computational efficiency demonstrated here positions neuro-fuzzy inference systems as a strong candidate technology for next-generation smart energy management in grid-connected industrial and institutional buildings operating under modern grid-code requirements.

Author Contributions

Conceptualization, Y.B. and A.S.; methodology, Y.B. and A.S.; software, Y.B.; validation, Y.B., A.S. and A.A.T.; formal analysis, Y.B. and A.S.; investigation, Y.B., A.M. and F.K.; resources, A.C. and A.R.; data curation, Y.B. and I.M.M.; writing—original draft preparation, Y.B.; writing—review and editing, A.S., A.A.T., A.M., F.K., I.M.M., A.C. and A.R.; visualization, Y.B.; supervision, A.S. and A.C.; project administration, A.S.; funding acquisition, A.S. and A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Power Systems and Photovoltaic Technology
PV	Photovoltaic
MPPT	Maximum Power Point Tracking
MPP	Maximum Power Point
FPPT	Flexible Power Point Tracking
FPP	Flexible Power Point
CPG	Constant Power Generation
DC	Direct Current
AC	Alternating Current
STC	Standard Test Conditions
NOCT	Normal Operating Cell Temperature
kWp	Kilowatt-peak
kVA	Kilovolt-Ampere
kWc	Kilowatt-crête
PSC	Partial Shading Conditions
Machine Learning and Computational Methods
ML	Machine Learning
DL	Deep Learning
ANFIS	Adaptive Neuro-Fuzzy Inference System
SVM	Support Vector Machine
SVR	Support Vector Regression
DT	Decision Tree
RF	Random Forest
KNN	K-Nearest Neighbors
DNN	Deep Neural Network
CNN	Convolutional Neural Network
TCN	Temporal Convolutional Network
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
ANN	Artificial Neural Network
MLR	Multiple Linear Regression
GEP	Gene Expression Programming
PSO	Particle Swarm Optimization
GA	Genetic Algorithm
ABC	Artificial Bee Colony
ACO	Ant Colony Optimization
RBF	Radial Basis Function
ReLU	Rectified Linear Unit
LSE	Least-Squares Estimation
SGD	Stochastic Gradient Descent
Statistical and Mathematical Metrics
RMSE	Root Mean Square Error
MAE	Mean Absolute Error
NRMSE	Normalized Root Mean Square Error
$R^{2}$	Coefficient of Determination
MSE	Mean Squared Error
OOB	Out-of-Bag
CV	Cross-Validation
MF	Membership Function
OLS	Ordinary Least Squares
VIF	Variance Inflation Factor
PCA	Principal Component Analysis
CI	Confidence Interval
S1	Validation strategy 1: random day-based split
S2	Validation strategy 2: chronological 70/15/15% split
S3	Validation strategy 3: external 14-day hold-out
Electrical and Physical Parameters
$V_{o c}$	Open Circuit Voltage
$V_{m p}$	Maximum Power Voltage
$I_{s c}$	Short Circuit Current
$I_{m p}$	Maximum Power Current
$P_{s t c}$	Rated Power at Standard Test Conditions
$P_{M P P}$	Maximum Power Point Power
$P_{F P P}$	Flexible Power Point Power
$P_{g r i d}$	Grid Power
$P_{l o a d}$	Load Power
$T_{P V}$	Module Temperature
$T_{a m b}$	Ambient Temperature
G	Solar Irradiance
$G_{S T C}$	Irradiance at Standard Test Conditions
$I_{M P P}$	Current at Maximum Power Point
$V_{M P P}$	Voltage at Maximum Power Point
$I_{S T C}$	Reference Current at Standard Test Conditions
$V_{S T C}$	Reference Voltage at Standard Test Conditions
$I_{p h}$	Light-Generated Current
$I_{0}$	Diode Saturation Current
$α$	Temperature Coefficient of Current
$β$	Temperature Coefficient of Voltage
$γ$	Power Temperature Coefficient
n	Diode Ideality Factor
q	Elementary Electric Charge
k	Boltzmann Constant
CT	Current Transformer
Communication and Monitoring Systems
RS485	Recommended Standard 485 (Serial Communication Protocol)
DTSD	Three-Phase Electronic Energy Meter (Model Designation)
SCADA	Supervisory Control and Data Acquisition
NWP	Numerical Weather Prediction
Energy and Grid Management
NaN	Not a Number
rng	Random Number Generator
MV	Medium Voltage
kV	Kilovolt
AM	Air Mass
PLL	Phase-Locked Loop
PI	Proportional–Integral (controller)
FRT	Fault Ride-Through
PCC	Point of Common Coupling
ANFIS Architecture Specific
FC	Fully Connected Layer
$Θ$	ANFIS Learnable Parameter Set
$θ_{M F}$	Membership Function Parameter Subset
$θ_{Cons}$	Consequent Parameter Subset

Appendix A. Mathematical Formulations of Benchmark Models

Appendix A.1. Support Vector Machine (SVM)

Support Vector Regression extends the Support Vector Machine framework to continuous output prediction. SVM solves a constrained quadratic programming problem that seeks to find an optimal hyperplane in a transformed feature space [8]. The elegance of SVM lies in its kernel trick, which enables implicit nonlinear feature mapping without explicit computation of the transformed feature space.

The SVM approach is grounded in the principle of structural risk minimization, balancing model complexity with training error [47]. By employing the kernel trick, SVM can operate efficiently in high-dimensional or even infinite-dimensional spaces while maintaining computational tractability. This property makes SVM particularly effective for nonlinear regression tasks [49].

The detailed algorithm is shown in Figure A1.

Figure A1. Support Vector Machine: Implicit feature transformation via RBF kernel.

Appendix A.1.1. Primal Optimization Problem

Support Vector Regression minimizes the following objective for input vectors

x_{i}

and targets

y_{i}

:

min_{w, b, ξ} \frac{1}{2} {∥ w ∥}^{2} + C \sum_{i} (ξ_{i} + ξ_{i}^{*})

(A1)

Subject to the constraints:

\begin{matrix} y_{i} - (w^{T} ϕ (x_{i}) + b) & \leq ϵ + ξ_{i} \end{matrix}

(A2)

\begin{matrix} (w^{T} ϕ (x_{i}) + b) - y_{i} & \leq ϵ + ξ_{i}^{*} \end{matrix}

(A3)

The parameters are defined as [32]:

$w$ : weight vector in the transformed feature space
b: bias term
$ϕ (x_{i})$ : nonlinear feature transformation function
$ϵ$ : insensitivity zone parameter
C: regularization parameter
$ξ_{i}, ξ_{i}^{*}$ : slack variables

Appendix A.1.2. Dual Optimization Problem

The dual optimization problem is expressed as follows:

min_{α, α^{*}} \frac{1}{2} {(α - α^{*})}^{T} K (α - α^{*}) + ϵ \sum_{i} (α_{i} + α_{i}^{*}) - \sum_{i} y_{i} (α_{i} - α_{i}^{*})

(A4)

Subject to:

\sum_{i} (α_{i} - α_{i}^{*}) = 0, 0 \leq α_{i}, α_{i}^{*} \leq C

(A5)

Appendix A.1.3. Decision Function

The decision function for prediction given input

x_{i}

is as follows:

{\hat{y}}_{i}^{SVM} = \sum_{l} (α_{l} - α_{l}^{*}) K (x_{l}, x_{i}) + b

(A6)

Appendix A.1.4. Radial Basis Function Kernel

The Radial Basis Function kernel provides nonlinear feature mapping between any two input vectors

x_{i}

and

x_{j}

:

K (x_{i}, x_{j}) = exp (- γ ∥ x_{i} - x_{j} ∥^{2})

(A7)

Expanded for the five-dimensional input space:

\begin{matrix} K (x_{i}, x_{j}) & = exp (- γ [{(P_{grid, i} - P_{grid, j})}^{2} + {(T_{PV, i} - T_{PV, j})}^{2} \\ + {(T_{amb, i} - T_{amb, j})}^{2} + {(P_{load, i} - P_{load, j})}^{2} + {(G_{i} - G_{j})}^{2}]) \end{matrix}

(A8)

Appendix A.1.5. Loss Function

The

ϵ

-insensitive loss function is as follows:

L_{ϵ} (y_{i}, {\hat{y}}_{i}^{SVM}) = max (0, | y_{i} - {\hat{y}}_{i}^{SVM} | - ϵ)

(A9)

Appendix A.2. Decision Tree Regressor

Decision trees provide a recursive partitioning approach to regression. The algorithm iteratively divides the input space into axis-aligned regions, each associated with a constant prediction value [47]. Decision trees offer interpretability through explicit if–then–else decision rules visible in the tree structure. However, they are susceptible to overfitting when grown without constraints [41].

The decision tree algorithm employs a greedy approach: at each node, it searches for the split that provides maximum improvement in the impurity measure. For regression tasks, mean squared error serves as the impurity criterion. The recursive binary partitioning with decision nodes and prediction leaves is shown in Figure A2.

Figure A2. Decision Tree: Recursive binary partitioning with decision nodes and prediction leaves.

Appendix A.2.1. Tree Structure and Splitting Criterion

A regression tree T consists of binary partitioning on the input space. At each node t, the algorithm searches for an optimal split on one of the five input dimensions at threshold

θ

:

Split on input k : {(x_{i}, y_{i}) : x_{k, i} \leq θ} \to t_{L} | {(x_{i}, y_{i}) : x_{k, i} > θ} \to t_{R}

(A10)

where

k \in {1, 2, 3, 4, 5}

corresponds to the five input variables.

Appendix A.2.2. Impurity Measure

For regression, the impurity at node t is measured by mean squared error:

MSE (t) = \frac{1}{n_{t}} \sum_{i \in t} {(y_{i} - {\bar{y}}_{t})}^{2}

(A11)

Appendix A.2.3. Split Quality

The improvement achieved by a split is quantified as follows:

Δ MSE = MSE (t) - \frac{n_{L}}{n_{t}} MSE (t_{L}) - \frac{n_{R}}{n_{t}} MSE (t_{R})

(A12)

The optimal split maximizes this reduction:

(k^{*}, θ^{*}) = arg max_{k \in {1, \dots, 5}, θ} Δ MSE

(A13)

Appendix A.2.4. Prediction

For a test sample

x_{i}

, the model traverses the tree following the splitting rules until reaching a leaf node [41,47]. The prediction is the mean of all training samples in that leaf:

{\hat{y}}_{i}^{Tree} = \frac{1}{| L (x_{i}) |} \sum_{j \in L (x_{i})} y_{j}

(A14)

Appendix A.2.5. Training Loss

The training loss represents the mean squared error across the training set [41]:

J_{Tree} = \frac{1}{n} \sum_{i} {(y_{i} - {\hat{y}}_{i}^{Tree})}^{2}

(A15)

Appendix A.3. Random Forest

Random Forest represents an ensemble learning technique that aggregates predictions from multiple decision trees [48]. By training numerous trees on random subsets of data (bootstrap samples) and averaging their predictions, Random Forest effectively reduces variance while maintaining low bias [41]. This ensemble approach alleviates the overfitting tendency of individual decision trees.

The strength of Random Forest derives from two sources of randomness: random subsampling of training data (bagging) and random feature selection at each split [48]. This diversity among ensemble members leads to improved generalization performance. The architecture with multiple trees and ensemble averaging is presented in Figure A3.

Figure A3. Random Forest: Bagging with multiple trees and ensemble averaging.

Appendix A.3.1. Bootstrap Sampling

For each tree

b = 1, \dots, B

, a bootstrap sample is drawn with replacement from the original data [41]:

D_{b} = {(x_{i_{1}}, y_{i_{1}}), \dots, (x_{i_{n}}, y_{i_{n}})} where i_{j} \sim Uniform [n]

(A16)

Appendix A.3.2. Tree Construction

Each tree

T_{b}

is trained to full depth on the bootstrap sample [47]:

T_{b} = GROW_TREE (D_{b}, x_{i})

(A17)

At each internal node during tree construction, a random subset of the five features is considered [47]:

S \subset {1, 2, 3, 4, 5}, | S | = m_{try}

(A18)

The optimal split is searched only within this random subset:

(k^{*}, θ^{*}) = arg max_{k \in S, θ} Δ MSE (x_{i})

(A19)

Appendix A.3.3. Ensemble Prediction

The final Random Forest prediction is the average of all tree predictions:

{\hat{y}}_{i}^{RF} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{y}}_{i}^{(b)} (x_{i})

(A20)

where

{\hat{y}}_{i}^{(b)} (x_{i})

is the prediction from tree b on input

x_{i}

[47].

Appendix A.3.4. Variance Reduction

The variance of the ensemble is reduced through averaging:

Var ({\hat{y}}_{i}^{RF}) = \frac{1}{B^{2}} \sum_{b = 1}^{B} Var ({\hat{y}}_{i}^{(b)}) + \frac{2}{B^{2}} \sum_{b < b^{'}} Cov ({\hat{y}}_{i}^{(b)}, {\hat{y}}_{i}^{(b^{'})})

(A21)

Appendix A.3.5. Out-of-Bag Estimation

For sample i, the Out-of-Bag (OOB) estimate uses only trees where i was not included in the bootstrap sample [41,47]:

{\hat{y}}_{i, OOB}^{RF} = \frac{1}{B_{i}} \sum_{b : i \notin D_{b}} {\hat{y}}_{i}^{(b)} (x_{i})

(A22)

OOB error provides an unbiased generalization estimate:

OOB Error = \frac{1}{n} \sum_{i} {(y_{i} - {\hat{y}}_{i, OOB}^{RF})}^{2}

(A23)

Appendix A.3.6. Feature Importance

Feature importance quantifies the average reduction in impurity achieved when splitting on each of the five input dimensions:

I_{k} = \frac{1}{B} \sum_{b = 1}^{B} \sum_{t \in T_{b}} Δ {MSE}_{t, k} (x_{i}) \cdot I (t splits on input k)

(A24)

for

k \in {1, 2, 3, 4, 5}

.

Appendix A.3.7. Ensemble Loss

The ensemble loss is as follows:

J_{RF} = \frac{1}{n} \sum_{i} {(y_{i} - {\hat{y}}_{i}^{RF})}^{2}

(A25)

Appendix A.4. K-Nearest Neighbors (KNN)

K-Nearest Neighbors represents a non-parametric, instance-based learning algorithm. Rather than learning explicit model parameters, KNN stores the training dataset and makes predictions based on the similarity of test samples to training examples [47]. The prediction for a test sample is the average of the target values of its k nearest neighbors in the training set [49].

KNN operates under the assumption that samples with similar feature values possess similar target values. This local averaging approach provides flexibility in modeling complex nonlinear relationships without explicit parameterization. The detailed architecture is presented in Figure A4.

Figure A4. K-Nearest Neighbors: Instance-based learning with distance computation and neighbor averaging.

Appendix A.4.1. Distance Computation

The Euclidean distance between a test sample

x_{i}

and a training sample

x_{j}

is [47]:

d (x_{i}, x_{j}) = \sqrt{\sum_{k = 1}^{5} {(x_{k, i} - x_{k, j})}^{2}}

(A26)

Explicitly defined as follows:

\begin{matrix} d (x_{i}, x_{j}) & = [{(P_{grid, i} - P_{grid, j})}^{2} + {(T_{PV, i} - T_{PV, j})}^{2} \\ + {(T_{amb, i} - T_{amb, j})}^{2} + {(P_{load, i} - P_{load, j})}^{2} + {(G_{i} - G_{j})}^{2}]^{1 / 2} \end{matrix}

(A27)

Appendix A.4.2. Standardized Distance

After standardization (zero mean, unit variance), the distance becomes [47]:

d_{std} (x_{i}, x_{j}) = \sqrt{\sum_{k = 1}^{5} \frac{{(x_{k, i} - x_{k, j})}^{2}}{σ_{k}^{2}}}

(A28)

Appendix A.4.3. Neighbor Selection

For a test sample

x_{i}

, the set of k nearest neighbors from the training set is [49]:

N_{k} (x_{i}) = {x_{j_{1}}, \dots, x_{j_{k}} : d (x_{i}, x_{j_{m}}) are k smallest distances}

(A29)

Appendix A.4.4. Uniform-Weighted Prediction

The KNN prediction is computed as the arithmetic mean of target values of neighbors [49]:

{\hat{y}}_{i}^{KNN} = \frac{1}{k} \sum_{j \in indices (N_{k} (x_{i}))} y_{j}

(A30)

Appendix A.4.5. Distance-Weighted Prediction

Distance-based weighting assigns higher influence to closer neighbors [49]:

{\hat{y}}_{i}^{KNN} = \frac{\sum_{j \in indices (N_{k})} w_{i j} y_{j}}{\sum_{j \in indices (N_{k})} w_{i j}} where w_{i j} = \frac{1}{d (x_{i}, x_{j}) + δ}

(A31)

Appendix A.4.6. KNN Algorithm

1.: Standardization: Normalize each of the five features to zero mean and unit variance
2.: Distance Computation: Calculate distances from test sample to all training samples

$d_{i} = [d (x_{i}, x_{1}), \dots, d (x_{i}, x_{n})]$

(A32)
3.: Neighbor Selection: Identify indices of k smallest distances
4.: Prediction: Compute average of neighbor target values

${\hat{y}}_{i}^{KNN} = \frac{1}{k} \sum_{j = 1}^{k} y_{idx [j]}$

(A33)

Appendix A.4.7. Loss Function

Since KNN lacks an explicit training phase, loss is evaluated only on predictions:

J_{KNN} = \frac{1}{n} \sum_{i} {(y_{i} - {\hat{y}}_{i}^{KNN})}^{2}

(A34)

Appendix A.5. Deep Neural Network (DNN)

Deep Neural Networks represent a powerful class of nonlinear function approximators composed of multiple layers of interconnected neurons. Each neuron computes a weighted sum of inputs followed by a nonlinear activation function [44]. The hierarchical composition of these transformations enables DNNs to learn increasingly abstract representations, making them effective for complex regression tasks [45].

The proposed architecture consists of fully connected (dense) layers with ReLU activation functions and dropout regularization. Dropout stochastically deactivates neurons during training, reducing co-adaptation and improving generalization [49]. The network is trained using the ADAM optimization algorithm, which maintains adaptive learning rates for each parameter [47]. The architecture is shown in Figure A5.

Figure A5. Deep Neural Network: Hierarchical layers with ReLU activation and dropout regularization.

Appendix A.5.1. Network Architecture

The network structure processes the five-dimensional input

x_{i}

through successive transformations [49]:

x_{i} \to FC (h_{1}) \to σ \to Dropout \to FC (h_{2}) \to σ \to Dropout \to FC (h_{3}) \to σ \to FC (1) \to {\hat{y}}_{i}^{DNN}

(A35)

Appendix A.5.2. Fully Connected (Dense) Layers

For the first layer processing input

x_{i}

:

z_{i}^{(1)} = W^{(1)} x_{i} + b^{(1)}

(A36)

where

W^{(1)} \in R^{h_{1} \times 5}

is the weight matrix and

b^{(1)} \in R^{h_{1}}

is the bias vector.

For subsequent hidden layers:

z_{i}^{(l)} = W^{(l)} a_{i}^{(l - 1)} + b^{(l)}

(A37)

Appendix A.5.3. ReLU Activation Function

The Rectified Linear Unit (ReLU) activation function applied to layer l is [47]:

a_{i}^{(l)} = ReLU (z_{i}^{(l)}) = max (0, z_{i}^{(l)})

(A38)

Element-wise definition:

{[ReLU (z)]}_{m} = \{\begin{matrix} z_{m} & if z_{m} > 0 \\ 0 & if z_{m} \leq 0 \end{matrix}

(A39)

Appendix A.5.4. Dropout Regularization

Dropout reduces overfitting by stochastically deactivating neurons. During training on sample

x_{i}

, each neuron in layer l is deactivated with probability p [44]:

a_{dropout, i}^{(l)} = a_{i}^{(l)} ⊙ m^{(l)}

(A40)

where

m^{(l)} \sim Bernoulli (1 - p)

is a binary mask, and with scaling:

a_{dropout, i}^{(l)} = \frac{a_{i}^{(l)} ⊙ m^{(l)}}{1 - p}

(A41)

Appendix A.5.5. Input Normalization

The five-dimensional input is scaled to the

[0, 1]

range [44]:

x_{i}^{'} = [\begin{matrix} \frac{P_{grid, i} - P_{grid, min}}{P_{grid, max} - P_{grid, min}} \\ \frac{T_{PV, i} - T_{PV, min}}{T_{PV, max} - T_{PV, min}} \\ \frac{T_{amb, i} - T_{amb, min}}{T_{amb, max} - T_{amb, min}} \\ \frac{P_{load, i} - P_{load, min}}{P_{load, max} - P_{load, min}} \\ \frac{G_{i} - G_{min}}{G_{max} - G_{min}} \end{matrix}]

(A42)

Appendix A.5.6. Target Normalization

The target output is similarly normalized:

y_{i}^{'} = \frac{y_{i} - y_{min}}{y_{max} - y_{min}}

(A43)

Appendix A.5.7. Output Layer

The output layer produces normalized predictions [44]:

{\hat{y}}_{i}^{'} = W^{(out)} a_{i}^{(L)} + b^{(out)}

(A44)

Appendix A.5.8. Inverse Scaling

Inverse scaling reconstructs original units [44]:

{\hat{y}}_{i}^{DNN} = {\hat{y}}_{i}^{'} \times (y_{max} - y_{min}) + y_{min}

(A45)

Appendix A.5.9. Loss Function

The mean squared error loss during training is [44]:

J (Θ) = \frac{1}{m} \sum_{i = 1}^{m} {({\hat{y}}_{i}^{'} - y_{i}^{'})}^{2}

(A46)

where m denotes the mini-batch size.

Appendix A.5.10. Backpropagation

For the final layer [44]:

\frac{\partial J}{\partial W^{(out)}} = ({\hat{y}}_{i}^{'} - y_{i}^{'}) {(a_{i}^{(L)})}^{T}

(A47)

Gradients propagate backward [44]:

\frac{\partial J}{\partial a_{i}^{(l)}} = {(W^{(l + 1)})}^{T} \frac{\partial J}{\partial z_{i}^{(l + 1)}}

(A48)

Appendix A.5.11. Adam Optimizer

For parameter

θ

at step t:

m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) \nabla_{θ} J_{t}

(A49)

v_{t} = β_{2} v_{t - 1} + (1 - β_{2}) {(\nabla_{θ} J_{t})}^{2}

(A50)

Bias-corrected estimates:

{\hat{m}}_{t} = \frac{m_{t}}{1 - β_{1}^{t}}, {\hat{v}}_{t} = \frac{v_{t}}{1 - β_{2}^{t}}

(A51)

Parameter update:

θ_{t + 1} = θ_{t} - η \frac{{\hat{m}}_{t}}{\sqrt{{\hat{v}}_{t}} + ϵ}

(A52)

Appendix A.5.12. Mini-Batch Training

In each epoch, the training set is shuffled and processed in mini-batches:

B \subset {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

(A53)

Θ \leftarrow Θ - η \nabla_{Θ} \frac{1}{| B |} \sum_{(x_{i}, y_{i}) \in B} {(y_{i}^{'} - {\hat{y}}_{i}^{'})}^{2}

(A54)

References

Ulpiani, G.; Vetters, N. On the risks associated with transitioning to climate neutrality in Europe: A city perspective. Renew. Sustain. Energy Rev. 2023, 183, 113448. [Google Scholar] [CrossRef]
Halassa, E.; Mazouz, L.; Seghiour, A.; Chouder, A.; Silvestre, S. Revolutionizing photovoltaic systems: An innovative approach to maximum power point tracking using enhanced dandelion optimizer in partial shading conditions. Energies 2023, 16, 3617. [Google Scholar] [CrossRef]
Reddy, V.; Hariram, N.; Ghazali, M.; Kumarasamy, S. Pathway to sustainability: An overview of renewable energy integration in building systems. Sustainability 2024, 16, 638. [Google Scholar] [CrossRef]
Himri, Y.; Rehman, S.; Mostafaeipour, A.; Himri, S.; Mellit, A.; Merzouk, M.; Merzouk, N. Overview of the role of energy resources in Algeria’s energy transition. Energies 2022, 15, 4731. [Google Scholar] [CrossRef]
Fezzani, A.; Guermoui, M.; Kouzou, A.; Hafaifa, A.; Zaghba, L.; Drid, S.; Rodriguez, J.; Abdelrahem, M. Performances analysis of three grid-tied large-scale solar PV plants in varied climatic conditions: A case study in Algeria. Sustainability 2023, 15, 14282. [Google Scholar] [CrossRef]
Benmouiza, K. Solar zoning maps of Algeria based on sunshine duration data and kriging method. Int. J. Heat Technol. 2023, 41, 649–656. [Google Scholar] [CrossRef]
Hazem, T.; Cheghib, H.; Fezzani, A.; Kahoul, N.; Sari, M.; Rashid, F.; Ameen, A.; Kezzar, M.; Mahariq, I. Performance evaluation of grid connected photovoltaic pilot plant in Saharan climate using experimental and numerical analysis. Sci. Rep. 2025, 15, 32488. [Google Scholar] [CrossRef]
Dimd, B.; Løvsland, J.; Fosso, O. A review of machine learning-based photovoltaic output power forecasting models in Nordic climates. IEEE Access 2022, 10, 116832–116857. [Google Scholar] [CrossRef]
Gao, F.; Li, X.; Sun, H. Flexible power point tracking for photovoltaic systems: Concepts, control strategies and future directions. Energy Convers. Manag. 2025, 310, 119805. [Google Scholar] [CrossRef]
Radhi, S.; Al-Majidi, S.; Abbod, M.; Al-Raweshidy, H. Machine learning approaches for short-term photovoltaic power forecasting. Energies 2024, 17, 4301. [Google Scholar] [CrossRef]
Uswarman, R.; Munawar, K.; Ramli, M.; Bouchekara, H.; Hossain, M. Maximum power point tracking in photovoltaic systems based on global sliding mode control with adaptive gain scheduling. Electronics 2023, 12, 1128. [Google Scholar] [CrossRef]
Nusrat, M.; Mekhilef, S.; Mubin, M.; Ahmed, S.; Seyedmahmoudian, M.; Stojcevski, A.; Alshammari, O. Advancements in flexible power point tracking and power control strategies for photovoltaic power plants: A comprehensive review. Energy Rep. 2024, 12, 237–250. [Google Scholar] [CrossRef]
Sharma, S.; Jately, V.; Kuchhal, P.; Kala, P.; Azzopardi, B. A comprehensive review of flexible power-point-tracking algorithms for grid-connected photovoltaic systems. Energies 2023, 16, 5679. [Google Scholar] [CrossRef]
Gomez, J.; Shanmugam, P. Flexible power point tracking using a neural network for power reserve control in a grid-connected PV system. Energies 2022, 15, 8234. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, T.; Yang, H. A review on capacity sizing and operation strategy of grid-connected photovoltaic battery systems. Energy Built Environ. 2024, 5, 500–516. [Google Scholar] [CrossRef]
Kassar, R.; Takash, A.; Faraj, J.; Khaled, M.; Ramadan, H. Phase change materials for enhanced photovoltaic panels performance: A comprehensive review and critical analysis. Energy Built Environ. 2025, 6, 655–675. [Google Scholar] [CrossRef]
Kong, X.; Chen, Z.; Liu, W.; Ning, K.; Zhang, L.; Muhammad Marier, S.; Liu, Y.; Chen, Y.; Xia, F. Deep learning for time series forecasting: A survey. Int. J. Mach. Learn. Cybern. 2025, 16, 5079–5112. [Google Scholar] [CrossRef]
Seghiour, A.; Bendjeddou, Y.; Mostefaoui, I.; Chouder, A.; Alharbi, H.; Humayd, A.; Wondimeneh, A.; Babqi, A. Fault detection and diagnosis in photovoltaic systems using artificial intelligence and time–frequency analysis. Sci. Rep. 2026, 16, 39386. [Google Scholar] [CrossRef]
Mekri, A.; Seghiour, A.; Kaddour, F.; Boudouaoui, Y.; Chouder, A.; Silvestre, S. Novel hybrid analytical-metaheuristic optimization for efficient photovoltaic parameter extraction. Electronics 2025, 14, 4294. [Google Scholar] [CrossRef]
Niu, Y.; Su, Y.; Tang, P.; Wang, Q.; Sun, Y.; Song, J. Estimation of solar irradiance under cloudy weather based on solar radiation model and ground-based cloud image. Energies 2025, 18, 757. [Google Scholar] [CrossRef]
Hassan, A.; Atia, D.; El-Madany, H.; ElGhannam, F. Multi-label machine learning for power forecasting of a grid-connected photovoltaic solar plant over multiple time horizons. Sci. Rep. 2025, 15, 32676. [Google Scholar] [CrossRef] [PubMed]
Markovics, D.; Mayer, M. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew. Sustain. Energy Rev. 2022, 161, 112364. [Google Scholar] [CrossRef]
Oprea, S.V.; Bâra, A. A stacked ensemble forecast for photovoltaic power plants combining deterministic and stochastic methods. Appl. Soft Comput. 2023, 147, 110781. [Google Scholar] [CrossRef]
Singh, Y.; Dubey, S.; Rajput, P.; Singh, K.; Pandey, K. Real-time and modelled performance assessment and validation studies of PV modules operating in varied climatic zones. Energy Built Environ. 2025, 6, 585–595. [Google Scholar] [CrossRef]
Zou, B.; Wang, J.; Wen, F. Optimal investment strategies for distributed generation in distribution networks with real option analysis. IET Gener. Transm. Distrib. 2017, 11, 804–813. [Google Scholar] [CrossRef]
Elboughdiri, N.; Kriaa, K.; Bakare, M.; Abdulkarim, A.; Alaneme, G.; Maatki, C. Intelligent demand-side energy management via optimized ANFIS–gene expression programming in hybrid renewable–grid systems. Sci. Rep. 2025, 15, 43065. [Google Scholar] [CrossRef]
Cisse, B.; Rashed, G.; Badjan, A.; Haider, H.; Gony, H.; Ershad, A. Optimized hybrid deep learning framework for reliable multi-horizon photovoltaic power forecasting in smart grids. Electricity 2026, 7, 4. [Google Scholar] [CrossRef]
Nguyen Trong, T.; Vu Xuan Son, H.; Do Dinh, H.; Takano, H.; Nguyen Duc, T. Short-term PV power forecast using hybrid deep learning model and variational mode decomposition. Energy Rep. 2023, 9, 712–717. [Google Scholar] [CrossRef]
Seghiour, A.; Abbas, H.; Chouder, A.; Rabhi, A. Deep learning method based on autoencoder neural network applied to faults detection and diagnosis of photovoltaic system. Simul. Model. Pract. Theory 2023, 123, 102704. [Google Scholar] [CrossRef]
Hamad, S.; Ghalib, M.; Munshi, A.; Alotaibi, M.; Ebied, M. Evaluating machine learning models comprehensively for predicting maximum power from photovoltaic systems. Sci. Rep. 2025, 15, 10750. [Google Scholar] [CrossRef]
Xiang, X.; Li, X.; Zhang, Y.; Hu, J. A short-term forecasting method for photovoltaic power generation based on the TCN-ECANet-GRU hybrid model. Sci. Rep. 2024, 14, 6744. [Google Scholar] [CrossRef]
Benitez, I.; Singh, J. A comprehensive review of machine learning applications in forecasting solar PV and wind turbine power output. J. Electr. Syst. Inf. Technol. 2025, 12, 54. [Google Scholar] [CrossRef]
Kim, E.; Jeon, Y.; Park, Y.; Park, S.; Oh, S. Evaluation of photovoltaic generation forecasting using model output statistics and machine learning. Energies 2026, 19, 486. [Google Scholar] [CrossRef]
Ahmadi, M.; Aly, H.; Gu, J. A comprehensive review of AI-driven approaches for smart grid stability and reliability. Renew. Sustain. Energy Rev. 2026, 226, 116424. [Google Scholar] [CrossRef]
Eren, Y.; Küçükdemiral, İ. A comprehensive review on deep learning approaches for short-term load forecasting. Renew. Sustain. Energy Rev. 2024, 189, 114031. [Google Scholar] [CrossRef]
Kraska, P.; Hanzel, K. Comparison of electricity production prediction models based on meteorological data for photovoltaic farms in Poland—Challenges and problems. Solar 2026, 6, 16. [Google Scholar] [CrossRef]
Rodriguez-Leguizamon, C.; López-Sotelo, J.; Cantillo-Luna, S.; López-Castrillón, Y. PV power generation forecasting based on XGBoost and LSTM models. In Proceedings of the 2023 IEEE Workshop on Power Electronics and Power Quality Applications (PEPQA), Cali, Colombia, 5–6 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Cortez, J.; Terada, L.; Bandeira, B.; Soares, J.; Vale, Z.; Rider, M. Comparative analysis of ARIMA, LSTM, and XGBoost for very short-term photovoltaic forecasting. In Proceedings of the 2023 15th Seminar on Power Electronics and Control (SEPOC), Santa Maria, Brazil, 22–25 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Mollasalehi, A.; Farhadi, A. Solar and wind power forecasting: A comparative review of LSTM, Random Forest, and XGBoost models. arXiv 2025, arXiv:2509.24059. [Google Scholar] [CrossRef]
Iheanetu, K. Solar photovoltaic power forecasting: A review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
Singh, U.; Singh, S.; Gupta, S.; Alotaibi, M.; Malik, H. Forecasting rooftop photovoltaic solar power using machine learning techniques. Energy Rep. 2025, 13, 3616–3630. [Google Scholar] [CrossRef]
Bin Yousuf, K.; Akter, A.; Noor, H.; Hoque, A.; Ahmed, A. A hybrid XGBoost–LSTM model with physics-informed features and uncertainty quantification for solar power forecasting. Energy Rep. 2026, 15, 109068. [Google Scholar] [CrossRef]
Abumohsen, M.; Owda, A.; Owda, M.; Abumihsan, A. Hybrid machine learning model combining CNN–LSTM–RF for time series forecasting of solar power generation. e-Prime Adv. Electr. Eng. Electron. Energy 2024, 9, 100636. [Google Scholar] [CrossRef]
Zi, X.; Liu, F.; Liu, M.; Wang, Y. A deep learning method for photovoltaic power generation forecasting based on a time-series dense encoder. Energies 2025, 18, 2434. [Google Scholar] [CrossRef]
Bouziane, A.; Bouziane, M.; Khatir, N. Enhancing photovoltaic power forecasting through hybrid deep learning models: A CNN-RNN approach for grid stability and renewable energy optimization. J. Renew. Energies 2024, 1, 59–71. [Google Scholar] [CrossRef]
Fan, Y.; Wu, H.; Lin, J.; Li, Z.; Li, L.; Huang, X.; Chen, W.; Chen, B. A distributed photovoltaic short-term power forecasting model based on lightweight AI for edge computing. J. Phys. Conf. Ser. 2024, 2876, 012050. [Google Scholar] [CrossRef]
Asiedu, S.; Nyarko, F.; Boahen, S.; Effah, F.; Asaaga, B. Machine learning forecasting of solar PV production using single and hybrid models over different time horizons. Heliyon 2024, 10, e28898. [Google Scholar] [CrossRef]
Velimirovici, L.; Paulescu, E.; Paulescu, M. Short-term solar irradiance forecasting using random forest-based models with a focus on mountain locations. Energies 2026, 19, 769. [Google Scholar] [CrossRef]
Husein, M.; Gago, E.; Hasan, B.; Pegalajar, M. Towards energy efficiency: A comprehensive review of deep learning-based photovoltaic power forecasting strategies. Heliyon 2024, 10, e33419. [Google Scholar] [CrossRef]
Seraj, H.; Abbaspour, A.; Bahadori-Jahromi, A. Interpretable data-driven models for energy performance assessment in residential buildings. Sustainability 2026, 18, 457. [Google Scholar] [CrossRef]
Zaghba, L.; Khennane Benbitour, M.; Fezzani, A.; Mekhilef, S.; Borni, A. Comprehensive performance assessment of two grid-tied photovoltaic systems in a hot arid climate: A three-year theoretical and experimental analysis. Renew. Sustain. Energy Rev. 2025, 216, 115643. [Google Scholar] [CrossRef]
Salameh, T.; Farag, M.; Hamid, A.K.; Hussein, M. Adaptive neuro-fuzzy inference system for accurate power forecasting for on-grid photovoltaic systems: A case study in Sharjah, UAE. Energy Convers. Manag. X 2025, 26, 100958. [Google Scholar] [CrossRef]
Ispir, M.; Aksoy, M.; Kalyoncu, M. Estimation of solar radiation and photovoltaic power potential of Türkiye using ANFIS. J. King Saud Univ. Eng. Sci. 2025, 37, 2. [Google Scholar] [CrossRef]
Annapoorani, I.; Rajaguru, V.; Vedanjali, N.; Pappula, R. Solar forecasting for a PV-battery powered DC system. Heliyon 2023, 9, e20667. [Google Scholar] [CrossRef] [PubMed]
Mohammed, H.; Mohd-Mokhtar, R.; Ali, H. An optimal adaptive neuro-fuzzy inference system for photovoltaic power system optimization under partial shading conditions. Energy Syst. 2025, Online first. [CrossRef]
Chicaiza, W.; Topa, A.; Sánchez, A.; Escaño, J.; Álvarez, J. Model design for photovoltaic facilities based on fuzzy neural network as core of its digital twin. Energy Convers. Manag. 2025, 342, 120001. [Google Scholar] [CrossRef]
Sungrow Power Supply Co., Ltd. SG33CX/SG40CX/SG50CX Multi-MPPT String Inverter for 1000 V_dc System—Datasheet, Version 1.4/1.21; User Manual SG30CX/SG33CX/SG40CX/SG50CX; Logger1000 Quick Guide for SG30-50-110CX Inverters, TD_202006_V1.4; Zero-Export Commissioning Guide; Sungrow Power Supply Co., Ltd.: Hefei, China, 2022; Version 1.4 / 1.21; originally issued 2019–2022; Available online: https://support.sungrowpower.com (accessed on 20 May 2026).

Figure 2. PV system performance on three representative days (15 July, 18 July, and 18 June 2025). Left panels (a,c,e): power balance between PV output, load, and grid. Right panels (b,d,f): corresponding solar irradiance, PV module temperature, and ambient temperature profiles.

Figure 3. Correlation heatmap of key operational parameters in the photovoltaic system.

Figure 4. P–V characteristic of the PV array showing the two feasible FPPT operating voltages

V_{left} < V_{mpp} < V_{right}

for a given curtailed reference

P_{ref} < P_{MPP}

. The branch selection is a property of the curtailment algorithm embedded in the inverter firmware and is not disclosed by the manufacturer for the SG50CX [57].

Figure 4. P–V characteristic of the PV array showing the two feasible FPPT operating voltages

V_{left} < V_{mpp} < V_{right}

for a given curtailed reference

P_{ref} < P_{MPP}

. The branch selection is a property of the curtailment algorithm embedded in the inverter firmware and is not disclosed by the manufacturer for the SG50CX [57].

Figure 5. Distributed FPPT control loop of the studied plant.

Figure 6. Calendar-day membership of the three validation strategies.

Figure 7. Diagnostic six-panel figure for ANFIS under the three complementary validation strategies: (a) scatter of predicted vs. measured PV power under S1 (random split),

R^{2} = 0.9992

; (b) scatter under S2 (chronological split),

R^{2} = 0.9998

; (c) scatter under S3 (external 14-day hold-out, 11–24 July); (d) histogram of prediction residuals on the external hold-out (S2 model); (e) measured and predicted PV power time series over the S2 internal test set (1–10 July); and (f) measured and predicted PV power time series over the S3 external hold-out (11–24 July).

Figure 7. Diagnostic six-panel figure for ANFIS under the three complementary validation strategies: (a) scatter of predicted vs. measured PV power under S1 (random split),

R^{2} = 0.9992

; (b) scatter under S2 (chronological split),

R^{2} = 0.9998

; (c) scatter under S3 (external 14-day hold-out, 11–24 July); (d) histogram of prediction residuals on the external hold-out (S2 model); (e) measured and predicted PV power time series over the S2 internal test set (1–10 July); and (f) measured and predicted PV power time series over the S3 external hold-out (11–24 July).

Figure 8. Detail of the ANFIS time-series tracking on the S1 test partition. Measured FPP (black line) and ANFIS prediction (red line) overlap closely across diurnal cycles, including during partial-cloud transients.

Figure 9. ANFIS model performance across the three data partitions. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9992

,

RMSE = 654

W); (b) S2 chronological split (

R^{2} = 0.9998

,

RMSE = 325

W); (c) S3 external generalisation (

R^{2} = 0.9997

,

RMSE = 409

W). Residuals vs. predicted PV power for: (d) S2 chronological (internal) test set; (e) S3 external test set. Empirical residual histograms for: (f) S2 chronological (internal) test set; (g) S3 external test set.

Figure 9. ANFIS model performance across the three data partitions. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9992

,

RMSE = 654

W); (b) S2 chronological split (

R^{2} = 0.9998

,

RMSE = 325

W); (c) S3 external generalisation (

R^{2} = 0.9997

,

RMSE = 409

W). Residuals vs. predicted PV power for: (d) S2 chronological (internal) test set; (e) S3 external test set. Empirical residual histograms for: (f) S2 chronological (internal) test set; (g) S3 external test set.

Figure 10. Cross-model, cross-strategy comparison of the coefficient of determination

R^{2}

.

Figure 10. Cross-model, cross-strategy comparison of the coefficient of determination

R^{2}

.

Figure 11. Cross-model, cross-strategy comparison of RMSE (in watts).

Figure 12. Cross-model, cross-strategy comparison of MAE (top) and NRMSE (bottom).

Figure 13. Multi-strategy validation panel for the SVM model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9910

); (b) S2 chronological split (

R^{2} = 0.9714

); (c) S3 external generalisation (

R^{2} = 0.9490

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 13. Multi-strategy validation panel for the SVM model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9910

); (b) S2 chronological split (

R^{2} = 0.9714

); (c) S3 external generalisation (

R^{2} = 0.9490

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 14. Multi-strategy validation panel for the Decision Tree model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.8993

); (b) S2 chronological split (

R^{2} = 0.8800

); (c) S3 external generalisation (

R^{2} = 0.8774

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 14. Multi-strategy validation panel for the Decision Tree model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.8993

); (b) S2 chronological split (

R^{2} = 0.8800

); (c) S3 external generalisation (

R^{2} = 0.8774

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 15. Multi-strategy validation panel for the Random Forest model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9400

); (b) S2 chronological split (

R^{2} = 0.9279

); (c) S3 external generalisation (

R^{2} = 0.9092

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 15. Multi-strategy validation panel for the Random Forest model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9400

); (b) S2 chronological split (

R^{2} = 0.9279

); (c) S3 external generalisation (

R^{2} = 0.9092

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 16. Multi-strategy validation panel for the KNN model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9660

); (b) S2 chronological split (

R^{2} = 0.9562

); (c) S3 external generalisation (

R^{2} = 0.9133

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 16. Multi-strategy validation panel for the KNN model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9660

); (b) S2 chronological split (

R^{2} = 0.9562

); (c) S3 external generalisation (

R^{2} = 0.9133

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 17. Multi-strategy validation panel for the DNN model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9955

); (b) S2 chronological split (

R^{2} = 0.9948

); (c) S3 external generalisation (

R^{2} = 0.9875

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 17. Multi-strategy validation panel for the DNN model. Measured vs. predicted PV power for: (a) S1 random split (

R^{2} = 0.9955

); (b) S2 chronological split (

R^{2} = 0.9948

); (c) S3 external generalisation (

R^{2} = 0.9875

). (d) Histogram of prediction residuals evaluated on the S3 external set using the S2-trained model; residuals are defined as measured − predicted power. (e) Time-series comparison of measured and predicted PV power over the S2 internal test period (1–10 July). (f) Time-series comparison over the S3 external test period (11–24 July).

Figure 18. Learned Gaussian membership functions for the two fuzzy rules of the trained ANFIS model. Blue: Rule 1; Orange: Rule 2. The five subplots correspond to the five input variables: grid power

P_{grid}

, module temperature

T_{PV}

, ambient temperature

T_{amb}

, load power

P_{load}

, and solar irradiance G.

Figure 18. Learned Gaussian membership functions for the two fuzzy rules of the trained ANFIS model. Blue: Rule 1; Orange: Rule 2. The five subplots correspond to the five input variables: grid power

P_{grid}

, module temperature

T_{PV}

, ambient temperature

T_{amb}

, load power

P_{load}

, and solar irradiance G.

Table 2. PV systems characteristics.

Parameter	Specification	Unit
Electrical Characteristics
Rated Power ( $P_{s t c}$ )	460	Wc
Open Circuit Voltage ( $V_{o c}$ )	50.01	V
Maximum Power Voltage ( $V_{m p}$ )	42.13	V
Short Circuit Current ( $I_{s c}$ )	11.45	A
Maximum Power Current ( $I_{m p}$ )	10.92	A
Cell Technology	Monocrystalline Silicon (sc-Si)	–
Total Cell Count	144	cells
Physical Dimensions
Length	2112	mm
Width	1052	mm
Module Surface Area	2.22	m²
Performance Coefficients
Power Temperature Coefficient	−0.35	%/°C
Current Temperature Coefficient	0.044	%/°C
Voltage Temperature Coefficient	−0.272	%/°C
Normal Operating Cell Temp (NOCT)	45	°C

Table 3. Statistical characteristics of the dataset.

Parameter	Count	Mean	Std	Min	25%	50%	75%	Max
PV power (W)	24,479	17,380.14	23,700.51	0	0	2516	29,428	106,295
Grid power (W)	24,479	33,491.99	48,925.41	0	6000	19,750	28,500	380,750
Load power (W)	24,479	50,684.59	63,859.69	0	19,082.5	26,000	38,886.5	420,294
Amb. Temp. (°C)	24,479	24.61	4.71	12.7	21.5	24.6	27.9	39.6
Irradiance (W/m²)	24,479	284.13	336.10	0	0	95	583.75	1206
Mod. Temp. (°C)	24,479	30.04	13.11	9.5	20.1	24.6	42.1	69.1

Table 4. Dataset summary for the three validation strategies.

Property	S1 (Random)	S2 (Chronological)	S3 (External Hold-Out)	Total Dataset
Training days	60	50	— (uses S1 or S2)	—
Validation days	—	11	—	—
Test days	11	10	14 (11–24 July)	—
Date range (test)	uniform [1 May–10 July]	1–10 July	11–24 July	1 May–24 July
Test samples	$\sim 3168$	$\sim 2880$	4032	$24, 479$
Sampling interval	5 min
Number of features	5 ( $P_{grid}, T_{PV}, T_{amb}, P_{load}, G$ )
Target variable	$P_{FPP}$ (W)

Table 5. ANFIS sensitivity analysis—effect of cluster influence range on the number of fuzzy rules, predictive performance between S1 and S2.

Radius	S1—Random Split					S2—Chronological Split
Radius	Rules	R² Train	RMSE Train	R² Test	RMSE Test	Rules	R² Train	RMSE Train	R² Test	RMSE Test
0.2	2	0.9973	1240.7	0.9992	653.6	2	0.9965	1344.4	0.9998	325.4
0.3	2	0.9979	1100.3	0.9989	791.3	2	0.9964	1370.4	0.9998	323.7
0.5	2	0.9972	1255.8	0.9992	683.5	2	0.9963	1388.8	0.9998	345.4
0.7	2	0.9978	1109.2	0.9988	817.1	2	0.9971	1239.3	0.9998	402.9
0.9	2	0.9979	1102.0	0.9988	805.1	2	0.9971	1230.5	0.9998	398.9

Table 6. Complete hyperparameter configuration for all six models.

Model	Hyperparameter	Search Space/Options	Selected Value
ANFIS	Clustering method	Subtractive clustering (`genfis`)	Subtractive
	Cluster influence range (radius)	{0.2, 0.3, 0.5, 0.7, 0.9} (Table 5)	0.2
	Generated fuzzy rules (R)	Determined automatically by clustering	2
	MF type (input)	Gaussian (`gaussmf`, MATLAB default)	Gaussian
	MF type (output)	Sugeno first-order linear	Linear
	Learning algorithm	Hybrid (LSE + gradient descent)	Hybrid
	Training epochs	{200, 500, 1000}	1000
	Error tolerance (stopping criterion)	$10^{- 6}$	$10^{- 6}$
SVM (SVR)	Kernel type	RBF (Gaussian), linear, polynomial	RBF (Gaussian)
	Kernel scale ( $γ$ )	{`auto`, 0.1, 1, 10, 100}	`auto`
	Regularization parameter (C)	{0.1, 1, 10, 100, 1000}	1
	Feature standardization	Enabled (`Standardize = true`)	Applied
	MATLAB function	`fitrsvm`	`fitrsvm`
Decision Tree	Splitting criterion	MSE (variance reduction)	MSE
	Maximum number of splits	{10, 50, 100, 200, `Inf`}	100
	Minimum samples per leaf	{1, 2, 5, 10}	5
	MATLAB function	`fitrtree`	`fitrtree`
Random Forest	Number of trees (B)	{50, 100, 200, 500}	100
	Method	Regression (`TreeBagger`)	Regression
	Minimum samples per leaf	{1, 2, 5, 10}	5
	MATLAB function	`TreeBagger`	`TreeBagger`
KNN	Number of neighbors (k)	{1, 3, 5, 7, 10}	3
	Distance metric	Euclidean, Manhattan, Minkowski	Euclidean
	Feature standardization	Enabled (`Standardize = true`)	Applied
	Fallback implementation	Manual brute-force search (if `fitrknn` unavailable)	Automatic
	MATLAB function	`fitrknn` (with brute-force fallback)	`fitrknn`
DNN	Number of hidden layers	2, 3, 4	3
	Neurons per hidden layer ( $h_{1}, h_{2}, h_{3}$ )	{32, 64, 128, 256}³	(128, 64, 32)
	Activation function	ReLU, Tanh, Sigmoid	ReLU
	Dropout rate (p)	{0.0, 0.1, 0.2, 0.3}	0.2 (layers 1–2 only)
	Optimizer	Adam, SGD, RMSProp	Adam
	Maximum epochs	{50, 100, 200, 500}	100
	Mini-batch size	{16, 32, 64, 128}	32
	Shuffle strategy	Every epoch	Every epoch
	Input normalization	Min–max scaling to $[0, 1]$	Applied
	MATLAB function	`trainNetwork` (Deep Learning Toolbox)	`trainNetwork`
Software and implementation details
All models	Platform	MATLAB R2024b	—
ANFIS	Toolbox	Fuzzy Logic Toolbox (`anfis`, `anfisOptions`, `genfis`)	—
SVM, DT, RF, KNN	Toolbox	Statistics and Machine Learning Toolbox	—
DNN	Toolbox	Deep Learning Toolbox (`trainNetwork`)	—
All models	Random seed	`rng(42)` (set globally)	—

Table 7. ANFIS performance under the three complementary validation strategies.

Strategy	Evaluation Set	$R^{2}$	RMSE (W)	MAE (W)	NRMSE (%)
S1 (random)	Test (11 days)	0.9992	653.62	276.90	3.86
S1 (random)	External 14 d	0.9998	363.45	147.11	1.98
S2 (chronological)	Test (1–10 July)	0.9998	325.40	119.17	1.51
S2 (chronological)	External 14 d	0.9997	408.50	158.17	2.23

Table 8. Complete cross-strategy performance assessment of all six models.

Model	Strategy	Test (Own Strategy)				External 14-Day Hold-Out
Model	Strategy	$R^{2}$	RMSE (W)	MAE (W)	NRMSE (%)	$R^{2}$	RMSE (W)	MAE (W)	NRMSE (%)
ANFIS	S1 random	0.9992	653.62	276.90	3.86	0.9998	363.45	147.11	1.98
ANFIS	S2 chronological	0.9998	325.40	119.17	1.51	0.9997	408.50	158.17	2.23
SVM	S1 random	0.9910	2224.79	1173.46	13.14	0.9776	3445.66	2034.82	18.78
SVM	S2 chronological	0.9714	4331.97	1651.83	20.08	0.9490	5198.17	2688.50	28.34
Decision Tree	S1 random	0.8993	7440.15	2097.98	43.94	0.9772	3476.52	1756.49	18.95
Decision Tree	S2 chronological	0.8800	8879.71	3057.80	41.17	0.8774	8058.41	3128.58	43.93
Random Forest	S1 random	0.9400	5,744.77	1857.59	33.93	0.9743	3687.08	1773.76	20.10
Random Forest	S2 chronological	0.9279	6,879.61	2611.53	31.89	0.9092	6935.00	3095.02	37.81
KNN	S1 random	0.9660	4,325.32	1901.70	25.55	0.9537	4951.80	2754.17	27.00
KNN	S2 chronological	0.9562	5364.56	2327.36	24.87	0.9133	6778.42	3824.00	36.95
DNN	S1 random	0.9955	1576.21	1181.80	9.31	0.9972	1226.42	873.25	6.69
DNN	S2 chronological	0.9948	1849.78	1346.32	8.58	0.9875	2576.15	2116.71	14.04

Table 10. Gaussian membership function parameters of the trained ANFIS model: centers

c_{j, k}

and widths

σ_{j, k}

for each input variable and fuzzy rule.

Table 10. Gaussian membership function parameters of the trained ANFIS model: centers

c_{j, k}

and widths

σ_{j, k}

for each input variable and fuzzy rule.

Rule	$P_{grid}$ (W)		$T_{PV}$ (°C)		$T_{amb}$ (°C)		$P_{load}$ (W)		G (W/m²)
Rule	$c$	$σ$	$c$	$σ$	$c$	$σ$	$c$	$σ$	$c$	$σ$
R1	$26, 923.1$	$25, 500.0$	$5.5$	$18.54$	$4.8$	$22.24$	$29, 719.3$	$25, 500.0$	$86.8$	$+ 0.10$
R2	$26, 923.1$	$13, 750.0$	$6.4$	$16.50$	$3.2$	$19.83$	$29, 719.3$	$13, 750.0$	$83.7$	$- 0.11$

Table 11. First-order Sugeno consequent coefficients

β_{j, k}

of the trained ANFIS model.

Table 11. First-order Sugeno consequent coefficients

β_{j, k}

of the trained ANFIS model.

Rule	$β_{j, 0}$	$β_{j, P_{grid}}$	$β_{j, T_{PV}}$	$β_{j, T_{amb}}$	$β_{j, P_{load}}$	$β_{j, G}$
R1	$- 0.986$	$- 26.154$	$+ 46.377$	$+ 0.985$	$+ 1.797$	$- 542.581$
R2	$- 0.801$	$+ 36.841$	$- 29.210$	$+ 0.787$	$+ 13.602$	$+ 323.857$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Boudouaoui, Y.; Seghiour, A.; Tadjeddine, A.A.; Mekri, A.; Kaddour, F.; Mostefaoui, I.M.; Chouder, A.; Rabhi, A. Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants. Electronics 2026, 15, 2430. https://doi.org/10.3390/electronics15112430

AMA Style

Boudouaoui Y, Seghiour A, Tadjeddine AA, Mekri A, Kaddour F, Mostefaoui IM, Chouder A, Rabhi A. Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants. Electronics. 2026; 15(11):2430. https://doi.org/10.3390/electronics15112430

Chicago/Turabian Style

Boudouaoui, Yassine, Abdellatif Seghiour, Ali Abderrazak Tadjeddine, Abdelkader Mekri, Fouad Kaddour, Imene Meriem Mostefaoui, Aissa Chouder, and Abdelhamid Rabhi. 2026. "Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants" Electronics 15, no. 11: 2430. https://doi.org/10.3390/electronics15112430

APA Style

Boudouaoui, Y., Seghiour, A., Tadjeddine, A. A., Mekri, A., Kaddour, F., Mostefaoui, I. M., Chouder, A., & Rabhi, A. (2026). Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants. Electronics, 15(11), 2430. https://doi.org/10.3390/electronics15112430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Neuro-Fuzzy Inference System for High-Accuracy Flexible Power Point Prediction in Utility-Scale Grid-Connected Photovoltaic Plants

Abstract

1. Introduction

Objectives and Contributions

2. Experimental Setup

3. Acquisition and Setup of Operational Data

Data Analysis and Interpretation

4. Flexible Power Point Tracking: Theory, Modes and Implementation in the Studied Plant

4.1. Operating Principle and Taxonomy of FPPT Modes

4.2. Set-Point Law in the Studied Plant

4.3. Operating-Point Selection on the P–V Curve

4.4. As-Built Distributed Control Architecture

4.4.1. Supervisory Cap Computation (Logger1000B)

4.4.2. Inverter-Level Cap Tracking (SG50CX)

5. Dataset Description, Model Configuration, and Training Methodology

5.1. Data Collection and Preprocessing

5.1.1. Preprocessing Pipeline

5.1.2. Data Partitioning: Three Complementary Validation Strategies

5.2. Input Feature Selection

5.3. ANFIS Layer Equations and Learnable Parameters

5.4. ANFIS Configuration and Hyperparameter Selection

5.4.1. Membership Function Type

5.4.2. Cluster Influence Range: Sensitivity Analysis and Optimal Selection

5.5. Computational Environment and Software

5.6. Complete Model Configuration

6. Results and Discussion

6.1. ANFIS Model Performance Across the Three Validation Strategies

6.2. Comparative Evaluation Across Six Models and Three Validation Strategies

6.3. Critical Evaluation of Model Validity

6.3.1. Overfitting Assessment via Parametric Efficiency, Cross-Validation and the External Hold-Out

6.3.2. Data Leakage Analysis

6.4. ANFIS Learned Parameters: Physical Interpretation

6.4.1. Membership Function Visualization and Analysis

6.4.2. Consequent Coefficient Analysis and Physical Consistency

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Mathematical Formulations of Benchmark Models

Appendix A.1. Support Vector Machine (SVM)

Appendix A.1.1. Primal Optimization Problem

Appendix A.1.2. Dual Optimization Problem

Appendix A.1.3. Decision Function

Appendix A.1.4. Radial Basis Function Kernel

Appendix A.1.5. Loss Function

Appendix A.2. Decision Tree Regressor

Appendix A.2.1. Tree Structure and Splitting Criterion

Appendix A.2.2. Impurity Measure

Appendix A.2.3. Split Quality

Appendix A.2.4. Prediction

Appendix A.2.5. Training Loss

Appendix A.3. Random Forest

Appendix A.3.1. Bootstrap Sampling

Appendix A.3.2. Tree Construction

Appendix A.3.3. Ensemble Prediction

Appendix A.3.4. Variance Reduction

Appendix A.3.5. Out-of-Bag Estimation

Appendix A.3.6. Feature Importance

Appendix A.3.7. Ensemble Loss

Appendix A.4. K-Nearest Neighbors (KNN)

Appendix A.4.1. Distance Computation

Appendix A.4.2. Standardized Distance

Appendix A.4.3. Neighbor Selection

Appendix A.4.4. Uniform-Weighted Prediction

Appendix A.4.5. Distance-Weighted Prediction

Appendix A.4.6. KNN Algorithm

Appendix A.4.7. Loss Function

Appendix A.5. Deep Neural Network (DNN)

Appendix A.5.1. Network Architecture

Appendix A.5.2. Fully Connected (Dense) Layers

Appendix A.5.3. ReLU Activation Function

Appendix A.5.4. Dropout Regularization

Appendix A.5.5. Input Normalization

Appendix A.5.6. Target Normalization

Appendix A.5.7. Output Layer

Appendix A.5.8. Inverse Scaling

Appendix A.5.9. Loss Function