1. Introduction
The global transition to renewable energy sources has made photovoltaic (PV) systems a cornerstone of sustainable energy generation [
1,
2,
3]. However, the efficiency and reliability of PV systems are frequently challenged by faults, which can lead to significant power losses, reduced system performance, and increased maintenance costs. As a result, accurate and reliable fault detection is crucial for optimizing the operational efficiency of PV systems and ensuring their long-term viability. A robust PV array model plays a key role in effective monitoring and fault diagnosis, serving as the foundation for identifying deviations from expected performance metrics. Despite advances in PV technology, fault detection in real-world conditions remains a complex issue [
4], primarily due to the influence of environmental variability and the diverse nature of potential faults, including sensor malfunctions that obscure the proper performance of the system.
While numerous fault detection and diagnosis techniques have been developed for PV systems, many of these methods still face significant challenges when applied to large-scale installations. For instance, existing methods often struggle to efficiently handle the scale and complexity of large PV arrays, which may include thousands of modules operating under dynamic environmental conditions.
A primary challenge in this context is the accurate prediction of PV system output, which is crucial for reliable system modeling and fault detection. Accurate prediction aligns with the need to assess PV system output and estimate energy generation under varying environmental conditions. Dynamic meteorological factors and site-specific environmental parameters, such as solar irradiation, wind velocity, ambient temperature, cloud cover, and module operating temperature heavily influence the energy generation of solar PV systems [
5]. These factors exhibit temporal variability, making it difficult to predict energy yield with precision. Thus, effective prediction of solar PV output is essential not only for fault detection but also for optimizing grid management strategies, ensuring a stable power supply, and enabling the seamless integration of renewable energy into existing electrical infrastructure.
To predict solar PV output power, various methods have been developed, generally falling into three categories: physically based models [
6,
7], data-driven statistical models [
8], and hybrid systems combining both approaches [
9]. Physically based models simulate energy conversion from solar irradiation to electrical output using deterministic equations and rely on meteorological variables such as solar irradiation and temperature [
10]. While these models are accurate under stable conditions, they struggle during periods of rapid environmental change. By contrast, data-driven statistical models analyze historical data to identify patterns without explicitly modeling system physics, providing flexibility in diverse scenarios. Hybrid systems, which integrate both physical and statistical approaches, show promise in enhancing prediction accuracy under varying environmental conditions.
In recent years, data-driven statistical models have become essential in PV power prediction due to their adaptability in capturing the complex relationships between environmental factors and system performance. Within this category, Machine Learning (ML) techniques—particularly subsets of Artificial Intelligence (AI)—have demonstrated considerable potential for accurate PV output predicting. Methods such as artificial neural networks (ANN) [
11], long short-term memory (LSTM) [
12] networks, and support vector machines (SVM) are widely used in PV output prediction. Advanced ANN architectures like multilayer perceptron neural networks (MLPNN), convolutional neural networks (CNN), and gated recurrent units (GRU) have proven capable of learning complex, nonlinear patterns from historical data [
13]. Additionally, ensemble learning methods like Random Forest (RF) [
14,
15] and instance-based methods like K-Nearest Neighbors (KNN) [
16,
17,
18] have gained attention for their interpretability and robustness.
The second challenge lies in choosing the most suitable fault detection and diagnosis methods, as the effectiveness of these methods directly impacts the timely identification and classification of faults, which is essential for minimizing losses. To address these challenges, various fault detection approaches have been proposed in the literature, including model-based methods [
19,
20,
21], conventional threshold-based approaches [
22,
23,
24], and machine learning techniques [
25,
26,
27,
28]. However, many of these methods rely heavily on a substantial amount of labeled data for training, which is often difficult to obtain in practice. Furthermore, these methods sometimes fail to fully capture the correlation between subtle power losses and fault conditions, limiting their ability to identify underlying faults in a timely and accurate manner.
In traditional threshold-based methods, fault detection and diagnosis are typically performed by analyzing various electrical parameters, including operating current, voltage, and generated output power. For example, Chouder et al. [
29] proposed an effective approach for supervising and detecting faults in PV systems through power loss analysis. This method introduces four new indicators for fault detection and supervision: current ratio, voltage ratio, thermal capture losses, and miscellaneous capture losses. Additionally, Taghezouit et al. [
30] presented a method for fault diagnosis in PV systems using behavioral modeling and performance analysis within the LabVIEW environment, focusing on a 9.54 kWp grid-connected system. The technique enhances reliability using a diagnostic tool based on performance loss rates (PLR), demonstrating high prediction accuracy with an R
2 value of 0.99 for variables such as DC and AC powers. However, a key disadvantage of the approach is its reliance on time-consuming parameter calibration. The parametric models require careful identification and adjustment of parameters for each specific PV installation, which can be resource-intensive and limit the method’s scalability and practical application.
Regarding fault detection and identification based on machine learning techniques, various works were conducted in the literature. For example, W. Chine et al. [
31] presented a fault diagnosis technique for PV systems using Artificial Neural Networks (ANN), which compares attributes like current, voltage, and I-V characteristics under varying conditions with field measurements. Validated with data from the Renewable Energy Laboratory in Algeria, the method showed high accuracy and can be implemented on an FPGA for real-time monitoring. However, the approach has limitations, including a reliance on accurate simulated data, which may not always reflect real-world conditions, and challenges in using machine learning for classification due to the need for large, diverse datasets. Additionally, environmental variability may affect its generalization across different settings, requiring further research. More recent works, such as those performed by Ledmaou et al. [
32] introduced a convolutional neural network (CNN) model designed to classify anomalies in solar photovoltaic panels, such as dust accumulation and physical damage, achieving high accuracy and high specificity. It emphasizes leveraging data augmentation and transfer learning from the VGG16 architecture to enhance model performance. However, the study notes several limitations, including the reliance on image-based data, which are susceptible to environmental factors like lighting, and the model’s limited ability to classify rare or unseen anomalies due to the diversity of the training data used. Additionally, many of these methods do not directly address the issue of sensor faults, such as misaligned pyranometers, which can lead to inaccurate irradiation measurements and compromise the fault detection accuracy.
To address these challenges, this paper presents a novel approach for fault detection and diagnosis in large-scale PV systems. It is based on the analysis of miscellaneous capture loss errors, as well as DC voltage and current errors, using predictive models built on RF and KNN algorithms. The proposed methodology focuses on creating a model of the system’s healthy behavior under normal operating conditions, which serves as a benchmark for identifying deviations caused by faults. By modeling the expected performance of the PV system, the predictive models (RF and KNN) simulate the “healthy” behavior of the system in real time. The measured data are then compared with those predicted by the models, and discrepancies in power losses—triggered by faults—are detected when they exceed predefined reference thresholds. The fault type is subsequently identified by analyzing the errors in DC voltage and current computed between the measured data and the predictive models. Additionally, this paper proposes a method for detecting faulty sensors, particularly the misalignment of pyranometers and environmental measurement stations, which can lead to inaccurate irradiation measurements.
In our opinion, the main contributions of this work are as follows: (1) The integration of machine learning techniques with the analysis of power loss errors and current/voltage errors to detect and identify faults not only in PV modules but also in the irradiation sensor. While machine-learning models avoid the need for large databases by making predictions using reduced data samples, the analysis of power loss, voltage, and current errors provides robust and effective fault detection and identification. (2) The proposal of a novel method for correcting erroneous datasets, addressing environmental sensor inaccuracies that often hinder reliable performance prediction in PV systems. (3) A performance comparative study is also carried out with other machine learning techniques to justify the choice of RF and KNN predictive models. (4) The development and validation of various data-driven models for a large-scale, real-world grid-connected PV installation with a capacity of 500 kWp. The reliability of the fault detection and identification method is also validated using experimental data from the station.
This paper is organized as follows:
Section 2 presents an overview of the studied site and describes the PV power plant.
Section 3 outlines the predictive models, including the data pre-processing process, tilt irradiation correction, and feature selection.
Section 4 details the fault detection and diagnosis method based on the analysis of errors in power losses, DC voltage, and DC current. Finally,
Section 5 and
Section 6 discuss the results and present the main conclusions of this study.
2. Experimental Setup Description
The dataset used in this study is collected from a grid-connected, ground-mounted photovoltaic (PV) system in Ain El-Melh, situated in the Algerian highlands near the desert region. Positioned at 34°51″ N latitude and 04°11″ E longitude, with an elevation of 910 m above sea level, the PV plant is integrated into the medium-voltage network of Ain El-Melh.
Part of a broader 400 MWp renewable energy initiative managed by SKTM, a subsidiary of Sonelgaz, this PV system contributes to Algeria’s commitment to advancing renewable energy. Under the Algerian government’s renewable energy directive, Sonelgaz has developed 23 PV power plants in the highlands and central regions. Spanning 40 hectares, the Ain El-Melh facility boasts an installed capacity of 20 MWp, designed to optimize energy production. The system utilizes polycrystalline silicon modules with a 15% efficiency rate, installed on fixed structures at a 33° tilt facing south to maximize solar exposure. To minimize shading and enhance energy capture, the rows of modules are spaced 5 m apart.
The solar park consists of 80,080 polycrystalline PV modules (250 Wp each) organized into 40 identical 500 kW sub-fields. Each sub-field consists of 1936 modules distributed over 88 strings (22 modules connected in series for each string). The PV modules are positioned at a 33° tilt and connected to a 500 kW SUNGROW inverter (500–850 VDC input, 315 VAC output). This configuration is consistently replicated across all sub-fields, where two sub-fields (totaling 1 MWp) share a 1250 kVA step-up transformer, as depicted in
Figure 1. The electrical connection between the photovoltaic modules and their respective 500 kW inverter cabinets is performed via 11 junction boxes (level 1), 3 parallel boxes (level 2), and 1 general box (level 3), all housed in shelters. The use of a three-tiered grouping of boxes reduces the total length of DC cables and minimizes ohmic losses. Additionally, this design facilitates optimization and management (O&M) operations. The generated AC power is then transmitted via 60 kV overhead power lines to the national grid, maintaining a standardized layout for the 1 MWp blocks. This repeatable design ensures efficient energy flow from the modules through inverters and transformers. In this study, we will focus on the analysis, modeling, and fault diagnosis of the PV modules and DC power generation of only one 500 kW subfield.
The electrical and environmental dataset was collected from the inverters’ cabinet junction boxes. It includes a comprehensive range of parameters such as solar panel temperature, tilt irradiation, total irradiation, diffuse irradiation, direct irradiation, wind speed, humidity, pressure, voltage, current, and PV power, current, and voltage. The dataset was gathered throughout one year, from 1 January 2023 to 31 December 2023, with measurements taken at 15-min intervals. This resulted in a dataset containing 69,195 data points. A summary of the environmental and electrical parameters of the PV system for the year 2023 is listed in
Table 1.
4. Proposed Fault Detection and Diagnosis Method
Power losses in PV systems are crucial factors that directly impact their overall performance and efficiency. These losses arise from environmental and operational conditions, such as shading, dirt accumulation, and module temperature variations. Deviations from standard test conditions often lead to reduced efficiency in real-world settings. For example, when module temperatures rise above the standard 25 °C, energy losses occur, and additional inefficiencies may arise due to factors like maximum power point tracking failures, module mismatches, and partial shading. These power losses act as key indicators that can help detect faults early in the system, making it possible to diagnose and address issues before they lead to more significant inefficiencies or damage.
Therefore, fault detection and isolation through automated supervision systems have become crucial for notifying the operator to take corrective actions, such as reconfiguring the PV array layout or replacing defective modules. These actions, in turn, will reduce maintenance costs, maximize power production, and prevent long-term degradation of the PV array.
Various faults, such as string disconnection, short-circuited modules, and shading, contribute to significant power losses. For example, a string disconnection results in the complete loss of power from the affected string, while short-circuited modules cause excessive current flow, leading to overheating and further energy losses. Additionally, shading can reduce overall energy generation, especially in systems with series-connected panels. Monitoring these power losses and their deviations from expected output is essential, allowing for early detection of faults and targeted corrective actions. Timely and accurately classifying these faults ensures minimal energy loss and helps maintain the system’s reliability, ensuring it performs optimally.
Table 2 outlines the various types of faults commonly encountered in photovoltaic systems, along with their locations, underlying causes, and the importance of timely detection and diagnosis for maintaining system efficiency and reliability [
39,
40].
By focusing on the power loss factors, the following sub-sections highlight how they can be utilized to identify inefficiencies and malfunctions within the system. This approach paves the way for proactive diagnostic strategies, which not only aid in the early detection of faults but also play a crucial role in improving the system’s overall performance.
4.1. Identification of Performance Indicators in a PV System
In this part, we delve into key performance indicators, which are essential for assessing the efficiency and health of photovoltaic systems [
41]. The Reference Yield (
) [h/d] represents the amount of time, expressed in hours per day, required to accumulate an equivalent amount of instantaneous solar irradiation as that received under the reference irradiation level and is calculated using Equation (6).
refers to the solar irradiation on the module plane, expressed in kWh/m2 per day, while represents the irradiation under standard test conditions (STC), set at 1000 W/m2.
The Array Yield (
) [h/d] represents the total hours per day the PV array operates at its maximum power to generate energy. It is computed as follows:
represents the DC electricity generation of the photovoltaic array, measured in kWh per day, while refers to the rated power of the PV plant, expressed in kilowatts peak (kWp).
The Final Yield (
) [h/d] represents the number of hours per day during which the PV system operates at its rated power. It is determined as follows:
represents the AC electricity generation, measured in kWh per day.
The Performance Ratio
[%] is a quality factor of the PV system rather than a direct efficiency measure. It reflects the extent to which the system’s performance is impacted by accumulated losses, showing how the actual system deviates from an ideal PV system with no losses. This ratio also enables comparisons between different plants operating under various climatic and environmental conditions. It is defined as the ratio of the final yield to the reference yield:
4.2. Identification of Power Loss Factors
Capture losses primarily occur on the DC current side of the PV conversion process, influenced by various factors such as operating temperature, temperature sensitivity, PV efficiency, fluctuations in solar irradiation, shading effects, and losses due to high angles of incidence (AOI) of sunlight. Other additional losses within the PV array arise from issues like inaccuracies in maximum power point tracking (MPPT), module parameter mismatches, wiring losses, and the effects of aging on the system. In a predictive model of a PV plant, the effective irradiation and module temperature serve as key inputs, enabling the calculation of predicted capture losses (
)—excluding operational faults—as follows:
and are the measured reference yield and predicted yield under an irradiation G and PV module’s temperature Tc.
On the other hand, thermal capture losses represent a factor providing valuable insights into energy losses caused by thermal effects under real-world operating conditions. The predicted normalized thermal capture losses
are defined as the difference between the yield predicted at standard temperature and that predicted at the real PV module’s operating temperature [
42]:
is the normalized energy yield at real-working irradiation and a standard temperature of 25 °C, while denotes the array yield at real-working irradiation and module’s temperature Tc.
Miscellaneous capture losses encompass a range of inherent losses, including wiring inefficiencies, string diode losses, low irradiation, dirt accumulation, non-uniform irradiation, module mismatches, maximum power point tracking (MPPT) errors, and DC-side failures such as faulty strings, defective modules, partial shading, and short circuits. The predicted miscellaneous capture losses
are determined in terms of capture losses and thermal capture losses as follows:
4.3. Fault Detection Based on the Analysis of the Miscellaneous Capture Loss Error
Define the error
, which quantifies the deviation between the measured miscellaneous capture losses (
) and their predicted counterpart (
), such that:
Therefore, the fault detection in the PV system is performed through continuous monitoring of
, serving also as a key metric for assessing the accuracy of the prediction models. To reduce the risk of false fault detections, it is crucial to establish a well-defined threshold for
, which delineates the permissible deviation range between predicted and measured values. Considering this, the upper and lower limits of
, representing the healthy state of the PV system are given as:
is the reference deviation in healthy conditions between the measured and predicted miscellaneous capture losses, referred to as
and
, respectively:
is the standard deviation of , calculated using the dataset of one day in healthy conditions. It is obtained as 0.0031. k is a constant, empirically adjusted to 1.25.
Figure 5 presents the flowchart of the fault detection procedure, illustrating the decision-making process for fault identification based on the analysis of the miscellaneous loss error between the actual power measurements and the predicted power by the RF model.
4.4. Fault Diagnosis
When the error indicator
exceeds the predefined threshold band, indicating a potential fault. The next step is to diagnose the root cause of the anomaly. To isolate the fault and identify its type, we use, in addition to
, two electric indicators: the DC current error (
) and the DC voltage error (
). These indicators quantify the deviations between the measured and predicted DC current and voltage values, respectively, and are calculated as follows [
23]:
and are the measured DC current and voltage, respectively. is the predicted current using the RF model, while and is the predicted voltage using the KNN model.
Similarly, in order to avoid a false fault isolation, we define the upper and lower limits beyond which the error is confirmed:
is the reference deviation in healthy conditions between the measured and predicted DC current supplied by the PV plant namely
and
, respectively:
is the reference deviation in healthy conditions between the measured and predicted DC Voltage provided by the PV plant, referred to as
and
, respectively:
= 10.4535 and = 2.0964 are the standard deviations of and , respectively, obtained from a collected dataset of one day in healthy conditions. p and m are two constants, empirically adjusted to 0.5 and 0.4.
Considering this, the diagnosis method aims first at identifying if the fault is actually affecting the PV system or if it is only due to sensor malfunction. This partition is performed through the analysis of the miscellaneous capture loss error deviation outside the threshold band as follows:
Sensor faults—Detected when falls below the lower threshold limit. This fault is likely due to a shading issue affecting only the pyranometer, without being propagated to the PV modules and leading to an inaccurate irradiation measurement.
PV System Faults—Identified when surpasses the upper threshold limit, indicating potential operational inefficiencies or system degradation.
This approach ensures a precise differentiation between system-level faults and sensor malfunctions, leading to more accurate fault diagnosis. Furthermore, it can be generalized to any grid-connected PV system, allowing for system-specific threshold calibration to enhance fault detection reliability.
In the second stage, once the fault source comes effectively from the PV system itself, additional processing is performed to isolate four possible fault sources. The proposed identification method relies on the analysis of the two electric indicators and , considering their location within or outside their respective threshold bands. The following four scenarios are, therefore, useful for identifying the possible sources of faults in the PV system:
and are both located outside their respective threshold bands: the PV system might be affected by shading.
remains within the threshold band while is outside: this might be caused by a short-circuit of PV modules.
is located outside the threshold band while remains inside: this situation might be caused by a disconnection problem affecting the PV modules, or strings.
and are both located inside their respective threshold bands: it is a false alarm announced by the fault detection algorithm, and the PV system is still operating in a healthy condition.
The flowchart of
Figure 6 summarizes the fault diagnosis procedure, illustrating different tests performed to isolate the possible cause of the fault or even detect a false alarm.
6. Conclusions
This work has introduced and validated, through experimental measurements, a comprehensive and effective methodology for fault detection and diagnosis of a large-scale PV system by analyzing power losses and two electrical indicators computed using RF and KNN-based prediction models. The proposed models demonstrate reliable predictions of the supplied DC voltage, DC current, and power, making them suitable for fault identification and diagnosis. Indeed, the KNN model successfully predicted the DC voltage, with an R2 reaching a value of 0.9967, while the RF model provided reliable predictions for DC current and power, with R2 values of approximately 0.9965 and 0.9945, respectively.
In this context, the miscellaneous capture loss error serves as a key signature for fault detection. A deviation beyond the upper threshold indicates a fault affecting the PV modules, while a deviation below the lower threshold suggests a potentially erroneous measurement from the environmental sensor. Additionally, the signatures of the DC current and voltage errors have been validated for classifying three types of PV faults. Specifically, a deviation in the DC current error beyond predefined thresholds suggests a disconnected string, while a deviation in the DC voltage error is characteristic of a short-circuited module. The final fault pattern, characterized by simultaneous deviations of both voltage and current errors beyond the predefined threshold band, allows for precise classification of a partial shading-induced anomaly.
While it is true that the proposed method does not directly detect Maximum Power Point Tracking (MPPT) errors, which are primarily software-related, their effects still contribute to the overall fault detection process, such as identifying low output power, which may be associated with MPPT inefficiencies.
Future work will focus on refining the proposed methodology to further improve accuracy and adaptability. Integrating deep learning models, such as convolutional and recurrent neural networks, will be explored to enhance fault classification capabilities. Additionally, the deployment of real-time fault detection mechanisms on edge computing platforms will also be considered to enable on-site diagnostics and predictive maintenance. Moreover, implementing adaptive thresholds and self-learning algorithms will be studied to enhance fault detection reliability under varying environmental and operational conditions. Finally, the integration of this system within smart grid infrastructures will be examined to optimize energy management and grid stability.