1. Introduction
The growing global demand for PV systems as a sustainable and clean energy source has highlighted the importance of ensuring their operational reliability and long-term efficiency [
1,
2]. Despite offering substantial environmental and economic advantages, PV systems are inherently susceptible to various faults such as open-circuit faults, partial short circuits, partial shading, and string-to-string mismatches. These faults can cause significant power losses and, in severe cases, may even pose safety hazards like electrical arcing or fire [
2]. Conventional fault detection techniques such as manual inspections, infrared thermography, and I–V curve tracing have proven inadequate, especially in utility-scale PV installations. These methods are often labor-intensive, lack sensitivity to early stage or internal faults, and are incapable of providing real-time diagnostics [
3,
4,
5]. To overcome these limitations, intelligent Fault Detection and Diagnosis (FDD) systems based on data driven approaches have gained increasing attention in recent years. Among various machine learning and artificial intelligence methods, Artificial Neural Networks (ANNs) have emerged as powerful tools due to their ability to learn complex non-linear patterns, adapt to noisy and uncertain inputs, and generalize across different operational environments [
6,
7,
8,
9]. Specifically (MLNNs) have demonstrated high effectiveness in recognizing subtle fault signatures in PV systems [
10]. In this context, this study introduces an MLNN-based fault classification framework designed and implemented for a 250 kW grid-connected PV system simulated in MATLAB/Simulink (R2022b). The system is simulated under five representative operating conditions: normal operation, open-circuit fault, partial short-circuit, partial shading, and string-to-string fault. Unlike many previous studies that rely on engineered statistical features or small-scale datasets, this work employs raw signal data including voltage, current, power, irradiance, and temperature captured under varying environmental conditions to enhance realism and robustness. Moreover, the modular design of the proposed model provides a foundation for future integration with predictive fault diagnosis techniques, enabling proactive maintenance strategies. Consequently, the framework is suitable not only for current fault detection tasks but also scalable toward predictive analytics in real- time PV monitoring systems.
In recent years substantial research efforts have been directed toward the development of intelligent Fault Detection and Diagnosis (FDD) techniques for PV systems. Conventional methods such as I–V curve analysis, infrared thermography, and manual inspections have proven largely inadequate for large scale PV installations due to their limited scalability, inability to detect internal or incipient faults, and lack of real time responsiveness [
3,
4,
5]. These limitations have prompted a growing shift toward Machine Learning (ML) and Artificial Intelligence (AI)-based solutions, which offer superior diagnostic accuracy, automation, and adaptability. Among AI methods, ANNs have gained prominence in PV fault diagnostics owing to their capacity to learn from historical data and identify complex, non-linear fault patterns. Chine et al. [
11] introduced an ANN-based model that achieved high classification accuracy across several fault types. Similarly, Mellit et al. [
6] employed feedforward neural networks, reporting notable improvements in fault detection performance. To further improve robustness and adaptability, hybrid approaches have also been explored, integrating ANNs with adaptive algorithms and real time monitoring systems. For example, Sepúlveda-Oviedo et al. [
9] combined AI-based algorithms with monitoring frameworks for enhanced real- time diagnostics, while Abubakar et al. [
8] surveyed ANN-based hybrid frameworks for increased reliability. Additionally, other ML techniques such as SVM and Deep Learning architectures [
12] have been investigated for their potential in PV fault classification. However, despite notable progress, several challenges remain unresolved. Many existing models exhibit poor generalization under variable environmental conditions or degraded performance in the presence of noisy inputs [
13,
14]. Moreover, a considerable proportion of the literature focuses on fault detection only after the event has occurred, offering limited predictive insights that could enable proactive maintenance strategies.
Table 1 summarizes the most relevant studies related to PV fault detection and classification. It highlights the systems, methods, limitations, and performance outcomes reported in the literature. As observed, most existing studies focused on small-scale systems or limited fault types, while the proposed framework in this study addresses these gaps through a 250 kW large-scale model, randomized fault scenarios.
This study aims to improve the precision and reliability of fault detection in PV systems by addressing the shortcomings identified in earlier research. A modular classification framework based on an MLNN is developed, utilizing raw electrical measurements current, voltage, power, irradiance, and temperature across varying environmental conditions. The central research question addressed is whether a data-driven MLNN trained on raw signals can achieve more accurate and robust fault classification than conventional diagnostic approaches. However, previous studies have primarily focused on small-scale PV systems or relied on engineered statistical features, limiting their scalability and generalization capability. To address these gaps, the present study makes the following key contributions:
A detailed 250 kW grid-connected PV system is developed using a modular, string-level configuration, providing higher scale and resolution than most existing studies (typically below 10 kW).
The proposed framework utilizes raw electrical signals instead of preprocessed or statistical features, enabling a richer and more flexible feature space for model training.
The system is evaluated under diverse irradiance and temperature conditions, ensuring robustness and generalizability of the results.
A comprehensive comparative analysis of three ML algorithms MLNN, SVM, and RF is performed on a unified dataset to guarantee consistency and fairness in benchmarking [
15,
16,
17].
The proposed MLNN model achieves a test accuracy of 98%, ranking among the highest reported performances in simulation-based PV fault classification studies [
18].
Table 1.
Summary of previous studies on PV fault detection.
Table 1.
Summary of previous studies on PV fault detection.
| References | System Used | Methodology | Limitations | Outcomes of Study |
|---|
| Mellit et al., 2018 [2] | Various PV systems (review) | Surveyed fault detection and diagnosis (FDD) methods including I–V curve tracing, ANN, and SVM approaches | No unified test platform; lacked environmental variability and ML implementation details | Identified the need for adaptive, data-driven diagnostic models for large-scale PV arrays |
| Ghoneim et al., 2021 [5] | PV farm (MATLAB/Simulink) | Rule-based fault detection algorithms for maintaining service continuity | Sensitive to sensor errors; limited scalability for large systems | Achieved 85–90% accuracy under ideal conditions; failed under noisy data |
| Mellit & Kalogirou, 2022 [6] | 5 kW PV array simulation | Comparative study using ANN, SVM, and RF classifiers | Fixed fault locations; no cross-validation; narrow irradiance range | RF achieved 92% accuracy; ANN showed better generalization ability |
| Li et al., 2021 [7] | PV array (review) | Reviewed ANN-based FDD approaches | Focused on ANN only; ignored hybrid and ensemble approaches | Highlighted ANN’s effectiveness but noted limited robustness to noise |
| Patthi et al., 2024 [3] | PV string (Simulink) | Multi-Layer Neural Network (MLNN)-based optimization | Single fault case; no irradiance variability | MLNN reached 95% accuracy; limited dataset diversity |
| Amiri et al., 2024 [17] | Real PV plant data | Random Forest classifier for fault detection | Pre-processing steps and noise robustness not specified | RF achieved 91% accuracy; model performed well but lacked environmental validation |
| Liu & Wu, 2025 [12] | PV module datasets (review) | Deep learning-based fault detection survey | Theoretical study; no experimental verification | Highlighted the lack of reproducible datasets and standardized evaluation metrics |
| Proposed work | 250 kW grid-connected PV system (MATLAB/Simulink) | MLNN-based framework with randomized fault locations, variable irradiance–temperature, and cross-validation | Previous works lacked randomized faults, and detailed pre-processing | Achieved 98% accuracy, and reproducibility under diverse environmental conditions |
Collectively, these contributions advance the development of robust and scalable fault detection and diagnosis (FDD) frameworks, forming a solid foundation for future research on predictive fault diagnosis and real-time monitoring applications.
The structure of this study is organized as follows:
Section 2 describes the overall configuration of the PV system under investigation.
Section 3 provides an overview of the PV system and summarizes the key parameters used in the simulation study.
Section 4 defines and categorizes the different types of faults considered in this work.
Section 5 introduces the machine learning-based fault classification approach and briefly explains the employed algorithms.
Section 6 outlines the adopted methodology, including data generation, preprocessing, model training, and evaluation procedures.
Section 7 presents and discusses the obtained results, highlighting the comparative performance of the proposed models.
Section 8 provides the main conclusions drawn from this study.
Finally,
Section 9 discusses the study’s limitations and proposes directions for future research.
2. System Description
The overall PV system was developed in MATLAB/Simulink, where the main block represents a subsystem that encapsulates the entire 250 kW array. The system comprises four main string groups: the first three groups each contain a single series-connected string of seven SunPower SPR-415E-WHT-D modules, while the fourth group includes 85 parallel strings, each formed by seven series-connected modules. In total, the complete array contains 88 strings, yielding a rated capacity of 250 kW. This modular configuration facilitates precise fault injection and measurement at both the string and module levels, while maintaining a clear and well-organized top-level schematic. All four strings are connected in parallel to form the overall PV array. The total output voltage of the system is approximately 510.3 V (7 × Vmp), while the total output current is around 500.72 A, resulting in an estimated power output of 255.5 kW. This modular configuration supports the simulation of realistic fault conditions and enables the generation of labeled voltage and power data. The extracted data were subsequently utilized to train and evaluate fault detection models based on ANN. The electrical behavior of the PV module is represented by the widely adopted one-diode equivalent circuit, as illustrated in
Figure 1. The output of the PV array is connected to a three-level IGBT inverter, followed by a step-up transformer (TR1), which adjusts the voltage to match grid requirements. The system is then interfaced with the utility grid and monitored using a power analyzer (P&O) to observe real-time voltage, current, and power characteristics [
19,
20]. This complete setup enables the evaluation of system dynamics under various operating and fault conditions, thereby facilitating accurate data acquisition for fault classification. The current
delivered by the solar cell can accordingly be expressed as follows:
Figure 2 presents the overall architecture of the proposed PV fault monitoring and classification system. Sub-figure (a) illustrates the power conversion section, where the PV array is connected to the MPPT and controller that regulate the inverter and transformer before grid connection. Sub-figure (b) shows the data-driven fault detection framework, which includes data acquisition, preprocessing, fault detection, classification, and notification. Measured electrical and environmental parameters (current, voltage, power, irradiance, and temperature) are processed to identify abnormal operating conditions and trigger corrective actions when faults occur.
4. Fault Types Analysis and Definitions
PV systems are inherently susceptible to various fault conditions that can significantly reduce energy yield, degrade overall performance, and, in some cases, pose potential safety hazards if left undetected. These faults are generally categorized into three main groups:
Electrical faults, such as open-circuit and short-circuit conditions.
Environmental faults including partial shading, dust accumulation, snow coverage and bird droppings.
Connection related faults, such as string-to-string mismatches [
21,
22].
In addition to this functional classification, PV faults can also be grouped based on their temporal behavior into permanent, intermittent, and incipient faults. Intermittent faults are temporary in nature and are typically caused by environmental factors such as shading, contamination, high humidity, or the accumulation of leaves and snow. Permanent faults refer to irreversible damages, including open or short circuits, junction box failures, or interconnection damage. Incipient faults, which often precede permanent faults, result from gradual processes such as cell degradation, corrosion, or partial delamination within the PV module. Such early-stage faults can progressively deteriorate system performance, particularly under conditions of high temperature and humidity. This hierarchical classification is illustrated in
Figure 3, providing a comprehensive perspective on how different fault types evolve over time and influence overall system operation [
23].
Comparative Analysis of Normal and Fault Operating Conditions in PV System
In order to assess system performance under various fault situations, this study simulated and examined five key operational scenarios for a 250 kW grid-connected PV power plant. These comprise the standard working condition as a reference case, as well as four typical fault types: open-circuit, partial short-circuits, partial shading, and string-to-string faults. To capture its unique effects on the properties of total power, voltage, and current, each situation was meticulously modeled. In order to increase reliability and guarantee continuous energy generation with low losses, the simulations offer important insights into the dynamic behavior of PV systems under actual disturbances. The following sections show the performance differences between faulty and healthy situations by presenting and comparing the outcomes of all operational cases. This methodical assessment highlights the significance of precise and trustworthy fault detection techniques, evaluates the severity of each problem type, and comprehends how it affects overall system performance.
Normal Operation:
Figure 4 presents the measured total DC-side current (I System) and voltage (V System) under standard test conditions (irradiance = 1000 W/m
2 and temperature = 25 °C). Under fault-free conditions, all PV modules operate efficiently within their rated parameters. The system delivers a total current of approximately 500 A, a voltage of 511 V, and a total power output of about 255 kW, which collectively serve as the reference baseline for subsequent fault detection and comparative performance analysis. The corresponding waveforms obtained under normal operation are shown in
Figure 5, where sub-figure (a) illustrates the total current, (b) represents the total voltage, and (c) depicts the total power at the DC side. This figure represents a sample from the larger dataset generated under various irradiance and temperature scenarios.
Open-Circuit Fault: Occurs under the simulation conditions (irradiance = 1000 W/m
2 and temperature = 25 °C) when the electrical continuity within a photovoltaic module or string is interrupted due to connector detachment, solder joint degradation, or broken interconnections. Such interruption prevents the current from flowing through the affected branch, leading to an increase in voltage across the open section and causing a power imbalance among the interconnected strings. In String 1, an open-circuit fault is simulated by cutting the connection between the first and second modules. The current in String 1 (I
1) decreases to zero since there is no current flowing through the open path. As a result, the array current slightly drops to 494.3 A, the overall power output drops to roughly 252 kW, while the total voltage stays almost constant at 511 V. If the fault remains undetected, it may cause uneven stress distribution among modules and affect long-term reliability. Therefore, maintaining optimal system performance requires accurate defect detection. The PV system configuration under this fault is shown in
Figure 6, while the variations in total current, voltage, and power at the DC side are illustrated in
Figure 7. where sub-figure (a) illustrates the total current, (b) represents the total voltage, and (c) depicts the total power at the DC side.
Partial Short-Circuit Fault: A partial short-circuit fault arises when a low-resistance path forms between two nodes of a module or string, often due to insulation degradation or moisture intrusion. It results in localized heating and reverses current flow, which may accelerate cell damage or cause thermal runaway.
An ideal switch with three terminals is used to model this defect, which is introduced in String 1 between the fourth and fifth modules. A resistor of about 3.3 Ω is connected to terminal 1, and the opposite side of the resistor is grounded. In order to replicate the fault initiation, terminal 2 of the switch is placed between modules 4 and 5, and a step signal is applied to the gate terminal at 0.2 s. Because of internal heating and mismatch effects, a reverse current therefore flows in String 1, lowering the output power to 240 kW, the voltage to 505.9 V, and the overall string current to 475.6 A.
Figure 8 displays the PV system layout under this partial short-circuit problem and the variations in total current, voltage, and power at the DC side are presented in
Figure 9.
Partial Shading: Occurs when certain modules receive less irradiance due to obstacles such as trees, dust, or debris. The shaded modules produce lower current, causing power mismatch and multiple peaks in the P–V characteristic curve.
Here, a 30% reduction in irradiance is applied to String 4. As a result, total current drops to 355.6 A, with voltage slightly decreasing to 509.8 V. The power output is severely affected, decreasing to 180 kW, highlighting the impact of mismatch losses, as shown in
Figure 10.
String-to-String Fault: This fault is caused by insulation failure between adjacent strings, leading to unintended current exchange paths. It can result in severe imbalance, overheating, or in extreme cases, arcing between the strings.
This fault is simulated by an insulation failure resulting in an unintentional connection between string 1 and string 2 as shown in
Figure 11. The fault is modeled using an ideal switch controlled by a step signal, with a resistor included to reflect partial conductivity. As a result, abnormal current paths are introduced between the two strings. The total current under this condition reaches 497.9 A, the voltage remains stable at 509.7 V, and the power output is approximately 253 kW, as shown in
Figure 12.
The simulated fault locations and their distributions across the PV array are illustrated in
Figure 13.
As shown in
Figure 13, the simulated faults were distributed across multiple strings within the PV array. The open-circuit and partial short-circuit faults were applied to String 1, the partial shading condition affected the 85-string array, and the string-to-string fault was introduced between Strings 1 and 2. This configuration ensures adequate spatial variation and realistic fault representation. Future work will further extend this setup to randomized fault placements for broader generalization.
Before presenting the detailed modeling and algorithmic procedures, the overall research workflow is outlined in
Figure 14 to provide a clear overview of the study’s structure and methodological sequence.
6. Methodology
This study employs a structured methodology for the development and evaluation of a fault classification and detection system for PV arrays using ML models. The process includes simulation-based data generation, feature extraction and preprocessing, model development, performance evaluation, and comparative analysis among three ML algorithms MLNN, SVM, and RF.
The methodology begins with simulating a 250 kW grid-connected PV system in MATLAB/Simulink, composed of 4 PV strings connected in parallel. Faults were systematically injected into the simulation to reflect five distinct operational scenarios: normal operation, open-circuit fault, partial short-circuit fault, partial shading, and string-to-string fault. Each fault scenario was applied at various string and module locations within the array to ensure diversity and enhance model generalization.
6.1. Data Generation and Labeling
For each scenario, data were generated by applying variable irradiance and temperature profiles to emulate real-world conditions. Faults were injected at different string and module positions to ensure data diversity and robustness.
During each simulation run, a set of raw electrical signals was recorded, including
String currents (I1 to I6);
Total current (Itotal);
Station voltage (V_station);
Total output power (P_total);
Irradiance (Irr) and temperature (Tempr);
Voltage of string 1 (V_string1).
As shown in
Figure 17 illustrates the data collection and preparation workflow. The process started by defining five operating scenarios The generated signals (voltage, current, total power, irradiance, and temperature) were then processed to extract 12 features, which were organized into labeled datasets. In total, 1000 samples were used for training and 200 samples for testing, as summarized in
Table 3. The available dataset was divided into 1000 training samples and 200 testing samples, ensuring all five operating conditions were proportionally represented. During MLNN training, MATLAB automatically reserved approximately 15% of the training data for validation, used exclusively for monitoring network generalization and preventing overfitting. No manual data leakage occurred between the training, validation, and testing subsets.
6.2. Data Preprocessing and Model Training
Before model training, the dataset was preprocessed to ensure consistency and comparability among all algorithms. In practical PV systems, measurement signals are often affected by different types of noise environmental, sensor, and electrical as summarized in
Table 4. Identifying and reducing these noise sources is an essential preprocessing step to improve the reliability of the training data. The primary steps of the proposed method, including data preparation, model training, and performance comparison of the MLNN, RF, and SVM classifiers, are illustrated in the flowchart shown in
Figure 18. All input features, including voltage, current, power, irradiance, and temperature, were normalized using z-score normalization to eliminate scale differences and accelerate model convergence. The class labels representing the five operating conditions were encoded using the one-hot encoding method, which prevents artificial ordinal relationships between categories and allows the MLNN model to process categorical targets effectively through its softmax output layer. In contrast, the SVM and RF models used integer class labels internally. This preprocessing step ensured fair comparison and stable performance across all models. The input features used for all three algorithms (MLNN, RF, and SVM) are summarized in
Table 5.
6.2.1. MLNN Model
The MLNN consisted of an input layer, two hidden layers, and one output layer. The architecture of the MLNN is illustrated in
Figure 19, showing the layered structure and neuron configuration.
The number of neurons in each layer was optimized empirically to achieve high accuracy while maintaining computational efficiency. The network structure and training parameters used for the MLNN model are summarized in
Table 6.
6.2.2. SVM Model
The SVM classifier utilized a linear kernel to separate the fault classes in the feature space. Since the data were already normalized and linearly separable to a reasonable extent, the linear kernel provided a good balance between accuracy and complexity. The structure and training parameters used for the SVM model are summarized in
Table 7.
6.2.3. RF Model
The RF algorithm, based on ensemble learning, was implemented using 100 decision trees. Each tree was trained on a random subset of features and samples (bootstrap aggregation), and the final output was determined by majority voting among trees. The structure and training parameters used for the RF model are summarized in
Table 8.
9. Future Work and Limitation of the Study
The results of this study confirm that the proposed MLNN-based framework effectively identifies and classifies various types of PV faults using simulated data. However, the main limitation of this work lies in its reliance on simulation-generated datasets, which may not fully capture the stochastic nature of real-world operating conditions such as sensor noise, degradation, and environmental fluctuations.
Building upon these promising results, several directions for future research are proposed.
First, advanced deep learning methodologies, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) architectures, will be investigated to enhance feature extraction and temporal fault prediction accuracy.
Second, experimental validation will be conducted using real operational data from field-deployed PV systems to assess model robustness and generalization under realistic conditions.
Third, the integration of the proposed framework into IoT-based monitoring platforms will be explored to enable real-time fault detection and predictive maintenance. This step will transform the current simulation model into a practical, intelligent tool for autonomous PV system management.
Finally, the framework will be extended to address more complex fault scenarios, including grounding faults, bypass-diode failures, and arc faults. Additional simulation datasets will be generated to analyze their impacts on system performance and to further improve the diagnostic models’ generalization capability.