Fault Diagnosis in Solar Array I-V Curves Using Characteristic Simulation and Multi-Input Models

: Currently, fault identi ﬁ cation in most photovoltaic systems primarily relies on experienced engineers conducting on-site tests or interpreting data. However, due to limited human resources, it is challenging to meet the vast demands of the solar photovoltaic market. Therefore, we propose to identify fault types through the current–voltage curves of solar arrays, obtaining curves for various conditions (normal, aging faults, shading faults, degradation faults due to potential differences, short-circuit faults, hot-spot faults, and crack faults) as training data for the model. We employ a multi-input model architecture that combines convolutional neural networks with deep neural networks, allowing both the imagery and feature values of the current–voltage curves to be used as input data for fault identi ﬁ cation. This study demonstrates that by inpu tt ing the current– voltage curves, irradiance, and module speci ﬁ cations of solar string arrays into the trained model, faults can be identi ﬁ ed quickly using actual ﬁ eld data.


Introduction
In recent years, with the development of the green energy industry and the implementation of sustainable energy policies by various countries, the construction of solar power plants has been expanding continuously, and the prospects of the solar energy market have been growing.However, due to prolonged operation in extreme and harsh natural environments, various faults are inevitable, leading to a significant reduction in actual service life.When a module fails, the direct risks include damage to the module itself, which in turn reduces the power generation efficiency; the indirect risks may cause the entire solar photovoltaic (PV) power generation system to malfunction, adversely affecting the grid and resulting in severe economic losses.
Image-based solar fault diagnostic methods use image processing and analyze different images, such as visible light, infrared thermography [1], and electroluminescence images [2,3], to detect partial shading, hot spot faults, and crack faults in solar modules.These methods require the use of additional sensors to collect data and, thus, are timeconsuming and costly.
To address the maintenance issues brought about by the rapidly increasing demand for large-scale solar panel installations in the future, this study aims to utilize deep learning in the string characteristics of solar systems.It establishes a fault diagnostic system for solar energy systems, diagnosing the I-V curves of solar string and utilizing deep learning algorithms to develop the required models to reduce manpower resource costs, thus more efficiently maintaining the solar arrays and increasing the power generation efficiency and stability, reducing the potential hazards caused by faults.
The current-voltage (I-V) curve of a solar string is a graphical representation of the relationship between the current and voltage outputs under fixed irradiance and temperature conditions.From the curve, some main parameters can be found, including open-circuit voltage (Voc), short-circuit current (Isc), maximum power point voltage (Vmpp), maximum power point current (Impp), maximum power (Pmax), and fill factor (FF) [4][5][6].
Most current fault diagnostic methods based on the I-V curve rely on feature extraction, summarizing the curve as a set of parameters and then using statistical methods [7] or support vector machines [8] to evaluate the parameters for fault classification, or using the entire I-V curve; employing automatic feature extraction with principal component analysis [9] or analysis through a two-dimensional convolutional neural network (2D-CNN) [10], and long short-term memory (LSTM) networks [11].Unlike previous methods, this study differs by using a multiple-input model that simultaneously inputs the entire I-V curve and the feature values extracted from the curve into the model, allowing the model to obtain more information from the curve to aid in fault classification.Most literature indicates that it is not easy to obtain a large quantity of current-voltage curve data for faults, and this study addresses this issue through simulation.
Considering the difficulty of collecting a large quantity of fault current-voltage curves from actual field sites as training data for the model, this study simulates a large number of fault characteristic curves of solar system string using MATLAB Simulink version 10.7 (R2023a) as training data for the fault diagnostic model, including voltage-current curves, irradiance, specifications of solar module short-circuit current, specifications of solar module open-circuit voltage, and the number of modules in a solar string, etc.The simulated data are preprocessed before being inputted into the multiple-input model for training.This model can reduce the maintenance costs of large solar power plants, decrease personnel costs and hazards, and obtain fault information more promptly, quickly addressing faults to enhance the stability, reliability, and safety of solar power plant operations, providing higher economic benefits for the solar power plants of the future.
In this paper, MATLAB Simulink is used to model various states of solar modules in Section 2, with the simulated curves serving as training data for the model.Section 3 introduces the data preprocessing methods employed in this study, emphasizing the necessity of preprocessing the training data before inputting it into the model for training.In Section 4, a model is presented that is designed to simultaneously utilize curve features and curve images as input data, followed by an analysis of its performance on real field data.

Current-Voltage Curves
Among the mathematical models for solar modules, the single-diode model is the most widely used solar cell model [12].From the equivalent circuit, the current-voltage characteristic equation of the solar cell can be obtained.The Bishop Equation (1) [13] with reverse bias characteristics is as follows: where  is the output current of the solar cell,  is the terminal voltage of the solar cell,  is the charge of an electron, 1 is a curve-fitting constant,  is the Boltzmann constant,  is the cell temperature, s is the series resistance of the cell, sh is the parallel resistance of the cell,  and  are curve fitting coefficients, b is the reverse bias voltage of the cell,  is the diode reverse saturation current, and ph is the photocurrent of the solar cell.
The formulas for  (2) and ph (3) are as follows: where or is the saturation reverse current of the reference diode, r is the reference temperature, go is the bandgap energy, 2 is a curve-fitting constant,  is the reference photocurrent, i is the temperature coefficient, and  is the irradiance.Although a two-diode model is used for simulating in MATLAB Simulink, adjusting the solar cell model to a "5-parameter mode" corresponds to the single-diode model shown in Figure 1, where the second diode's saturation current is zero, and the parallel resistance is infinite.Only the short-circuit current sc, open-circuit voltage oc, irradiance r0 used for measurement, quality factor , and series resistance s need to be adjusted to simulate different I-V curve scenarios.Since the parallel connection of the modules in this model under the "5-parameter model" cannot be adjusted, subsequent curve simulations will be conducted by modeling the series-parallel connection between modules and strings.Through MATLAB Simulink, seven states of the string current-voltage curves of the solar system (normal, aging fault, shading fault, PID fault, short-circuit fault, hot spot fault, and crack fault) are simulated to serve as training data for the solar system fault classification model, with 5000 data points for each state of the solar system string currentvoltage curve, totaling 35,000 training data points.

Normal Condition
To simulate the current-voltage curves of normal string connections using solar modules of different specifications, the number of cells connected in series within each module ranges from 330 to 450, and each cell has an open-circuit voltage of 1.35 V.The resulting simulated current-voltage (I-V) curves exhibit open-circuit voltages ranging from 445.5 V to 607.5 V and short-circuit currents from 8 A to 12 A, thereby allowing the model to be applicable to different site specifications as shown in Figure 2. The irradiance is simulated from 600 W/m 2 to 1000 W/m 2 to mirror actual field conditions as shown in Figure 3.The remaining fault conditions will also be simulated following this method.

Aging Faults
Aging faults in solar modules refer to the gradual decline in performance and efficiency over time, primarily due to prolonged exposure to sunlight, temperature variations, humidity, and environmental pollution.As the modules age, their energy conversion capacity weakens, leading to reduced output power.Aging faults may also result in increased internal resistance, higher leakage currents, and changes in transient and temperature characteristics [15].
Aging faults can be classified into two types: one involving the change of the  parameter in the solar cell model to age all cells; the other involves connecting two solar cell modules in series, where one module has an additional resistor in series with some cells to simulate partial cell aging, this allows for the simulation of string's current-voltage curves with different aging faults.These two types of aging faults exhibit distinct characteristics on the current-voltage (I-V) curves.
Each type of aging fault is simulated with 2500 instances.Total cell aging is simulated through a single solar module, as shown in Figure 4, where the parameter  is adjusted to achieve different levels of overall aging.The number of cells in series ranges from 330 to 450, with series resistance  adjusted from 60 mΩ to 80 mΩ.Partial cell aging divides the cell string into two groups, as shown in Figure 5, one group being normal and the other experiencing aging faults with an additional resistor.The percentage of aged cells is between 18% and 33% of the total, and the resistance connected in series is adjusted from 40 Ω to 60 Ω to simulate varying degrees of partial cell aging.On the I-V curve, the characteristic of total cell aging shows the maximum power point moving slightly towards the origin, while the characteristic of partial cell aging shows a change in slope near the opencircuit voltage, as illustrated in Figure 6.

Shading Faults
Shading faults in solar modules occur when shadows cast on the surface of the solar panels affect part of their surface, preventing the shaded areas from effectively converting sunlight into electrical energy.This could be due to buildings, trees, or other obstacles obstructing sunlight at certain times or seasons.Shading faults reduce the overall performance of the solar panels, thereby decreasing the output power [16].
The simulation of shading faults is conducted by connecting three solar cell modules and providing different levels of irradiance to each module as shown in Figure 7.The irradiance for the shaded parts ranges from 100 W/m 2 to 500 W/m 2 .The simulation also varies the shaded area coverage, with small portions covering 5% to 10% and larger areas covering up to 50%, to achieve different degrees of shading fault simulation as shown in Figure 8.

PID (Potential Induced Degradation) Faults
In solar systems using transformerless inverters, there is no electrical isolation, and the negative pole of the solar module string does not need to be grounded.In grid-connected solar systems, solar modules are typically connected in series to form high-voltage outputs, and the module frames are grounded for safety reasons.This results in floating potentials; half of the modules are under positive bias and half under negative bias.The potential difference causes leakage currents to flow from the module frames to the solar cells.Under external voltage, sodium ions in the glass migrate and accumulate on or enter the surface of the cells, causing shunting and reducing their efficiency.PID faults occur after long-term operation of the module [17].
When a solar module exhibits PID faults, characteristics such as decreased sh, increased , and reduced open-circuit voltage () appear.The current-voltage (I-V) curve will show characteristics as depicted in Figure 9.By paralleling a resistor and a diode with the solar cell module and adjusting the parallel resistance, internal , and  parameters of the solar cell module as shown in Figure 10, the I-V curve of a PID fault is simulated as shown in Figure 11.

Short-Circuit Faults
Short-circuit faults in solar modules occur when the current in the solar panel forms a low-resistance path on an unexpected route, causing the current to flow directly without passing through the load.This can be due to faults in the connecting wires or circuit components within the solar panel or due to external factors such as damage, overheating, or physical damage causing contact between two or more electrodes.Short-circuit faults lead to the solar panel's current losing its normal path, potentially causing system overheating, damage to the battery components, or even severe consequences like fires.
As shown in Figure 12, two solar cell modules are connected in series, each paralleled with a diode, and one module is short-circuited to simulate varying degrees of short-circuit faults as shown in Figure 13.

Hot Spot Faults
Hot spot faults in solar modules primarily arise from mismatched power output among some solar cells within the module.Cells that are mismatched can become reversebiased, transforming from power-generating elements to load-bearing ones, consuming energy and generating significant heat.When a solar module exhibits a hot spot fault [19], its current-voltage (I-V) curve will display characteristics that vary depending on the distribution of the hot spots.Cells affected by hot spot faults exhibit lower short-circuit currents than normal cells, leading to power mismatches among them.
As shown in Figure 14, among all fault types, only this fault employs module parallel connection.To simulate the distribution of hot spots in half-cut cells, the solar cell string is divided into three modules.Two of them are connected in series, with one being a normal module and the other experiencing hot spot faults.This string of solar cell modules is then paralleled with another set of normal solar cell modules; a resistor is paralleled on a solar module experiencing a hot spot fault.By changing its internal parameters sc and the value of the parallel resistor, a distorted curve hot spot is simulated as shown in Figure 15.As shown in Figure 16, three solar cell modules are connected in series; the top module is normal, while the two below exhibit varying degrees of hot spot faults with different resistor values paralleled, simulating a double-step curve hot spot as shown in Figure 17.

Crack Faults
Solar modules may develop cell cracks during transportation from the factory to the installation site, during installation, and subsequently when exposed to repetitive weather conditions such as strong winds.Cracks in solar modules can lead to loss of connection between cells, decreased output power, insulation failure, non-compliance with safety standards, and potential safety hazards such as leakage currents.
When a solar cell module exhibits a crack fault [20,21], the current-voltage (I-V) curve will display stair-step features similar to those seen in shading faults but with the distinctive feature of the steps having a convex function characteristic.As shown in Figure 18, by connecting two solar cell modules in series, each paralleled with a diode, providing different levels of irradiance to produce stair-step curves, and adjusting the internal parameter  of the module with the crack fault to make the steps show convex function characteristics such as shown in Figure 19, crack faults are simulated.

Data Preprocessing
The current-voltage (I-V) curves generated through MATLAB Simulink must undergo data preprocessing to be converted into a suitable input format for the multi-input model prior to model training.

Feature Values
From the I-V curves, parameters such as open-circuit voltage (Voc), short-circuit current (Isc), maximum power point voltage (Vmpp), and maximum power point current (Impp) are calculated to obtain the required feature values for the model.Additional features like irradiance, module open-circuit voltage, short-circuit current, and the number of modules in a solar string are also used as model features.A total of 10 feature values used in the model are shown in Table 1.By using the specifications of the modules, such as opencircuit voltage and short-circuit current, along with the number of modules, the opencircuit voltage (Voc′) and short-circuit current (Isc′) of the string under normal conditions (1000 W/m 2 irradiance without any faults) can be calculated.Feature 7 in Table 1

Curve Rescaling
Most solar array fault diagnostic systems are designed for specific solar field settings; thus, the simulated I-V curves closely match the measured data from these sites.While models trained on these curves perform well in these specific settings, their accuracy might decline when applied to solar modules of different specifications at other sites.The purpose of curve rescaling is to address the aforementioned issue by rescaling all curves to fixed specifications of open-circuit voltage and short-circuit current before inputting them into the model.This allows the model to be applicable across different specifications of solar modules in various field scenarios.
To address the differences in module specifications across various solar field scenarios, curve rescaling was used to improve the accuracy and versatility of the model, as shown in Figure 20.This study anticipates that this model can be applied universally in solar array fields where the open-circuit voltage is below 800 V and the short-circuit current is below 8 A. Therefore, this study first calculates feature values from the original curves, then rescales the curves using a current scaling factor (If) of 8 and a voltage scaling factor (Vf) of 800, standardizing them to a short-circuit current of 8 A and an open-circuit voltage of 800 V before converting them into image inputs for training in the multi-input model.The formulas for current rescaling (4) and voltage rescaling (5) are as follows where current rescaling uses the maximum current from the original data as the denominator to scale any I-V curve to the target current size, and voltage rescaling uses the module's specification open-circuit voltage as the denominator to scale the I-V curve to the specified open-circuit voltage while retaining fault characteristics in the curve.
where Iscaled is the scaled current, original is the original current,  is the current scaling factor, max is the maximum current from the original data, scaled is the scaled voltage, original is the original voltage, f is the voltage scaling factor, and OC′ is the module specification opencircuit voltage.

Adding Gaussian Noise to Curves
The current-voltage curves measured in actual field conditions may exhibit slight noise due to various external factors.Therefore, to ensure the training data obtained through simulation more closely resemble the actual measured data, Gaussian noise is added to the simulated curves, as shown in Figure 21, making the simulated curves more similar to the curves obtained in actual field measurements.The curves with added Gaussian noise are then converted into images to serve as input data for the CNN part of the multi-input model, enhancing the model's accuracy on actual field data.After multiple tests, the study initially generates random noise with a Gaussian distribution having a mean of 0 and a standard deviation of 0.03; the noise generated under this parameter not only preserves the original features of the curve but also simulates a certain degree of noise, adds this noise to the original current data to simulate real-world environmental noise.As for the actual field data, Gaussian noise is not necessary.It can be directly input into the trained model for fault diagnosis.

Solar Array Fault Classification Model
Convolutional Neural Networks (CNN) are a type of deep learning model primarily used for tasks involving data with a grid structure, such as image recognition and processing.They extract features from input data through convolutional and pooling layers and perform tasks like classification or regression through fully connected layers.CNNs are known for their ability to handle high-dimensional image data and generalize well across different settings.Therefore, in this study, CNNs are employed to extract features from the current-voltage (I-V) curve images of solar strings.
Deep Neural Networks (DNN) are a type of artificial neural network (ANN) with a deep architecture consisting of multiple hidden layers.DNNs learn and analyze input data in-depth, typically used for processing both structured and unstructured data.Through the backpropagation algorithm, DNNs can automatically learn and extract highlevel features from input data, facilitating various tasks such as classification, regression, and generation.Since a 2D-CNN alone cannot simultaneously input images and feature values derived from the curves, a DNN is chosen to process the computed feature values.
Multi-input models differ from standard CNN models in that they can handle various types of data inputs, including numeric data, images, and categorical data.These inputs are processed by the model to learn and then output predictions.To enable the deep learning model to simultaneously input images and feature values for training, the multiinput model integrates CNNs with DNNs.The architecture of the model, as shown in Figure 22, the activation function used in this architecture is ReLU for all layers except the output layer, which uses Softmax; the current-voltage curve image serves as the input for the CNN (Input1), which undergoes three convolutional layers and three pooling layers to extract features from the curve image.The convolutional layers employ 32, 64, and 128 3 × 3 filters, respectively, to extract features.Max pooling with a 2 × 2 kernel size is used to reduce the number of parameters, prevent overfitting, and retain important features.Subsequently, the flattened layer transforms the multidimensional input into one dimension, followed by a fully connected layer with 128 neurons.Dropout layers discard 20% of the neurons, and the output is passed through another fully connected layer with eight neurons.
The 10 features mentioned in Section 3.1 are used as the input for the DNN (Input2), passing through a fully connected layer with 32 neurons.Dropout layers discard 20% of the neurons, followed by another fully connected layer with eight neurons.The outputs of these two components are combined into 16 neurons through a concatenation layer and then processed through a fully connected layer with 16 neurons.Finally, the output layer utilizes a Softmax activation function to generate predictions for seven categories.The overall structure of the study, as depicted in Figure 23, shows how the required data, after data preprocessing, is input into the model, which then outputs the status of the solar strings.

Model Training
The 35,000 preprocessed data points were split into 80% for the training set and 20% for the testing set. Figure 24 shows the accuracy and loss curves after 300 iterations, with the model achieving a final training accuracy of 99.9%.It is observed that before 15 iterations, the accuracy and loss change rapidly, and the model converges quickly.After surpassing 15 iterations, the accuracy and loss gradually stabilize.Based on the results in Figure 24, the optimal number of epochs is set to 100.To visually assess the classification results of the trained model, a confusion matrix is used to display the relationship between predicted results and actual labels.The results for the test set are shown in Figure 25, where the numbers indicate the count of data points for each predicted label, with the diagonal representing correct predictions for each category, including 'no' for normal, 'ag' for aging faults, 'ps' for shading faults, 'PID' for PID faults, 'sc' for short-circuit faults, 'hs' for hot spot faults, and 'ck' for crack faults.The model predicted all test data correctly except for one instance where a hot spot fault was misclassified as normal.These results demonstrate the high recognition accuracy of the method.

Model Validation
In a real solar field, current-voltage curves were obtained from a string of 11 solar modules connected through an inverter, with the modules having an open-circuit voltage of 40.38 V and a short-circuit current of 10.85 A. The real solar field data comprised 23 entries, including 6 with minor crack faults, 14 with minor hot spot faults, 2 with smallarea shading faults, and 1 with a large-area shading fault, all collected under an irradiance of 1000 W/m 2 ; the three types of faults are shown in Figure 26.After preprocessing, the real field data were input into the trained fault classification model.The results are displayed in Figure 27.The model correctly identified all shading and hot spot faults.However, it correctly identified only one of the crack faults, misclassifying the remaining five as hot spot faults, resulting in a model prediction accuracy of 78%.The misclassification can be attributed to the subtle differences between minor hot spot and crack faults in the current-voltage curves in real-world conditions, which are not distinctly different, leading to errors in model predictions.

Conclusions
This study's solar array fault diagnostic system utilizes a multi-input model architecture combining Convolutional Neural Networks (CNN) and fully connected layers, enabling simultaneous training with images and feature values.Through MATLAB Simulink, normal and six types of fault current-voltage (I-V) curves for solar strings were simulated.From the original simulated data, ten feature values were calculated, and Gaussian noise was added before converting them into curve images for training the multi-input model.
In the validation of actual field data, the model correctly identified shading and hot spot faults; however, it misclassified crack faults as hot spot faults.Future improvements might include calculating the slope at the steps in the I-V curves of hot spot and crack faults as a new feature, enhancing the model's accuracy in distinguishing these faults.This study only requires obtaining the current-voltage curves of solar strings to quickly identify faults.Although hot spot and crack faults may be misclassified, shading faults can be correctly identified.Hot spot and shading faults exhibit similar step features, yet this model can distinguish the differences between them.Additionally, the study currently includes only shading, hot spot, and crack faults for field validation.To ensure the model's accuracy in real-world applications, it is hoped that future work will incorporate actual field data from other types of faults to validate the model, training it with new fault categories.

Figure 2 .
Figure 2. Simulation circuit diagram of the normal current-voltage curve.

Figure 4 .
Figure 4. Aging current-voltage curve simulation circuit diagram with  parameter change.

Figure 5 .
Figure 5. Aging current-voltage curve simulation circuit diagram with series resistor.

Figure 14 .
Figure 14.Distorted curve hot spot fault current-voltage curve simulation circuit diagram.

Figure 16 .
Figure 16.Double-step curve hot spot fault current-voltage curve simulation circuit diagram.
calculates the difference between the current open-circuit voltage (Voc) and the open-circuit voltage under normal conditions (Voc′), aiding the model in identifying faults that would cause a decrease in the open-circuit voltage.

Figure 23 .
Figure 23.Architecture of the Solar Array Fault Classification Model.

Figure 25 .
Figure 25.Confusion Matrix for the Model Test Set.

Figure 27 .
Figure 27.Confusion Matrix of Actual Field Measurement Data.

Table 1 .
List of Feature Values.