Next Article in Journal
Experimental Study of Impingement-Film Compound Cooling in the Leading Region of a Turbine Vane
Previous Article in Journal
Power Quality Disturbances and Operating Regimes as Determinants of Reliability and Technical Condition of Industrial Electrical Equipment: A Comprehensive Review
Previous Article in Special Issue
Applications of AI for the Optimal Operations of Power Systems Under Extreme Weather Events: A Task-Driven and Methodological Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predictive Maintenance in PV Systems: A Copula-Based Approach with Digital Twin Technique

1
College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
2
Hainan Institute of Zhejiang University, Sanya 572025, China
3
Department of Electrical Engineering, The Hong Kong Polytechnic University, Hong Kong, China
*
Author to whom correspondence should be addressed.
Energies 2026, 19(11), 2686; https://doi.org/10.3390/en19112686
Submission received: 8 April 2026 / Revised: 18 May 2026 / Accepted: 26 May 2026 / Published: 2 June 2026

Abstract

Currently, solar photovoltaic (PV) systems are a priority for end-use decarbonization, aimed at reducing reliance on fossil fuels. However, PV systems are typically exposed to outdoor conditions, making them more susceptible to aging and damage. In this paper, a predictive maintenance approach that integrates digital twin technology with the copula-based model is proposed. This integration enables accurate simulation of the PV system’s condition and precise representation of the correlation between the power output of the digital twin and that of the actual system. Given the power output of the digital twin, predictive maintenance is performed based on the conditional cumulative distribution function (CDF) of the actual power output, which is derived from the copula model. A comprehensive case study was conducted to evaluate the performance of the proposed approach named OCAD (Optimal Copula-based Anomaly Detector), which achieved an accuracy of 92.51% and an F1-score of 92.13%. This significantly outperforms conventional models, including SVM, KNN, and ANN, demonstrating the effectiveness of the proposed predictive maintenance strategy.

1. Introduction

The growing demand for power supply has led to significant consumption of fossil energy resources in recent decades, resulting in harmful effects on ecosystems and climates. With the maturation of Photovoltaic (PV) manufacturing technology, such as monocrystalline silicon, the price per watt (PPW) of solar power systems has decreased, fostering the development of distributed PV power plants. Solar energy has emerged as a viable alternative and has been widely integrated into power systems [1]. According to the “Renewable energy statistics 2025” from the International Renewable Energy Agency (IRENA), global solar PV capacity increased by 452 GW in 2024, raising the total cumulative installed PV capacity to 1859 GW. However, since PV systems are exposed to outdoor conditions, they are particularly susceptible to aging and damage. Therefore, effective and precise predictive maintenance methodologies have become crucial in advancing solar PV development.
Predictive maintenance of PV systems has garnered considerable research attention to date [2]. Various anomaly detection methods have been proposed to ensure the stable and reliable operation of PV systems. Current research in this field can be broadly categorized into three main approaches: model-based, data-driven [3], and image-based methods.
Model-based methods can be further divided into two types: physics-based models and empirical models. While model-based methods offer precise analysis of system characteristics, they may not accurately capture system behaviors under all conditions [4,5,6]. Data-driven methods, on the other hand, rely on machine learning algorithms such as artificial neural networks (ANN), random forests (RF), and support vector machines (SVM) to analyze the measured current-voltage (I-V) curves of a PV array. These methods do not require detailed knowledge of system parameters and can learn directly from the data [7,8,9,10,11,12]. However, they demand a large amount of training data, which may not be available in every situation.
The image-based approach has gained traction in recent years with the advancement of computer vision (CV) technology. This method typically employs drones and infrared thermal imaging to swiftly detect faults, such as hotspots [13,14]. Despite its effectiveness, the image-based method has limitations: it is generally less effective for non-thermal faults and entails high deployment costs, making it less suitable for distributed PV power stations.
The concept of the Digital Twin (DT) was initially introduced in the aerospace domain and has since been widely adopted across various industrial sectors [15,16]. A digital twin is a virtual replica designed to accurately represent a physical entity, effectively mirroring its internal structure and behavior in response to the external environment. Fan and Li [17] integrated digital twin modeling of renewable energy sources with a cloud-fog computing architecture to enable real-time monitoring and optimal energy management in power grids. Yao et al. [18] developed a digital twin platform for industrial energy systems incorporating photovoltaic generation to facilitate data-driven solar load forecasting and adaptive resource allocation. Yang et al. [19] proposed a real-time machine learning-driven digital twin framework for floating photovoltaic systems, achieving an overall coefficient of determination of 0.990 in predicting dynamic responses under diverse irradiance and wave conditions. However, relatively limited research has been devoted to the application of digital twin technology for fault diagnosis in photovoltaic systems [20]. To address this gap, we apply the DT concept to develop a comprehensive predictive maintenance framework. Furthermore, we employ a copula model to characterize the dependence structure between the actual power output and the digital twin’s power output, with particular attention to the tail dependence inherent in PV power output distributions.
This study presents the design, development, and experimental validation of a predictive maintenance strategy based on digital twin technology and the copula model, named OCAD (Optimal Copula-based Anomaly Detector). The proposed approach constructs a digital twin model, which is used to detect anomalies through an optimal copula model of the digital twin’s power output and the measured power output. The main contributions of this experimental study are summarized as follows:
1.
A digital twin model is established to simulate the status of the PV system. The digital twin model is based on fundamental physical models constructed with internal electrical and thermal parameters, and its parameters are dynamically adjusted in real-time according to external environmental inputs.
2.
The copula model is applied to represent the correlation between the digital twin’s power output and the measured output. In light of the characteristics of the PV output, several copula functions are experimentally compared. The results indicate that the Student’s t copula is most appropriate for modeling the dependence structure of PV power output.
3.
A predictive maintenance approach for PV systems based on the digital twin and Student’s t copula is proposed. Given the power output of the digital twin model, the conditional cumulative distribution function (CDF) of the actual power output is obtained using the copula model. Anomalies in the measured values are detected based on the confidence interval (CI). Experimental results demonstrate the effectiveness of the proposed approach in identifying potential faults in PV systems.
The rest of this paper is organized as follows. In Section 2, common faults in PV systems are summarized. In Section 3, predictive maintenance strategies, especially anomaly detection of PV modules based on copula, are described. In Section 4, a case study is carried out to verify the feasibility and accuracy of the proposed predictive maintenance method. Finally, conclusions are drawn in Section 5.

2. Fault Mechanisms

Predictive maintenance for photovoltaic (PV) systems focuses primarily on identifying anomalies that lead to a decrease in the system’s energy production compared to anticipated levels. The energy output from PV modules fluctuates due to environmental factors, such as solar irradiance and temperature, which makes identifying low-current faults challenging [3]. The immediate identification and rectification of such faults are crucial to maintaining the system’s operational continuity and financial viability. Consequently, a thorough analysis of these failure conditions is essential for executing efficient maintenance strategies.

2.1. Physical Faults

In PV systems, physical faults predominantly arise from direct physical damage to hardware or the loss of functionality, often triggered by mechanical impacts or thermal variations. These disruptions significantly compromise system performance. A notable manifestation of such faults is micro cracks in the silicon solar cells of PV modules, induced by mechanical stress or pressure. These cracks, while imperceptible to the human eye, are detectable only through specialized tests and constitute a principal cause of PV module malfunction or failure. Additionally, hot spots on PV panels exemplify another prevalent physical fault, instigating localized thermal elevation within the modules, detrimentally affecting their efficiency and acceleration of degradation, culminating in damage. Similarly, issues like delamination or debonding of modules, where separation or adhesive failure occurs, facilitate moisture ingress into the module’s protective layers, severely hastening the degradation process and leading to eventual failure.

2.2. Electrical Faults

Electrical faults are a prevalent issue within PV systems, covering a spectrum of issues from line-to-line faults—due to inaccuracies in wiring or inadequate contacts between components—to line-to-ground faults, which arise from unintended connections to the ground, leading to potential current leakage and safety concerns. Furthermore, the system may encounter open circuit faults, signaling a disruption in the electrical circuit that compromises the system’s output, or short circuit faults, where an inadvertent direct connection between terminals leads to excessive current flow, posing risks of equipment damage or safety hazards. The timely identification and rectification of these electrical faults are crucial for maintaining the operational efficiency, safety, and longevity of PV systems.

2.3. Environmental Faults

Environmental faults in PV systems are typically temporary and include conditions such as partial shading [21], snow coverage, bird droppings, or dust accumulation on the surface of the PV panels [22]. While these faults may not cause immediate damage to the PV components, a lack of timely maintenance can lead to more severe harm. For instance, prolonged partial shading of the solar module or inadequate connections between cells can result in energy dissipation in the form of heat rather than electricity, further leading to the development of hot spots.
Among the various types of faults, environmental faults pose the most significant challenge for detection within PV systems. This difficulty arises because the system’s performance is influenced by the complex and dynamic nature of environmental conditions. Consequently, identifying environmental faults requires systematic collection of environmental data, followed by detailed comparison and analysis with the PV system’s outputs.

3. Predictive Maintenance Based on OCAD

The proposed predictive maintenance methodology, OCAD, effectively detects potential faults in PV systems, such as hot spots, wiring issues, and degradation. The complete flowchart of the predictive maintenance process is depicted in Figure 1. By incorporating weather data inputs, including irradiance and temperature, the digital twin model generates a sequence of estimated power outputs. Using both measured and estimated power outputs, the copula model calculates the probability of faults occurring. Probabilities that exceed the CI are classified as abnormal values.

3.1. Digital Twin of PV System

To construct the digital twin model of a PV system, it is crucial to simulate the DC output of the PV cells. The equivalent circuit equation is widely used to calculate the power output of the PV cell. According to relevant research [23], the single-diode model is more suitable for high fluctuations of irradiance than the double-diode model. Moreover, the single-diode model’s accuracy generally meets the requirements, and its lower computational complexity allows for faster computation under the same hardware conditions. Therefore, the classic five-parameter single-diode model was adopted to simulate a typical PV cell, as shown in Figure 2. According to the single-diode equivalent circuit equation, the output current of a single PV cell can be expressed as follows:
I = I L I D I s h
where I L denotes the light-generated current, I D denotes the voltage-dependent recombination current, and I s h represents the current lost through the shunt resistance.
Specifically, the diode current I D is given by
I D = I 0 exp V + I R s n V T 1
where I 0 is the reverse saturation current, n denotes the diode ideality factor, and V T represents the thermal voltage.
The current lost due to shunt resistances I s h is expressed as follows:
I s h = V + I R s R s h
Substituting the above expressions into Equation (1) yields the following:
I = I L I 0 exp V + I R s n V T 1 V + I R s R s h
The light-generated current I L is determined by the following expression [24]:
I L = I L , r e f + K t T c T r e f G G r e f
where I L , r e f denotes the light-generated current at the reference condition, K t denotes the temperature coefficient, T c denotes the cell temperature, T r e f represents the reference temperature (298.15 K), G represents the irradiance incident on the PV cell, and G r e f is the reference irradiance, typically set to 1000 W/m2.
Accordingly, the complete formulation of the classical five-parameter single-diode equivalent circuit model can be expressed as follows:
I = I L , r e f + K t ( T c T r e f ) G G r e f I 0 exp V + I R s n V T 1 V + I R s R s h
The reverse saturation current of the diode, I 0 , is determined by the following expression:
I 0 = I 0 , r e f T c T c , r e f 3 exp q E g n k 1 T r e f 1 T c
where I 0 , r e f denotes the saturation current at the reference temperature, T c , r e f is the reference cell temperature, q is the elementary charge, E g represents the bandgap energy of the semiconductor material, and k is the Boltzmann constant.
As can be seen in Equations (6) and (7), the power output of the PV cells is highly related to the solar irradiance and temperature. Thus, to construct the digital copy of the PV system, we need the real-time data of global horizontal irradiance (GHI) and PV module temperature [25].
Assume that all cells are connected in series, are under uniform and equal irradiance, and work in the same status, the current and voltage of a PV module are as follows:
I m = I c
V m = N × V c
where I m and V m represent the current and voltage of the PV module, I c and V c represent the current and voltage of a PV cell. N is the number of cells in a PV module. Given the in-plane irradiance and module temperature, as well as the above equations, we can derive the maximum power point of the array via P m = I m × V m . Where P m is the actual power output of the PV module. Given the power output of each PV module, the DC power output of the PV modules is combined to the convert. After that, the AC power of the PV system is connected to the grid. The typical digital twin model of the PV system is shown in Figure 3. Note that while the figure illustrates the complete PV system architecture including the inverter and AC-side components, the proposed detection method operates exclusively on the DC side, as indicated by the dashed detection boundary in the figure. The inverter model is therefore not included in the mathematical formulation.

3.2. Copula-Based Dependency Modeling

In the digital twin model, implementing electrical fault detection is typically easier than detecting faults in PV modules. For PV panels, temporary shading caused by cloud cover, tree shadows, and building shadows can reduce PV output. Additionally, bird droppings, dust, and degradation can lead to long-term reductions in the PV output power. Distinguishing between power loss caused by transient weather factors and that caused by permanent factors remains a persistent challenge.
To improve the detection of environmental faults, it is essential to collect environmental data as a reference, particularly local real-time radiation intensity and temperature. Given the irradiance and temperature data, the expected power output of the PV module can be calculated using the physical model discussed in Section 3.1.
Unlike the simple and stable circuit analysis, the transition process from irradiance to current in PV modules is affected by many factors, leading to a fluctuating output curve. The comparison between the expected and actual output curves shows significant and inconsistent fluctuations. Correlation analysis has traditionally been employed to compare these sequences. However, correlation-based methods typically require a sufficiently large sliding window of historical data to produce reliable estimates, which may reduce sensitivity to abrupt short-duration faults. In contrast, the proposed Copula-based approach enables fault assessment at each individual data point through the conditional CDF, with the 5-min detection window serving only as a post-classification filter to suppress transient false alarms.
Inspired by research on financial risk assessment, we employ the copula approach to identify faults in PV modules. The copula is a statistical tool widely used in finance to model the dependence structure between random variables. In other words, a copula is a mathematical construct that represents the joint probability density function (PDF) of multiple variables [26]. Compared to other statistical techniques like Gaussian processes (GPs) and Bayesian networks (BNs), the copula model offers distinct advantages, including its flexibility in modeling non-linear dependencies and ability to handle multi-dimensional relationships with diverse marginal distributions [27,28,29].
Consider a continuous random vector ( X 1 , X 2 , , X N ) . The cumulative distribution function (CDF) of marginals can be expressed as follows:
F n x n = P X n x n , 1 n N
where the marginal distributions can be transformed into a uniform distribution over the interval [ 0 , 1 ] using the probability integral transform, as per:
U 1 , U 2 , , U N = F 1 X 1 , F 2 X 2 , , F N X N
The copula function C is defined as the joint CDF of F :
C u 1 , u n , , u N = P U 1 u 1 , U n u n , , U N u N
where for 1 n N , 0 u n 1 . Note that for 1 n N , 0 u n 1 ,
C u n = C 1 , , 1 , u n , 1 , , 1 = u n
Selecting the appropriate copula function is crucial when constructing a copula model for multiple variables, as the copula family includes various function types, each suited to different data distributions. Therefore, the initial step is to select the best copula function for the PV system. For continuous variables, commonly used copula functions include the Frank copula, Clayton copula, Gumbel copula, Gaussian copula, and t-Copula.
The Gaussian copula assumes that the dependency structure between variables is multivariate normal. Gaussian copulas have light tails, meaning they may not capture extreme events well. In solar energy management, this means a Gaussian copula might underestimate the likelihood of simultaneous extreme events. The Gaussian copula with matrix R [ 1 , 1 ] d × d can be written as follows:
C R Gauss ( u ) = Φ R Φ 1 ( u 1 ) , , Φ 1 ( u d )
where Φ 1 is the inverse cumulative distribution function of a standard normal and Φ R is the joint cumulative distribution function of a multivariate normal distribution.
The t-copula assumes that the dependency structure between the variables follows a multivariate Student’s t-distribution. Student’s t-distribution has the probability density function (PDF) given by
f ( t ) = Γ ν + 1 2 ν π Γ ν 2 1 + t 2 ν ( ν + 1 ) / 2
where ν is the number of degrees of freedom and Γ is the gamma function. For ν = 1 the Student’s t distribution becomes the standard Cauchy distribution, whereas for ν it becomes the standard normal distribution N ( 0 , 1 ) . Thus, the t-copulas have heavier tails compared to Gaussian copulas, which makes them more suitable for modeling events in the tails of the distribution.
The bivariate t-copula is then defined as follows:
C ρ , ν t ( u 1 , u 2 ) = t ρ , ν t ν 1 ( u 1 ) , t ν 1 ( u 2 )
where t ν 1 denotes the inverse CDF (quantile function) of the univariate Student’s t distribution with ν degrees of freedom, and t ρ , ν denotes the joint CDF of the bivariate Student’s t distribution with correlation parameter ρ and ν degrees of freedom. The corresponding probability density function of the bivariate t-copula is given by
c ρ , ν t ( u 1 , u 2 ) = 1 1 ρ 2 · Γ ν + 2 2 Γ ν 2 Γ ν + 1 2 2 · 1 + ξ 1 2 ν ν + 1 2 1 + ξ 2 2 ν ν + 1 2 1 + ξ 1 2 + ξ 2 2 2 ρ ξ 1 ξ 2 ν ( 1 ρ 2 ) ν + 2 2
where ξ i = t ν 1 ( u i ) , i = 1 , 2 .
Frank copula, Clayton copula, and Gumbel copula are Archimedean copulas, generated by the generator function. An Archimedean copula is expressed as follows:
C u 1 , , u N ; θ = ψ [ 1 ] ψ u 1 ; θ + + ψ u N ; θ ; θ
where the generator function ψ : [ 0 , 1 ] × Θ [ 0 , ) is a continuous, strictly decreasing and convex function [30]. The parameter θ plays a pivotal role in shaping correlation and dependence structure. Different values of θ yield varying degrees of dependence and distinct geometric configurations. Smaller values of θ typically indicate weak correlations or outright independence, while larger values of θ signify stronger dependence. And the pseudo-inverse of generator ψ is defined by
ψ [ 1 ] ( t ; θ ) = ψ 1 ( t ; θ ) if 0 t ψ ( 0 ; θ ) 0     if   ψ ( 0 ; θ ) t
The Archimedean copulas, generator functions, and Kendall functions in this paper are presented in Table 1.
One of the properties of Archimedean copulas is that an Archimedean copula can be uniquely determined by the Kendall distribution function
K ψ ( t ) = t ψ ( t ) ψ ( t ) ψ ( t ) ψ ( t ) , for 0 < t < 1
The Kendall distribution function of a bivariate Archimedean copula represents the distribution of C ( u ,   v ) . However, in reality, the data distribution we encounter does not fully conform to the Archimedean copula distribution. Therefore, to find the optimal Archimedean copula function, we need an empirical estimate of the copula function and to compare with various Archimedean copula functions. The empirical estimate of copula, K n ( t ) , from a random sample of size n is given by
K n ( t ) = # T i t n + 1
where pseudo-observations T i are given by
T i = H n X i , Y i = j = 1 n I X j X i , Y j Y i n + 1 , i = 1 , 2 , n
To conduct the analysis, we select a representative PV output curve, as illustrated in Figure 4. A visual inspection suggests that the PV power output does not conform to a normal distribution. To formally verify this observation, a Shapiro-Wilk test is performed on the PV DC power output, yielding a test statistic of W = 0.8656 ( p < 0.001 ), which strongly rejects the null hypothesis of normality. This result confirms the non-Gaussian nature of PV output and thereby justifies the exclusion of the Gaussian copula as a candidate dependence model. Furthermore, the distribution exhibits pronounced left-skewness. For such distributions, the Gumbel copula, which features a heavier left tail, is generally considered more appropriate than other Archimedean copulas [31,32]. A comparative evaluation of several copula functions is presented in Section 4 to verify whether the Gumbel copula indeed provides superior performance.

3.3. Selection Criteria of Optimal Copulas

Log-likelihood, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are three commonly used criteria for model selection, typically representing the model’s degree of fit.
The log-likelihood is a metric for probability models that describes the probability of a model accurately representing observed data. A higher log-likelihood indicates a better fit of the model to the observed data. While log-likelihood directly reflects the model’s fit to the data, it does not account for the model’s complexity. The log-likelihood is often expressed as ln ( L ) , where L represents the likelihood function.
AIC is an estimator of prediction error and thereby the relative quality of statistical models for a given set of data [33]. Suppose that we have a statistical model of some data. Let k be the number of parameters estimated within the model, and let L ^ denote the maximized value of the model’s likelihood function. Then the AIC is derived by
A I C = 2 k 2 ln ( L ^ )
Models are then evaluated based on their AIC values, with a preference for models with lower AIC values, indicating a superior balance of model complexity and fit.
Based on the principles of Bayesian theory, BIC severely penalizes the presence of excess parameters to avoid overfitting. The definition of BIC is presented as follows:
B I C = k ln ( n ) 2 ln ( L ^ )
Consistent with the AIC criterion, the model with the lower BIC value is preferred.
The values of Log-likelihood, AIC, and BIC depend on research objectives, sample size, and other factors. By comparing the Log-likelihood, AIC, and BIC values across different models, we can identify the copula model that is most suitable for the PV system.

3.4. Anomaly Detection Based on OCAD

Let U 1 = F 1 ( P e x p ) and U 2 = F 2 ( P m e a s ) denote the marginal CDFs of the expected power output from the digital twin and the measured power output from the PV system, respectively. Given an observed expected power P e x p , the corresponding value u 1 = F 1 ( P e x p ) is obtained. The conditional CDF of U 2 given U 1 = u 1 is then derived from the copula model as follows:
C U 2 | U 1 ( u 2 | u 1 ) = C ( u 1 , u 2 ) u 1
A confidence interval (CI) with significance level α = 0.05 is constructed based on this conditional distribution. The lower and upper bounds of the CI are determined by
u 2 L = C U 2 | U 1 1 α 2 | u 1 , u 2 U = C U 2 | U 1 1 1 α 2 | u 1
The measured power output P m e a s is then classified as follows:
Status = Normal , if u 2 L F 2 ( P m e a s ) u 2 U Abnormal , otherwise
It is worth noting that the detection is performed on a point-by-point basis. At each time step, the digital twin receives real-time irradiance and temperature inputs and computes P e x p . The Copula model then evaluates the conditional probability of the observed P m e a s given this specific P e x p , independent of the overall power distribution. Consequently, weather-induced power fluctuations (e.g., during cloudy periods) do not cause false alarms, as both P e x p and P m e a s decrease simultaneously, preserving the dependence structure captured by the Copula model.
To balance the timeliness of detection and minimize false alarms caused by transient output fluctuations, a detection window of 5 min is adopted. An anomaly alarm is triggered only when all data points within the window are classified as abnormal.

4. Case Study

In this section, a case study is conducted on the proposed predictive maintenance method for PV systems based on digital twins and copula modeling. Section 4.1 presents the application scenario and related dataset of the case study, while Section 4.2 compares and analyzes the proposed method with other methods.

4.1. Dataset

The PV system examined is located on the balcony of laboratory AG713 at The Hong Kong Polytechnic University, as shown in Figure 5. The system consists of eight mono-crystalline PV panels connected to a single phase of the grid through five inverters, with an overall capacity of 3.34 kWp.
To monitor the individual panel behavior, six DC meters (AcuDC243-60-A2-P1-X5-C, Accuenergy, Toronto, ON, Canada) and four AC meters (EV190-5A-Modbus, Accuenergy, Toronto, ON, Canada) were employed to measure the crucial electrical parameters, such as DC/AC current, DC/AC voltage, DC/AC active/reactive power, power factor, energy, etc. The configuration of these meters is also illustrated in Figure 5 and Figure 6.
The on-site weather conditions are recorded using a micro weather station. Two pyranometers with different tilt angles were installed to measure solar irradiance at varying incidence angles. One pyranometer was installed horizontally for Global Horizontal Irradiance (GHI) measurement, while the other was installed in the same orientation as the PV panels to measure Plane-of-Array (POA) irradiance. The testing dataset was collected at a 1-min resolution from 5:00 am to 8:00 pm daily throughout the year 2021. The dataset comprises normal data as well as data representing different fault conditions recorded over the course of one year.
The dataset is partitioned into training and test sets in chronological order. The training set comprises data collected from January to September 2021, while the test set spans October to December 2021 and includes both normal operating data and fault data, covering partial shading, complete shading, open circuit, and short circuit conditions. It is worth noting that the training strategies differ across the evaluated methods. For the proposed OCAD approach, only normal operating data are used during the training phase to fit the marginal distributions and estimate the copula parameters, as the copula model characterizes the dependence structure under healthy operating conditions and detects anomalies as deviations from this baseline. In contrast, the comparative methods, including SVM, KNN, and ANN, require both normal and fault samples during training to learn the decision boundaries between different operating states. Since the number of fault samples is significantly smaller than that of normal samples, the normal samples in the test set are randomly downsampled to construct a balanced dataset, thereby ensuring an unbiased evaluation across all methods.

4.2. Performance Analysis

The PV power output under ideal conditions can be estimated using internal parameters, POA irradiance, and module temperature. As shown in Figure 7, the measured and expected values of all PV panels, except for the String Panel, are generally consistent. Even the String Panel, used as a shaded negative sample, shows a high degree of correlation despite the relatively large difference between the measured and expected values. Therefore, correlation analysis alone is inadequate for identifying faults such as panel degradation and shading.
In this work, we classified weather conditions into four types: sunny, occasionally cloudy, cloudy, and rainy. We analyzed the correlation between the measured output values of PV panels and the expected values from the digital twin model under different weather conditions, as illustrated in Figure 8. It was observed that the PV system’s output exhibited different correlation distributions depending on the weather condition. Under sunny or rainy conditions, the measured and predicted values of the PV system showed a strong correlation. However, during cloudy conditions, the effect of cloud cover on the system’s output increased with solar irradiance, demonstrating a nonlinear correlation. Consequently, the identification of faults solely based on correlation thresholds proved to be insufficient. Thus, we advocate for the development of a probabilistic model to establish the confidence interval for the PV output values. Evaluating these against the confidence interval allows for a more precise determination of potential faults.
We fitted several copula functions to the dataset and generated the joint probability density map of each copula function, as illustrated in Figure 9. The uniform marginal distribution within the range of [ 0 , 1 ] represents the ideal probability density distribution for U 1 , U 2 , , U N . Various methods are utilized to approach the uniform marginal distribution, such as maximum likelihood estimation and Kendall’s tau approximation [30,34]. It’s noteworthy that since the data used to construct the model represents only a portion of the total data, the copula model established based on the known data is biased. It can only approximate the true distribution as closely as possible.
As illustrated in Figure 9, the data fitted through different copula models exhibit distinct characteristics. To find the most suitable copula model for the PV systems, we calculate the optimal parameters for each copula model and evaluate the models using Log-likelihood, AIC, and BIC criteria. As shown in Table 2, when ρ = 0.99 and ν = 1.34 , the Student’s t copula achieves the best performance. By comparing the shapes of several PDFs and the probability density distribution of PV output power in Figure 4, we can see that the fitted t-copula distribution is the closest to the distribution of PV output power, indicating that it provides the best fit. Therefore, we select this copula model as our assessment model for predictive maintenance. The joint probability density of the optimal copula model is illustrated in Figure 10.
Based on the optimal copula model we have obtained, we propose the OCAD method. To verify the effectiveness of the proposed method, we compare it with classical machine learning methods, including Support Vector Machine (SVM) [35], K-Nearest Neighbor (KNN) [36], and Artificial Neural Network (ANN) [37,38]. These methods are selected as baselines because the proposed approach targets deployment on resource-constrained edge devices such as smart inverters, where advanced deep learning models (e.g., Transformer-based architectures) are not computationally feasible. The test dataset includes various fault types, such as partial shading, complete shading of PV panels, open circuit, and short circuit. Notably, during the test, we found the impact of soiling on PV power output is almost negligible, and research in reference [39] also shows that the standard industry assumption of soiling losses ranges from 1% to 4% on an annual basis. Therefore, it is difficult to identify soiling faults through the output characteristics of the PV system. Currently, image recognition is widely used for diagnosis of soiling faults. Hence, we exclude the soiling fault from this test. Since the number of fault samples collected over the year is significantly smaller than that of normal samples, normal operating data are randomly sampled to construct a balanced test dataset containing an equal number of normal and fault instances, thereby ensuring an unbiased comparison across all methods.
Table 3 presents a comparative analysis of predictive maintenance strategies for evaluating PV panel performance. The results reveal that traditional methods, such as SVM and KNN, exhibit limited effectiveness in detecting anomalies. Conversely, methodologies incorporating digital twin technology and copula models demonstrate superior detection capabilities across all evaluation metrics. Among these, the proposed OCAD strategy achieves the highest accuracy (92.51%), precision (96.72%), and F1-score (92.13%). Although ANN attains the highest recall (88.47%), the recall of OCAD remains competitive and closely approaches this best-performing value. The high precision indicates that the method produces very few false alarms, which is desirable for practical deployment. This result also aligns with the performance derived from the copula model fittings in Table 2. Consequently, the application of the OCAD method represents a significant advancement over conventional predictive maintenance techniques, offering a viable and more accurate solution for predictive maintenance in PV systems.
In addition to detection accuracy, we compare the computational complexity of each method, as summarized in Table 4. The proposed OCAD model requires only O ( n ) for parameter estimation and O ( m ) for single-point inference, meaning that once the model is fitted, each incoming data point can be evaluated independently in constant time. This makes the method highly efficient for real-time deployment. In contrast, SVM and KNN exhibit higher computational costs that scale with the training set size, which may limit their applicability in online monitoring scenarios.

5. Conclusions

In this paper, a predictive maintenance approach for PV systems based on digital twin technology and a copula model is proposed. By constructing a digital copy of the entire PV system, the digital twin can accurately perceive the system’s running status. Additionally, the copula model is applied to represent the correlation between the digital twin’s power output and the measured output. Anomalies in the measured values are then detected based on the confidence interval. A case study was conducted to evaluate the performance of the proposed method among the candidates. Results showed that the Student’s t-copula is more suitable for PV anomaly detection, particularly for left-skewed distributions. In the test, the proposed method achieved an accuracy of 92.51% and an F1-score of 92.13% in anomaly detection. The methodology and results may provide valuable insights for future researchers on leveraging digital twin technology.
There are still some challenges to be addressed in future work: (1) The auto calibration of internal parameters of the PV system. Currently, the digital twin relies on factory parameters, which may differ from the actual parameters and affect the accuracy of anomaly detection; (2) the detection of fault categories. The proposed approach lacks the ability of fault classification; further research will be conducted in this area; (3) integration into smart inverters. The proposed method operates on DC-side measurements and features O ( 1 ) single-point inference complexity, making it a promising candidate for embedding into smart inverters as a lightweight, real-time fault detection module.

Author Contributions

Conceptualization, S.Z. and X.Y.; methodology, S.Z. and X.Y.; software, S.Z. and X.Y.; validation, S.Z.; formal analysis, S.Z.; investigation, X.Y.; resources, Z.X. and M.W.; data curation, S.Z.; writing—original draft preparation, S.Z. and X.Y.; writing—review and editing, D.Q. and M.W.; visualization, Y.Y.; supervision, D.Q. and Z.X.; project administration, D.Q. and Z.X.; funding acquisition, D.Q. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was jointly supported by the National Natural Science Foundation of China (Grant No. 52467024, No. 62001416), the Hainan Provincial Sanya Yazhou Bay Science and Technology Innovation Joint Project (Grant No. ZDYF2025GXJS142), the Project of Sanya Yazhou Bay Science and Technology City (Grant No: SKJC-JYRC-2025-53), and the Hong Kong Research Grants Council Theme-Based Program under contract No. 152443/16E.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Specifications of selected PV modules at STC.
Table A1. Specifications of selected PV modules at STC.
PanelType AType BType C
Technology Half-Cut Mono Bifacial Mono Shingled Mono PERC
Maximum power (Pmax)405 W540 W370 W
Optimum operating voltage ( V m p )42.0 V41.75 V39.54 V
Optimum operating current ( I m p )9.65 A12.94 A9.36 A
Open circuit voltage ( V o c )49.2 V49.54 V47.47 V
Short circuit current ( I s c )10.54 A13.89 A9.90 A
Efficiency20.10%20.80%19.9%
Temp. coefficient of P m a x 0.37 %/°C 0.36 %/°C 0.38 %/°C
No. of cells per module144 (6 × 24)144 (6 × 24)420 (6 × 70)
Dimensions (L × W × H)2008 × 1002 × 35 mm2287 × 1134 × 35 mm1842 × 1008 × 35 mm
STC (Standard Test Conditions): irradiance 1000 W/m2, module temperature 25 °C, wind speed 0 m/s, AM 1.5, light incidence angle 0°.

References

  1. Yang, D.; Li, W.; Yagli, G.M.; Srinivasan, D. Operational solar forecasting for grid integration: Standards, challenges, and outlook. Sol. Energy 2021, 224, 930–937. [Google Scholar] [CrossRef]
  2. Youssef, A.; El-Telbany, M.; Zekry, A. The role of artificial intelligence in photo-voltaic systems design and control: A review. Renew. Sustain. Energy Rev. 2017, 78, 72–79. [Google Scholar] [CrossRef]
  3. Pillai, D.S.; Rajasekar, N. A comprehensive review on protection challenges and fault diagnosis in PV systems. Renew. Sustain. Energy Rev. 2018, 91, 18–40. [Google Scholar] [CrossRef]
  4. Ge, L.; Xian, Y.; Yan, J.; Wang, B.; Wang, Z. A hybrid model for short-term PV output forecasting based on PCA-GWO-GRNN. J. Mod. Power Syst. Clean Energy 2020, 8, 1268–1275. [Google Scholar] [CrossRef]
  5. Eskandari, A.; Aghaei, M.; Milimonfared, J.; Nedaei, A. A weighted ensemble learning-based autonomous fault diagnosis method for photovoltaic systems using genetic algorithm. Int. J. Electr. Power Energy Syst. 2023, 144, 108591. [Google Scholar] [CrossRef]
  6. Oviedo, E.H.S.; Travé-Massuyès, L.; Subias, A.; Alonso, C.; Pavlov, M. Feature extraction and health status prediction in PV systems. Adv. Eng. Inform. 2022, 53, 101696. [Google Scholar] [CrossRef]
  7. Chen, Z.; Wu, L.; Cheng, S.; Lin, P.; Wu, Y.; Lin, W. Intelligent fault diagnosis of photovoltaic arrays based on optimized kernel extreme learning machine and IV characteristics. Appl. Energy 2017, 204, 912–931. [Google Scholar] [CrossRef]
  8. Mekki, H.; Mellit, A.; Salhi, H. Artificial neural network-based modelling and fault detection of partial shaded photovoltaic modules. Simul. Model. Pract. Theory 2016, 67, 1–13. [Google Scholar] [CrossRef]
  9. Chen, Z.; Han, F.; Wu, L.; Yu, J.; Cheng, S.; Lin, P.; Chen, H. Random forest based intelligent fault diagnosis for PV arrays using array voltage and string currents. Energy Convers. Manag. 2018, 178, 250–264. [Google Scholar] [CrossRef]
  10. Huang, J.M.; Wai, R.J.; Yang, G.J. Design of hybrid artificial bee colony algorithm and semi-supervised extreme learning machine for PV fault diagnoses by considering dust impact. IEEE Trans. Power Electron. 2019, 35, 7086–7099. [Google Scholar] [CrossRef]
  11. Zhu, L.; Wen, W.; Li, J.; Hu, Y. Integrated data-driven power system transient stability monitoring and enhancement. IEEE Trans. Power Syst. 2023, 39, 1797–1809. [Google Scholar] [CrossRef]
  12. Amiri, A.F.; Oudira, H.; Chouder, A.; Kichou, S. Faults detection and diagnosis of PV systems based on machine learning approach using random forest classifier. Energy Convers. Manag. 2024, 301, 118076. [Google Scholar] [CrossRef]
  13. Tsanakas, J.A.; Ha, L.; Buerhop, C. Faults and infrared thermographic diagnosis in operating c-Si photovoltaic modules: A review of research and future challenges. Renew. Sustain. Energy Rev. 2016, 62, 695–709. [Google Scholar] [CrossRef]
  14. Afridi, M.; Kumar, A.; ibne Mahmood, F.; Tamizhmani, G. Hotspot testing of glass/backsheet and glass/glass PV modules pre-stressed in extended thermal cycling. Sol. Energy 2023, 249, 467–475. [Google Scholar] [CrossRef]
  15. Liu, M.; Fang, S.; Dong, H.; Xu, C. Review of digital twin about concepts, technologies, and industrial applications. J. Manuf. Syst. 2021, 58, 346–361. [Google Scholar] [CrossRef]
  16. Semeraro, C.; Lezoche, M.; Panetto, H.; Dassisti, M. Digital twin paradigm: A systematic literature review. Comput. Ind. 2021, 130, 103469. [Google Scholar] [CrossRef]
  17. Fan, X.; Li, Y. Energy management of renewable based power grids using artificial intelligence: Digital twin of renewables. Sol. Energy 2023, 262, 111867. [Google Scholar] [CrossRef]
  18. Yao, C.; Wang, J.; Sun, H.; Chu, H.; Jin, T.; Xiang, Q. A Data-driven method for adaptive resource requirement allocation via probabilistic solar load and market forecasting utilizing digital twin. Sol. Energy 2023, 250, 368–376. [Google Scholar] [CrossRef]
  19. Yang, D.; Mi, C.; Lyu, X.; Xie, Y.; Luo, Z.; Huang, L. Real-time machine learning-driven digital twin framework of a floating solar system in waves. Energy Convers. Manag. 2026, 356, 121373. [Google Scholar] [CrossRef]
  20. Jain, P.; Poon, J.; Singh, J.P.; Spanos, C.; Sanders, S.R.; Panda, S.K. A digital twin approach for fault diagnosis in distributed photovoltaic systems. IEEE Trans. Power Electron. 2019, 35, 940–956. [Google Scholar] [CrossRef]
  21. Madeti, S.R.; Singh, S. A comprehensive study on different types of faults and detection techniques for solar photovoltaic system. Sol. Energy 2017, 158, 161–185. [Google Scholar] [CrossRef]
  22. Herraiz, Á.H.; Marugán, A.P.; Márquez, F.P.G. Photovoltaic plant condition monitoring using thermal images analysis by convolutional neural network-based structure. Renew. Energy 2020, 153, 334–348. [Google Scholar] [CrossRef]
  23. Gray, J.L. The physics of the solar cell. In Handbook of Photovoltaic Science and Engineering, 2nd ed.; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2011. [Google Scholar]
  24. Motahhir, S.; El Ghzizal, A.; Sebti, S.; Derouich, A. Modeling of photovoltaic system with modified incremental conductance algorithm for fast changes of irradiance. Int. J. Photoenergy 2018, 2018, 3286479. [Google Scholar] [CrossRef]
  25. Chaibi, Y.; Allouhi, A.; Malvoni, M.; Salhi, M.; Saadani, R. Solar irradiance and temperature influence on the photovoltaic cell equivalent-circuit models. Sol. Energy 2019, 188, 1102–1110. [Google Scholar] [CrossRef]
  26. Yan, J. Enjoy the joy of copulas: With a package copula. J. Stat. Softw. 2007, 21, 1–21. [Google Scholar] [CrossRef]
  27. Wu, H.; Yuan, Y.; Zhu, J.; Xu, Y. Assessment model for distributed wind generation hosting capacity considering complex spatial correlations. J. Mod. Power Syst. Clean Energy 2021, 10, 1194–1206. [Google Scholar] [CrossRef]
  28. Zhang, G.; Li, Z.; Zhang, K.; Zhang, L.; Hua, X.; Wang, Y. Multi-objective interval prediction of wind power based on conditional copula function. J. Mod. Power Syst. Clean Energy 2019, 7, 802–812. [Google Scholar] [CrossRef]
  29. Hu, W.; Min, Y.; Zhou, Y.; Lu, Q. Wind power forecasting errors modelling approach considering temporal and spatial dependence. J. Mod. Power Syst. Clean Energy 2017, 5, 489–498. [Google Scholar] [CrossRef]
  30. Genest, C.; Rivest, L.P. Statistical inference procedures for bivariate Archimedean copulas. J. Am. Stat. Assoc. 1993, 88, 1034–1043. [Google Scholar] [CrossRef]
  31. Kole, E.; Koedijk, K.; Verbeek, M. Selecting copulas for risk management. J. Bank. Financ. 2007, 31, 2405–2423. [Google Scholar] [CrossRef]
  32. Sreekumar, S.; Sharma, K.C.; Bhakar, R. Gumbel copula based multi interval ramp product for power system flexibility enhancement. Int. J. Electr. Power Energy Syst. 2019, 112, 417–427. [Google Scholar] [CrossRef]
  33. Stoica, P.; Selen, Y. Model-order selection: A review of information criterion rules. IEEE Signal Process. Mag. 2004, 21, 36–47. [Google Scholar] [CrossRef]
  34. Joe, H. Multivariate Models and Dependence Concepts; Chapman & Hall/CRC: Boca Raton, FL, USA, 1997. [Google Scholar]
  35. Wang, J.; Gao, D.; Zhu, S.; Wang, S.; Liu, H. Fault diagnosis method of photovoltaic array based on support vector machine. Energy Sources Part A Recover. Util. Environ. Eff. 2023, 45, 5380–5395. [Google Scholar]
  36. Swarna, K.; Vinayagam, A.; Ananth, M.B.J.; Kumar, P.V.; Veerasamy, V.; Radhakrishnan, P. A KNN based random subspace ensemble classifier for detection and discrimination of high impedance fault in PV integrated power network. Measurement 2022, 187, 110333. [Google Scholar] [CrossRef]
  37. Yuan, Z.; Xiong, G.; Fu, X. Artificial Neural Network for Fault Diagnosis of Solar Photovoltaic Systems: A Survey. Energies 2022, 15, 8693. [Google Scholar] [CrossRef]
  38. Chine, W.; Mellit, A.; Lughi, V.; Malek, A.; Sulligoi, G.; Pavan, A.M. A novel fault diagnosis technique for photovoltaic systems based on artificial neural networks. Renew. Energy 2016, 90, 501–512. [Google Scholar] [CrossRef]
  39. Detrick, A.; Kimber, A.; Mitchell, L. Performance evaluation standards for photovoltaic modules and systems. In Conference Record of the Thirty-first IEEE Photovoltaic Specialists Conference, 2005; IEEE: New York, NY, USA, 2005; pp. 1581–1586. [Google Scholar]
Figure 1. Flowchart of the proposed methodology.
Figure 1. Flowchart of the proposed methodology.
Energies 19 02686 g001
Figure 2. Equivalent circuit model of the PV cell.
Figure 2. Equivalent circuit model of the PV cell.
Energies 19 02686 g002
Figure 3. The digital twin framework of the PV system.
Figure 3. The digital twin framework of the PV system.
Energies 19 02686 g003
Figure 4. The typical PV power output and its probability density distribution based on a single representative day.
Figure 4. The typical PV power output and its probability density distribution based on a single representative day.
Energies 19 02686 g004
Figure 5. The PV experiment platform. Panel types and specifications are detailed in Table A1.
Figure 5. The PV experiment platform. Panel types and specifications are detailed in Table A1.
Energies 19 02686 g005
Figure 6. The digital copy of PV system with real-time data.
Figure 6. The digital copy of PV system with real-time data.
Energies 19 02686 g006
Figure 7. Measured and expected DC power outputs for each panel. The String Panels exhibit a notable deficit due to partial shading, serving as a faulty sample in this study.
Figure 7. Measured and expected DC power outputs for each panel. The String Panels exhibit a notable deficit due to partial shading, serving as a faulty sample in this study.
Energies 19 02686 g007
Figure 8. Comparison of measured power outputs and expected power outputs under typical weather.
Figure 8. Comparison of measured power outputs and expected power outputs under typical weather.
Energies 19 02686 g008
Figure 9. PDF surface for the copula models. (a) Clayton. (b) Frank. (c) Gaussian. (d) Gumbel. (e) t-Copula ( ν = 1). (f) t-Copula ( ν = 3).
Figure 9. PDF surface for the copula models. (a) Clayton. (b) Frank. (c) Gaussian. (d) Gumbel. (e) t-Copula ( ν = 1). (f) t-Copula ( ν = 3).
Energies 19 02686 g009
Figure 10. PDF surface for the optimal Copula.
Figure 10. PDF surface for the optimal Copula.
Energies 19 02686 g010
Table 1. Archimedean copulas.
Table 1. Archimedean copulas.
Copula FamilyFrankClaytonGumbel
Bivariate Copula
C θ ( u , v )
1 θ ln 1 + ( e θ u 1 ) ( e θ v 1 ) e θ 1 ( u θ + v θ 1 ) 1 / θ exp ( ln u ) θ + ( ln v ) θ 1 / θ
θ θ R { 0 } θ > 1 θ 1
Generator
ψ ( t ) , 0 < t < 1
ln e θ t 1 e θ 1 t θ 1 ( ln t ) θ
Generator Inverse
ψ 1 ( t ) , 0 < t < 1
1 θ ln 1 + ( e θ 1 ) e t ( t + 1 ) 1 / θ exp ( t 1 / θ )
Kendall Distribution Function
K ψ ( t ) = t ψ ( t ) ψ ( t ) , 0 < t < 1
t ln e θ t 1 e θ 1 θ ( e θ t 1 ) t t θ + 1 t θ t t ln t θ
Table 2. Comparison of fitting results for copulas.
Table 2. Comparison of fitting results for copulas.
Copula FunctionParameter(s)Log-LikelihoodAICBIC
Clayton θ = 18.26 1844.60−3687.20−3682.29
Frank θ = 46.22 1782.34−3562.68−3557.77
Gumbel θ = 9.10 1592.44−3182.88−3177.97
Gaussian ρ = 0.98 1578.34−3154.69−3149.78
Student’s t ρ = 0.99 ,    ν = 1.34 1924.54−3847.09−3842.18
Bold values indicate the best performance.
Table 3. Comparison of PV anomaly detection results (%) for different methods.
Table 3. Comparison of PV anomaly detection results (%) for different methods.
MethodAccuracyPrecisionRecallF1-Score
OCAD (Proposed)92.5196.7288.0192.13
Clayton Copula91.8595.1088.2591.54
Frank Copula90.3493.6486.5689.95
Gumbel Copula86.4190.1081.8385.77
Gaussian Copula86.8390.6582.1286.18
SVM84.3285.1883.1184.13
KNN82.1784.6678.5881.50
ANN88.6488.7688.4788.60
Bold values indicate the best performance.
Table 4. Computational complexity of different methods.
Table 4. Computational complexity of different methods.
MethodTrainingInference
OCAD (Proposed) O ( n ) O ( 1 )
SVM O ( n 2 ) O ( n 3 ) O ( s · d )
KNN O ( 1 ) O ( n · d )
ANN (MLP) O ( e · n · H 2 ) O ( H 2 )
n: number of training samples; d: feature dimensionality; s: number of support vectors; e: number of training epochs; H: number of hidden neurons. Bold method name indicates the proposed approach; bold complexity value indicates the lowest in the column.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, S.; Yang, X.; Qi, D.; Xu, Z.; Wang, M.; Yan, Y. Predictive Maintenance in PV Systems: A Copula-Based Approach with Digital Twin Technique. Energies 2026, 19, 2686. https://doi.org/10.3390/en19112686

AMA Style

Zhang S, Yang X, Qi D, Xu Z, Wang M, Yan Y. Predictive Maintenance in PV Systems: A Copula-Based Approach with Digital Twin Technique. Energies. 2026; 19(11):2686. https://doi.org/10.3390/en19112686

Chicago/Turabian Style

Zhang, Songjie, Xinyi Yang, Donglian Qi, Zhao Xu, Minghao Wang, and Yunfeng Yan. 2026. "Predictive Maintenance in PV Systems: A Copula-Based Approach with Digital Twin Technique" Energies 19, no. 11: 2686. https://doi.org/10.3390/en19112686

APA Style

Zhang, S., Yang, X., Qi, D., Xu, Z., Wang, M., & Yan, Y. (2026). Predictive Maintenance in PV Systems: A Copula-Based Approach with Digital Twin Technique. Energies, 19(11), 2686. https://doi.org/10.3390/en19112686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop