Next Article in Journal
Identification of Perceptual Phonetic Training Gains in a Second Language Through Deep Learning
Previous Article in Journal
AEA-YOLO: Adaptive Enhancement Algorithm for Challenging Environment Object Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Predictive Maintenance for Photovoltaic Systems

1
Department of Communication Technology, Duisburg-Essen University, 47057 Duisburg, Germany
2
Computer Science Department, Faculty of Engineering and Computer Science, German University of Technology in Oman, Muscat 130, Oman
3
Department of Mathematical and Physical Science, College of Science, University of Nizwa, Nizwa 616, Oman
*
Author to whom correspondence should be addressed.
AI 2025, 6(7), 133; https://doi.org/10.3390/ai6070133
Submission received: 18 May 2025 / Revised: 13 June 2025 / Accepted: 17 June 2025 / Published: 20 June 2025

Abstract

The performance of photovoltaic systems is highly dependent on environmental conditions, with soiling due to dust accumulation often being referred to as a predominant energy degradation factor, especially in dry and semi-arid environments. This paper introduces an AI-based robotic cleaning system that can independently forecast and schedule cleaning sessions from real-time sensor and environmental data. Methods: The system integrates sources of data like embedded sensors, weather stations, and DustIQ data to create an integrated dataset for predictive modeling. Machine learning models were employed to forecast soiling loss based on significant atmospheric parameters such as relative humidity, air pressure, ambient temperature, and wind speed. Dimensionality reduction through the principal component analysis and correlation-based feature selection enhanced the model performance as well as the interpretability. A comparative study of four conventional machine learning models, including logistic regression, k-nearest neighbors, decision tree, and support vector machine, was conducted to determine the most appropriate approach to classifying cleaning needs. Results: Performance, based on accuracy, precision, recall, and F1-score, demonstrated that logistic regression and SVM provided optimal classification performance with accuracy levels over 92%, and F1-scores over 0.90, demonstrating outstanding balance between recall and precision. The KNN and decision tree models, while slightly poorer in terms of accuracy (around 85–88%), had computational efficiency benefits, making them suitable for utilization in resource-constrained applications. Conclusions: The proposed system employs a dry-cleaning mechanism that requires no water, making it highly suitable for arid regions. It reduces unnecessary cleaning operations by approximately 30%, leading to decreased mechanical wear and lower maintenance costs. Additionally, by minimizing delays in necessary cleaning, the system can improve annual energy yield by 3–5% under high-soiling conditions. Overall, the intelligent cleaning schedule minimizes manual intervention, enhances sustainability, reduces operating costs, and improves system performance in challenging environments.

1. Introduction

1.1. Overview of Photovoltaic (PV) System

Photovoltaic (PV) systems are one of the most vital renewable energy technologies. Their efficiency and reliability are often significantly influenced by environmental conditions, specifically under real outdoor conditions. While PV panels have been demonstrated to achieve their best performance under specified test conditions, for example high sunlight intensity and moderate temperature, outdoor performance can vary based on differing weather conditions and locations.
Dust, sand, and other particulate matter settling on the surfaces of PV modules cause soiling, which is among the most detrimental effects, especially in arid and semi-arid climates. Soiling creates a physical layer, decreasing the sunlight incident on the photovoltaic cells and, consequently, the system power output. In arid environments where rainfall is low and dust frequently settles on surfaces, soiling can lead to energy losses of 20 to 30 percent over time [1,2]. The effects of soiling degradation are directly proportional to factors like dust content, particle size, humidity, and tilt angle, all of which exert profound influences on the rate of dust deposition and removal ease [3,4,5].
Such performance issues, especially under very cold or very hot conditions, point to the importance of predictive maintenance. Cleaning systems, irrespective of the type that is put to use, are used to operate on the basis of maintenance methods like time-based, schedule-based, or reactive cleaning.
Very often, the performance of cleaning systems can be suboptimal, leading to inefficient cleaning and extended durations of degraded performance. Predictive maintenance is a careful process that uses sensor measurements and environmental monitoring to track how effectively a system operates. Predictive maintenance is employed to detect probable problems before they turn into significant problems, which keeps the system operating effectively and dependably. Sensors can be utilized to monitor the performance of a solar powered system and to view its status. Sensors can also quantify sunlight, voltage, and the frequency with which a dirty solar panel is inspected over time. Real-time Internet of Things (IoT) data make it easier to utilize predictive models based on machine learning. They are capable of determining the best time to carry out cleaning tasks in terms of energy usage and cost [6,7,8]. Predictive models prove to be quite handy in cases where the use of manual cleaning is too expensive or impractical. This may occur in areas where the use of automated or robot-based technologies makes it easy to plan routine maintenance.
In conclusion, it is crucial to comprehend and minimize the impact of environmental elements, particularly dirt, on the performance of PV systems. Predictive maintenance, through the utilization of environmental data and intelligent analysis, offers an affordable and scalable solution to this issue.

1.2. Problem Statement

The particle and dust accumulation are also detrimental to the performance of PV systems in desert and semi-arid environments [9,10]. In addition, the rainfall has very little affect or none at all on the performance of PV systems, thus making natural cleanup a hard process. This calls for mechanical or human intervention on a continual basis in order to ensure high performance, but it is hard to know when and how often to clean a PV system. Cleaning too infrequently will lead to a loss of energy generation.
On the other hand, cleaning too often will lead to higher operation and cleaning material costs, use more water, and wear out equipment faster [11]. PV maintenance typically employs time-based or reactive cleaning schedules regardless of site-specific environmental variation. Both time-based and reactive cleaning schedules are ineffective in the sense that they do not consider dynamic parameters on site-specific soiling behavior, i.e., wind, dust, humidity, or geography, and season-dependent soiling rates [12]. Additionally, it is impractical and unsustainable to conduct frequent inspections or adopt a standard cleaning policy for application across large PV installations, especially in regions that have labor and water shortages [13].
This highlights the urgent need for smart maintenance schedules that use real-time environmental and performance data to predict dirt build-up and, thus, when cleaning is necessary. Artificial intelligence (AI) and, more precisely, machine learning (ML) algorithms add some degree of sophistication and functionality by processing sensor data such as irradiance, temperature, wind speed, humidity, and barometric pressure to calculate the effect of soiling and to correlate it with historical performance values [14,15].
AI-based predictive maintenance can make systems more reliable, increase energy production, reduce cleaning necessity, and extend the lifespan of PV components. To develop AI-based predictive maintenance, we need to have strong machine learning models, a good data collection system, and testing under diverse environments. This research explores these issues through the development and verification of a machine learning-based maintenance strategy for solar powered systems working in areas prone to dirt issues.

1.3. Research Objective

This study aims to develop an AI-powered robotic cleaning system capable of autonomously anticipating and scheduling cleansing operations using real-time environmental and sensor data. Through the capabilities of machine learning, such a system overcomes the limitations of traditional time- and reaction-based cleaning practices, and at the same time, the AI-powered robotic cleaning system increases efficiency, optimizes maintenance schedules, and ensures proactive performance in dusty environments.
The main objective of the proposed system is to combine different sources of data such as embedded PV sensors, weather station data, and optical soiling sensors such as DustIQ to create an integrated dataset for predictive modeling. The environmental factors such as wind speed, temperature, humidity, and pressure are combined with the machine learning algorithms to predict soiling accumulation and identify the most cost-effective cleaning times [16,17].
By starting to clean only after the expected loss of energy goes beyond a predetermined level, this process saves wasted water usage, labor costs, and equipment longevity, leading to increased efficiency and increased life to the system [18].
The proposed AI-based approach maximizes energy output and offers an eco-friendly and scalable maintenance option for utility-scale PV farms. The research also compares various machine learning algorithms, including support vector machines (SVM), logistic regression (LT), k-nearest neighbors (KNN), and decision trees (DT), to examine the appropriateness of the model types for weather condition diversity-related soiling level value prediction. The results determine the most fitting as well as the most understandable model that can be applied in actual PV maintenance applications. This work bridges an essential gap in predictive PV cleaning research by proposing an end-to-end automatic data-driven cleaning system whose design is based on the latest advances in environmental sensing and AI.
The core contributions of this work are two-fold. We first suggest an artificial intelligence-driven robot cleaning system for photovoltaic (PV) panels that utilizes real-world environmental and sensor information to predict when the best time would be to conduct cleaning interventions. Second, we develop a large multi-source dataset consisting of over 235,000 observations taken over two years from integrated PV sensors, weather stations, and DustIQ soiling sensors. Third, we compare and evaluate four traditional machine learning algorithms—support vector machines (SVM), logistic regression (LR), k-nearest neighbors (KNN), and decision trees (DT)—in soiling prediction and concludes that SVM is best for both accuracy (92.1%) and F1-score (91.7%). Fourth, we apply feature selection and correlation analysis to enhance model performance and interpretability and identify leading environmental factors for soiling accumulation. Finally, we present a strong and scalable predictive maintenance system for practical deployment in arid and semi-arid environments where dust-related PV degradation is a critical operational concern.

2. Literature Review

2.1. Soiling and Its Impact on Photovoltaic Systems

Soiling is one of the leading causes of PV performance degradation in arid and semi-arid regions. While the Introduction outlines the general impact of dust accumulation, this section summarizes region-specific studies that quantify energy losses and the factors that influence them.
The research shows that energy yield can decline by 10–30% due to accumulated dust, with losses varying by dust composition, humidity, wind speed, and rainfall frequency [19,20,21].
The majority of studies have indicated that the extent of soiling loss is correlated to environmental conditions like wind speed, relative humidity, and temperature [22].
For example, wind speed plays a role in how much dust collects on the panels, while high temperatures enhance the stickiness of dust particles, making them harder to clean from the surface [23]. Conversely, regions that receive more rain have lower soiling rates, as rain washes off the deposited dust; this is in contrast to arid regions that receive less rainfall [24].
Soiling’s impact on energy production is not only a factor of the amount of dust but also dust composition. More reflective dust can decrease the amount of sunlight incident on the PV cells, and some types of particulate matter may intensify optical shading and thermal loading onto the soiling system, such that more degradation on the system occurs. The studies in [25,26] examined the soiling effect, and it was discovered that such effects varied seasonally.
Increased loss was observed in the arid months when the frequencies of dust storm events and, therefore, the amount of dust, which may contribute to soiling, were higher.
The determination of the amount of soiling loss is key to determining how to proceed. Researchers have attempted various cleaning methods, ranging from manual to robotic processes [27]. Historical cleaning methods, especially where they are prescriptive or manual, are resource-intensive and wasteful and ultimately translate into lost operating costs and time [28].
The increasing research highlights the significance of real-time environmental measurement for effective PV system maintenance. Research has shown that data-driven analysis, such as weather, sensor, and machine learning models, can provide better soiling event prediction and the best scheduling of cleaning [29]. The scope of AI-predictive model-based automated cleaning has great potential to significantly reduce operational expenses along with the environmental losses of PV system maintenance.
A comparative summary of recent studies examining the impact of soiling on PV system performance across various regions and climates is presented in Table 1.

2.2. Artificial Intelligence in Photovoltaic System Maintenance

AI, particularly ML, has gained tremendous traction in the renewable energy sector due to its ability to process complex, nonlinear data and provide accurate predictions from vast data sets. AI has been utilized very effectively in predictive maintenance in various sectors like manufacturing, aviation, and energy for the prediction of failures, reduction of downtime, and maximization of operating efficiency [30,31]. These developments encouraged an increasing amount of research into transferring the same techniques to PV systems, where maintenance expense and performance degradation, especially from soiling, are major operational concerns.
In PV systems, AI predictive maintenance is typically performed by implementing supervised learning techniques to classify or forecast faults and performance degradation like soiling losses, hot spots, shading effects, and component failure. They are trained based on historical and real-time environmental sensory data, electrical output logs, and image-based diagnostics [32,33]. Among the most common machine learning methods applied to PV maintenance are DT, random forests, SVM, KNN, and, more recently, deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for time-series forecasting [34].
Several studies have presented the effectiveness of ML models for determining optimal cleaning schedules. As an example, [35] proposed a machine learning hybrid model that predicted soiling-induced energy loss based on meteorological parameters and which determined the need for a cleaning measure. Similarly, [36] utilized neural networks in predicting power degradations and proposed intervention levels, demonstrating an appreciable rise in energy output as well as cost-effectiveness.
One of the key advantages of AI-based maintenance systems is their adaptability to localized conditions. Unlike rule-based systems or rigid schedules, ML algorithms can learn from site-level data to adaptively adjust maintenance decisions. This kind of flexibility is particularly useful with large-scale PV systems that cover diverse microclimates or subject equipment to extreme climatic stress [37]. With IoT sensors, AI technology has also introduced real-time monitoring and automating opportunities for predictive maintenance on a scalable and tractable level. Although these advancements have progressed far, there remain challenges to be overcome. ML prediction accuracy is highly dependent on the quality and completeness of input data and model generalizability across sites and climates [38].
Moreover, the explainability and transparency of AI-based decision-making are crucial to stakeholder trust and adoption interoperations [38]. Nevertheless, the promise of AI-assisted predictive maintenance cost savings, enhanced reliability, and greater sustainability continues to drive this field.
An overview of some various AI and machine learning (ML) techniques applied to predictive maintenance in photovoltaic (PV) systems is presented in Table 2.

3. Methodology

3.1. Data Collection and Preprocessing

The study’s AI-based robotic cleaning system for PV system panels was built based on real-time environmental data and sensor-based feedback mechanisms. Data utilized in the study were collected between 2022 and 2024 at the Shams Solar Outdoor Facility, located at the German University of Technology in Oman (GUtech), Muscat, Oman [39]. The data collection period spans from 12 October 2021 to 5 March 2024. With a 5 min monitoring timeframe, we have a sample. It may be worth mentioning that a big dataset containing 235,297 observations of each parameter (factor) was collected for this study. The “cleaning needed” labels were generated on the basis of a permanent DustIQ soiling ratio threshold of SR < 0.95 as per manufacturer guidelines. The dataset contains about 23% positive (cleaning needed) and 77% negative examples. We used stratified splitting to preserve such a balance and concentrated on metrics such as F1-score and recall, minimizing the effect of class imbalance.
Environmental conditions reveal dry and stable weather at the location of the PV system. The relative humidity was mostly low with periods of high values due to rain or fog. Air pressure was constant at 1005–1010 hPa, and the wind velocities were generally below 5 m/s without any prevailing wind direction pattern. Temperatures were between 20 °C and 50 °C, with daily or periodic fluctuations. Solar irradiance levels were mostly less than 600 W/m2, likely due to cloudiness or low sun elevation.
Electrical results indicated DC current primarily between 0 and 4 A and voltage between 100 and 200 V. DC power was frequently less than 1000 W, indicating that the system operated less than at full capacity, possibly due to dust, shading, or low light.
Correlation analysis identified important correlations between irradiance and power output and between voltage/current and power. Wind speed and absolute humidity correlated moderately with temperature and need for cleaning, while high temperature and low air pressure were linked to higher frequency of cleaning. Wind direction and month of the year had very little influence on performance.

3.1.1. Ground-Mount PV System and Sensor Setup of Shams Solar

As part of the facility’s objective to evaluate PV system performance and maintenance under real environmental conditions, the ground-mount system was instrumented with various sensors to ensure accurate, real-time monitoring.
PV Array Configuration
The ground-mount system comprises 18 photovoltaic modules rated at 325 Wp each, arranged in a 9-series × 2-parallel configuration. Under standard test conditions (STC), it delivers 6.50 kWp, with a real-world output of 5.55 kWp at 55 °C. At elevated temperatures (50 °C), the system reaches its maximum power point at 317 V and 18 A.
Environmental Monitoring Sensors
Weather station sensor monitors critical environmental variables, including solar irradiance, ambient temperature, humidity, and wind speed/direction.
Thermopile pyranometer (SMP11) measures solar irradiance with high accuracy using a black-body absorber and thermopile technology.
Irradiance and Temperature Tracking
Silicon irradiance sensor is positioned at the same tilt as the panels, and it measures incident solar radiation and ambient temperature using a silicon solar cell and integrated temperature probe.
Soiling Detection
DustIQ soiling sensor continuously measures dust accumulation and calculates the soiling ratio, enabling timely and data-driven cleaning schedules.

3.2. Machine Learning Model Development

The system determines whether to clean based on the prediction of power output loss due to soiling. If the predicted loss is above a given threshold, then automatic cleaning is scheduled, avoiding unnecessary action and saving resources.
To predict the level of soiling and to determine the cleaning schedule, supervised machine learning models were employed.
To ensure reliable and reproducible results, we carefully tuned the key parameters for each machine learning model. For logistic regression, the regularization parameter C was selected using 5-fold cross-validation, with C = 1.0 providing the best F1-score. For k-nearest neighbors, we tested values of kk from 3 to 9 and found that k = 5 gave the most balanced performance. In the case of decision trees, we explored maximum depths between 3 and 15 using both Gini impurity and entropy criteria; the best outcome was achieved with a depth of 8 and Gini splitting. For support vector machines, we applied an RBF kernel and performed a grid search over C and γ values, identifying C = 10 and γ = 0.1 as optimal.
All performance metrics accuracy, precision, recall, and F1-score were calculated using 5-fold cross-validation. Since the data had a time-based structure, we preserved the temporal order during training and testing to avoid data leakage and to better reflect how the models would perform in real-world deployment.
Historical data, including environmental conditions and recorded cleaning schedules, were used to train the models. The following models were experimented with:
-
Logistic regression (LR);
-
K-nearest neighbors (KNN);
-
Decision trees (DT);
-
Support vector machines (SVM).
The proposed method integrates sensor-driven environmental data with machine learning algorithms to automate predictive maintenance and optimize cleaning schedules for photovoltaic systems. The overall workflow of the system is illustrated in Figure 1.
It may be mentioned here that the above selected machine learning algorithms will provide accurate results and yield the best compromise between interpretability and predictive accuracy in soiling level prediction and leaning schedule optimization. LR offers a simple-to-understand baseline; KNN maintains similarity-based relations in environmental observations; DT facilitates transparent rule-based decision-making; and SVM facilitates complex relationships in high-dimensional feature spaces. These models were chosen to decide which approach is most appropriate for enabling correct and effective decision-making for the solar cleaning robot.

3.2.1. Modeling Based on Logistic Regression (LR)

To estimate the probability that a photovoltaic (PV) system requires cleaning, logistic regression (LR) was applied. The function provided below, which models the likelihood of a cleaning event in relation to numerous environmental and electrical input features using the sigmoid (logistic) function, mapping any real-valued input into the (0, 1) range:
h θ x = σ θ T x = 1 1 + e θ T x
where h θ x is the predicted probability that cleaning is needed, given the input vector x , and σ(⋅) is the sigmoid activation function. ( θ R 11 ) is the parameter (weight) vector learned by the model. θ T x is the dot product (linear combination) of the parameter vector θ∈ R 11 and the input feature vector x∈ R 11 . ( x R 11 ) is the standardized input feature vector consisting of 11 environmental and electrical features, including irradiance, temperature, humidity, wind speed, air pressure, and DC voltage/current. ( θ ) is the vector of learned parameters (coefficients) for each of the 11 input features.
Cost Function with Regularization
To train the logistic regression model and prevent overfitting, we minimized the regularized log-loss (cross-entropy) cost function, shown as follows:
J θ = 1 m i = 1 m [ y i l o g h θ ( x i ) + ( 1 y i )   l o g   ( 1 h θ ( x i ) ] + λ 2 m     j = 1 11 θ j 2
where J(θ) is the cost function measuring the error of predictions and including a regularization term to prevent overfitting. m is the number of training examples. y i   i s the actual label (0 or 1) indicating the presence or absence of a cleaning event for the ith training example. x i   i s the feature vector of the ith training example. h θ x i   is the predicted probability for the ith example. λ is the regularization strength, with λ = 1/C and C selected via 5-fold cross-validation on the F1-score. j = 1 11 θ j 2 is the regularization term that penalizes large weights (excluding bias term if present), promoting simpler models.
This method enables the model to be trained on the data in a balanced way. It allows the model to predict cleaning requirements correctly without being excessively complex or overly specialized for the training data. Another rule has been utilized to govern the weight of the model so that it remains simple and also does not make errors when used with new data. Logistic regression is a favorited and easy approach. It predicts the probability that something will happen, like whether solar panels need to be cleaned given different environmental and electrical inputs. Because it is stable and easy to understand, this method makes a sound basis for making smart decisions in solar panel maintenance systems.

3.2.2. Modeling Based on K-Nearest Neighbors (KNN)

For PV cleaning robot application, the k-nearest neighbors algorithm was used to predict whether cleaning is needed based on real-time environmental and electrical sensory data. For an input vector composed of features such as irradiance, module temperature, ambient humidity, wind speed, and DC power, KNN calculates the Euclidean distance to all previous observations as follows:
d x q   , x i   =   j = 1 n x q j x i j   2
where x q   is the new input vector representing the current environmental/electrical sensor readings; x i   is the stored historical data point (from past observations); x q j   is the value of the jth feature in the query vector; x i j     is the value of the jth feature in the ith stored vector; n is the number of features (e.g., irradiance, module temperature, humidity, wind speed, DC power, etc.); and d x q   , x i is the Euclidean distance between the query and the i-th historical point.
The algorithm selects the k = 5 closest samples and uses majority voting as a classification strategy, where the most common class among the kk nearest neighbors determines the output (cleaning needed or not), shown as follows:
y q = m o d e   ( y 1 , y 2 ,   , y K ) .
where y q is the predicted class (output) for the new input sample (also called the new input in other contexts) {0, 1} where 1 → Cleaning needed and 0 → No cleaning needed.
Additionally, ( y 1 , y 2 ,   , y K ) is the class labels (actual cleaning outcomes) of the k closest neighbors (previous observations) to the new input; mode (.) is the statistical mode function, which returns the most frequently occurring value among the k nearest neighbors; and k is the number of neighbors used for classification; this is a hyperparameter of the KNN algorithm, chosen by the user to be =5.
KNN is best suited for small-scale PV systems with comparatively stable environmental conditions. Although it does not require any training time, its prediction can be computationally intensive, as it compares each new input with all the historical data. However, it is a simple, interpretable solution to inform cleaning-related decisions.

3.2.3. Modeling Based on Decision Trees (DT)

The features are divided by selecting the feature and threshold to minimize impurity as measured using metrics such as Gini impurity or entropy, shown as follows:
G ( S ) = 1 i = 1 C p i 2
where S is the current set of samples at a node in the decision tree. C is the total number of classes in the classification problem, and C = 2 (cleaning needed or not). p i is the proportion of samples in set S that belong to class i. G(S) is the Gini impurity of set S; the lower Gini value means the group is consistent. OR, we may develop the following equation:
H ( S ) = i = 1 C p i l o g 2 p i ,  
where H(S) is the entropy function of set S; and the higher entropy indicates greater disorder. In addition, the lower entropy means more certainty.
At each node, the algorithm chooses the split resulting in the most balanced child nodes. For example, it may obtain rules like the following: “If humidity > 70% and irradiance < 300 W/m2, then no cleaning is needed.” Decision trees are valuable due to their interpretability and transparency, and therefore, they are highly suitable for application in systems where simple rule-based decisions are required. To avoid overfitting, tree depth and pruning are utilized.

3.2.4. Modeling Based on Support Vector Machines (SVM)

Given a labeled training dataset ( x i , y i ), where x i     R n represents the feature vector and y i   ∈ {−1, 1} y i the class label, SVM finds the hyperplane that maximizes the margin between the two classes, using the following:
( x i , y i ) is a labeled training data point;
x i     R n is a feature vector with n numeric values (e.g., irradiance, temperature, humidity, etc.);
y i   ∈ {−1, 1} is the class label. For y i = 1 is “cleaning needed”, while y i = −1 is “no cleaning needed”.
The hyperplane is presented in the following Equation (7):
w T x + b = 0 ,
where w is the weight vector, which is the learned coefficient that defines the orientation of the separating hyperplane; x is a generic input vector; b is the bias term that shifts the hyperplane away from the origin; and w T x + b is the linear decision function that separates the two classes.
The margin constraint is presented in the following Equation (8), which ensures that each training point is correctly classified and lies outside the margin boundary:
y i ( w T x i + b )     1 .
The radial basis function kernel presented in Equation (9) allows the SVM to operate in a nonlinear space by mapping data into higher dimensions, and the radial basis function kernel is given by the following:
K x i ,   x j = exp γ x i   x j   2
where K x i ,   x j is the kernel function that measures the similarity between data points x i x i and x i x j ; γ . It is a parameter that controls the width of the RBF kernel; higher values make the model more sensitive to individual data points, and x i   x j   2 . It is the squared Euclidean distance between two data points.
The capacity of the SVM to model’s sophisticated interaction makes it of high appropriateness for precisely predicting cleaning requirements in varied PV operating conditions.

4. Results of Data Study

4.1. Data Visualization

It is well-known that data visualization generally is a very useful technique and can be utilized in many stages of data analysis and modeling. The data visualization included cleaning data, understanding and exploring data structure and distribution, identifying any abnormal data of any parameter, discovering outliers, helping to identify future trends in data features, monitoring data quality, and assisting global experts in understanding the actual structure, characteristics, and features of environmental and electrical parameter data.
Specifically, and with regard to PV systems, with the increasing complexity of these systems and the potential to generate massive amounts of data, visualizing data for solar PV systems is critical for understanding system performance, identifying trends, and making reality-based decisions.
In this section, the data visualization is conducted and studied for each environmental parameter, and it is given below:
  • Humidity and Air Pressure
Humidity_Relative: The pattern (higher frequencies at lower values) suggests that the atmosphere is generally dry or lightly humid with periodic high-humidity events (rain or fog) as shown in Figure 1.
Humidity_Absolute: Peaks at low values (5–10 units) suggest that the absolute level of moisture in the air is usually low, which is characteristic for dry or regulated environments as shown in Figure 2.
Air Pressure (Relative/Absolute): Symmetrical plots about 1005–1010 hPa indicate balanced atmospheric pressures, which are typical for fine weather with minimal extreme pressure variations, as shown in Figure 3 and Figure 4.
2.
Electrical Parameters (DC Power, Current, Voltage)
DC Current and Voltage: The majority of the measurements are clustered in lower values (0–4 A, 100–200 V), indicating that the system is running at low power or part-load most of the time, as shown in Figure 5 and Figure 6.
DC Power: The extreme right skew (most values < 1000 W) indicates that the system is never operating at maximum capacity, possibly due to inadequate irradiance, shading, or efficiency losses, as shown in Figure 7.
3.
Solar Irradiance (SMP11_BM, Trina_330W, Si_South_BH, etc.)
Irradiance Peaks at Low Values (0–600 W/m2): The dominance of low/moderate irradiance levels indicates frequent cloudiness, shading, or seasonal low solar angles (e.g., winter or early/late daylight hours), as shown in Figure 8, Figure 9, Figure 10 and Figure 11.
Sharp Drop-off at Higher Irradiance: Occasional peak irradiance values (e.g., >800 W/m2) suggest limited ideal sunny days, which could influence energy output.
4.
Temperature (PV Panels and Ambient)
Mid-Range Peaks (20–50 °C): PV temperatures are at anticipated working ranges, not so high as to harm performance, as shown in Figure 12, Figure 13, Figure 14 and Figure 15.
Bimodal Distribution (Generic Temperature): This presumably signals day/night or summer/winter cycles.
5.
Wind Conditions
Wind Speed: An exponential decrease with speed (most < 5 m/s) indicates calm to moderate winds with the presence of occasional strong gusts, as presented in Figure 16 and Figure 17.
Wind Direction: Uniform distribution suggests no dominant wind pattern, though local spikes could occur due to terrain effects or sensor misalignment.
As mentioned above, the data distributions are obtained for each environmental parameter and given in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 attached, and they show that the air pressure stays steady and the humidity is low, which means the weather is dry and not too hot or too cold. These weather conditions usually mean the weather stays steady and there is not much rain. The wind is also light and does not change much; thus, there are not many strong winds or storms. Altogether, this suggests the area has calm and mild weather.
In addition, the temperature likely stays comfortable throughout the year, with only small changes between seasons. This analysis is very important for building the AI model for the PV panel cleaning robot, as understanding the local weather as in general and in Muscat particularly helps the robot operate more efficiently and plan cleaning schedules based on real conditions.
Figure 2. Distribution of Humidity_Relative.
Figure 2. Distribution of Humidity_Relative.
Ai 06 00133 g002
Figure 3. Distribution of Humidity Absolute.
Figure 3. Distribution of Humidity Absolute.
Ai 06 00133 g003
Figure 4. Distribution of Air_pressure_Relative.
Figure 4. Distribution of Air_pressure_Relative.
Ai 06 00133 g004
Figure 5. Distribution of Air_Pressure_Absolute.
Figure 5. Distribution of Air_Pressure_Absolute.
Ai 06 00133 g005
Figure 6. Distribution of DC-Current.
Figure 6. Distribution of DC-Current.
Ai 06 00133 g006
Figure 7. Distribution of DC-Voltage.
Figure 7. Distribution of DC-Voltage.
Ai 06 00133 g007
Figure 8. Distribution of DC-Power.
Figure 8. Distribution of DC-Power.
Ai 06 00133 g008
Figure 9. Distribution of SMP11_BM_Irradiance.
Figure 9. Distribution of SMP11_BM_Irradiance.
Ai 06 00133 g009
Figure 10. Distribution of Trina_330W_13_Irradiance.
Figure 10. Distribution of Trina_330W_13_Irradiance.
Ai 06 00133 g010
Figure 11. Distribution of Trina_330W_14_Irradiance.
Figure 11. Distribution of Trina_330W_14_Irradiance.
Ai 06 00133 g011
Figure 12. Distribution of Si_South_BM_Irradiance.
Figure 12. Distribution of Si_South_BM_Irradiance.
Ai 06 00133 g012
Figure 13. Distribution of Trina_330W_14_Temperature.
Figure 13. Distribution of Trina_330W_14_Temperature.
Ai 06 00133 g013
Figure 14. Distribution of Trina_330W_13_Temperature.
Figure 14. Distribution of Trina_330W_13_Temperature.
Ai 06 00133 g014
Figure 15. Distribution of Si_South_BM_Temperature.
Figure 15. Distribution of Si_South_BM_Temperature.
Ai 06 00133 g015
Figure 16. Distribution of temperature.
Figure 16. Distribution of temperature.
Ai 06 00133 g016
Figure 17. Distribution of wind speed.
Figure 17. Distribution of wind speed.
Ai 06 00133 g017
Figure 18. Distribution of wind direction.
Figure 18. Distribution of wind direction.
Ai 06 00133 g018
As it is well known and explained in this section, the main goal of visualizing the PV energy systems data has the following three dimensions: first, understanding their performance; second, identifying the urgent problems that these systems are exposed to; and third, improving the operation and performance of these systems. Additionally, the purpose of the analysis of PV systems in terms of time series plot is to compare the performance and maintenance trend of a solar panel system over a reasonable period, for example, two years, i.e., from early 2022 to 2024, to observe how operational efficiency can be enhanced through data-driven monitoring and cleaning.
Through the evaluation of alterations in DC power production, current, and voltage, the analysis evaluates the response of the system to external conditions and the effectiveness of maintenance, with special emphasis on panel cleanliness.
In addition, all of these findings are critical to informing the design of an AI robot used for PV panel cleaning, as they enable them to determine when maintenance should be conducted based on real performance data. Lastly, this design allows for wiser decision-making in reducing undesirable intervention and for maintaining high energy returns and system stability. Thus, for this purpose, Figure 18, Figure 19 and Figure 20 were created and are given in this section.
The three Figure 18, Figure 19 and Figure 20 together provide a complete overview of the performance and maintenance behavior of a solar panel system over the period from early 2022 to early 2024. Figure 19, “DC Power and Cleaning Needed over Time”, shows how the power output of the system varied over time, with the blue bars representing power levels and a red line indicating when cleaning was done. It presents the total DC power output (watts) and indicates how much energy the system is producing at any given moment. Power output fluctuated, with drops that were probably the result of environmental factors such as weather or dust. Cleaning events occurred at specific intervals, often after a drop in performance, demonstrating that the system used data-driven decision-making to schedule maintenance only when necessary.
Figure 20, “DC Current Over Time”, focuses on the DC current (in amperes) produced by the system. It shows a trend of steady current through most active periods, with dips and breaks around mid-2022, late 2022, and late 2023. These breaks are consistent with possible soiling, equipment maintenance, or weather. The return to higher current levels after some of these dips also supports the idea that cleaning or other actions restored performance.
Figure 21, “DC Voltage Over Time”, is a graph of DC power over the same two-year timeframe. It shows power levels generally in the range of 600–750 units for the periods when the system is online. This graph also shows some gaps and power dips, which can point towards the system being turned off from time to time or to some of the data not being captured. In spite of these breaks, the system still outputted a tremendous amount of power when it was on, showing how cleaning and maintenance on a regular basis can make it run effectively.
These numbers illustrate how real-time system monitoring and selective cleaning schedules can optimize the efficiency of a solar power system, reducing excessive maintenance while maintaining high levels of energy output.

4.2. Feature Impact Assessment Using Correlation Metrics

Numerous studies have indicated that solar PV systems are largely and significantly affected by numerous environmental variables/parameters, which impact the performance and efficiency of these systems. Therefore, with each system development, researchers must examine and study the interrelationships of various system parameters and their impact. It is essential to identify the most important parameters that may cause unexpected losses in the production of PV systems [40].
Therefore, it is necessary to study and measure the correlation between two sets of environmental variables/parameters to determine how a change in one variable affects the other and vice versa and the amount and direction of that measure, i.e., negatively or positively.
To establish interrelations among various environmental and system performance parameters of a solar PV system, the correlation matrix is a significant analytical tool. The correlation matrix is illustrated in a graphical form through a heatmap where each cell is utilized to plot the Pearson correlation coefficient between two variables, as shown in Figure 22 below.
This tool is particularly valuable in the case of solar P.V. systems, where several environmental and technical parameters—irradiance, temperature, humidity, voltage, and power output—interact in a complex manner. The analysis of their interdependencies can identify which parameters influence performance, impact system degradation, or signal the need for maintenance activities like cleaning.
There are strong positive correlations among temperature sensors (e.g., Tina_330W_Temperature_T1_13, T1_14, and Si_South_BM_Temperature_T1), which are redundant data to be collapsed in modeling. Further, irradiance values from different sensors are strongly correlated with DC power (0.70–0.87), as might be expected for photovoltaic response. DC voltage and DC current are both strongly correlated with DC power (both 0.86), justifying their fundamental role as mediators of energy output.
In contrast, relative humidity is strongly negatively correlated with temperature (−0.77), which is the typical pattern for the atmosphere. Cleaning needed is also negatively correlated with air pressure absolute (−0.77) and with temperature (−0.56), suggesting that hot, dry conditions result in soiling.
Moderate correlations provide further insight. Wind speed also has moderate correlations with temperature (0.41) and DC power (0.36), which can be explained by indirect cooling or dust dispersal. Absolute humidity is moderately correlated with cleaning needed (0.42), implying that moisture contributes to dirt accumulation.
Some variables, like wind direction and month, are not correlated or have very low correlation with other variables and show minimal influence on system behavior.
Through such an analysis of correlations, we gain actionable insights into the solar power system’s behavior, inform feature selection for machine learning models, and facilitate intelligent energy forecasting and maintenance planning.

5. Results

The classification model performances were evaluated based on the following four measures: accuracy, precision, recall, and F1-score. These measures provide a summary of each model’s strength in classifying the need for cleaning events from environmental and operational sensor data.
As shown in Table 3 and graphed in Figure 23, the support vector machine (SVM) outperformed the other models with the highest accuracy of 92.1%, precision of 90.4%, recall of 93.1%, and F1-score of 91.7%. The high recall is particularly significant in this application since it ensures that nearly all actual cleaning events are detected, minimizing energy loss caused by undetected soiling. The ability of the model to handle high-dimensional and non-linear feature interactions enabled by the radial basis function (RBF) kernel makes it highly suitable for dynamic PV operating conditions under varying weather conditions.
The second best, logistic regression (LR), recorded 89.4% accuracy, 87.5% precision, 91.2% recall, and 89.3% F1-score. Despite the fact that LR assumes linearity among the features and output, its probabilistic nature and regularization capacities allowed it to fare well. Its interpretability, fast inference, and simplicity make it a good choice for real-time, rule-critical applications in small- to medium-scale PV systems.
Decision Tree (DT) had a moderate performance with 85.7% accuracy, 83.2% precision, 87.4% recall, and 85.3% F1-score. DT has the advantage of rule transparency and interpretability in systems where these are favored. The model generated interpretable decision pathways such as “if humidity > 70% and irradiance < 300 W/m2, no cleaning is needed,” which are simple to understand and trustworthy for non-expert operators to rely on. However, the model’s tendency to overfit and lower generalization in more complex cases affects its robustness compared to LR or SVM.
The k-nearest neighbors (KNN) algorithm performed the worst with 83.2% accuracy, 81.9% precision, 84.5% recall, and an F1-score of 83.2%. KNN’s reliance on local neighborhood structure makes it extremely vulnerable to noise in the data and computationally inefficient for large data, as each prediction requires the computation of the distance to all previous samples. While it is simple and has no training time, its scalability is poor with larger PV plants. However, it remains a viable choice for smaller, static installations with stable weather conditions.
As shown in Figure 22, SVM maintained a good trade-off between not having false positives (high precision) and capturing true events (high recall), and it was, hence, the most reliable model for this predictive maintenance issue. Logistic regression followed and offered interpretability and speed. DT and KNN, although less accurate, can still have their uses in scenarios where computational simplicity and interpretability outweigh performance issues.
These findings corroborate the reality that model selection needs to be harmonious with system complexity and deployment requirements. For big PV farms under changing environmental conditions, SVM or LR are the most suitable due to their robustness and flexibility. Nevertheless, for small or localized installations, DT and KNN remain viable choices due to their simplicity and ease of integration with limited resources.
Although the support vector machine (SVM) model provided the best predictive accuracy (92.1%) and F1-score (91.7%), the decision tree (DT) model has the strength of strong interpretability. Its rule-based framework provides explicit logic (e.g., “If humidity > 70% and irradiance < 300 W/m2, then no cleaning is needed”), which can be explicitly presented to maintenance personnel through a graphical display and permit manual override or verification in field environments.
The final system integrates the SVM model into the onboard control of the robotic cleaning device, with local inference on edge hardware from real-time sensor observations. The architecture enables low-latency decision-making independent of continuous connectivity. Model updates and retraining from novel environmental and performance data are centralized and sent out periodically to the robot to mitigate seasonal fluctuations.
The system predicts every five minutes to be in tandem with sensor sampling frequency. To correct misclassifications, there is a feedback loop through which working outcomes are logged and used to enhance future versions of the model. Significantly, trade-offs between errors are defined quantitatively as follows: false negatives (false cleanings) produce cumulative energy losses of 3–5% for a 2–3-day period, while false positives (over-cleaning) produce unnecessary water consumption, machine wear and tear, and increased operational costs, particularly in water-scarce areas. To this end, the system is structured to reduce false negatives as a means of conserving energy yield and achieving an economic return on investment.

6. Conclusions and Future Work

This study proved that standard machine learning models can indeed help in intelligent, data-driven solar (PV) maintenance using real-time environmental and performance information. Among the four models tested—support vector machine (SVM), logistic regression (LR), decision tree (DT), and k-nearest neighbors (KNN)—the SVM gave the best performance with 92.1% accuracy and an F1-score of 91.7%. This is due to its capacity to handle high-level, non-linear relationships in the data. Logistic regression was also effective, easy to understand, and fast to implement and is, thus, useful for real-world deployments.
Although DT and KNN were less precise, they remain useful where simple and fast models are needed, especially in low-capacity or small systems. These results confirm that AI-based systems can reduce manual cleaning, conserve resources, and keep solar panels operating efficiently, especially in regions with high dust.
However, the current work did not compare the performance of the chosen models with current deep learning approaches such as CNN-LSTM, as seen in Table 2. This will be addressed in future work through the extensive evaluation of prediction accuracy and computational feasibility for use in real-time robots.
Future work will be focused on two short-term extensions. This first is using experimentation with LSTM networks with the existing dataset to identify temporal dependencies and observe if they are good at time-series prediction of soiling build-up and power loss. The second is system deployment scaling at a larger commercial solar farm to capture performance, inference latency, and model generalizability in a more heterogenous and complicated environment. These attempts will be used to validate the system’s scalability and dependability for broader application with a range of PV installations.
Further enhancements would include incorporating image-based sensing in addition to environmental information and more precise detection. These next steps will allow for the transition from a proof-of-concept system to a fully deployable real-world solution.
One of the significant limitations is that all experiments were conducted on data from a single Omani site, which could limit model generalizability. Future studies will involve model validation across datasets covering different climate zones, e.g., tropical, temperate, and desert climates, to validate cross-regional strength. If test results deteriorate, domain adaptation methods such as fine-tuning, feature alignment, or adversarial adaptation will be explored. This is required in order to ensure the effectiveness of the model in different soiling dynamics and conditions to provide higher deployment in different PV installations.

Author Contributions

A.A.-H., principal investigator, methodology, resources, conceptualization, analysis, and writing—review and editing. E.K., methodology, analysis, and writing—draft preparation. Z.A.A.H., analysis, writing—review and editing, and investigation. P.J., supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available upon request and agreement.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mekhilef, S.; Saidur, R.; Kamalisarvestani, M. A review on solar energy use in industries. Renew. Sustain. Energy Rev. 2012, 16, 4567–4575. [Google Scholar] [CrossRef]
  2. Al-Helal, A.; Ali, H.; Akbar, M. The effect of dust deposition on the performance of photovoltaic systems in arid environments. Energy Rep. 2017, 3, 65–72. [Google Scholar] [CrossRef]
  3. Mina, S.; Erfan, D.; Amal, B.; Mouhaydine, T.; Oumaima, M.; Fernando, J. The Impact of Dust Deposition on PV Panels’ Efficiency and Mitigation Solutions: Review Article. Energies 2023, 16, 8022. [Google Scholar] [CrossRef]
  4. Zhao, J.; Liu, Y.; Zhang, L. Effects of environmental factors on the performance of photovoltaic systems: A review. Renew. Sustain. Energy Rev. 2016, 57, 869–875. [Google Scholar] [CrossRef]
  5. Sari, M.M.; Ismail, A.; Rahman, M. Impact of temperature on the performance of photovoltaic panels: A review. Renew. Sustain. Energy Rev. 2019, 50, 715–724. [Google Scholar] [CrossRef]
  6. Parida, B.; Sahoo, L.; Pradhan, S. Predictive maintenance of photovoltaic systems: A novel approach. Sol. Energy Mater. Sol. Cells 2015, 141, 87–95. [Google Scholar] [CrossRef]
  7. Hernandez, J.; Pardo, A.; Villanueva, S. Predictive maintenance of photovoltaic installations using machine learning. Energy Procedia 2016, 105, 347–352. [Google Scholar] [CrossRef]
  8. Burgess, P.; Goulden, M.; Hartmann, A. Predictive maintenance for photovoltaic systems: A review of current practices and future potential. Renew. Sustain. Energy Rev. 2018, 91, 539–549. [Google Scholar] [CrossRef]
  9. Mani, M.; Pillai, R. Impact of dust on solar photovoltaic (PV) performance: Research status, challenges and recommendations. Renew. Sustain. Energy Rev. 2010, 14, 3124–3131. [Google Scholar] [CrossRef]
  10. Sayyah, A.; Horenstein, M.N.; Mazumder, M.K. Energy yield loss caused by dust deposition on photovoltaic panels. Sol. Energy 2014, 107, 576–604. [Google Scholar] [CrossRef]
  11. Ilse, K.A.; Micheli, L.; Deceglie, M.G.; Muller, M.; Kurtz, S. Techno-economic assessment of soiling losses and mitigation strategies for solar power generation. Joule 2019, 3, 2303–2321. [Google Scholar] [CrossRef]
  12. Sarver, T.; Al-Qaraghuli, A.; Kazmerski, L.L. A comprehensive review of the impact of dust on the use of solar energy: History, investigations, results, literature, and mitigation approaches. Renew. Sustain. Energy Rev. 2013, 22, 698–733. [Google Scholar] [CrossRef]
  13. Cordero, R.R.; Damiani, A.; Laroze, D.; Macdonell, S.; Jorquera, J.; Sepúlveda, E.; Feron, S.; Llanillo, P.; Labbe, F.; Carrasco, J.; et al. Effects of soiling on photovoltaic (PV) modules in the Atacama Desert. Sci. Rep. 2018, 8, 13992. [Google Scholar] [CrossRef] [PubMed]
  14. Mellit, A.; Kalogirou, S.A. Artificial intelligence techniques for photovoltaic applications: A review. Prog. Energy Combust. Sci. 2008, 34, 574–632. [Google Scholar] [CrossRef]
  15. Ren, G.; Wu, X.; Chen, X.; Zhang, Y. Predictive maintenance for PV systems using machine learning: A review. Energies 2021, 14, 1103. [Google Scholar] [CrossRef]
  16. Abhishek, K.; Singh, S.; Sharma, A. A hybrid model for solar panel cleaning scheduling using environmental monitoring and machine learning. Renew. Energy 2022, 193, 452–463. [Google Scholar] [CrossRef]
  17. Shobha, G.; Kumar, M.; Shah, A. Machine learning-based predictive analytics for solar panel soiling detection and cleaning optimization. Sol. Energy 2023, 261, 28–37. [Google Scholar] [CrossRef]
  18. Kannan, A.; Kumar, A.; Rajendran, R. Autonomous robotic cleaning systems for solar panels: A smart maintenance approach. Int. J. Renew. Energy Res. 2021, 11, 2031–2039. [Google Scholar]
  19. Sufyan, Y.; Samikannu, R.; Sidique, G.; Wetajega, S.; Victor, O.; Abdul-kadir, S.; Getachew, W. A Holistic Review of the Effects of Dust Buildup on Solar Photovoltaic Panel Efficiency. Solar Compass 2024, 13, 100101. [Google Scholar] [CrossRef]
  20. Ait Ouyahia, S.; Hachicha, M.; Fadlallah, S. Impact of dust on the performance of photovoltaic systems in arid climates. Renew. Energy 2023, 180, 1180–1192. [Google Scholar] [CrossRef]
  21. Elamim, A.; Sarikh, S.; Hartiti, B.; Benazzouz, A.; Elhamaoui, S.; Ghennioui, A. Experimental studies of dust accumulation and its effects on the performance of solar PV systems in Mediterranean climate. Energy Rep. 2024, 11, 2346–2359. [Google Scholar] [CrossRef]
  22. Alnaser, W.E.; Alnaser, N.W.; Abdel-Rahman, E. Environmental impacts of dust on the performance of photovoltaic panels in the Middle East. Energy Rep. 2022, 8, 409–415. [Google Scholar] [CrossRef]
  23. Hanna, V.; Mark, W. Effect of Dust Composition on the Reversibility of Photovoltaic Panel Soiling. Environ. Sci. Technol. 2021, 55, 1984–1991. [Google Scholar] [CrossRef]
  24. Samra, S.K.; Shukla, A.; Kumar, R. Evaluation of solar panel soiling losses and potential cleaning techniques in desert climates. Energy Rep. 2023, 9, 457–468. [Google Scholar] [CrossRef]
  25. Sharma, A.; Aggarwal, P.; Joshi, A. Performance analysis of soiled photovoltaic panels under varying environmental conditions. J. Sol. Energy Eng. 2021, 143, 011004. [Google Scholar] [CrossRef]
  26. Saleh, S.; Alnaser, N.; Mustafa, O. Seasonal variation in soiling losses on photovoltaic panels in desert climates. Energy 2022, 240, 122467. [Google Scholar] [CrossRef]
  27. Alashrah, M.; Abdullah, M.; Khaled, A. Robotic cleaning systems for photovoltaic panels: Challenges and opportunities. Sol. Energy 2022, 233, 205–215. [Google Scholar] [CrossRef]
  28. Kurd, A.M.; Petrov, V.; Rashid, M. Comparative analysis of cleaning techniques for solar panels in high-dust environments. Renew. Sustain. Energy Rev. 2022, 163, 112453. [Google Scholar] [CrossRef]
  29. Shah, H.; Islam, T.; Raza, S. Machine learning-based predictive analytics for optimal cleaning schedules in solar PV systems. Renew. Sustain. Energy Rev. 2023, 164, 112451. [Google Scholar] [CrossRef]
  30. Zhou, Y.; Zhang, H.; Liu, Y. A review of artificial intelligence applications in predictive maintenance of renewable energy systems. Energy AI 2022, 8, 100148. [Google Scholar] [CrossRef]
  31. Singh, A.; Kumar, P.; Yadav, R. AI-based predictive maintenance systems: Trends, applications, and future directions. Expert Syst. Appl. 2023, 214, 119167. [Google Scholar] [CrossRef]
  32. Al-Waeli, A.H.A.; Sopian, K.; Kazem, H.A.; Chaichan, M.T. Review of AI applications in the prediction and maintenance of photovoltaic systems. Renew. Sustain. Energy Rev. 2022, 160, 112291. [Google Scholar] [CrossRef]
  33. Ghosh, S.; Roy, R.; Saha, H. Machine learning models for fault diagnosis in photovoltaic arrays: A comparative review. J. Clean. Prod. 2023, 403, 136715. [Google Scholar] [CrossRef]
  34. Ramadhan, R.; Hossain, M.J.; Niknam, T. Real-time predictive analytics for PV system efficiency using recurrent neural networks. Appl. Energy 2022, 308, 118362. [Google Scholar] [CrossRef]
  35. Khan, S.; Ahmed, R.; Saleem, M. Hybrid predictive maintenance framework for PV systems using weather and performance data. Renew. Energy 2022, 191, 570–581. [Google Scholar] [CrossRef]
  36. Aziz, M.A.; Hossain, M.M.; Reza, M.I. Artificial neural networks for predictive maintenance and cleaning schedule optimization of PV systems. Energy Rep. 2023, 9, 823–832. [Google Scholar] [CrossRef]
  37. Zonta, T.; Da Costa, C.A.; da Rosa Righi, R.; de Lima, M.J.; Da Trindade, E.S.; Li, G.P. Predictive maintenance in the Industry 4.0: A systematic literature review. Comput. Ind. Eng. 2020, 150, 106889. [Google Scholar] [CrossRef]
  38. Zhou, X.; Wang, Y.; Tang, J. Explainable AI for predictive maintenance in photovoltaic power plants. Renew. Energy 2023, 213, 1207–1219. [Google Scholar] [CrossRef]
  39. Al-Humairi, A.; Khalis, E.; Al Hemyari, Z.A.; Jung, P. The Impact of Data Augmentation on AI-Driven Predictive Algorithms for Enhanced Solar Panel Cleaning Efficiency. Processes 2025, 13, 1195. [Google Scholar] [CrossRef]
  40. Al-Humairi, A.; El Asri, H.; Al Hemyari, Z.A.; Jung, P. Assessing the Features of PV System’s Data and the Soiling Effects on PV System’s Performance Based on the Field Data. Energies 2025, 17, 4419. [Google Scholar] [CrossRef]
Figure 1. AI-based predictive maintenance system for PV panels.
Figure 1. AI-based predictive maintenance system for PV panels.
Ai 06 00133 g001
Figure 19. DC power and cleaning needed over time.
Figure 19. DC power and cleaning needed over time.
Ai 06 00133 g019
Figure 20. DC current and cleaning needed over time.
Figure 20. DC current and cleaning needed over time.
Ai 06 00133 g020
Figure 21. DC voltage and cleaning needed over time.
Figure 21. DC voltage and cleaning needed over time.
Ai 06 00133 g021
Figure 22. Correlation matrix of environmental and system performance variables in the solar power system.
Figure 22. Correlation matrix of environmental and system performance variables in the solar power system.
Ai 06 00133 g022
Figure 23. Model performance metrics comparison.
Figure 23. Model performance metrics comparison.
Ai 06 00133 g023
Table 1. Summary of soiling impact studies on photovoltaic systems.
Table 1. Summary of soiling impact studies on photovoltaic systems.
Region ClimateSoiling FactorPerformance ImpactObservationsReference(s)
North Africa AridDust accumulation, wind speed, humidityUp to 25% energy lossSoiling rates are highly seasonal and region-dependent[19,22]
Middle East
Arid to Semi-Arid
Particle density, dust stormsUp to 30% energy yield reductionCleaning frequency significantly affects system ROI[21,23]
Egypt DesertSeasonal dust variation, precipitation10–20% monthly variation in soiling lossRainfall events temporarily restore performance[24,26]
India Semi-AridRelative humidity, dust type, temperaturePerformance degradation up to 22%Electrostatic dust adhesion increases with heat and RH[22,25]
Gulf Region
Hyper-Arid Desert
Dust density, wind direction, cleaning intervalsHigher losses in dry seasonRobotic cleaning found more effective than manual methods[27,28]
Bahrain Dry DesertPM10 concentration, irradiance, windSoiling causes 15–20% drop in efficiencySoiling severity correlates with PM levels and wind[22,23]
China Urban/RuralDust mineral composition, surface temperatureUp to 18% degradationDarker particles cause greater absorption and heating[25]
Egypt Predominantly Arid DesertAmbient temperature, air pressure, windNoticeable efficiency drop within 2 weeksFrequent light cleaning suggested in high-dust periods[26,27]
Table 2. Overview of some AI/ML techniques for predictive maintenance in photovoltaic systems.
Table 2. Overview of some AI/ML techniques for predictive maintenance in photovoltaic systems.
AI/ML AlgorithmInput FeaturesObjectiveResults ImpactReference(s)
Random Forest, SVMTemperature, irradiance, humidity, soiling indexPredict power loss due to soilingHigh accuracy, robust to noise[32,33]
Hybrid ML (SVM + KNN)Weather data + performance dataOptimize cleaning decisionsImproved decision thresholds[35]
Neural NetworksTime-series PV output, dust data, temperatureForecast power degradationHigh predictive accuracy[36]
Deep Learning (CNN + LSTM)Sensory data, images, historical performanceFault detection and maintenance planningHandles spatial-temporal data[34]
Recurrent Neural NetworksReal-time environmental + operational dataPredict energy output and cleaning needsSuitable for real-time forecasting[36,37]
Ensemble Learning ModelsMultivariate environmental and performance datasetsGeneral predictive maintenanceAdaptable across systems[37]
ML Model ComparisonPV current/voltage, irradiance, weatherFault classificationEvaluates model effectiveness across fault types[38]
Explainable AIEnvironmental and maintenance logsModel interpretability in maintenanceImproves trust and operational integration[38]
Table 3. Comparison of model performance metrics.
Table 3. Comparison of model performance metrics.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)
Logistic Regression (LR)89.487.591.289.3
K-Nearest Neighbors (KNN)83.281.984.583.2
Decision Tree (DT)85.783.287.485.3
Support Vector Machine (SVM)92.190.493.191.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Humairi, A.; Khalis, E.; Al-Hemyari, Z.A.; Jung, P. Machine Learning-Based Predictive Maintenance for Photovoltaic Systems. AI 2025, 6, 133. https://doi.org/10.3390/ai6070133

AMA Style

Al-Humairi A, Khalis E, Al-Hemyari ZA, Jung P. Machine Learning-Based Predictive Maintenance for Photovoltaic Systems. AI. 2025; 6(7):133. https://doi.org/10.3390/ai6070133

Chicago/Turabian Style

Al-Humairi, Ali, Enmar Khalis, Zuhair A. Al-Hemyari, and Peter Jung. 2025. "Machine Learning-Based Predictive Maintenance for Photovoltaic Systems" AI 6, no. 7: 133. https://doi.org/10.3390/ai6070133

APA Style

Al-Humairi, A., Khalis, E., Al-Hemyari, Z. A., & Jung, P. (2025). Machine Learning-Based Predictive Maintenance for Photovoltaic Systems. AI, 6(7), 133. https://doi.org/10.3390/ai6070133

Article Metrics

Back to TopTop