1. Introduction
The world has witnessed a significant increase in demand from classical energy sources, such as oil, which was characterized by rising cost due to the dependence of all industries on oil, directly or indirectly, and this was accompanied by the steady growth of human wealth and then the expansion of industry in a steady manner. In addition, the need has emerged in the past two decades, and continuous emphasis has been placed on, for developing the energy generation systems free of greenhouse and gas emissions because these gases cause many problems on the planet, including climate change and global warming.
One of the most promising sources of clean energy is photovoltaic energy (PV), which is characterized by a low cost and the availability of its main resource—sunlight—in a great way. Therefore, this energy has shown significant environmental and economic benefits when compared to fossil resources [
1,
2].
It is well-known that the “solar energy and photovoltaic (PV) systems became an essential part of the global energy profile” [
3], and the uses of renewable energy in the world have become common and increasingly important for all countries. Currently, the interest in renewable energy is increasing as it is considered an essential resource for future energy worldwide. At the same time, the contribution of solar energy to electricity production in the world is still low, amounting to only 3.6%. Despite this small percentage, solar energy has become the second most installed renewable energy resource after hydroelectric energy [
4,
5,
6,
7].
The world is experiencing a state of widespread consumption of fossil fuels, estimated at high rates approaching 80% of global energy consumption. In addition, the exhaustion of this type of fuel is occurring more rapidly than what we previously expected. Therefore, this problem must be addressed urgently;it requires unconventional actions and plans; the most important action remains the search for sustainable alternatives, like renewable energy (ibid). Hence, the solar energy option is a suitable alternative with a promising future. Such an option requires great efforts on the part of the countries which aspire to development of solar energy. This essentially involves the expansion the construction of PV power plants. Furthermore, research efforts should be focused on how to develop this promising source in any country which aspires to the development of PV.
PV systems play a pivotal role in sustainable energy production, offering a clean and renewable alternative to conventional energy sources. However, the efficiency of solar panels, a critical component of PV systems, can be significantly affected by various environmental factors. One such factor that poses a considerable challenge to solar panel performance is the accumulation of soiling, encompassing dust, dirt, pollen, and other particulate matter on the panel surface. The impact of soiling on solar panel efficiency has been a subject of extensive research due to its adverse effects on the energy output and economic viability.
Indeed, solar panel systems are exposed to a wide range of environmental conditions, including dust storms, urban pollution, agricultural activities, and seasonal variations, all of which contribute to the accumulation of soiling on panel surfaces. The gradual buildup of dirt and debris can diminish the transparency of the cover glass, reduce sunlight absorption, and impede heat dissipation, which leads to decreased energy conversion efficiency. Additionally, soiling may exacerbate issues such as hot spots and potential-induced degradation, further compromising the long-term reliability and performance of PV systems. Also, experiments have shown that one of the main factors affecting the efficiency of PV panels, and then the level of performance or production, is the amount of pollution that occurs in those panels, which shows the amount of dirt that accumulates in the solar panels due to environmental factors such as wind and rain, etc. [
1,
2].
Moreover, understanding the magnitude and dynamics of soiling effects is crucial for optimizing the design, maintenance, and operation of solar panel installations. By quantifying the relationship between the soiling levels and the energy output and studying the characteristics/properties of the random variables which affect the process of production of the PV systems, the researchers and practitioners can develop effective cleaning strategies. The ways in which this can be carried out is through studying the significant random variables which effect the level of production of PV systems and modeling the data of PV systems accurately, the predictive future production, and the performance monitoring protocols to maximize the level of production, mitigate the adverse effects of soiling, and enhance the overall reliability and profitability of solar energy systems.
While laboratory studies provide valuable insights into the effects of soiling on solar panels under controlled conditions, real-world performance data are essential for a comprehensive understanding of the phenomenon. Field experiments conducted on operational PV systems offer a unique opportunity to assess the actual impact of soiling in diverse environmental settings and under varying operating conditions.
It is widely recognized that PV systems comprise diverse components, some with electrical characteristics, others exhibiting thermal behaviors, and others possessing mechanical properties, often with considerable complexity. Consequently, evaluating the efficiency of solar systems solely through physical models is impractical. Hence, numerous studies resort to mathematical or experimental models to elucidate the performance of different components within solar systems [
8,
9,
10].
The process of modeling the performance of PV systems depends entirely, in one way or another, on the components of the mathematical model to be developed and whenever the components affect the efficiency of the PV panels positively—in order to focus on them—or negatively—in order to avoid them or reduce their impact. In addition, the process of modeling solar energy systems requires two types of important inputs. In the first type, the design specifications and parameters or factors (variables) of the environmental data must be taken into account (solar radiation, irradiance, current, power, temperature (Average Solar Panel), humidity, air pressure, wind speed and direction, volume of snow, and so on) [
11,
12,
13,
14], which are expressed as descriptive data, knowing that this description is very misleading because these data do not describe any property or show any characteristic of the data (ibid).
Moreover, it should be noted that the established models for analyzing solar data encompass various methodologies such as, but not limited to, Linear models, Non-linear models, Unrestricted models, Polygonal models, Triple estimation models, Fuzzy-genetic models, and Neural Network models [
15]. Nonetheless, numerous misconceptions surround the behavior of these variables, with one prevalent belief being the assumption of static environmental conditions. This presumption often leads to the utilization of historical data to estimate the efficiency or production of PV systems. However, with the undeniable realities of climate change and global warming, relying solely on past environmental data becomes futile. The shifting landscape renders historical data unreliable, thereby jeopardizing the accuracy of PV system estimates (ibid).
From these observations, it becomes evident that a thorough examination of both current and historical environmental data is imperative before embarking on any modeling or estimation of solar systems [
16,
17,
18,
19,
20,
21,
22,
23]. Furthermore, the literature on this subject highlights a significant gap in the scientific research: the dearth of studies that delve into the intricacies of solar system-related data.
In this research, a framework is proposed for how to examine the data related to the environmental factors of solar systems. It provides the basic basis for understanding the attributes and properties of the data so that it can be modeled by models that suit the nature of the data which give the most accurate and efficient results. As an application of the proposed framework, Shams Solar Facility will be taken as an example to study the various properties and characteristics of environmental variables.
This paper presents an experimental investigation focused on evaluating the effects of soiling on the performance of solar panel systems using real-life data collected from installed arrays. In this paper, we present the methodology and results of our experimental study, which involved monitoring the performance of solar panel systems installed in the German University of Technology in Oman (GUtech) campus over an extended period. By analyzing the correlation between the soiling levels, environmental parameters, and energy production data, we aim to elucidate the complex interplay between soiling and solar panel performance in real-world conditions.
The structure of this paper is as follows.
Section 2 presents a review of the most pertinent works.
Section 3 discusses the methodology followed in this research.
Section 4 presents the results and the discussion. Lastly,
Section 5 and
Section 6 discuss the conclusions and future studies, respectively.
2. Previous Works
It is useful to point out that the available literature on the analysis and study of PV systems is vast and diverse. At the same time, there are many studies that have dealt with it in a preliminary manner, there are many studies that have dealt with it from a physical side, and there are many others that have dealt with the subject through mathematical and statistical models as well. Therefore, in this section, we will review some of the available papers, and we will focus on a portion of the general papers, and papers related to mathematical and statistical applications of the PV systems.
Solar energy production is a multifaceted field that encompasses various aspects and the related published work of this topic is extensive. Therefore, the related published papers will be implicitly divided into a number of areas, and this section discusses the most relevant works.
PV systems have been the subject of extensive research in recent years, with a focus on improving their performance, efficiency, and cost-effectiveness [
24]. This was driven by the increasing demand for renewable energy sources and the need to reduce greenhouse gas emissions. The literature highlights the growth of the PV industry, with a focus on improving the efficiency and cost-effectiveness of PV systems [
25,
26]. In fact, research has been conducted on various aspects of PV systems, including the development of new materials and device structures, enhancements in power conversion efficiency, and the integration of PV systems with other technologies such as energy storage systems [
26].
Moreover, one significant area of progress is the development of new materials and device structures for PV cells, such as perovskite solar cells and thin-film technologies [
26]. In fact, these advancements have led to improvements in the power conversion efficiency and have reduced manufacturing costs. Additionally, research has also been conducted on enhancing the performance of PV systems under various operating conditions, such as high temperatures and partial shading.
In addition to technical and economic aspects, the literature also addresses environmental and sustainability issues related to PV systems. A systematic literature review by [
27] examines the solar PV value chain, including the performance and environmental impact of PV installations throughout their lifetime and at the end-of-life stage.
A significant area of investigation across these studies is the impact of dust deposition on solar PV panels, a critical concern for installations in arid and semi-arid regions. Dust accumulation on PV panels is, indeed, a significant concern, particularly in regions with a high dust intensity such as the Middle East, North Africa (MENA), and parts of Asia [
28]. In fact, dust deposition reduces the transparency of the panel surface, preventing sunlight from reaching the solar cells and decreasing the output power and efficiency of the PV system [
29]. It may be mentioned that the problem of soiling on PV panels has been studied based on the socio-environmental parameters using different models [
1].
Refs. [
30,
31] both conduct detailed analyses of how dust accumulation affects PV efficiency, with [
30] focusing on Iran’s desert environment and [
31] examining the situation in Qatar. Despite the geographical differences, both studies agree on the substantial negative impact of dust on PV performance, highlighting the necessity for effective cleaning strategies. The composition and morphology of dust particles, as investigated by both studies, they reveal that the specific characteristics of dust vary by region, which influences the degree of efficiency and loss and suggests the need for localized mitigation strategies.
By further exploring the ramifications of environmental factors on solar energy systems, Ref. [
32] delves into the economic impacts of soiling on CPV modules in Spain, emphasizing the financial losses incurred from decreased efficiency due to dust accumulation. This economic perspective is complemented by [
4], who expand the discussion to the economic analysis of hybrid energy systems, underscoring the potential for such systems to mitigate the effects of environmental variables on solar energy reliability and cost-effectiveness.
Ref. [
32] add a broader perspective by discussing dust-related challenges in both the MENA region and India, highlighting the universal challenge of soiling across diverse geographical landscapes. Their overview suggests that the quest for solutions is a global endeavor, with innovations in cleaning technologies and system optimization holding the key to overcoming these barriers.
It may be mentioned here that the two basic models of the equivalent circuits (Electro-Optical Mode) of a solar cell have been developed for the problem of the soiling of PV systems, namely the single-diode model, the two-diode models, and the Three-Diode Model, which have been developed by Asbayou et al., 2023, and Kumar and Mary, 2022, respectively [
33,
34].
Ref. [
35]’s study aimed to contribute to an improved understanding of using paraffin as a phase change material and its application in storing solar energy. Paraffin has a high latent heat capacity with stable properties, and it is the most widely used building material in storing solar energy. Hence, it requires the correct knowledge of its spectral properties to optimize this material’s performance. In this work, a multi-thickness model is proposed with an advanced heuristic algorithm, IQPSO, to perform an inverse task in measuring the spectral refractive index and absorption index with very high accuracy. Numerical experiments conducted in this study indicate that the inverse technique is one of the potential tools characterizing paraffin materials, and the process is key to further improving the performance of solar energy storage systems and the efficiency of building energy.
Moreover, Ref. [
36] contributed to the solving of coupled radiation and conduction heat transfer problems by coupling the discrete ordinate method with the finite volume method. The researchers demonstrated that, with the use of the temperature data as an input to SFSM, it is possible to provide a transient heat flux estimate with reasonable accuracy for various semitransparent materials. Its use in heat fluxes that change abruptly revealed some limitations of SFSM. The paper examined further the effects of future time steps and measurement errors on the heat flux estimate accuracy, advocating that a future time step of 3 is appropriate for optimal results, even in increased random deviations of up to 1.5. Similarly, Ref. [
37] presented the new hybrid optimization technique where DFIM is coupled with the Sequential Quadratic Programming algorithm for reconstructing simultaneously thermal boundary conditions and physical properties of participating media. It associated the application of the finite volume method together with the discrete ordinate method for this coupled radiation–conduction heat transfer problem. As such, this will be formulated as an inverse problem, using the data for the surface temperature and exit radiative intensity in the reconstruction process.
It may be worth mentioning that the above models are not in the direction of this paper.
From another perspective, recent research has focused on innovative methods for dust detection and measurement, including the use of artificial intelligence techniques, drones, and sensors to optimize cleaning schedules [
32]. Computational fluid dynamics (CFD) simulations have also been used to analyze the impact of wind-blown sand on PV arrays [
38].
Several studies have focused on developing mathematical models for PV systems to simulate their output characteristics and predict their performance. These models are essential for optimizing PV system design and operation [
39]. Indeed, research in the area of solar PV technology has focused on the modeling, economic feasibility, and efficiency of solar energy systems. The results of a number of pieces of research are combined in this part of the review to give a cogent summary of the progress made in comprehending and maximizing PV system performance in the face of financial and environmental difficulties.
One of the key papers in this area is by [
40], which proposes a mathematical model for PV panels in the range of 10–25 V with approximately 50 W of power generation that accurately represents the I-V and P-V curves of PV modules and can be used for maximum power point tracking (MPPT) algorithms [
40].
The problem of “determining the reliability, maintainability, and availability of solar panels” [
8] was carried out by [
8] in Indonesia using standard deviation (σ) and the Gamma Probability Density Function (PDF). Ref. [
41] modeled the production of PV systems by developing several statistical techniques.
In addition, the problem of “evaluating the availability of solar energy in the NEOM region on a quantitative and qualitative basis, and a database of weather conditions such as temperature and wind speed is collected and processed” [
11] by calculating the arithmetical average of solar radiation and weather conditions for each day of the NASA Power data from 2017 to 2021. Also, Ref. [
12] developed a machine learning technique based on several statistical measures in order to forecast a PV generation system in Manchester Metropolitan University, UK. Moreover, Ref. [
42] developed a forecasting statistical technique for PV production using quantile regression and a non-parametric approach. In addition, this problem is a case study conducted at the University of Liège, Belgium, to study the development of the university’s PV solar power plant’s production. Likewise, Ref. [
33] developed mathematical models based on the tilt angle to model the energy produced by solar panels, and the data were collected from ten solar panels with different angles.
Ref. [
43] developed a probabilistic PV forecaster utilizing deep learning methods. The input of the deep learning models is based on the weather forecasts from the regional climate model provided by the Lab. In a more recent paper, Ref. [
44] discussed the sequential modeling, simulation, and analysis of a PV module using MATLAB/Simulink 2020a. The authors develop a mathematical model for the PV module and validate it through simulation and experimental results that can be used to analyze the impact of various environmental factors, such as temperature and irradiance, on PV performance [
45].
The juxtaposition of these studies reveals a consensus on the critical impact of environmental soiling on PV system performance, alongside a recognition of the economic implications of such efficiency losses. Moreover, the exploration of hybrid energy systems by [
46] introduces an additional dimension of solution-oriented research, pointing towards the integration of multiple renewable energy sources as a strategy to enhance system resilience and economic viability.
3. Processes and Methods
It may be worthy of mention that the logic flowchart diagram of the proposed processes and methods of this paper including the steps and techniques which are to be followed and discussed in this Section, as well as the meaning of each step, is given in
Figure 1.
3.1. Problem Selection
The research problem, as explained in
Section 1, is summed up by proposing a framework for how to examine and scrutinize the features and characteristics of historical and actual data for environmental factors that affect the performance of solar systems, and also to investigate the intricate relationship between soiling and the solar panel performance. This is necessary in order to model the data for these factors according to the appropriate mathematical/empirical models that take into account and fit the features and properties of the historical and actual data.
This research investigates the performance of two identical sets of strings in PV systems, each consisting of 9 solar panels connected in series to function as a single large panel. The panels in both strings share the same tilt, direction, and physical properties to ensure a uniform baseline for comparison. The only difference between the two strings is that one is kept clean (String 1) while the other accumulates dust over time (String 2). Therefore, this study focuses on analyzing various parameters, including DC_Current_String1 and DC_Current_String2, DC_Power_String1 and DC_Power_String2, and DC_Voltage_String1 and DC_Voltage_String2, to assess their behavior under different conditions. By examining these parameters, this research aims to understand how environmental factors, such as dust accumulation, influence the efficiency and electricity production of PV systems. The results provide insights into the impact of maintenance and environmental conditions on PV system performance.
It is worth mentioning that the power, current, and voltage data are collected from the Alberter and downloaded automatically at five-minute intervals. This process is fully automated to ensure consistent and reliable data capture. Other environmental parameters like wind speed and temperature are also recorded automatically through different sensors. Moreover, all measurement devices are interconnected with MetaControl, from which we extracted the collected data for analysis.
3.2. The Environmental Factors
Drawing from an extensive literature review on the environmental factors impacting solar PV systems [
9,
11,
12,
14,
47,
48,
49], the considered factors encompass an array of variables such as the DC current, DC power, DC voltage, irradiance, temperature, air pressure, relative humidity, absolute humidity, wind speed, and wind direction. Despite Muscat’s high temperatures, abundant sunlight, and minimal cloud cover, the daily electrical energy output from solar sources exhibits variability dictated by specific standards and instantaneous environmental conditions. This variability hinges on a multitude of factors or parameters, as documented in prior studies [
48,
49]. Key among these factors are irradiance, temperature, air pressure, and relative humidity, along with the aforementioned variables. This variability underscores the complexity inherent in assessing solar energy generation and highlights the significance of comprehensively understanding and accounting for environmental influences in PV system analysis.
3.3. Data Resources
For this study, the data from Shams Solar Facility, located in the premises of the German University of Technology in Oman (GUtech), were collected as a representative sample to illustrate the application of the proposed guidelines. The choice of Shams Solar Facility was deliberate, considering its prominence and relevance within the renewable energy sector. Moreover, the time period considered spans from 3 April 2021 at 9:00 AM to 12 March 2022 at 5:05:00 PM. It may be worth mentioning that a big data-set containing 36,851 observations of each parameter (factor) was collected for this study.
The solar training facility, shown in the above figure at the German University of Technology in Oman (GUtech), is a composite of a ground-mount solar system with 20 solar modules that generate around 6 kWp, installed in a portrait faced to the south orientation. A pitched-roof solar system that generates around 3900 kWp is installed in a landscape position to the sun with 12 solar modules oriented to the south and two flat-roof solar systems with both composites of 12 solar modules, and each generating around 3900 kWp of power, with one system faced to the south with a portrait orientation and the other system with half of the solar modules faced to the east and the other half faced to the west. This site is a testament to the collaboration between Shams Global Solutions (SGS) and BP Oman’s Social Investment Program. Equipped with four different solar systems that can generate 18 kWp (kilowatt peak) of solar power and four training zones, including a dedicated interface protection zone, the facility is designed to meet the exact specifications of the Authority for Electricity Regulation’s (AER) grid-connected solar regulations that were released in 2017.
Figure 1 showcases the solar facility.
Moreover, the dataset comprises various pertinent variables including, but not limited to, energy production, environmental factors, operational parameters, and economic indicators. The size and scope of the dataset provides ample opportunities for comprehensive analysis and meaningful insights into the dynamics of solar energy production. Access to reliable and extensive resources pertaining to Shams Solar Facility facilitated a thorough examination of the subject matter, ensuring the robustness and credibility of the study.
3.4. Data Checking
In the introduction, we highlighted the necessity of scrutinizing data pertaining to factors affecting solar stations. Given the multifaceted influences on these factors’ features and properties, data integrity is paramount. It is imperative to acknowledge that some values may be lost or become outliers due to various phenomena. Therefore, an examination is crucial to ensure the inclusion of only valid data points in our analysis, enhancing the accuracy and reliability of our findings.
3.5. Data Analysis
Many complex mathematical techniques were carefully applied to reveal the subtleties and underlying properties of the gathered dataset. In order to fully understand its complexity, these mathematical techniques should be taken into account in modeling the PV System’s performance. These statistical measures were carefully selected to examine different aspects of the dataset, such as its distributional features, central tendency, and even possible correlations across variables [
16,
17,
18,
19,
20,
21,
49].
The use of descriptive statistical metrics, such as the variance, skewness, kurtosis, mean, and median, allowed for a deeper comprehension of the underlying patterns and fluctuations in the dataset. The median presented a strong measure of a central tendency that was less vulnerable to the impact of outliers, whereas the mean just gave a snapshot of the average value. Moreover, the variance unveiled the extent of dispersion among data points, shedding light on the dataset’s variability, while skewness and kurtosis unveiled the asymmetry and shape of its distribution, respectively (ibid).
Moreover, the utilization of the Pearson correlation coefficient facilitated a more profound investigation of the linear correlations between variables, revealing possible links that could have remained hidden in other circumstances. This made it easier to understand the interdependencies within the dataset more comprehensively and provided insightful knowledge about the dynamics controlling the generation of solar energy (ibid).
The use of normality tests, such as the Kolmogorov–Smirnov test, was essential in guaranteeing the reliability of ensuing statistical studies. Through the thorough examination of the dataset’s normalcy assumption, these tests established a strong basis for interpreting the results, so augmenting the validity and dependability of the research findings. Furthermore, the use of graphs illustrating kernel density estimation provided an intuitive visual depiction of the probability density function of the dataset. Through these graphical depictions, intricate distributional characteristics were brought to light, providing deeper insights into the underlying patterns and structures inherent in the data (ibid).
These analytical methodologies were employed in a systematic manner to address the objectives of the study comprehensively. By meticulously applying these advanced statistical techniques, this research endeavors to offer profound insights into the intricate dynamics and patterns governing solar energy production. Ultimately, these insights are poised to inform enlightened decision making and foster sustainable practices within the renewable energy sector, thereby contributing to the collective pursuit of a greener and more sustainable future.
3.6. Software Selection
In the analysis of this study, a comprehensive suite of R software packages 4.3.3 was employed to ensure the accuracy and reliability of results concerning Environmental Factors. These packages included ‘ggplot2’, facilitating data visualization for clear and insightful representations of complex datasets. ‘quantreg’ enabled a robust quantile regression analysis, providing a nuanced understanding of variable relationships.
In addition, the diagnostic tests and other robust techniques were conducted using ‘lmtest’ and ‘robustbase’, respectively, to ensure the validity and robustness of statistical inferences. The ‘MASS’ package contributed a wide array of statistical methods, enhancing the depth and breadth of the analyses conducted. ‘CAR’ aided in spatial data analysis, capturing geographical nuances in environmental factors. ‘boot’ facilitated bootstrapping procedures to assess the stability and reliability of results through resampling techniques.
Finally, ‘forecast’ empowered the generation of accurate predictions and projections based on environmental factor analyses, offering valuable insights into future trends and patterns. Through the strategic utilization of these R software packages, this study aimed to deliver rigorous and accurate results, advancing the understanding of environmental factors and their implications.
3.7. Finding the Results
The results are used to make a comparison between the soiling and non-soiling panels’ environmental parameters. A careful study of the results and the statistical measures are necessary in order to assure the characteristics of the environmental parameters.
3.8. Reporting the Conclusion
Reporting the materials, results, conclusions, limitations, and future directions.
4. Results and Discussion
It may be mentioned that the analyses of this paper are based on a set of big data containing 36,851 observations of each parameter (factor) of the study. Based on the analysis of this dataset, this section presents the outcomes derived from our comprehensive analysis, and an illumination of several pivotal aspects elucidated by this research. In short, the results of the Location Measures, Variability Measures, Symmetrical Measures, Relationship Measures, and the Normality Measures of the data of the PV system of Shams Solar Facility are to be presented and discussed in this section.
The ensuing discussion will delve into these significant findings, shedding light on their implications and contributions to the problems of environmental parameters which affect the PV systems and the discussion will contribute to the existing body of knowledge on this issue.
4.1. The Location Measures
In analyzing the dataset, automatically we observed, via many sensors (for example, Irradiance and Temperature Solar Panel Sensors, a weather station that is capable of measuring weather data like the Humidity, Temperature, and Air speed, and sensors to measure the current, voltage, and power), various parameters related to solar energy generation, including the DC current (Ampere), DC power (Watt), DC voltage (Voltage), irradiance (watts per square meter), Solar Panel temperature (Celsius), air pressure (hectoPascal), humidity (grams per cubic meter of air), Ambient temperature (Celsius), wind speed (meter/second), and wind direction (Degree). Calculating the mean, maximum, and minimum values for each parameter provides valuable insights into their location, distribution, and range within the dataset. These statistics reveal the typical values as well as the extremes encountered which enable us to understand the performance of each factor and its context to develop the future plans, i.e., aiding in understanding the behavior of the solar energy system under different environmental conditions (ibid).
Analyzing the mean, maximum, and minimum values of the key parameters reveal significant trends. DC_Power_String1 and DC_Power_String2 exhibit high mean values of 1420.373 and 1368.112, respectively, with maximum values exceeding 2800, indicating substantial power generation. Si_South_Irradiance and its variants display considerable mean values around 570–575, with maximum values exceeding 1100, suggesting strong irradiance levels.
Table 1 presents the parameters and their respective mean, maximum, and minimum values.
4.2. The Variability Measures
Moreover, the standard deviation and standard error of the mean for each parameter offers insight into the consistency/variability or dispersion of values around the mean. It quantifies the extent to which data points deviate from the average, providing a measure of the parameter’s consistency or volatility within the dataset. By analyzing these standard deviations and standard error of the mean, we can discern the degree of variability inherent in each parameter, which is crucial for understanding the stability and reliability of the solar energy system under different environmental conditions (ibid).
Table 2 presents the parameters with their respective standard deviation and standard error.
This analysis of the standard deviations and standard error of the mean for several important metrics sheds light on how our solar energy system behaves. Significant differences in the DC current, power, and voltage are shown by the investigation, these showing varying degrees of consistency and volatility. Furthermore, variations in external factors like the temperature, humidity, and wind speed highlight how flexible the system is in response to shifting circumstances. These results underline the challenges associated with producing solar energy and the need for customized approaches for maximizing the system’s performance.
Firstly, considering the SD, it serves as a metric to gauge the dispersion or spread of the dataset. Notably, parameters such as DC_Power_String1 and DC_Power_String2 exhibit high SD values, indicating significant variability in the power output between different strings of the solar system. This variability may stem from factors such as fluctuations in sunlight intensity, shading effects, or temperature differentials impacting panel efficiency.
Furthermore, parameters related to solar irradiance, including Si_South_Irradiance, IR_S01_RM_Trina_330W_13_Irradiance, IR_S01_LM_Trina330W14_Irradiance, and SMP11_BM_1_51_Irradiance, also demonstrate elevated SD values. This suggests fluctuations in the sunlight intensity over time, likely influenced by atmospheric conditions such as cloud cover or changes in the position of the sun throughout the day.
Transitioning to the analysis of SE, this metric provides an estimation of the precision of the sample mean, serving as an indicator of the variability of sample means drawn from the same population. Notably, parameters such as DC_CurrentString1 exhibit a high SE relative to their SD, suggesting a smaller sample size relative to the observed variability. Conversely, DC_CurrentString2 demonstrates a very low SE, indicating a large sample size relative to the observed variability, thus implying a higher level of precision in estimating the population mean.
Additionally, parameters like Wind_direction showcase a relatively high SE compared to their SD. This may be attributed to the cyclic and directional nature of wind data, indicating that the estimated mean wind direction carries a higher degree of uncertainty due to the inherent variability in wind patterns.
4.3. The Symmetrical Measures
The measures of skewness and kurtosis provide valuable insights into the distributional characteristics of the variables. Skewness measures the asymmetry of the distribution, where a value closer to zero indicates symmetry, negative values indicate left skew (tail extends towards the left), and positive values indicate right skew (tail extends towards the right). Kurtosis measures the heaviness of the tails of the distribution, where higher positive values indicate heavier tails and lower values indicate lighter tails compared to a normal distribution (ibid).
The skewness and kurtosis analyses reveal distinct distributional characteristics across the variables. For instance, DC_CurrentString1 displays a slightly left-skewed distribution (−0.301) with relatively lighter tails (−1.239) compared to a normal distribution. Similarly, DC_CurrentString2 exhibits a similar left-skewed distribution (−0.245) with comparable lighter tails (−1.230). Both DC_Power_String1 and DC_Power_String2 demonstrate slightly left-skewed distributions (−0.381 and −0.329, respectively) with relatively lighter tails (−1.185 and −1.187, respectively).
In addition, both DC Voltage String 1 and String 2 exhibit notably high absolute values in both skewness and kurtosis. Skewness measures the asymmetry of the probability distribution of a real-valued random variable about its mean. In this case, the highly negative skewness values (−8.682 for String 1 and −8.283 for String 2) indicate a significant leftward skewness, suggesting a lack of symmetry in the distribution of voltage readings. This skewness implies that there might be a long tail on the left side of the distribution, with fewer extreme low-voltage readings. Moreover, the high kurtosis values (110.110 for String 1 and 101.780 for String 2) indicate heavy-tailed distributions with a higher probability of extreme deviations from the mean compared to a normal distribution. This suggests that voltage readings in these strings are prone to outliers or extreme values, potentially indicating underlying issues such as equipment malfunction, measurement errors, or system irregularities.
Table 3 presents the parameters with their respective skewness and Kurtosis values.
In contrast, DC_Voltage_String1 and DC_Voltage_String2 showcase highly left-skewed distributions with extremely heavy tails (skewness: −8.682 and −8.283; kurtosis: 110.110 and 101.780, respectively). Si_South_Irradiance and Si_South_Temperature_1 present slightly left-skewed distributions with lighter tails, indicated by their skewness and kurtosis values. Both Air_pressure_relative_1 and Air_pressure_absolute demonstrate near-zero skewness and negative kurtosis, suggesting approximately symmetric distributions with lighter tails. Conversely, Humidity_relative_1 and Humidity_absolute_1 display slightly right-skewed distributions with lighter tails (skewness: 0.075 and −0.085; kurtosis: −0.883 and −1.055, respectively).
Also, Temperature features a slightly left-skewed distribution (−0.084) with lighter tails (−0.233), while wind_speed exhibits a highly right-skewed distribution (skewness: 2.536) with heavy tails (kurtosis: 17.673). In the context of wind speed, high kurtosis suggests that the distribution of wind speeds is more peaked and has fatter tails than a normal distribution. This could imply that extreme wind speeds are more likely to occur, which may have implications for structural integrity, energy production (e.g., wind turbines), and safety considerations in affected areas.
Wind_direction indicates a slightly right-skewed distribution (skewness: 0.155) with relatively lighter tails (kurtosis: −1.580). Lastly, IR_S01_RM_Trina_330W_13_Irradiance, IR_S01_LM_Trina330W14_Irradiance, and SMP11_BM_1_51_Irradiance present similar slightly left-skewed distributions with lighter tails compared to a normal distribution.
4.4. The Relationship Measure
Examining the relationship between DC_CurrentString1, representing the clean solar panel, and other parameters reveals several noteworthy correlations. In addition, significant or highly significant relationships between the parameters/factors (independent random variables) will negatively affect the modeling process of the PV System’s performance and lead to incorrect estimation results. Thus, care should be given in studying the results of relationships between the parameters (ibid). It may be mentioned that the results of this Section (Correlation Coefficients) are given in
Table A1 in
Appendix A.
DC_CurrentString1 exhibits strong positive correlations with DC_CurrentString2 (0.972), DC_Power_String1 (0.998), and DC_Power_String2 (0.976), indicating significant relationships with these parameters. Moreover, it demonstrates weak negative correlations with DC_Voltage_String1 (−0.121) and DC_Voltage_String2 (−0.083), suggesting a slight inverse relationship. This implies that as the DC current increases in the clean panel, there is a corresponding increase in the DC current and power in both clean and dusty panels, while a decrease in the DC voltage is observed.
DC_CurrentString2, representing the dusty solar panel, displays similar patterns to DC_CurrentString1, with strong positive correlations with DC_Power_String1 (0.969) and DC_Power_String2 (0.998). Additionally, it shows a strong positive correlation with Si_South_Irradiance (0.938), implying a significant connection between the solar irradiance and DC current in the dusty panel. However, it also exhibits a weak negative correlation with DC_Voltage_String1 (−0.127), suggesting a slight inverse relationship between the DC voltage and current in the dusty panel.
DC_Power_String1, representing the clean solar panel, demonstrates very high positive correlations with DC_Power_String2 (0.976) and strong positive correlations with both DC_CurrentString1 (0.998) and DC_CurrentString2 (0.969). This suggests that an increase in power generated in the clean panel is accompanied by a corresponding increase in power in the dusty panel, as well as a DC current in both panels. Moreover, there is a strong positive correlation between the DC power in the clean panel and Si_South_Irradiance (0.956), indicating the influence of solar irradiance on power generation.
DC_Power_String2, representing the dusty solar panel, shows similar correlations to DC_Power_String1, with very high positive correlations with DC_Power_String1 (0.976) and DC_CurrentString2 (0.998). Moreover, it exhibits a strong positive correlation with Si_South_Irradiance (0.937), suggesting a significant relationship between solar irradiance and power generation in the dusty panel. However, it demonstrates a very weak positive correlation with both air pressure parameters, implying minimal influence.
DC_Voltage_String1, representing the clean solar panel, demonstrates very high positive correlations with DC_Voltage_String2 (0.982), but shows weak negative correlations with Si_South_Irradiance (−0.057) and Si_South_Temperature_1 (−0.107). This suggests that as the voltage increases in the clean panel, there is a corresponding increase in voltage in the dusty panel, while a slight decrease is observed in both solar irradiance and temperature.
DC_Voltage_String2, representing the dusty solar panel, shows similar patterns to DC_Voltage_String1, with very high positive correlations with DC_Voltage_String1 (0.982). Additionally, it exhibits weak negative correlations with Si_South_Irradiance (−0.028) and Si_South_Temperature_1 (−0.086), suggesting a slight inverse relationship between the voltage in the dusty panel and both solar irradiance and temperature.
Furthermore, Si_South_Irradiance, a measure of solar irradiance, displays strong positive correlations with both DC_Power_String1 (0.956) and DC_Power_String2 (0.937), indicating a significant influence of solar irradiance on power generation in both clean and dusty panels. Additionally, it exhibits a very high positive correlation with Si_South_Temperature_1 (0.922), suggesting a strong relationship between solar irradiance and temperature. This implies that as the solar irradiance increases, there is a corresponding increase in temperature in the south-facing direction.
Si_South_Temperature_1 demonstrates strong positive correlations with both DC_Power_String1 (0.849) and DC_Power_String2 (0.846), indicating a significant relationship between the temperature and power generation in both clean and dusty panels. Moreover, it exhibits strong positive correlations with humidity-related parameters, including Humidity_relative_1 (0.659) and Humidity_absolute_1 (0.006), suggesting a connection between the temperature and humidity levels. This implies that as the temperature rises, there is a corresponding increase in both the relative and absolute humidity.
The air pressure parameters, Air_pressure_relative_1 and Air_pressure_absolute, demonstrate weak positive correlations with most other parameters, suggesting minimal influence on the overall system. However, they exhibit moderate negative correlations with humidity-related parameters, indicating a slight inverse relationship between air pressure and humidity levels.
Humidity_relative_1 and Humidity_absolute_1 exhibit strong negative correlations with Si_South_Irradiance (−0.295 and −0.096, respectively), implying an inverse relationship between humidity levels and solar irradiance. Moreover, they demonstrate strong negative correlations with _Temperature (−0.595 and −0.142, respectively), suggesting an inverse relationship between humidity levels and temperature.
The Solar Panel Temperature shows strong positive correlations with Si_South_Irradiance (0.407) and Si_South_Temperature_1 (0.908), indicating a significant influence of temperature on both solar irradiance and the local temperature. Additionally, it exhibits moderate positive correlations with wind-related parameters, including Wind_speed (0.132) and Wind_direction (−0.008), suggesting a connection between the temperature and wind conditions.
Wind_speed demonstrates a weak positive correlation with Si_South_Irradiance (0.115), indicating a slight influence of wind speed on solar irradiance. Moreover, it exhibits weak negative correlations with temperature-related parameters, including Si_South_Temperature_1 (−0.013) and _Temperature (0.012), suggesting a slight inverse relationship between wind speed and temperature.
Wind_direction displays weak negative correlations with Si_South_Irradiance (−0.017) and Si_South_Temperature_1 (−0.016), implying a slight inverse relationship between the wind direction and both solar irradiance and temperature. Additionally, it exhibits a weak positive correlation with _Temperature (0.362), suggesting a slight influence of the wind direction on temperature.
Temperature (we have many sensors that measure the Solar Panels, and this is the average) has an inverse relationship with Voltage. A high temperature means less voltage, which also means less Power.
Lastly, the irradiance parameters, IR_S01_RM_Trina_330W_13_Irradiance, IR_S01_LM_Trina330W14_Irradiance, and SMP11_BM_1_51_Irradiance, demonstrate very high positive correlations with each other (0.967 to 0.985), indicating consistent irradiance measurements across different sensors. This implies that variations in irradiance are consistent across different locations within the system.
Appendix A presents the relationship measure table.
4.5. Normality Testing
It is known that the process of modeling the data of any phenomenon, including the PV System’s data, must be fitted by matching a mathematical/statistical model with the phenomenon’s data. It is usual that each model includes a number of parameters/factors and that the majority of those models assume the normality assumption of their variables. This means that those models and any other consequences like estimation and forecasting processes depend entirely on this assumption (ibid).
Even if the sample sizes are large for historical data or for current data, because of climate changes, this condition is rarely met. Therefore, the PV System’s data must be first examined by applying any test of normality like the Kolmogorov–Smirnov test for the purpose of knowing whether it is possible to assume that the parameters of the PV System’s data follow a normal distribution or not.
In this section, we conducted a series of Kolmogorov–Smirnov tests to assess the normality of various parameters within the dataset. It may be worth mentioning that the Kolmogorov–Smirnov test is a non-parametric test that compares the Cumulative Distribution Function (CDF) of the data to the CDF of a theoretical normal distribution.
Table 4 presents the Kolmogorov–Smirnov tests of normality.
Across all parameters examined, the Kolmogorov–Smirnov test statistics yielded statistically significant results (Sig. < 0.05), indicating that the null hypothesis of normality is rejected for each variable. This suggests that the distribution of these parameters significantly deviates from a normal distribution. Specifically, parameters such as DC Voltage String 1 and String 2, Si South Irradiance, Temperature, and Wind Direction exhibited higher Kolmogorov–Smirnov test statistics, indicating relatively larger deviations from normality compared to other parameters. These findings are consistent with the nature of environmental and electrical data, which often display non-normal distributions due to complex underlying processes and factors.
It is noteworthy that the significance (Sig) of the Kolmogorov–Smirnov test statistics for all parameters is 0.000, indicating very low p-values. This further reinforces the rejection of the null hypothesis of normality for each parameter.
4.6. Kernel Density Estimation
Many important measures and tests have been conducted in this paper, and in order to complete the study of the features and characteristics of the solar power plant factors and to confirm the results of the Kolmogorov–Smirnov test, the probability distribution will be estimated and drawn for each of the factors by applying the method of Kernal Density Estimation (KDE) (ibid).
The KDE method has been applied to a number of factors in this paper, some of which are given below. We can observe from the diagrams in
Figure 2 that the diagrams have either one, two, or three distinct peaks, and that the graph of the functions does not indicate any specific or known shape of a distribution. In addition, all the graphs of the parameters are almost similar to
Figure 2 and, in short, we can say that the graphs of all figures do not follow a normal distribution.
Moreover, this result is very important as we conclude from this that the majority of mathematical and statistical models that are in circulation, widely used in this field, and in many of the papers cited in this paper cannot be applied to our data and most probably to any data of similar factors. Therefore, it is necessary to search for other models or developing data transformations, which is not an easy task.
In this section, the kernel density plots of all the parameters are presented in 20 figures in
Appendix B.
5. Conclusions
Due to the great importance of the productivity of PV systems for generating electric power, the importance of the research concerned with modeling energy production and the factors affecting it, the presence of a large gap in the majority of research published in the field, the lack of study of the features and characteristics of the data of the parameter (environmental factors) of the PV systems, and the impact of missing the features and characteristics of the data on the results of modeling, several explanatory techniques have been developed in this paper.
Specifically, we proposed a practical and scientific framework based on a number of mathematical standards and tests in order to fill the gap or deficiency in how to study the characteristics and features of environmental variables data. The proposed framework was applied to the data collected from Shams Solar Facility (GUtech, Oman) and the evidence/results showed that the features or the characteristics of the data of the PV systems were completely different from what was expected.
It is useful to clarify that the results of the framework proposed in this paper will contribute to building a base and opinion for researchers in the PV systems to adopt regarding the necessity of scrutinizing the specifications and characteristics of the parameters/variables that affect the production of electricity from PV systems, and furthermore to to not waste the efforts of researchers in modeling any data with inappropriate mathematical models.
Based on the correlation analysis provided, it is evident that there are strong positive correlations between the DC current and DC power for both string 1 and string 2 configurations, with correlation coefficients close to 1 (0.998 and 0.976, respectively). This suggests that as the DC current increases, the DC power output also increases proportionally, indicating a direct relationship between the two variables. This correlation is consistent regardless of whether the PV panel is clean or dirty, as indicated by the high correlation coefficients observed for both configurations.
However, when examining the relationship between the DC current and DC voltage, it is notable that there is a weak negative correlation between them, regardless of the panel’s cleanliness. This implies an inverse connection, which means that the DC voltage tends to slightly drop as the DC current increases. This inverse relationship may be explained by system impedance or resistance, which have an impact on the voltage drop that occurs as current passes through the circuit. This reduction in performance is evident in both current and power measurements with the clean panel consistently demonstrating a superior performance.
Air pressure, however, does not significantly correlate with any of the other variables, suggesting that variations in air pressure have no appreciable effect on the voltage, power, or DC current that are recorded. Furthermore, as there is no discernible link between the wind direction and any of the other variables, the wind direction does not appear to have a major impact on the correlations that have been detected. All things considered, these results offer insightful information on the connections between various elements in the context.
Additionally, by examining the correlations involving other environmental factors such as Si_South_Irradiance, Si_South_Temperature_1, and _Temperature, we observe moderate-to-strong positive correlations between the DC current, DC power, and DC voltage. This indicates that as irradiance and temperature increase, the performance metrics of the PV panels also tend to increase. However, it is important to note that these correlations may vary based on the specific characteristics of the PV system and the environmental conditions.
The above results show the first violation of the properties/characteristic of the models fitting the data of the parameters (environmental factors) of the PV systems because the majority of the models in use assume that there is no correlation or significant relationship between the parameters (environmental factors) of the PV systems.
It is really stunning that the results of the Skewness measure, Kurtosis measure, Kolmogorov–Smirnov test, and Kernal Density Estimation of
Section 3 are similar and show that the data of the PV system of the Shams Solar Facility, with such a big dataset (36,851 observations of each parameter/factor of the study), were not following the normal distribution.
In addition, we believe that the above results are the general cases of various PV systems’ data, and not only the Shams Solar Facility of GUtech. Moreover, the comprehensibility and complexity of the methods and results of this paper may be the main reasons for not studying the properties/characteristics of data of PV systems in the various published papers on PV systems.
Finally, the features of the parameters (environmental factors) of PV systems if soiling exists on solar panels are similar to those if the solar panels are clean; i.e., we have not observed any impact of soiling on the features of the data.
Thus, a high level of care should be taken when modeling any data of PV systems.