Next Article in Journal
Forest Biomass Fuels and Energy Price Stability: Policy Implications for U.S. Gasoline and Diesel Markets
Previous Article in Journal
Non-Magnetic Assembly Technology and Mechanical Performance Analysis of Permanent Magnet Integrated Motor for Ball Mills
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AI-Enhanced Photovoltaic Power Prediction Under Cross-Continental Dust Events and Air Composition Variability in the Mediterranean Region

by
Pavlos Nikolaidis
Department of Electrical Engineering, Cyprus University of Technology, P.O. Box 50329, Limassol 3603, Cyprus
Energies 2025, 18(14), 3731; https://doi.org/10.3390/en18143731
Submission received: 6 June 2025 / Revised: 10 July 2025 / Accepted: 12 July 2025 / Published: 15 July 2025

Abstract

Accurate short-term forecasting of photovoltaic power generation is vital for the operational stability of isolated energy systems, especially in regions with increasing renewable energy penetration. This study presents a novel AI-based forecasting framework applied to the island of Cyprus. Using machine learning methods, particularly regression trees, the proposed approach evaluates the impact of key environmental variables on PV performance, with an emphasis on atmospheric dust transport and air composition variability. A distinguishing feature of this work is the integration of cross-continental dust events and diverse atmospheric parameters into a structured forecasting model. A new clustering methodology is introduced to classify these inputs and analyze their correlation with PV output, enabling improved feature selection for model training. Importantly, all input parameters are sourced from publicly accessible, internet-based platforms, facilitating wide reproducibility and operational application. The obtained results demonstrate that incorporating dust deposition and air composition features significantly enhances forecasting accuracy, particularly during severe dust episodes. This research not only fills a notable gap in the PV forecasting literature but also provides a scalable model for other dust-prone regions transitioning to high levels of solar energy integration.

1. Introduction

The global shift toward sustainable energy systems has highlighted the critical need for accurate forecasting of renewable energy sources, particularly photovoltaic (PV) power. In systems with high solar penetration, forecast errors can deteriorate grid stability, increase operational costs, and limit the integration of renewable energy into national and regional energy markets. This challenge is even more pronounced in isolated power networks, where the absence of large-scale interconnections amplifies the importance of precise PV output prediction [1].
Cyprus, located in the Eastern Mediterranean, represents a pertinent test case for advanced solar forecasting solutions. As an island system with growing reliance on PV installations, Cyprus is actively working toward meeting its national energy target of 50% renewable electricity production by 2030. Ensuring the reliability of PV power forecasts in such a system is essential for balancing supply and demand, optimizing energy dispatch and supporting further PV expansion.
Traditional forecasting approaches mainly rely on core meteorological inputs, such as solar irradiance, ambient temperature, and cloud coverage. However, emerging evidence suggests that broader atmospheric conditions, including wind characteristics, relative humidity, and aerosol presence, can significantly influence PV efficiency. In particular, the Mediterranean region is frequently affected by intense dust intrusions originating from North Africa, resulting in reduced solar radiation due to scattering and absorption, as well as panel soiling from particulate deposition.
Despite the established physical impact of dust and complex air composition on solar energy generation, their integration into AI-based PV forecasting models remains largely unexplored. This study addresses this gap by developing an AI-enhanced framework that incorporates a comprehensive set of atmospheric parameters, including cross-continental dust events and detailed air composition features, into a regression tree-based predictive model. The proposed methodology also introduces a novel clustering approach to group and evaluate input parameters based on statistical correlation with PV power output, improving both model accuracy and interpretability.

1.1. Literature Survey

Accurate forecasting of PV power generation is essential for the integration of solar energy into modern electricity grids, especially in isolated or small-scale systems with limited storage or interconnection capacity. In recent years, artificial intelligence and machine learning (ML) methods have become widely used for solar power prediction due to their ability to model complex, nonlinear relationships between environmental inputs and PV output.
A variety of ML techniques, including artificial neural networks (ANNs), support vector machines (SVMs), random forests, and regression trees, have been applied to solar forecasting problems. Among these, tree-based methods such as decision trees, gradient boosting, and regression trees have gained popularity due to their interpretability, efficiency, and robustness against overfitting, especially when dealing with high-dimensional or noisy data [2,3]. The majority of existing PV forecasting models incorporate common meteorological variables, such as solar irradiance, ambient temperature, and cloud cover. These inputs primarily influence the amount of solar radiation reaching the PV modules and are typically derived from either ground-based sensors, satellite imagery, or numerical weather prediction (NWP) outputs.
Recent advancements in artificial intelligence and machine learning have significantly improved photovoltaic power forecasting accuracy. Traditional methods, including autoregressive models and NWP-based approaches, have been widely used for day-ahead or longer-term forecasts. However, their performance often suffers due to the complexity of modeling nonlinear dependencies between environmental inputs and PV output, especially in real-time applications. On the contrary, data-driven approaches such as ANN; SVM; and ensemble methods, like random forests and gradient boosting, have demonstrated higher adaptability and improved performance across different forecast horizons [4,5,6]. More recently, deep learning models such as long short-term memory (LSTM) and hybrid architectures have gained traction for their ability to model temporal sequences and capture spatiotemporal dependencies in short-term and ultra-short-term forecasting scenarios [7,8].
Forecasting frameworks in the literature are typically divided into direct and indirect methods. Direct forecasting models predict PV output directly using historical PV and meteorological data, while indirect models forecast solar radiation or irradiance first and subsequently estimate PV power. Across both frameworks, a wide range of input features are employed, including solar irradiance, ambient and module temperature, wind speed and direction, relative humidity and cloud cover. In some cases, advanced pre-processing techniques such as principal component analysis (PCA), self-organizing maps (SOM), and feature clustering have been used to enhance parameter selection and reduce redundancy. Hybrid models combining decomposition techniques like empirical mode decomposition (EMD) with ML algorithms have also been shown to improve forecasting accuracy by isolating distinct signal components [9].
Despite the progress in model sophistication, the vast majority of forecasting studies still rely on a limited set of conventional meteorological inputs, with minimal incorporation of atmospheric composition parameters or soiling factors. However, the impact of dust deposition and particulate matter on PV performance has been well-documented in experimental and observational studies, showing reductions in output ranging from 10% to over 40% in dust-prone environments. Yet, such parameters remain underrepresented in AI-based forecasting frameworks. This omission is particularly critical for regions affected by frequent transboundary dust transport, such as the Mediterranean, where PV output can be substantially degraded during dust events [10].
Global reviews and meta-analyses have quantified the average energy yield losses from dust deposition at between 2% and 50%, depending on geographical location, environmental conditions, and system characteristics [11]. In extreme cases, such as in arid or semi-arid regions, losses of up to 80% have been reported for uncleaned PV systems exposed over extended periods [12,13]. A techno-economic analysis by Klemens et al. [14] emphasized that soiling is not just a technical but also a major economic issue, especially for large-scale solar farms.
Moreover, the nature of the dust (e.g., particle size and chemical composition), environmental dynamics (e.g., wind speed and humidity), and panel tilt angle play significant roles in determining the extent and variability of soiling losses [15]. For instance, Hussein et al. [11] found that fine particles like fly ash and soot tend to adhere more strongly to surfaces, resulting in more persistent power degradation compared to larger, sand-based particles. Likewise, Roman et al. [16] demonstrated how dust storms and volcanic ash could significantly alter the spectral transmittance of incoming solar radiation, disproportionately affecting PV output during such events.
Despite this extensive body of work, most AI-based PV forecasting models still neglect soiling-related variables, relying predominantly on conventional meteorological inputs such as irradiance, temperature, and cloud cover. As noted by Mani and Pillai [13], this omission presents a significant blind spot, especially in regions frequently affected by transboundary dust transport, such as the Middle East and Mediterranean basin.

1.2. Objective and Contribution

A notable limitation in the current literature is the lack of integration of atmospheric composition parameters that can directly affect PV efficiency beyond irradiance attenuation. This gap underlines the pressing need to integrate dust-related parameters, such as PM concentrations, aerosol optical depth, or deposition forecasts, into predictive frameworks. Incorporating such features into machine learning models not only reflects the true operational dynamics of PV systems but also enhances the forecasting reliability under real-world dusty conditions. For example, particulate matter (PM)—including PM10 and PM2.5—and aerosols can scatter and absorb solar radiation, reducing the effective irradiance reaching the PV panels. Additionally, dust deposition on panel surfaces during intense transport events can significantly reduce power output by creating a physical barrier to light absorption. These effects are especially relevant in regions prone to transboundary dust transport, such as the Mediterranean basin, yet they remain underrepresented in predictive modeling efforts.
While several studies highlight the seasonal and climatic influence of soiling on PV efficiency, most treat these factors as post-event observations rather than real-time forecasting variables. There is also limited research into the correlation structure among environmental inputs or the use of clustering methods to optimize input parameter grouping. These techniques could enhance model performance by improving feature relevance and reducing redundancy.
Another critical shortcoming is the reliance on costly or proprietary data sources. Many forecasting systems depend on commercial weather services or private satellite data, thus limiting scalability and reproducibility. Models that leverage open-access, internet-based data—while still delivering high forecast accuracy—are rare but increasingly important for supporting widespread PV adoption. Despite these findings, the integration of dust-related parameters into AI-based PV forecasting models remains limited. Most existing models do not account for the dynamic effects of dust accumulation and atmospheric composition variability, which can lead to significant forecasting errors, especially in regions prone to dust events.
The current work addresses the aforementioned research gap by integrating air composition variables—retrieved entirely from public datasets—into a regression tree-based forecasting framework, enhanced by a novel input clustering and feature correlation strategy. This approach contributes a replicable and interpretable model to the field of solar forecasting, offering significant potential for implementation in other dust-affected regions. It presents an AI-enhanced forecasting framework based on regression trees, integrating not only classical meteorological parameters but also dust transport indicators and air composition variability. It introduces (a) a novel clustering method to evaluate input parameter correlations; (b) a fully open-data-driven approach using internet-available sources; and (c) a real-world application in an isolated Mediterranean system (Cyprus), where the impact of transboundary dust is especially pronounced.
To investigate both spatial and seasonal relationships between varying irradiance levels and PV power output, the model integrates a diverse set of predictors derived from publicly accessible and reliable online platforms. These inputs enable the estimation of renewable energy generation while capturing the influence of key environmental variables. The forecasting framework simultaneously addresses the power outputs of both PV parks and residential systems, incorporating a broad spectrum of input features. Its performance is evaluated across multiple real-world case studies, with results showing variation due to dynamic weather patterns and short-term fluctuations. These changes are used to iteratively adjust model parameters—such as weights and biases—in support of nonlinear regression. The overarching goal is to deliver a robust day-ahead forecasting tool for PV output that relies entirely on accessible, realistic datasets. This model is designed to serve as a reference framework, adaptable to different geographical settings, temporal scales, and training methodologies. While advanced AI techniques like deep reinforcement learning [17] and hierarchical reinforcement learning have demonstrated effectiveness in complex decision-making contexts, their application in energy forecasting is often hindered by high computational demands and increased dimensionality [18]. The growing integration of weather-sensitive and renewable technologies further complicates the forecasting landscape. Therefore, to handle the inherent uncertainty associated with cross-continental dust intrusions and fluctuating atmospheric composition, this study adopts regression tree algorithms for their interpretability and efficiency.
In contemporary energy systems, each independent power producer is required to submit their expected generation schedule to the corresponding system operator one day in advance. Aggregators managing distributed PV and residential systems coordinate with distribution system operators (DSOs), while utility-scale PV plants are accountable to transmission system operators (TSOs) for reporting hourly production forecasts. A bidirectional exchange of information between DSOs and TSOs enables the calculation of residual load, facilitating optimal unit commitment and economic dispatch planning for conventional thermal units connected at either the transmission or distribution level. Concurrently, load forecasts are independently generated by DSOs, emphasizing the necessity for accurate predictions—particularly for demand-side actors, such as prosumers, participants in demand–response programs, and those engaged in demand-side management. Within this framework, day-ahead PV power forecasting plays a pivotal role for TSOs, as it supports the consolidation of generation and demand profiles, accounting for both system losses and exogenous variables such as temperature, wind speed, and time of day. Additionally, the forecasting process must ensure the privacy of sensitive information across diverse stakeholders, including consumers, prosumers, renewable and conventional energy producers, aggregators, and grid operators. Consequently, the quality, selection and management of historical data emerge as critical factors in enhancing forecasting reliability and operational efficiency. Table 1 accommodates the most recent research carried out in PV power forecasting.
The rest of this paper is organized as follows. Section 2 presents a focused analysis of (a) PV generation patterns and their temporal profiles, (b) meteorological variables relevant to irradiance availability and PV system efficiency, and (c) the influence of dust transport events and particulate matter on power output degradation. The clustering approach for data harvesting purposes is explained in Section 3, while the forecast model and the underlying mathematical framework are presented in Section 4. Finally, conclusions are drawn in Section 5.

2. Problem Formulation

A robust PV power forecasting model must account for a wide range of dynamic factors that influence photovoltaic generation. These include intrinsic system characteristics, such as the temporal behavior of PV generation itself; external atmospheric drivers, including temperature, humidity, and cloud cover; and highly influential phenomena such as dust transport and deposition. In this section, the forecasting problem is defined by analyzing these three core dimensions. The objective is to identify and characterize the input variables most strongly correlated with PV output fluctuations in the context of the Mediterranean region, where meteorological variability and transboundary dust events frequently disrupt solar-to-electricity performance.

2.1. Photovoltaic Generation

Photovoltaic systems convert solar radiation directly into electrical energy through semiconductor materials. The amount of electricity generated by a PV array depends primarily on the conversion efficiency of the modules (ηPV), the incoming global solar irradiance (GA), and the surrounding ambient temperature (Ta). To estimate the actual power output of a system, several standard reference conditions are used, typically provided by manufacturers. These include the reference solar irradiance (GSC, usually 1000 W/m2), standard temperature (TSC, typically 25 °C), and the temperature coefficient of the PV module (CT), which quantifies the efficiency drop per degree increase above TSC.
Under these conditions, the nominal output of a PV system (PSC) can be adjusted for actual environmental conditions using the following relationship:
P P V = P S C . η P V . G A G S C . 1 + T a T S C . C T          
This expression captures the reduction in performance associated with elevated ambient temperatures, which can reduce conversion efficiency by approximately 0.3–0.5% per degree Celsius increase, depending on the module’s temperature sensitivity [27,28].
In Cyprus, the seasonal variation in solar azimuth and elevation angles governs the sun’s apparent path across the sky, resulting in higher solar elevation and longer daylight periods during summer months, and lower elevation with shorter days during winter, which directly influences the incident irradiance on PV surfaces throughout the year. Figure 1 depicts the sun path throughout a year in Cyprus.
The orientation and movement of PV panels relative to the sun also play a vital role in determining daily and seasonal energy yield. Tilted modules experience variable angles of solar incidence throughout the day and year. Dual-axis tracking systems, which adjust the tilt in both horizontal (β) and vertical (α) planes, can significantly enhance solar capture by maintaining optimal alignment with the sun’s position. While β changes throughout the day, ranging from 0° to 180°, α adjusts seasonally from 0° to 90°, enhancing performance during winter and summer extremes [30]. Based on extensive measurements on direct solar irradiation at a particular site in Cyprus (coastal area of Limassol), the average values for some critical parameters can be observed in Figure 2.
The specific location was selected as a representative case study due to its semi-arid Mediterranean climate, frequent cross-continental dust events, and rapidly expanding solar PV capacity, which pose unique challenges for accurate power forecasting. These conditions are shared by numerous regions across Southern Europe, North Africa, and the Middle East, making the proposed methodology directly transferable to other dust-prone, high-solar-resource environments. As such, the model developed and validated in this study offers a replicable solution for similar power systems aiming to integrate high shares of renewables while minimizing grid uncertainty and reserve dependency.
Solar irradiance itself is subject to substantial temporal variability, often influenced by atmospheric conditions such as cloud cover. To quantify the transparency of the atmosphere, a dimensionless clearness index is often employed. This index typically ranges from 0.25 (indicating overcast skies) to 0.75 (clear-sky conditions), reflecting the fraction of sunlight that reaches the Earth’s surface after atmospheric attenuation. In addition to clouds, airborne pollutants and dust particles further diminish irradiance by scattering and absorbing sunlight. Studies suggest that total PV energy losses due to both suspended particulate matter and deposited dust can reach up to 17–25%, highlighting the significance of air composition in PV performance assessment [31]. Considering a residential rooftop PV system of 7 kW installed capacity at the specified location, the hourly averaged and total photovoltaic power output is given in Figure 3 by month.
Temperature effects add another layer of complexity to forecasting, as high irradiance often coincides with increased temperatures, which in turn degrade module efficiency. These competing effects necessitate accurate temperature modeling to avoid overestimating generation during peak sunlight hours. Besides these primary drivers, other factors, such as inverter efficiency, shading, soiling surface, and panel degradation over time, also introduce variability. These physical and environmental uncertainties underscore the need for advanced modeling strategies that can incorporate real-time data and account for spatial and temporal variation in solar resource availability.

2.2. Meteorological Variables

Meteorological conditions play a critical role in photovoltaic energy conversion, as they govern both the availability of solar radiation and the operational efficiency of PV modules. In addition to solar irradiance itself, variables such as ambient temperature, wind speed and direction, relative humidity, and cloud index significantly affect the PV power output. These variables influence either the thermal balance of the PV modules or the transmittance of solar energy through the atmosphere, and their impacts vary across both daily (diurnal) and seasonal time scales.
PV-module efficiency decreases with increasing cell temperature, which is influenced by ambient temperature and radiative conditions, as per the modified form of the standard PV power output equation, Equation (1). The thermal loss becomes more significant during midday and in summer, when irradiance and ambient temperatures are highest. These losses exhibit a diurnal pattern, peaking in early afternoon, and a seasonal pattern with higher degradation observed in warmer months. Research confirms that every 1 °C rise above 25 °C can lead to an efficiency drop of 0.3–0.5%, depending on the panel material (e.g., silicon vs. thin-film technologies) [32].
Wind contributes to convective cooling of PV modules, thereby indirectly improving their efficiency by reducing cell temperature. The heat transfer from the panel to the surrounding air can be approximated by Newton’s law of cooling:
q = h · A · T m o d u l e T a
Here, the convective heat transfer coefficient, h, increases with wind speed, v, resulting in more efficient cooling and thus higher PV output. A common empirical equation used to estimate h for air in natural environments is h = 2.56 v + 8.55 [33]. Wind direction further affects this cooling mechanism, especially for fixed-tilt systems, where airflow can vary significantly depending on panel orientation relative to wind vectors. Empirical studies have shown that moderate wind speeds (2–5 m/s) can improve PV output by 1–3% under high irradiance conditions [34].
Relative humidity influences solar radiation transmittance through aerosol–water interactions, which lead to increased scattering and absorption. High RH is often associated with haze, fog, or high dew point temperatures—all of which reduce the optical clarity of the atmosphere. While RH does not directly affect module performance, it correlates with reduced global irradiance and increased likelihood of condensation or surface moisture, which can temporarily degrade PV efficiency. These effects are more pronounced in the early morning and late afternoon and seasonally, during transitional periods like spring and autumn [35].
Clouds significantly alter the incident solar radiation on PV systems by both attenuating direct irradiance and enhancing diffuse irradiance under certain conditions. The clearness index is defined as follows:
K t = G A G 0
where G0 is the extraterrestrial solar irradiance; it serves as a proxy for cloud coverage and is widely used in PV performance modeling. Values close to 1 indicate clear-sky conditions, while lower values denote cloudier skies. The impact of clouds follows a strong diurnal cycle, with rapid fluctuations possible within minutes, and varies seasonally depending on cloud climatology (e.g., frequent winter cloud cover in Mediterranean coastal areas). Accurate estimation of the cloud index enables more responsive and adaptive PV forecasting models, especially when derived from satellite imagery or sky cameras [36]. Figure 4 illustrates a meteorological representation of the critical variables discussed.
Meteorological variables introduce both short-term variability and long-term trends in PV generation. Their influences are nonlinear and interdependent, reinforcing the need for their inclusion in machine learning-based forecasting frameworks. Ambient temperature and wind modify system efficiency, while humidity and cloud cover primarily modulate irradiance availability. Understanding their temporal behavior, both within the day and across seasons, is crucial to building robust, generalizable forecasting models.
Effectively managing the uncertainties tied to both seasonal and diurnal variability is essential for achieving accurate photovoltaic power forecasting. Solar irradiance and related meteorological factors follow distinct daily cycles and undergo noticeable seasonal fluctuations, which demand resilient and adaptive modeling strategies. Additionally, uncertainties arise from the availability and reliability of input data, including both historical records and real-time observations. These may be affected by measurement errors, missing values, or limitations in sensor technologies. The potential influence of climate change on solar radiation patterns is an emerging challenge that could affect the long-term consistency of forecast models. Furthermore, integrating PV power into the grid requires careful assessment of uncertainties stemming from transmission constraints and system-level dynamics. Finally, inherent aspects of the forecasting methodology—such as model architecture, forecast horizon, and update intervals—play a significant role in determining the overall predictive reliability of PV power forecasts.

2.3. Dust Impact

Cyprus is a pioneer in the field of electricity production utilizing photovoltaic technology. It constantly provides incentives to consumers through a variety of support schemes and fully understands the importance of green transition. Although the electricity storage market is completely absent, the prosumers are incentivized via net metering, net billing, virtual net metering, and virtual net billing schemes to continuously increase the residential PV installations. Recently, bigger PV systems have been planned to be installed on terraces of the majority of schools in the island [37,38]. Figure 5 shows the distribution of these systems per size.
Accurate PV power forecasting is of great importance for the seamless integration of the ever-increasing PV systems into the islanded electrical grid in order to reduce the need for backup power sources. Dust in the atmosphere is a type of air pollution that consists of small particles of solid matter suspended in the air. These particles can come from a variety of sources and can vary in size, composition, and concentration. Although the incident radiation on a particle may exhibit reflection, refraction, diffraction, and absorption, the amount of its reduced energy is estimated as the sum of scattering and absorption [39]. Figure 6 shows the interaction of radiation with suspended matter. Atmospheric particulate matter, encompassing both suspended aerosols and deposited dust on PV modules, significantly influences solar energy conversion efficiency. The attenuation of solar radiation by PM occurs primarily through scattering and absorption processes, which alter the spectral composition and intensity of the incident irradiance.
The attenuation of solar radiation due to PM can be quantitatively described using the Beer–Lambert law, which models the exponential decrease in irradiance as it traverses an absorbing and scattering medium:
I = I 0 · e τ
where I represents the transmitted irradiance; I0 is the incident irradiance at the top of the atmosphere; and τ denotes the optical depth, a dimensionless parameter that encapsulates the cumulative effects of absorption and scattering by atmospheric constituents, including PM. In general, the vertical optical depth, τ(λ), from some height, z, and above can be written as follows:
τ λ , z = z β λ , z d z
where the coefficient β(λ,z) is equal to the attenuation resulting from scattering and absorption. The total optical depth of the atmosphere, τ, is wavelength-dependent and can be expressed as the sum of individual contributions in the ultraviolet range:
τ = τ a e r o s o l + τ R a y l e i g h + τ U M G + τ S O 2 + τ H 2 O + τ O 3  
In this context, SO2 due to urban pollution, water vapor (H2O), and ozone (O3), together with the uniformly mixed gases (UMGs) which mainly refer to CO2 and O2, only absorb, whereas exclusive scattering is observed by molecular O2 and N2 (Rayleigh phenomenon or molecular scattering). τaerosol specifically accounts for the attenuation due to aerosols that absorb and scatter, which is directly influenced by PM concentration levels. The aerosol optical depth (AOD) is a critical parameter in assessing the impact of PM on solar radiation and is often derived from ground-based measurements or satellite observations. The scattering and absorption of solar radiation by PM not only reduce the total irradiance reaching the PV modules but also modify their spectral distribution. This spectral shift can lead to suboptimal performance of PV systems, especially those sensitive to specific wavelength ranges. Studies have shown that PM concentrations, particularly PM10, significantly affect the spectral power distribution (SPD) of solar radiation, with notable reductions observed in the range from 380 to 540 nm [40]. Figure 7 depicts a representative paradigm of radiance attenuation with a comparative snapshot during a normal day and a sudden dust storm.
In addition to atmospheric effects, the deposition of dust particles on the surface of PV modules—commonly referred to as soiling—further exacerbates energy losses. Soiling leads to the formation of a dust layer that scatters and absorbs incoming light before it reaches the photovoltaic cells, thereby diminishing the effective irradiance and, consequently, the power output. The extent of soiling-induced losses depends on various factors, including dust composition; particle size distribution; deposition rate; and environmental conditions such as humidity and wind speed [42].
In regions like Cyprus, characterized by arid climates and frequent dust events, understanding and modeling the impact of PM on PV systems are crucial for accurate energy forecasting and efficient system operation. Incorporating real-time PM data and predictive models into PV performance assessments can enhance the reliability of solar energy generation and support the integration of renewable energy sources into the power grid. Deserts are one of the main sources of sand and dust. Cyprus is significantly affected by dust events, primarily due to its geographical position at the crossroads of Europe, Asia, and Africa. The most critical sources of particulate matter, especially PM10 and PM2.5, impacting Cyprus include the Sahara Desert, Middle-Eastern deserts, and North African deserts. Dust from the Sahara, particularly from regions in Libya and Egypt, is a major contributor to PM levels in Cyprus [43]. These dust particles are transported across the Mediterranean, often during spring and summer months, leading to elevated PM10 concentrations. Arid regions in the Middle East, including parts of Syria, Jordan, Iraq, and Saudi Arabia, also contribute to dust events in the island. These events are often associated with specific meteorological conditions that facilitate the long-range transport of dust particles. Desert areas in Chad, Algeria, and Libya constitute additional sources of dust. Depending on wind direction and weather systems, dust from these regions can reach the island, particularly during the spring months [44]. The main deserts are found in the “dust-belt” in the northern hemisphere, and a worldwide representation is offered in Figure 8.
The finest dust particles can be transported over long distances from these sources by the winds. To model the combined effects of atmospheric PM and soiling on PV performance, comprehensive frameworks that integrate aerosol transmittance functions and soiling ratio metrics must be developed. These models will facilitate the prediction of PV output under varying environmental conditions and inform maintenance strategies, such as optimal cleaning schedules, to mitigate the adverse effects of dust accumulation.

3. Methodology

The accuracy and reliability of PV power-forecasting models depend heavily on the quality and relevance of the input variables. In this section, the selection of features was grounded in physical and environmental factors known to influence PV system behavior. These inputs fall into three main categories: (a) weather-related parameters; (b) temporal factors capturing solar geometry; and (c) PV system output characteristics, including operational history and capacity scaling.
A feature clustering strategy is proposed to enhance the interpretability and efficiency of the forecasting task by grouping meteorological and temporal inputs with similar statistical behavior. This clustering stage supports more robust predictor handling by identifying patterns, reducing redundancy, and improving model generalization. Then, the data harvesting and processing phase is described, where key predictors are selected based on a dual-criterion approach using Pearson correlation coefficients and mutual information metrics. Both methods are applied to historical datasets to quantify the relevance and predictive strength of each input variable with respect to the PV power output, both on an annual and seasonal basis. The combined methodology ensures that only the most informative and non-redundant features are retained for training the forecasting model, contributing to the accuracy and stability of the final predictive results.

3.1. Feature Clustering

Meteorological and atmospheric variables are primary drivers of PV system performance. The PV-module temperature, closely related to ambient temperature and influenced by wind cooling, directly affects conversion efficiency—typically reducing output by 0.3–0.5% per degree Celsius above standard test conditions (25 °C). Based on this direction, ambient temperatures (Ta) are imported as a positive deviation from 25 °C for each hourly interval, t, so that we have the following:
δ T a t = max 0 , T a t 25
Depending on the season, ambient temperature exhibits greater fluctuation during spring, summer, and autumn, and lower fluctuation in winter. These fluctuations can be observed with the aid of Figure 9.
A completely different behavior occurs with humidity, which may vary from minimum to maximum during the course of a day. Relative humidity modifies atmospheric transmittance by influencing light scattering and absorption due to moisture-laden air. Higher humidity is often correlated with cloud presence or haze, thereby reducing direct irradiance. Figure 10 shows the annual fluctuation of relative humidity in Cyprus per season.
Wind speed and direction influence the convective heat loss from the panel surface, with optimal airflow enhancing efficiency through cooling, particularly during periods of high solar irradiance. To investigate whether the wind direction, WD(t), can affect the overall PV performance, the cardinal direction, CD(t), was taken into consideration. To avoid intense computational complexity, the ranging directions between 0 °C and 360 °C were normalized using Equation (8), exploiting the compass points, north (0), east (90), south (180), and west (270), as reference.
C D ^ t = 1     ,   315 ° < W d 45 ° 2     ,   45 ° < W d 135 ° 3   ,   135 ° < W d 225 ° 4   ,   225 ° < W d 315 °
The stochastic behavior of wind speed across the wind direction is depicted in Figure 11, whereas the normalized cardinal values are shown in Figure 12. Both figures refer to the wind-speed and wind-direction variability during the selected week of 15–21 May 2024. This specific period was chosen because it exhibited unusually intense and rapidly shifting wind conditions, making it ideal for evaluating the sensitivity of PV performance to convective cooling effects and directional wind patterns. The observed fluctuations ranged from near-calm conditions to wind speeds exceeding seasonal norms, combined with significant directional variability across different hours and days. Such dynamic behavior is particularly relevant for PV forecasting, as wind not only influences panel temperature through cooling but can also interact with localized dust resuspension and deposition processes.
Cloudiness is a dominant factor in irradiance variability, introducing significant fluctuations in available solar energy within short time scales. The cloud index (CI) is a metric used to quantify the effect of cloud cover on the availability of solar irradiance on the Earth’s surface. It is defined as the ratio of the actual solar irradiance to the clear-sky solar irradiance at a specific geographic location and time. This index provides insight into the degree of solar attenuation caused by atmospheric conditions, primarily clouds, and is essential for evaluating solar resource variability and forecasting photovoltaic (PV) system performance. To estimate the cloud index based on PV system output, one must measure both the electrical power generated by the PV modules and the incident solar irradiance using a calibrated solar radiation sensor, typically expressed in watts per square meter (W/m2). The cloud index can then be computed using the following equation:
C I t = P g e n ( t ) η P V G A ( t )
Here, PV-module efficiency refers to the manufacturer’s rated efficiency under standard test conditions, as provided in the module’s datasheet. It is important to note that this calculation assumes optimal operating conditions, specifically that the modules are functioning at their maximum power point (MPP) and that environmental factors such as temperature remain constant. Deviations from these assumptions, such as thermal fluctuations or partial shading, can introduce inaccuracies in the calculated cloud index. In this regard, the hourly CI(t) is calculated as a percentage deviation of the actual generated PV power from its monthly average equivalent:
C I t = P a v g t P g e n ( t ) P a v g t · 100 %
Subsequently, a normalization process is applied to the resulting values to define clear, partly cloudy, and mostly cloudy weather as follows.
C I ^ ( t ) = 0.25   m o s t l y   c l o u d y 0.5   p a r t l y   c l o u d y 1   c l e a r              
The monthly average over the actual generated PV power during a representative week is presented in Figure 13a, while the respective cloud index is illustrated in Figure 13b.
Dust extent, often quantified through particulate matter concentrations (e.g., PM2.5 and PM10), attenuates incoming solar radiation through both atmospheric scattering and surface deposition, leading to short-term losses and long-term soiling effects. As the number of fine particles increases, more light is absorbed and scattered, resulting in less clarity, color, and visual range. Light absorption by gases and particles is sometimes the cause of discolorations in the atmosphere but usually does not contribute very significantly to visibility degradation at the ground level. Haze is traditionally an atmospheric phenomenon in which dust, smoke, and other dry particulates suspended in air obscure visibility and the clarity of the sky. PM pollution is the major cause of reduced visibility (haze) in parts of the island. To model both the atmospheric scattering in terms of hourly PM concentration, QPM(t), and panel-surface deposition, DPM(t), the actual and previous PM concentrations are taken into account. This way, the deposition is estimated by means of the previous day and the previous two days’ measures, resulting in the following expression:
D P M t = Q P M t + Q P M t 24 + Q P M t 48
Exploiting Equation (12), the actual PM concentrations, along with their deposited impact on PV panels for a representative week, are shown in Figure 14.
The dust extent is normalized based on a typical air quality index (AQI) categorization which assigns the value 1 to clean, 0.75 to light, and 0.5 to dense dust in the atmosphere, according to Equation (13) [45].
A Q I t = 0.5   ,               Q P M t 100 0.75   ,   50 < Q P M t < 100   1   ,             Q P M t 50
Temporal factors are essential for capturing the cyclical nature of solar-energy availability. Hourly and daily time resolutions were selected to model diurnal and seasonal solar variations. The position of the sun changes throughout the day (solar elevation and azimuth) and year (declination angle), affecting the angle of incidence on the PV surface and, thus, the effective irradiance. Hourly resolution allows the model to capture intraday dynamics, while daily aggregates help reflect broader trends influenced by seasonality and cloud climatology.
PV power output is not only a function of external conditions but also of the system’s historical performance and installed capacity. The generation at the previous hour reflects system inertia and operational lag in irradiance response, while previous-day values provide insights into daily meteorological patterns and persistent atmospheric conditions. Additionally, the effects of permanent factors such as fixed panel-tilt angles, efficiency degradation, and nearby shading obstacles—which influence irradiance incidence and remain constant over time—are implicitly captured through these historical generation features. By learning from past generation behavior under similar environmental conditions, the model can internalize location-specific and system-specific characteristics that are otherwise difficult to quantify explicitly. Furthermore, as PV capacity on the island of Cyprus has been expanded over the study period, the installed generation capacity acts as a scaling factor that must be accounted for to normalize or calibrate forecast outputs over time.

3.2. Temporal Data Harvesting and Processing

All hourly data relating to the prevailing weather conditions during 2023, including ambient temperature, relative humidity, wind speed, wind direction, and cloudiness, were collected from the official website of BBC weather [46] and verified by the Cyprus Meteorology Department. Regarding the air-quality historical data, the World Air Quality Index website [47] was utilized, and the values were checked and modified appropriately with the aid of Cyprus Air Quality Monitoring Network. To gain a broader overview, the concentration of various pollutants is given in Figure 15 as hourly and daily alternations.
It is worth noting that the indicated sources are internet-based platforms that provide publicly accessible, day-ahead-forecasted meteorological values. This feature is particularly important, as it enables their direct integration as input data into the proposed AI-enhanced prognosis tool, supporting its applicability for operational planning and real-time decision-making. Finally, all necessary information pertaining to the PV power (both actual and estimated mean) was retrieved by Global Solar Atlas [29] and cross-checked with those regularly reported by the Energy Regulatory Authority, Distribution System Operator, and Electricity Authority of Cyprus.
Before introducing the model developed by leveraging modern machine learning technologies, all aforementioned variables must be evaluated to define their importance and overall impact on PV-power forecasting. Due to the non-parametric input–output relationship, sensitivity and uncertainty analyses constitute less preferable methods. On the other hand, feature-selection techniques, including principal component analysis, stepwise regression, and random forest, offer untraceable solutions toward optimization, increasing the computational efforts and complicating overall tasks. This way, correlation analysis based on mathematical process becomes advantageous and provides a more rigorous mechanism for rapid inference.
To ensure the model is trained with the most informative predictors, a systematic feature evaluation process was applied. This process combines two complementary statistical techniques, namely Pearson correlation coefficient and mutual information (MI), to assess both linear and nonlinear dependencies between input variables and the target PV power output. This dual approach improves the robustness of feature selection by accounting for direct correlations, as well as more complex, non-monotonic relationships [48].
The Pearson correlation coefficient (ρ) is used to measure the strength and direction of the linear relationship between each predictor, xi, and the target variable, Y (i.e., PV power output). Using Equations (14) and (15), ρ is defined as the covariance between two variables divided by the product of their standard deviations.
ρ x , Y = c o v ( x , Y ) σ x σ Y
ρ x i , Y = j = 1 n x i , j x ̿ i Y j Y ̿ j = 1 n x i , j x ̿ i 2 · j = 1 n Y j Y ̿ 2
where xi,j is the j-th sample of predictor xi; Yj is the corresponding PV output value; x ̿ i and Y are the sample means of the predictor and output, respectively; and n is the total number of data points. The Pearson coefficient ρ ∈ [−1, +1], where values close to ±1 indicate strong positive or negative correlation, respectively. In this study, features with an absolute correlation above 0.10 (i.e., ∣ρ∣ ≥ 0.10) were considered sufficiently relevant for model training.
To complement this linear measure, mutual information was used to quantify the mutual dependence between input variables and the PV output. MI is capable of capturing nonlinear relationships by measuring the reduction in uncertainty of one variable given knowledge of another. Assuming the input–output pair (X,Y) is treated as a joint random variable with discrete states, Sn and Sm, the mutual information, I(X;Y), is defined as follows:
I X ; Y = n S n m S m P ( x n , y m ) log P ( x n , y m ) P x n · P ( y m )
where P ( x n , y m ) is the joint probability distribution of X and Y, whereas P x n and P y m are the marginal distributions. Mutual information is non-negative (i.e., I ∈ [0, +∞)), and higher values reflect stronger statistical dependency. For the purposes of this analysis, an MI threshold of 0.25 or greater was used to retain features with significant influence on PV output.
By applying both methods across the selected feature set, including ambient temperature, wind speed, wind direction, relative humidity, cloud index and PM concentration, previous hour/day PV output, and time indicators, the most relevant variables were identified for inclusion in the forecasting model. This hybrid filtering approach ensures that only the features contributing meaningful predictive value are used in the training process, improving model performance and interpretability. To further enhance the robustness of the feature-selection process, the correlation and mutual information analyses were conducted not only on an annual basis but also separately for each season. The categorized results are tabulated in Table 2.
The results confirmed distinct, seasonally varying relationships. Cloud cover exhibited the strongest negative correlation with PV output (–0.68 annually), reflecting its dominant role in attenuating solar irradiance, particularly during autumn months (–0.75). Ambient temperature showed a moderate inverse correlation (–0.25), more pronounced in summer (–0.40), consistent with the known thermal losses in PV efficiency at higher temperatures. Wind speed, which contributes to convective cooling of the PV modules, displayed a weak-to-moderate positive correlation (+0.12), supporting its role in partially offsetting heat-induced efficiency drops during high-irradiance periods. Relative humidity showed a noticeable negative correlation (–0.66), indicating its indirect effect on PV performance, possibly via increased scattering and moisture-related soiling. These findings support the inclusion of all four variables in the model, not only due to their physical relevance but also their quantitative linkage to PV output variability, which is essential for robust day-ahead forecasting.
Observing the resulting coefficients, one can seen that only wind adds a positive effect on PV production. Apart from the expected impact of cloudiness, relative humidity acts negatively, though its effect is indirect and often secondary compared to factors like solar irradiance and temperature. High relative humidity is associated with increased water vapor and cloud cover, which scatter and absorb sunlight. Water vapor in humid air increases the scattering and absorption of solar radiation before it reaches the PV panels. The negative influence of temperature becomes superior in summer, when the greatest values are presented and maintained for the longest period overall. Consequently, wind speed appears to be very helpful in energy production during summer, mitigating the temperature ranges on PV panels. Figure 16 offers a demonstration of the obtained coefficients by category and season.
Regarding the directions of wind, north provides almost no effect mainly due to the orientation of PV panels in order to receive maximum irradiance. South is the most advantageous wind, and west has little to insult, as it coincides with mitigated wind speeds during late afternoon. Finally, although dense PM concentrations definitely strangle PV performance, the dust deposition presents a higher negative impact since it acts additively and increases by consecutive light and/or dense concentrations in the atmosphere.

4. Regression-Based Architecture and Performance Evaluation

This section presents the proposed data-driven model for day-ahead forecasting of photovoltaic power generation using regression tree algorithms. Unlike deep neural networks, which often require extensive training and large datasets, regression trees offer an interpretable, computationally efficient alternative that is well-suited to medium-sized, uncertain, and weather-dependent data environments like those characterizing the power system of Cyprus [49]. The model leverages both historical PV generation and meteorological features, with its performance tested over multi-year datasets under real environmental conditions.
Analytical models for renewable forecasting are often computationally intensive and require explicit physical modeling of all system components. In contrast, machine learning techniques like regression trees enable the discovery of patterns and relationships within observed data without requiring predefined structural assumptions. A regression tree recursively partitions the input space using conditional rules of the form, xic or xi > c, producing a set of terminal nodes (leaves), each corresponding to a specific subset of the data and a constant predicted output.
Formally, let x = ( x 1 , x 2 , , x d ) R d represent a d-dimensional input vector of predictors (e.g., temperature, humidity, wind speed, and cloud index) and y R denote the target PV output. The regression tree grows from a learning set, L = { ( x i , y i ) } i = 1 n , by selecting splits that minimize the within-node variance of y. The recursive process creates M terminal region, with the prediction function defined as follows:
T x = m = 1 M c m · I x R m
where Rm is the m-th region (leaf), cm is the average PV output in region Rm, and I is the indicator function. In a more generalized form, the regression tree function can also be expressed using product splines:
T x = m = 1 M c m · B m ( x )
B m x = l = 1 L m I ( x l , m c l , m )
where Lm is the number of splits (depth) leading to region, m and cl,m represents the threshold values at each internal node.
Hyperparameters such as the maximum tree depth, minimum leaf size, and splitting criteria (e.g., least-squares minimization) are tuned during training to prevent overfitting and improve generalization. More advanced techniques such as boosted regression trees or ensemble methods (e.g., gradient boosting) may be adopted to enhance predictive accuracy further. The input dataset includes meteorological predictors (ambient temperature, wind speed/direction, humidity, cloud index, and PM concentration), time indices (hour, day, month, season, etc.), and past PV generation (previous hour and day). Constants such as tilt angle and permanent shading conditions are inherently reflected in historical generation patterns and therefore do not need to be explicitly modeled.
The model is structured as a multi-input, single-output regressor: xi → yi, where each xi is a feature vector of dimension, d, and yi is the PV output for the forecasted hour. Data were collected at an hourly resolution, ensuring the model could capture diurnal and seasonal cycles. To reduce residual errors due to unpredictable noise, particularly during days with steep irradiance fluctuations caused by dust or cloud anomalies, a post-processing step was applied. Forecasted daily PV outputs were grouped using k-means clustering based on their diurnal profiles (i.e., 24-dimensional vectors per day), aiming to minimize intra-cluster variance as follows:
W = i = 1 k y i S i y j y ̿ i 2
where k ≤ n is the number of clusters; S i is the set of points in cluster i; and y j and y ̿ i are a data point and the centroid (mean) of S i , while y j y ̿ i 2 constitutes the squared Euclidean distance between them. A linear correlation was then applied within each cluster to refine predictions as follows:
P c o r r e c t e d = b P o b s e r v e d + a
In Equation (21), the coefficients a and b are optimized via least-squares fitting between predicted and observed values within the cluster. This step accounts for systematic biases that are temporally or meteorologically induced. The evaluation of the proposed model is realized using real-world data from the island of Cyprus (2023–2024), a region particularly impacted by transboundary dust transport and seasonal atmospheric variability. In addition, monthly PV capacity increments were integrated to normalize generation levels. The model performance is assessed by making use of both seasonal and annual metrics, including mean absolute error (MAE), root mean square error (RMSE), and mean absolute range normalized error (MARNE).
M A E = 1 τ t = 1 τ P a t P p t
R M S E = 1 τ t = 1 τ P a t P p t 2
M A R N E = 1 τ t = 1 τ P a t P p t max t P a t × 100
The actual PV production (Pa) over the predicted (Pp) is depicted in Figure 17. As can be seen, the proposed model based on regression trees consistently captured intra-day variations and responded well to PM and cloud-related fluctuations, making it especially suitable for environments with rapid weather changes and limited data infrastructure. For comparison purposes, the performance metrics are listed in Table 3. Importantly, only daylight hours were included in the analysis, thereby excluding nighttime intervals, where PV output is zero by definition. This adjustment guarantees that performance metrics reflect the model’s true forecasting capability during solar-generation windows and are not artificially improved by periods of zero-output predictability.
The model achieved annual mean absolute errors of 1.44 and 1.61 in 2023 and 2024, respectively, indicating low forecast deviation even under varying environmental and operational conditions. The mean absolute relative normalized error values decreased significantly from 0.73 in 2023 to 0.34 in 2024, highlighting the model’s growing generalization capability as more data were assimilated. Seasonal results confirmed the model’s adaptability to the highly dynamic Mediterranean climate, with peak performance during summer months (e.g., MARNE of 0.27 in summer 2024), when stable solar conditions prevail, and only modest degradation in accuracy during dust-prone spring or cloud-heavy winter periods.
A detailed comparison of root mean square error across seasons further supports the model’s robustness, with values remaining below 0.11 in all scenarios and as low as 0.03 annually in 2023. This indicates effective mitigation of both bias and variance in PV power predictions. The seasonal granularity of the evaluation also confirmed the effectiveness of the post-processing correction mechanism, which significantly stabilized performance during transitional periods, such as spring and autumn. Notably, the slight performance drop from 2023 to 2024 in winter MARNE (from 0.89 to 0.41) may be attributed to the higher variability in dust and humidity levels observed in early 2024. Nevertheless, the model retained an exceptionally low relative error, showcasing its capacity to maintain reliability across years and datasets.
The seasonal breakdown underscores the model’s robustness across a range of weather conditions. Notably, the model maintains high predictive accuracy during summer, when irradiance is stable and predictable and performs reasonably well in spring and autumn, where atmospheric dust and intermittent cloud cover introduce significant variability. The winter period shows slightly reduced accuracy, primarily due to increased cloud cover and lower irradiance levels, though the model still maintains acceptable performance. The model’s performance during high-PM days shows a moderate increase in MAE and MAPE during dust episodes. However, the inclusion of PM concentration as a predictor variable, coupled with the post-processing correction step via clustering and linear adjustment, successfully mitigated a substantial portion of the forecast error.
The model’s ability to maintain accuracy under high temporal and seasonal variability, particularly during dust-prone spring months and cloudy winter periods, highlights the superiority of regression tree algorithms in operational PV forecasting. These findings further support the adoption of regression trees in forecasting frameworks where interpretability and agility are key, such as isolated or weakly interconnected grids with rising renewable energy shares.

5. Conclusions

In this work, a robust forecasting framework for day-ahead PV power prediction was presented. The interpretable and data-efficient tool was tailored to the environmental and operational characteristics of an isolated Mediterranean power system. The model offers a powerful alternative to conventional black-box approaches, such as artificial neural networks and Gaussian process regression. Through a detailed analysis incorporating meteorological inputs, particulate matter concentrations, temporal indicators, and historical generation profiles, the proposed mechanism has demonstrated both high predictive accuracy and practical scalability.
A key strength of this approach lies in its interpretability and computational efficiency. Unlike deep learning methods, which require extensive tuning and are often opaque in their decision-making, regression trees provide a clear decision path for each prediction, enabling real-time traceability. This is especially critical in operational environments where transparency and accountability are prerequisites for grid management. Additionally, the regression tree model delivers competitive and, in many cases, superior performance with fewer computational resources, making it well-suited for deployment in regions with limited data infrastructure or real-time processing capabilities. The experimental evaluation across multiple years and seasons revealed the model’s resilience to environmental variability, including conditions affected by transboundary dust transport. The integration of air composition metrics, particularly PM2.5 and PM10 concentrations, significantly improved the model’s responsiveness to real-world disturbances that typically degrade PV output but are overlooked in conventional forecasting approaches. Furthermore, the application of a post-processing correction layer using clustering and linear adjustments enabled the model to correct systematic biases and enhance short-term precision, especially during episodes of high atmospheric variability. The scalability of this framework means it can be readily adapted to other regions with similar climatic or infrastructural conditions, serving as a blueprint for accelerating clean energy integration while maintaining grid reliability.
Future directions to research could focus on hybridizing regression trees with probabilistic techniques, such as quantile regression forests or tree-based ensemble learning with Bayesian updates. This would enable the provision of confidence intervals alongside point forecasts, equipping system operators with uncertainty-aware predictions. Furthermore, expanding the model to jointly forecast multi-source renewable portfolios, including PV, wind, and demand-side response, within a single, unified framework would enhance its system-level applicability. Incorporating real-time satellite imagery and sky-camera data through feature-fusion techniques may also further improve intra-hour resolution and cloud-event detection. The proposed regression tree-based model constitutes a flexible, high-performing tool for PV power forecasting and offers a replicable solution for other RES-dominant power systems aiming to balance accuracy, efficiency, and interpretability in the transition toward a cleaner energy world.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed toward the author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

PaActual PV power output (MW)
T m o d u l e Actual PV-module temperature (°C)
AQIAir quality index (0.5, 0.75, 1)
T a Ambient temperature (°C)
AArea of the PV panels (m2)
ΒAttenuation coefficient
C D ^ Cardinal directions (°)
KtClearness index (0.25, 0.5, 0.75)
C I Cloud index (%)
HConvective heat transfer coefficient
G 0 Extraterrestrial solar irradiance (W/m2)
G A Global solar irradiation (W/m2)
THourly intervals
IoIncident irradiance (W/m2)
c m Internal-node threshold constant
P ( x n , y m ) Joint probability distribution
MAEMean absolute error
MARNEMean absolute range normalized error
P a v g Monthly averaged PV power output (MW)
I X ; Y Mutual information
P S C Nominal PV output under standard conditions (kW)
KNumber of clusters
L m Number of nodal splits
MNumber of terminal regions
ΤOptical depth (m)
PMxParticulate matter of diameter x nm
ρ x , Y Pearson correlation coefficient
η p v Photovoltaic conversion efficiency (%)
Q P M PM concentration (μg/m3)
D P M PM deposition (μg/m2)
PpPredicted PV output power (MW)
T S C PV panel temperature under standard conditions (°C)
P p v PV power output (MW)
C T PV temperature coefficient (%)
G S C Reference solar irradiance (1000 W/m2)
x i Regression tree input predictor
y i Regression tree output target
HRelative humidity (%)
RMSERoot mean square error
G S C Solar radiation under standard conditions (W/m2)
σ Standard deviation
CTTemperature coefficient
δΤTemperature deviation (°C)
ITransmitted irradiance (W/m2)
ΛWavelength (nm)
v w Wind speed (m/s)

References

  1. Nikolaidis, P.; Poullikkas, A. Evolutionary Priority-Based Dynamic Programming for the Adaptive Integration of Intermittent Distributed Energy Resources in Low-Inertia Power Systems. Eng 2021, 2, 643–660. [Google Scholar] [CrossRef]
  2. Suresh, V.; Janik, P.; Rezmer, J.; Leonowicz, Z. Forecasting solar PV output using convolutional neural networks with a sliding window algorithm. Energies 2020, 13, 723. [Google Scholar] [CrossRef]
  3. Gu, B.; Shen, H.; Lei, X.; Hu, H.; Liu, X. Forecasting and uncertainty analysis of day-ahead photovoltaic power using a novel forecasting method. Appl. Energy 2021, 299, 117291. [Google Scholar] [CrossRef]
  4. Mohamad Radzi, P.N.L.; Akhter, M.N.; Mekhilef, S.; Mohamed Shah, N. Review on the Application of Photovoltaic Forecasting Using Machine Learning for Very Short- to Long-Term Forecasting. Sustainability 2023, 15, 2942. [Google Scholar] [CrossRef]
  5. Gupta, P.; Singh, R. PV power forecasting based on data-driven models: A review. Int. J. Sustain. Eng. 2021, 14, 1733–1755. [Google Scholar] [CrossRef]
  6. Iheanetu, K.J. Solar Photovoltaic Power Forecasting: A Review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
  7. Scott, C.; Ahsan, M.; Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 2023, 278, 127807. [Google Scholar] [CrossRef]
  8. Ahmed, R.; Sreeram, V.; Togneri, R.; Datta, A.; Arif, M.D. Computationally expedient Photovoltaic power Forecasting: A LSTM ensemble method augmented with adaptive weighting and data segmentation technique. Energy Convers. Manag. 2022, 258, 115563. [Google Scholar] [CrossRef]
  9. Brester, C.; Kallio-Myers, V.; Lindfors, A.V.; Kolehmainen, M.; Niska, H. Evaluating neural network models in site-specific solar PV forecasting using numerical weather prediction data and weather observations. Renew. Energy 2023, 207, 266–274. [Google Scholar] [CrossRef]
  10. Li, Y.; Song, L.; Zhang, S.; Kraus, L.; Adcox, T.; Willardson, R.; Komandur, A.; Lu, N. A TCN-Based Hybrid Forecasting Framework for Hours-Ahead Utility-Scale PV Forecasting. IEEE Trans. Smart Grid 2023, 14, 4073–4085. [Google Scholar] [CrossRef]
  11. Kazem, H.A.; Chaichan, M.T.; Al-Waeli, A.H.A.; Sopian, K. A review of dust accumulation and cleaning methods for solar photovoltaic systems. J. Clean. Prod. 2020, 276, 123187. [Google Scholar] [CrossRef]
  12. Sayyah, A.; Horenstein, M.N.; Mazumder, M.K. Energy yield loss caused by dust deposition on photovoltaic panels. Sol. Energy 2014, 107, 576–604. [Google Scholar] [CrossRef]
  13. Mani, M.; Pillai, R. Impact of dust on solar photovoltaic (PV) performance: Research status, challenges and recommendations. Renew. Sustain. Energy Rev. 2010, 14, 3124–3131. [Google Scholar] [CrossRef]
  14. Ilse, K.; Micheli, L.; Figgis, B.W.; Lange, K.; Daßler, D.; Hanifi, H.; Wolfertstetter, F.; Naumann, V.; Hagendorf, C.; Gottschalg, R.; et al. Techno-Economic Assessment of Soiling Losses and Mitigation Strategies for Solar Power Generation. Joule 2019, 3, 2303–2321. [Google Scholar] [CrossRef]
  15. Kazem, H.A.; Chaichan, M.T.; Al-Waeli, A.H.A.; Al-Badi, R.; Fayad, M.A.; Gholami, A. Dust impact on photovoltaic/thermal system in harsh weather conditions. Sol. Energy 2022, 245, 308–321. [Google Scholar] [CrossRef]
  16. Román, R.; Antón, M.; Valenzuela, A.; Gil, J.E.; Lyamani, H.; De Miguel, A.; Olmo, F.J.; Bilbao, J.; Alados-Arboledas, L. Evaluation of the desert dust effects on global, direct and diffuse spectral ultraviolet irradiance. Tellus Ser. B Chem. Phys. Meteorol. 2013, 65, 19578. [Google Scholar] [CrossRef]
  17. Xu, S.; Guo, S. Distributed Reactive Power Optimization for Energy Internet via Multiagent Deep Reinforcement Learning with Graph Attention Networks. IEEE Trans. Ind. Inform. 2024, 20, 8696–8706. [Google Scholar] [CrossRef]
  18. Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Ali, R.; Usama, M.; Muhammad, M.A.; Khairuddin, A.S.M. A hybrid deep learning method for an hour ahead power output forecasting of three different photovoltaic systems. Appl. Energy 2022, 307, 118185. [Google Scholar] [CrossRef]
  19. Behera, M.K.; Nayak, N. Engineering Science and Technology, an International Journal A comparative study on short-term PV power forecasting using decomposition based optimized extreme learning machine algorithm. Eng. Sci. Technol. Int. J. 2020, 23, 156–167. [Google Scholar] [CrossRef]
  20. Yin, W.; Han, Y.; Zhou, H.; Ma, M.; Li, L.; Zhu, H. A novel non-iterative correction method for short-term photovoltaic power forecasting. 2020, 159, 23–32. Renew. Energy. [CrossRef]
  21. Moreira, M.O.; Balestrassi, P.P.; Paiva, A.P.; Ribeiro, P.F.; Bonatto, B.D. Design of experiments using artificial neural network ensemble for photovoltaic generation forecasting. Renew. Sustain. Energy Rev. 2021, 135, 110450. [Google Scholar] [CrossRef]
  22. Rodríguez, F.; Martín, F.; Fontán, L.; Galarza, A. Ensemble of machine learning and spatiotemporal parameters to forecast very short-term solar irradiation to compute photovoltaic generators’ output power. Energy 2021, 229, 120647. [Google Scholar] [CrossRef]
  23. Sharma, N.; Mangla, M.; Yadav, S.; Goyal, N.; Singh, A.; Verma, S.; Saber, T. A sequential ensemble model for photovoltaic power forecasting. Comput. Electr. Eng. 2021, 96, 107484. [Google Scholar] [CrossRef]
  24. Ma, X.Y.; Zhang, X.H. A short-term prediction model to forecast power of photovoltaic based on MFA-Elman. Energy Rep. 2022, 8, 495–507. [Google Scholar] [CrossRef]
  25. Niu, Y.; Wang, J.; Zhang, Z.; Luo, T.; Liu, J. De-Trend First, Attend Next: A Mid-Term PV forecasting system with attention mechanism and encoder–decoder structure. Appl. Energy 2024, 353, 122169. [Google Scholar] [CrossRef]
  26. Lee, D.S.; Son, S.Y. PV Forecasting Model Development and Impact Assessment via Imputation of Missing PV Power Data. IEEE Access 2024, 12, 12843–12852. [Google Scholar] [CrossRef]
  27. Konstantinou, M.; Peratikou, S.; Charalambides, A.G. Solar photovoltaic forecasting of power output using lstm networks. Atmosphere 2021, 12, 124. [Google Scholar] [CrossRef]
  28. Peratikou, S.; Charalambides, A.G. Estimating clear-sky PV electricity production without exogenous data. Sol. Energy Adv. 2022, 2, 100015. [Google Scholar] [CrossRef]
  29. Global Solar Atlas. 2025. Available online: https://globalsolaratlas.info/ (accessed on 15 March 2025).
  30. Nikolaidis, P. Pulsed-Supplied Water Electrolysis via Two-Switch Converter for PV Capacity Firming. Electricity 2022, 3, 131–144. [Google Scholar] [CrossRef]
  31. Bergin, M.H.; Ghoroi, C.; Dixit, D.; Schauer, J.J.; Shindell, D.T. Large Reductions in Solar Energy Production Due to Dust and Particulate Air Pollution. Environ. Sci. Technol. Lett. 2017, 4, 339–344. [Google Scholar] [CrossRef]
  32. Kayabaşi, R.; Kaya, M. Effect of module operating temperature on module efficiency in photovoltaic modules and recovery of photovoltaic module heat by thermoelectric effect. J. Therm. Eng. 2023, 9, 191–204. [Google Scholar] [CrossRef]
  33. Hayakawa, Y.; Sato, D.; Yamada, N. Measurement of the Convective Heat Transfer Coefficient and Temperature of Vehicle-Integrated Photovoltaic Modules. Energies 2022, 15, 4818. [Google Scholar] [CrossRef]
  34. Xu, W. How important of the effect of temperature on the efficiency of solar photovoltaic cells ? Adv. Eng. Innov. 2024, 10, 85–100. [Google Scholar] [CrossRef]
  35. Mekhilef, S.; Saidur, R.; Kamalisarvestani, M. Effect of dust, humidity and air velocity on efficiency of photovoltaic cells. Renew. Sustain. Energy Rev. 2012, 16, 2920–2925. [Google Scholar] [CrossRef]
  36. Carpentieri, A.; Folini, D.; Nerini, D.; Pulkkinen, S.; Wild, M.; Meyer, A. Intraday probabilistic forecasts of surface solar radiation with cloud scale-dependent autoregressive advection. Appl. Energy 2023, 351, 121775. [Google Scholar] [CrossRef]
  37. Georgiou, G.S.; Rouvas, C.; Nathanael, D. Enhancing expansion of rooftop PV systems through Mixed Integer Linear Programming and Public Tender Procedures. Renew. Energy 2022, 187, 347–361. [Google Scholar] [CrossRef]
  38. Naxakis, I.; Nikolaidis, P.; Pyrgioti, E. Performance of an installed lightning protection system in a photovoltaic park. In Proceedings of the 2016 IEEE International Conference on High Voltage Engineering and Application (ICHVE), Chengdu, China, 19–22 September 2016; no. 2. pp. 2–5. [Google Scholar]
  39. Kitchener, B.G.B.; Wainwright, J.; Parsons, A.J. A review of the principles of turbidity measurement. Prog. Phys. Geogr. 2017, 41, 620–642. [Google Scholar] [CrossRef]
  40. Ye, S.; Xue, P.; Fang, W.; Dai, Q.; Peng, J.; Sun, Y.; Xie, J.; Liu, J. Quantitative effects of PM concentrations on spectral distribution of global normal irradiance. Sol. Energy 2021, 220, 1099–1108. [Google Scholar] [CrossRef]
  41. Mamouri, R.E.; Ansmann, A.; Nisantzi, A.; Solomos, S.; Kallos, G.; Hadjimitsis, D.G. Extreme dust storm over the eastern Mediterranean in September 2015: Satellite, lidar, and surface observations in the Cyprus region. Atmos. Chem. Phys. 2016, 16, 13711–13724. [Google Scholar] [CrossRef]
  42. Psiloglou, B.E.; Kambezidis, H.D. Performance of the meteorological radiation model during the solar eclipse of 29 March 2006. Atmos. Chem. Phys. 2007, 7, 6047–6059. [Google Scholar] [CrossRef]
  43. Stafoggia, M.; Zauli-Sajani, S.; Pey, J.; Samoli, E.; Alessandrini, E.; Basagaña, X.; Cernigliaro, A.; Chiusolo, M.; DeMaria, M.; Díaz, J.; et al. Desert dust outbreaks in Southern Europe: Contribution to daily PM10 concentrations and short-term associations with mortality and hospital admissions. Environ. Health Perspect. 2016, 124, 413–419. [Google Scholar] [CrossRef]
  44. Michaelides, S.; Tymvios, F.; Athanasatos, S.; Papadakis, M. Trends of Dust Transport Episodes in Cyprus Using a Classification of Synoptic Types Established with Artificial Neural Networks. J. Climatol. 2013, 2013, 280248. [Google Scholar] [CrossRef]
  45. White, J.E.; Wayland, R.A.; Dye, T.; Chan, A. Airnow Air Quality Notification and Forecasting System. 2016, pp. 1–6. Available online: https://www.researchgate.net/publication/268057874_AIRNow_AIR_QUALITY_NOTIFICATION_AND_FORECASTING_SYSTEM (accessed on 15 March 2025).
  46. BBC Weather. 2019. Available online: https://www.bbc.com/weather/146268 (accessed on 15 March 2025).
  47. World Air Quality Index. 2025. Available online: https://aqicn.org/map/cyprus/ (accessed on 15 March 2025).
  48. Nikolaidis, P. Wind power forecasting in distribution networks using non-parametric models and regression trees. Discov. Energy 2022, 2, 6. [Google Scholar] [CrossRef]
  49. Nikolaidis, P. Smart Grid Forecasting with MIMO Models: A Comparative Study of Machine Learning Techniques for Day-Ahead Residual Load Prediction. Energies 2024, 17, 5219. [Google Scholar] [CrossRef]
Figure 1. Sun path throughout a 365-day cycle in Cyprus [29].
Figure 1. Sun path throughout a 365-day cycle in Cyprus [29].
Energies 18 03731 g001
Figure 2. Annual irradiation metrics measured at coastal Limassol area [29].
Figure 2. Annual irradiation metrics measured at coastal Limassol area [29].
Energies 18 03731 g002
Figure 3. (a) Hourly averaged and (b) monthly summed PV generation at coastal Limassol area.
Figure 3. (a) Hourly averaged and (b) monthly summed PV generation at coastal Limassol area.
Energies 18 03731 g003
Figure 4. Meteorological map of (a) maximum temperatures, (b) relative humidity, (c) wind, and (d) cloud cover at different locations in Cyprus.
Figure 4. Meteorological map of (a) maximum temperatures, (b) relative humidity, (c) wind, and (d) cloud cover at different locations in Cyprus.
Energies 18 03731 g004
Figure 5. Dispersion of 405 rooftop PV systems in public schools in Cyprus.
Figure 5. Dispersion of 405 rooftop PV systems in public schools in Cyprus.
Energies 18 03731 g005
Figure 6. Interaction of incident solar radiation with suspended particle.
Figure 6. Interaction of incident solar radiation with suspended particle.
Energies 18 03731 g006
Figure 7. A comparative illustration of a sandstorm in the coastal areas of Cyprus [41].
Figure 7. A comparative illustration of a sandstorm in the coastal areas of Cyprus [41].
Energies 18 03731 g007
Figure 8. Main sources of sand and dust around the world [41].
Figure 8. Main sources of sand and dust around the world [41].
Energies 18 03731 g008
Figure 9. Ambient temperature: (a) actual recordings and (b) standard deviation.
Figure 9. Ambient temperature: (a) actual recordings and (b) standard deviation.
Energies 18 03731 g009
Figure 10. Annual fluctuation of relative humidity.
Figure 10. Annual fluctuation of relative humidity.
Energies 18 03731 g010
Figure 11. Weekly variation in wind speed and wind direction.
Figure 11. Weekly variation in wind speed and wind direction.
Energies 18 03731 g011
Figure 12. Wind-direction variation and cardinal normalization.
Figure 12. Wind-direction variation and cardinal normalization.
Energies 18 03731 g012
Figure 13. Cloudiness evaluation: (a) PV power generation over (b) cloud-cover circumstances.
Figure 13. Cloudiness evaluation: (a) PV power generation over (b) cloud-cover circumstances.
Energies 18 03731 g013
Figure 14. Particulate matter: (a) hourly concentration and (b) normalized deposition extent.
Figure 14. Particulate matter: (a) hourly concentration and (b) normalized deposition extent.
Energies 18 03731 g014
Figure 15. Concentrations of various pollutants in (a) hourly and (b) daily resolutions.
Figure 15. Concentrations of various pollutants in (a) hourly and (b) daily resolutions.
Energies 18 03731 g015
Figure 16. Input/~output correlation coefficients for main predictors.
Figure 16. Input/~output correlation coefficients for main predictors.
Energies 18 03731 g016
Figure 17. Obtained results for the PV power (MW) forecasting for representative weeks in 2023.
Figure 17. Obtained results for the PV power (MW) forecasting for representative weeks in 2023.
Energies 18 03731 g017
Table 1. Representative research on PV power forecasting.
Table 1. Representative research on PV power forecasting.
Study YearMethodInput PredictorsData HarvestingPerformance MetricsRef.
2020Decomposition-based optimized extreme learning machineSolar irradiance and module temperature-2.39 (RMSE) 1.89 (MAE)[19]
2020Auto-regressionDaily PV generation-20.58 (MSE) 21.71 (RMSE)[4]
2020Numerical weather predictionSolar irradianceFeature selection4.5 (RMSE) 2.6 (MAE)[20]
2020Convolutional Neural NetworksIrradiation, module temperature, ambient temperature, and wind speedSliding window11.95 (RMSE) 4.48 (MAE)[2]
2021ANNCloudiness, temperature, precipitation, and humidity-4.7 (MAPE)[21]
2021Feed-forward and recurrent NNSolar irradianceSpatial and temporal optimization5.3–21.5 (MAPE) 9.6–72.7 (RMSE)[22]
2021Least squares SVMWind speed, solar irradiance intensity, ambient temperature, and humidityFuzzy c-means2.55–6.03 (RMSE)[3]
2021Sequential ensemble modelActive power, wind speed, temperature, humidity, global horizontal radiation, diffuse horizontal radiation, wind direction, and daily rainfallTime-series decomposition3.01–16.49 (MAPE)[23]
2022Modified Firefly Algorithm–ElmanLight intensity, temperature, humidity, wind speed, and atmospheric pressurek-means clustering1.3 (RMSE)[24]
2022Long short-term memory algorithmWind speed, ambient temperature, global horizontal radiation, wind direction, air pressure, and daily rainfallData segmentation6–9 (MAPE)[8]
2022Hybrid deep learningModule temperature, ambient temperature, solar irradiance, and wind speedExperimental15.4–19.1 (RMSE) 10.8–22.9 (MAE)[18]
2023ANNsTemperature, global radiation, diffuse radiation, wind speed, wind direction, precipitation intensity, total cloud fraction day of the year, hour of the dayWeather observations6.38 (RMSE) 4.03 (MAE)[9]
2023Temporal convolutional networkOriginal daily irradiance series and cloud eventsConventional detector site selection10.48 sunny, 32.93 rainy and 57.68 cloudy (RMSE)[10]
2024Multi-layer perceptron and Gate recurrent unit Undefined meteorological variables and solar irradianceIsolation forest8.727 (MSE) 2.132 (MAE)[25]
2024Hybrid convolutional NN and gated recurrent unitSky statusk-nearest neighbor and generative adversarial net18.30 sunny, 20.71 partly cloudy and 19.77 cloudy (RMSE)[26]
2025Regression treesTemperature, cloud index, humidity, wind speed, wind direction, increasing capacity, previous generation, and dust extent (concentration and deposition)Pearson correlation and mutual information1.44 (MAE) 0.05 (RMSE) 0.34 (MARNE)This work
Table 2. Correlation coefficients between input variables and PV power output.
Table 2. Correlation coefficients between input variables and PV power output.
PeriodsTemperatureRelative HumidityCloudinessWind SpeedWind DirectionParticulate Matter
NorthEastSouthWestConcentrationDeposition
Winter0.00−0.66−0.690.030.010.090.03−0.03−0.17−0.19
Spring−0.30−0.65−0.640.110.020.190.380.04−0.12−0.16
Summer−0.40−0.68−0.670.250.100.220.620.07−0.11−0.15
Autumn−0.25−0.67−0.750.140.070.150.41−0.02−0.12−0.14
Annual−0.25−0.66−0.680.120.040.140.370.01−0.13−0.14
Table 3. Comparative annual and seasonal performance metrics.
Table 3. Comparative annual and seasonal performance metrics.
YearsPeriodsMAERMSEMARNE
2023Winter1.750.080.89
Spring1.640.070.83
Summer1.150.050.58
Autumn1.500.060.76
Annual1.440.030.73
2024Winter2.190.110.41
Spring1.770.090.39
Summer0.990.100.27
Autumn1.270.070.24
Annual1.610.050.34
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nikolaidis, P. AI-Enhanced Photovoltaic Power Prediction Under Cross-Continental Dust Events and Air Composition Variability in the Mediterranean Region. Energies 2025, 18, 3731. https://doi.org/10.3390/en18143731

AMA Style

Nikolaidis P. AI-Enhanced Photovoltaic Power Prediction Under Cross-Continental Dust Events and Air Composition Variability in the Mediterranean Region. Energies. 2025; 18(14):3731. https://doi.org/10.3390/en18143731

Chicago/Turabian Style

Nikolaidis, Pavlos. 2025. "AI-Enhanced Photovoltaic Power Prediction Under Cross-Continental Dust Events and Air Composition Variability in the Mediterranean Region" Energies 18, no. 14: 3731. https://doi.org/10.3390/en18143731

APA Style

Nikolaidis, P. (2025). AI-Enhanced Photovoltaic Power Prediction Under Cross-Continental Dust Events and Air Composition Variability in the Mediterranean Region. Energies, 18(14), 3731. https://doi.org/10.3390/en18143731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop