The primary objective of this study is to forecast the number of new photovoltaic installations and the average capacity of a single PV installation using an LSTM model trained exclusively on empirical data from one of the largest PV installers in Poland. This methodology allows for capturing the actual variability of the market and accurately reflecting its dynamics.
The results of the analysis indicate that the application of LSTM models enables effective representation of demand fluctuations in the PV sector, providing businesses and decision-makers with a tool to support sales management, distribution network development, and strategic planning. This approach also opens new perspectives for using predictive modeling in the context of energy planning, logistics, and low-emission transition—as a tool supporting the energy transformation process. Compared with existing research, which predominantly focuses on electricity generation forecasting, this study emphasizes the forecasting of market demand—namely, the number of completed PV installations and their average unit capacity. This distinction represents an important novelty, as it directly addresses the decision-making needs of enterprises and market operators rather than purely technical aspects of generation.
1.1. The Photovoltaic Market in Poland and Europe
Over the past two decades, the photovoltaic (PV) market has undergone a profound transformation, both in terms of the geographical structure of production and the pace of technological advancement. At the beginning of the 21st century, China and Taiwan assumed a dominant position in PV cell production, and since 2008, their output has exceeded that of all other regions worldwide. This trend has persisted to the present day. In Poland, by 2023, nearly all newly installed PV systems were based on monocrystalline modules. The share of polycrystalline modules declined virtually to zero, most likely due to the utilization of existing warehouse inventories [
4]. The European PV industry—with the exception of support structure manufacturing—remained active until mid-2023; however, in the first half of 2024, it had almost entirely ceased operations. The current market situation clearly highlights both the decisive role of large-scale subsidization of the Chinese PV industry and the serious issue of oversupply: in 2024, global production capacity exceeded 1 TW, while worldwide demand was more than two times lower.
In the face of such significant oversupply and mounting price pressure, the importance of precise production planning, logistics, and supply chain management is steadily increasing. In this context, analytical tools based on machine learning (ML) and artificial intelligence (AI) are playing an increasingly critical role, enabling demand forecasting, operational process optimization, and adaptation to changing market conditions. The application of these technologies supports PV component manufacturers and distributors in making more informed decisions regarding resource allocation, production and delivery scheduling, as well as mitigating the risks associated with inefficient oversupply management.
At the end of 2023, the total installed capacity of photovoltaic (PV) systems in the European Union reached 257 GW, representing an increase of 51 GW compared to the previous year. The growth rate remained at 25%, similar to that observed in 2021–2022. However, the pace of PV market development in the EU was lower than in Poland, where the increase amounted to 38%. Germany holds the largest installed capacity (81.7 GW), followed by Spain (31 GW) and Italy (29.8 GW) (
Figure 1). A reversal in ranking between Spain and Italy compared to 2022 resulted primarily from revised data on Spain published in the IRENA report. As in the previous year, Poland remains the only Central and Eastern European country to rank among the top six EU member states in terms of total installed PV capacity [
5].
In terms of annual capacity additions in 2023, Poland ranked fourth in the European Union, behind Germany, Spain, and Italy. Germany once again recorded the highest increase—over 14.2 GW—which was nearly double the value achieved in 2022 (4.8 GW) [
5]. The annual growth dynamics of the Polish PV market have remained at a double-digit level for the past eight years. This has enabled Poland to remain among the leading European countries in terms of newly installed capacity, and forecasts indicate the continuation of this trend in the coming years. In 2008, the total installed PV capacity in Poland amounted to only 800 kW, of which 600 kW was off-grid installations. By 2010, this figure had risen to 1.1 MW (including 0.8 MW off-grid), and in 2012 it reached 3.6 MW, with 2.2 MW accounted for by off-grid systems. According to data from the Energy Regulatory Office (URE) as of March 2015, there were 119 photovoltaic installations operating in Poland with a combined capacity of 21 MWp.
Currently, the photovoltaic market in Poland is developing at an exceptionally rapid pace, reaching successive records of installed capacity. By the end of 2023, the total installed PV capacity amounted to approximately 17.08 GW, increasing to 17.73 GW by the end of the first quarter of 2024. For comparison, in 2020, this value was only 2.9 GW [
6] (
Figure 2). Such a rapid expansion in the scale of investments and the number of new PV installations brings growing challenges in demand forecasting, resource management, and infrastructure planning. In this context, the application of machine learning- and artificial intelligence-based tools becomes crucial to ensuring operational efficiency and market stability.
An important factor influencing the development of renewable energy in Poland, particularly in the field of photovoltaics, is the phenomenon of so-called prosumer energy. The term “prosumer” was introduced in the 1980s by futurist Alvin Toffler as a combination of the words producer and consumer. In the field of information technology, the term describes software users who not only consume free software but also actively co-create it by adding their own code fragments or developing new programs. Other interpretations of the concept link it with the terms professional and consumer, referring to advanced users of electronic equipment, or pro-active and consumer, defining individuals who consciously select high-quality products, actively compare offers, and optimize their purchases.
In the context of energy, prosumers play a crucial role in the development of the photovoltaic market in Poland. They are simultaneously producers and consumers of electricity generated from renewable sources. By installing PV micro-installations on the rooftops of their homes, they contribute to the increase in the total installed photovoltaic capacity in the country. Their active involvement enables the decentralization of energy production, which helps to reduce the load on the power grid and strengthens local supply sources. Prosumers often benefit from support programs such as subsidies or net metering schemes, which further encourage investments in renewable energy. As a result, their activity accelerates Poland’s energy transition, increasing the share of renewable energy in the national energy mix and supporting sustainable development.
Currently, prosumer micro-installations dominate the market structure, accounting for approximately 66% of new installed capacity in 2023. In the same year, capacity additions amounted to 2022 MW, which, compared to 3217 MW in 2022, represents a clear decline in the growth rate from 69% to 43%. The reasons for this phenomenon can be attributed to the limited forms of support available to customers [
4].
In the face of a growing number of prosumers and evolving regulatory conditions, forecasting market behavior and the demand for PV micro-installations has become a significant analytical challenge. Machine learning and artificial intelligence tools are playing an increasingly important role in this area, as they enable the analysis of large datasets, the identification of trends, and the modeling of the impact of various support scenarios on the development of the prosumer market. This, in turn, allows for improved public policy planning, the adjustment of market offerings, and the optimization of energy management at both the local and national levels.
It is also worth noting that Poland ranks fourth in the world in terms of installed PV capacity per capita. On a per capita basis, Poland possesses nearly twice the PV capacity of China. These figures confirm that countries previously lagging in the energy transition are now investing intensively in solar power. However, such dynamic development of the PV sector in Poland is associated with challenges, such as grid connection refusals and administrative barriers, which may hinder further market growth [
6].
In summary, at the end of 2023, prosumer micro-installations accounted for 66.3% of total PV capacity, representing a slight decline compared to a 75% share in 2022. Among these installations were not only household systems but also installations owned by enterprises, mounted on service facilities, commercial buildings, and religious structures, operated by so-called autoproducers.
The installed capacity structure of PV systems in Poland at the end of 2023 consisted of the following (
Figure 3):
Micro-installations (up to 50 kW), constituting the prosumer segment, with a total capacity of over 11.3 GW;
Small installations with capacities between 50 kW and 1 MW, totaling 4.1 GW;
Photovoltaic farms with capacities exceeding 1 MW, with a total capacity of 1.6 GW.
The total installed capacity of photovoltaic systems at the end of 2023 amounted to 17,057 MW.
In light of the presented information on the dynamic development of the photovoltaic market in Poland, Europe, and worldwide, the use of modern analytical tools in managing this sector becomes particularly important. Such tools can support not only demand forecasting but also production planning, logistics, and infrastructure development. The integration of machine learning and artificial intelligence into management processes is becoming essential for the efficient growth of the photovoltaic sector and for mitigating investment risk.
1.2. The Importance of Data Analytics and Artificial Intelligence in the Renewable Energy Sector
In the digital era, where data constitutes the foundation of most economic sectors, the energy industry—particularly the renewable energy segment—is increasingly dependent on advanced analytical tools and artificial intelligence technologies. The growing number of photovoltaic installations, the rise of prosumer energy models, and the proliferation of distributed energy generation have resulted in the generation of vast volumes of data, both in real time and as historical records. These data originate from diverse sources, including PV modules, inverters, monitoring systems, weather forecasts, Internet of Things (IoT) devices, grid management systems, and satellite and geospatial platforms.
A key application of AI in photovoltaics is energy production forecasting, which also encompasses the use of artificial neural networks to estimate the number of installed PV systems and the average capacity of individual installations. Advanced machine learning models enable the generation of highly accurate forecasts of electricity generation based on meteorological parameters, seasonal variations, panel tilt angles, and shading effects. Such models facilitate improved balancing of energy systems, particularly in the context of microgrids, PV farms, and smart grids. Under conditions of high solar irradiance variability, AI algorithms are capable not only of predicting production fluctuations but also of responding to them in real time via dynamic load management, energy source switching, or energy storage control.
Consequently, data analytics serves not only as a tool supporting operational management of installations but also as a foundation for strategic planning, optimization, and automation of entire energy systems. Artificial intelligence acts as a catalyst in the energy transition process by enabling rapid and precise analysis of large datasets (Big Data), forecasting complex phenomena, and supporting decision-making under uncertainty.
An important application of AI is the condition monitoring of photovoltaic installations. Through analysis of data collected from sensors and SCADA systems, AI can detect operational irregularities, diagnose faults, and localize performance degradation caused by soiling, module aging, mechanical damage, or inverter malfunction. When combined with imaging technologies—such as thermal imaging from drones or industrial cameras—AI can automatically identify defective cells, hot spots, and other anomalies that are otherwise time-consuming and costly to detect using traditional methods.
AI is also extensively employed in energy storage management, a critical component of renewable energy systems. Decision-making algorithms analyze forecasts of production and demand, market prices, and grid conditions to determine optimal battery charge and discharge times, as well as the proportion of energy to inject into the grid versus retain for self-consumption. Such intelligent optimization enhances the economic viability of installations and improves overall system stability.
Another significant application concerns the planning and optimization of PV investments. AI leverages spatial data (GIS), land use information, slope, insolation, historical weather data, and local infrastructure constraints to identify optimal locations for new PV installations. It also models investment profitability by considering dynamic tariffs, energy prices, component costs, and evolving technology trends. These capabilities underpin solar potential mapping tools increasingly utilized by municipalities and energy developers.
Moreover, AI supports automatic management of energy networks with high shares of renewable sources by addressing generation variability and grid instability. Algorithms predict local overloads, analyze energy flows in real time, control power dispersion, and autonomously manage energy distribution and balancing within the grid.
AI applications extend beyond technical domains to include social sentiment analysis and regulatory risk assessment, for instance, through social media monitoring, public consultation analysis, and legislative trend evaluation. This enables modeling of social acceptance levels for renewable energy projects, a critical factor in regional and national infrastructure development.
Despite its immense potential, AI implementation in the renewable energy sector faces challenges. Data quality remains a primary limitation, as imprecise, incomplete, or inconsistent data can lead to inaccurate conclusions and suboptimal decisions. Additionally, successful AI deployment requires adequate infrastructure—including computational resources, data storage capabilities, and skilled analytical and technical personnel. Ethical and legal considerations also arise, notably regarding algorithmic transparency, auditability, data privacy, and accountability for decisions made by autonomous systems.
Nevertheless, AI continues to be a pivotal driver of renewable energy development. When combined with cloud computing, IoT, blockchain technology, and 5G networks, AI forms the backbone of the modern digital energy paradigm, wherein intelligent data analysis is as critical as the physical energy infrastructure itself [
7]. Within photovoltaics, this translates into better-designed, more reliable, efficient, and economically viable systems capable of not only producing energy but also dynamically responding to the requirements of users, grids, and markets.
In this context, AI facilitates the creation of flexible, self-learning predictive models that integrate historical, meteorological, behavioral, and economic data to forecast future energy consumption at system, regional, household, or individual site levels. Machine learning enables the development of increasingly complex and accurate models capable of accommodating typical and unforeseen demand fluctuations. Demand forecasting in photovoltaics is intrinsically linked to production forecasting—estimating daily energy output based on weather forecasts, seasonality, or geographic factors—thus enabling integrated energy balancing. These approaches also encompass forecasting of PV installation sales in specific regions, serving as decision-support tools for sales departments. Neural network architectures such as LSTM, ensemble methods like XGBoost, and hybrid models combining multiple techniques generate highly accurate short- and long-term forecasts, thereby supporting improved energy system management, optimized operation of energy storage, electric vehicle chargers, and flexible demand-side management and sales strategies.
As a result, AI has become an indispensable instrument for stakeholders across the energy value chain—from prosumer micro-installations, through commercial enterprises, to transmission system operators and policymakers responsible for energy governance.
The development of renewable energy sources, including photovoltaics, represents a fundamental pillar in the transition towards a sustainable and low-carbon energy system. The increasing deployment of PV installations and their growing share in the energy mix drive the demand for advanced methods of management, forecasting, and operational optimization. Within this context, artificial intelligence techniques, particularly machine learning, have gained significant attention due to their capability to efficiently process and analyze large volumes of data generated by PV systems.
Machine learning, as a subfield of AI, enables the construction of models that learn autonomously from input data without explicit programming of decision rules. In recent years, ML methods have been widely applied across various facets of PV system operation, including short-term energy yield forecasting, fault detection and anomaly diagnosis, maximum power point tracking (MPPT), and energy storage management.
The integration of ML in photovoltaics not only enhances the energy efficiency and reliability of PV systems but also facilitates improved grid integration and operational optimization under variable weather conditions and dynamic load profiles. Advances in ML algorithms—such as artificial neural networks (ANNs), convolutional neural networks (CNNs), long short-term memory (LSTM) networks, ensemble methods (e.g., XGBoost, random forest), reinforcement learning (RL), and automated machine learning (AutoML)—offer novel opportunities for automation and intelligent energy management.
Among the most extensively researched and implemented ML applications in PV is the forecasting of electricity generation. Accurate power forecasting is critical for efficient grid management, supply–demand balancing, and optimization of energy storage systems. Especially in decentralized renewable energy systems, short-term forecasts (ranging from minutes to several hours) reduce energy losses and enhance supply stability.
To elucidate the potential and diversity of AI and ML applications in the PV sector, a comprehensive review of recent studies is indispensable. Current research highlights the predominance of deep learning algorithms, notably LSTM, recurrent neural networks, and CNNs, alongside ensemble learning techniques such as XGBoost and random forest. For instance, in [
8], the deep extreme learning machine (DELM) model coupled with the novel ECBO-VMD (enhanced chaotic bat optimization–variational mode decomposition) signal decomposition method demonstrated accurate PV production forecasts within a 15 min to 4 h horizon, characterized by significantly reduced training times compared to traditional deep learning models.
AutoML-based frameworks for automatic model and feature selection, as presented in [
9], further advance forecasting capabilities. The authors developed a multi-model predictive system combining ElasticNet regression, gradient boosting, and random forest algorithms, with input variable optimization via a genetic algorithm. This approach was successfully validated across diverse Japanese regions, exhibiting robustness even with limited historical data.
In [
10], the fusion of LSTM networks with wavelet packet decomposition enabled the capture of both local and global temporal features in PV generation data. Another promising strategy described in [
11] employed transfer learning, facilitating knowledge transfer from solar irradiance time series datasets to PV power generation, which is particularly advantageous in data-scarce new locations.
Probabilistic forecasting methods have garnered growing interest due to their capacity to provide not only point estimates but also uncertainty quantification. The quantile CNN model proposed in [
12] offers interval forecasts alongside reliability metrics such as prediction interval normalized average width (PINAW) and prediction interval coverage probability (PICP), supporting improved risk management in renewable energy systems. Other studies [
13,
14] implemented Gaussian process regression (GPR) and LSTM models with bootstrap-derived confidence intervals to similar effect.
Furthermore, ML solutions optimized for deployment in resource-constrained environments are critical for practical adoption. The edge computing system described in [
15], operating on a Raspberry Pi platform, integrates LightGBM with temporal pattern optimization and weather data clustering via tree-structured self-organizing maps (TS-SOM), achieving a favorable balance of accuracy and computational efficiency.
Beyond forecasting, ML techniques are extensively applied in fault detection, anomaly diagnostics, MPPT tracking, and PV system design and management optimization. The inherent variability of environmental conditions, process nonlinearity, and evolving data distributions constitute challenges well-suited for ML approaches, underscoring their relevance in advancing PV technology and grid integration.
One of the key practical applications of machine learning in photovoltaic systems is fault and anomaly detection. Due to the large number of measurement sensors and the availability of thermographic (IR) and electroluminescence (EL) imaging, PV systems generate data that can be analyzed for early detection of defects such as shading, hot spots, cracks, soiling, and module degradation. The authors of the review [
16] presented a model based on support vector machines (SVMs) utilizing 41 features extracted from IR images to classify modules as healthy, affected by harmless hot spots, or actually faulty. In study [
17], a long short-term memory model supported by discrete wavelet transform (DWT) was employed for detecting various types of faults in PV systems, achieving superior results compared to classical algorithms such as SVM or decision trees. An interesting approach was applied in [
18], where one-dimensional measurement data were transformed into two-dimensional scalogram images and subsequently analyzed using a pretrained AlexNet network. The use of transfer learning enabled the model to achieve high accuracy in classifying multiple fault types, including arc faults, partial shading, and open/short circuits.
It is important to note that many PV installations—particularly industrial ones—have limited labeled fault cases. Therefore, researchers increasingly adopt unsupervised learning and clustering techniques. For instance, in [
19], an anomaly detector based on an auto-Gaussian mixture model (Auto-GMM) was proposed to identify deviations from normal operation in measurement data, followed by classification of fault types using an XGBoost classifier based on frequency-domain features (Fourier spectrum). The authors emphasize the importance of the local anomaly index (LAI) for precise fault identification.
Another significant ML application in photovoltaics is maximum power point tracking. Under conditions such as partial shading, variable temperature, or module soiling, the voltage-current characteristic of a PV system may exhibit multiple local maxima, only one of which is the global maximum power point (GMPP). Traditional tracking algorithms, such as perturb and observe or incremental conductance, may fail under these circumstances, creating opportunities for intelligent adaptive algorithms. A comprehensive review [
20] presents a wide range of ML-based approaches, including artificial neural networks (ANN), fuzzy logic control (FLC), swarm intelligence (PSO) algorithms, genetic algorithms (GAs), and increasingly, reinforcement learning (RL). RL methods enable dynamic adaptation to variable PV system operating conditions by learning control strategies based on feedback (rewards) without requiring explicit physical modeling of the system. These approaches have demonstrated particular effectiveness in unstable and unpredictable environments and can be implemented in embedded systems with limited computational resources.
In recent work [
21], a hybrid machine learning and bootstrap approach has been applied to probabilistic demand forecasting in the Southeast region of the Mexican power system in 2024. This method provides not only accurate point forecasts but also well-calibrated prediction intervals, explicitly addressing the uncertainty inherent in energy demand data. Such probabilistic forecasting has been shown to deliver more reliable information for system operators and decision-makers, especially under conditions of variability and market risk, compared with deterministic approaches.
In this latest study [
22], the authors developed an optimized LSTM model for forecasting short-term electricity consumption under dynamic pricing and demand response schemes in smart grids. The proposed approach emphasizes integrating forecasting accuracy with practical application in flexible tariff systems, where precise demand forecasting is essential for pricing and load-shifting strategies. The results demonstrate that the optimized LSTM model significantly reduces forecasting errors compared to conventional statistical techniques and alternative neural network models, thereby increasing the efficiency of load management and system balancing. Furthermore, the study highlights the potential of such forecasting methods in supporting energy suppliers and system operators in their decision-making processes under dynamically changing market conditions.
The aforementioned examples illustrate the broad and diverse applications of ML in PV systems. Machine learning not only facilitates more accurate energy production forecasting but also automates fault diagnosis, enhances system efficiency through adaptive MPPT, and supports decision-making under uncertainty. The ongoing development of these technologies clearly indicates the growing role of ML as an integral component of future intelligent renewable energy systems. Thus, ML holds significant potential to become one of the pillars of modern intelligent energy systems based on photovoltaics. However, fully realizing this potential requires further advancement of methodologies, access to high-quality data, and close collaboration between academia, industry, and the public sector. Such an integrated approach also contributes to optimizing the operational costs of PV solutions. To provide a clearer overview of the use of AI methods in photovoltaic systems, a comparative table was prepared that summarizes the most important issues and contributions of selected studies analyzed (
Table 1).
Based on the comparative table, and as will be described in detail in the following sections, the modeling approach presented in this study distinguishes itself from previous studies primarily in terms of data coverage and application area. Most previous work focused on forecasting photovoltaic energy production using meteorological and generation data. In contrast, our study focuses on the installation market, forecasting the number of completed micro-installations and their average unit power. The analysis is based on a unique, confidential dataset from one of the largest photovoltaic installers in Poland, covering over 12,000 micro-installations implemented over 24 months. This approach enables direct support for sales planning, logistics, and resource management within the company, rather than simply estimating energy production. The contribution of this study is therefore (i) the application of ML models in a new context—forecasting market demand rather than energy production; (ii) the use of a unique business dataset; and (iii) providing empirical evidence for the superiority of sequential modeling (LSTM) over conventional modeling (MLP) in this scenario.
Given the increasing importance of PV as an energy source for households, enterprises, and entire regions, economic and operational analyses are gaining particular significance in determining the viability of PV installation deployment and operation. Evaluations encompassing both investment and operational perspectives form the basis for rational decision-making in the renewable energy sector. Automation of processes, improved energy production forecasts, and early fault detection contribute to reducing operational losses, lowering maintenance costs, and enhancing energy consumption and sales planning. Consequently, ML-based technologies provide tangible support in achieving economic and managerial benefits.
It should be noted that explainability (XAI) techniques, which have become an important trend in recent forecasting studies, were not applied in the present work. The main objective of this study was to evaluate and compare forecasting performance between LSTM and MLP models using real-world company-level data. Nevertheless, incorporating explainability into future analyses is considered a promising extension of this research.