1. Introduction
The global demand for energy has continued to rise gradually and significantly in recent years. The excessive exploitation and consumption of fossil fuels has resulted in substantial air pollution and other environmental issues [
1]. With the intensifying global energy crisis, renewable energy sources such as photovoltaic (PV) power generation, wind energy, and hydropower have garnered significant worldwide attention for their environmentally friendly attributes such as low-carbon emissions and cleanliness, which contribute to the mitigation of environmental pollution issues [
2]. Among these, PV power generation, as one of the most widely adopted renewable energy technologies, plays a crucial role in ensuring the continuous, stable, and economical operation of power systems [
3]. According to statistics, the global renewable energy capacity increased by 260 GW in 2020, with solar photovoltaics accounting for nearly half of this expansion [
4].
Thee accurate forecasting of PV power output enables smart grids to efficiently manage and integrate solar energy generation [
5]. PV power forecasting in smart grids can be practically applied from four perspectives. First, energy management enables grid operators to plan and adjust power supply to reliably meet demand. By accurately predicting renewable energy generation, operators dynamically regulate electricity supply in real-time, reduce the risk of blackouts, and enhance grid reliability. Second, long-term planning for future capacity requirements supports informed investment decisions in grid infrastructure. Third, forecasting assists in energy market operations by helping to set prices and facilitating renewable energy credit trading. Finally, risk management involves identifying potential supply disruptions and taking appropriate measures to minimize their impact [
6]. However, PV power production is highly susceptible to weather conditions and other environmental factors, exhibiting inherent intermittency, volatility, and randomness. These characteristics pose considerable challenges to achieving high-precision PV power prediction and smart control.
Numerous researchers have been actively engaged in the study of photovoltaic PV power forecasting. PV power forecasting methods can be generally categorized into physical models, statistical learning methods, and machine learning models [
7]. Physical models establish mathematical relationships between PV power output and solar radiation, typically calculated using numerical weather prediction or satellite-derived models [
8]. These approaches require detailed considerations such as PV system location, panel tilt angle and orientation as well as weather conditions. However, the effectiveness of physical methods is often constrained by model complexity and computational burden, particularly for high-precision modeling [
9]. Statistical methods establish mathematical models by extracting patterns of variation from historical data. Compared with physical models, these approaches are generally more suitable for short-term forecasting [
10]. Existing statistics-based time series methods have been applied to capture correlations in PV power curves, including exponential smoothing [
11], ARMA [
12], ARIMA [
13], and SARIMA [
14]. Although these methods are computationally efficient, their simple structures limit their ability to model complex nonlinear relationships.
With the rapid advancement of machine learning and deep learning technologies, statistical methods have been gradually supplanted by deep learning approaches. In contrast to traditional methods, deep learning methods exhibit stronger nonlinear modeling capabilities and adaptability. They are better equipped to handle complex time series data [
15]. The mainstream methods for PV power prediction can be classified into models based on RNN and CNN. For instance, LSTM and BiLSTM, two improved versions derived from RNN, are employed to extract the inherent temporal relationships within PV sequences [
16]. Wang et al. [
17] enhanced the LSTM model for PV power prediction by incorporating a frequency domain decomposition method, and this approach demonstrated superior prediction performance. Agga et al. [
18] proposed CNN-LSTM and ConvLSTM models to forecast the power generation of PV power plants. The results indicated that both CNN-LSTM and ConvLSTM outperformed the LSTM model. Additionally, TCN, a novel convolutional architecture designed for sequential modeling, has shown excellent performance in PV power generation prediction and has been proven to outperform deep learning models such as CNN and LSTM [
19]. Xiang et al. [
20] introduced a hybrid model combining TCN and LSTM and found that it was more effective in capturing the complex long-term dependencies of spatial and temporal features compared with the CNN-LSTM model. Previous studies on PV power generation prediction using TCN typically involved constructing models by integrating shallow TCN with LSTM. Nevertheless, shallow TCN might not be able to capture the latent temporal patterns and fine-grained temporal information in PV sequences [
21]. Moreover, previous methods often relied on a single neural network to extract the spatial relationships among variables, overlooking the fact that the spatial and temporal relationships in multivariate time series are intertwined.
Different forms of neural networks such as CNN [
22], RNN [
23], and Transformer [
24] were applied in photovoltaic power generation prediction. These neural networks have shown significant advantages in modeling real-world time series data. However, one of the major limitations of the above methods is that they do not model the hidden spatial relationships between time series [
25]. The environmental factors of multi-variable PV sequences interact with each other and change over time [
26]. Photovoltaic power generation is influenced by environmental factors such as solar irradiance, temperature, humidity, and wind speed.
A graph is a type of data structure that can naturally model complex relationships among a set of entities in real-world scenarios. In practice, many types of data inherently exhibit graph-like properties, such as social networks and e-commerce user–item interactions. In fact, numerous time series are spatiotemporally correlated in nature [
27]. For such time series, modeling them in the form of networks or graphs can effectively leverage both the data itself and its spatial dependencies to improve the forecasting accuracy. In recent years, GNNs have emerged as a powerful tool for modeling real-world time series data, capable of capturing complex intervariable and temporal relationships. This approach has gained widespread attention and application in the field of traffic prediction [
28] and PV forecasting. Hasnat et al. [
29] proposed a graph attention network (GAT)-based solar power forecasting framework constructed according to geographical distances. The framework adapts to prediction horizons ranging from several minutes to multiple days by adjusting individual modules within the architecture. Graph neural networks (GNNs) have been widely employed in forecasting for distributed photovoltaic (PV) power stations, where graph structures are used to represent the relationships among distributed sites. Wang et al. [
30] developed a domain-adversarial graph neural network-based method for ultra-short-term distributed PV power forecasting, addressing the challenge of data scarcity that arises in virtual power plants due to newly constructed sites or data-sharing limitations. Wang et al. [
31] further proposed a dynamic graph network for ultra-short-term distributed PV power forecasting based on a shape–amplitude loss function. In this approach, dynamic graphical data are used to represent interstation correlations, and a dynamic graph network is constructed as the forecasting model. Lin et al. [
32] introduced a novel end-to-end deep learning model for the short-term probabilistic forecasting of regional PV generation. The model employed a directed graph-based dynamic spatial convolutional graph neural network, in which multi-source inputs are used to determine the contribution of one PV station to another. Wang et al. [
33] also proposed a domain-adversarial graph neural network approach that utilized a GNN encoder to extract spatial features and capture inter-site spatial correlations, thereby improving ultra-short-term distributed PV forecasting under data-scarce conditions. GNNs have become powerful tools for learning non-Euclidean data representations [
34], providing new ideas for modeling real-world time series data and capturing the relationships between different variables in multivariable sequences. Combining GNNs with existing time series frameworks is expected to further improve model performance [
25]. Han et al. [
35] combined the attention mechanism with an adaptive graph neural network to achieve accurate building energy consumption forecasting and optimize energy structure design. Gao et al. [
36] proposed an attention-driven spatiotemporal hybrid model that integrated multi-graph structures and attention-based feature fusion to enhance both single-site and multi-site PV power forecasting performance. This paper proposes a PV power generation prediction model based on adaptive GNN, which takes environmental factors into account and inputs them together with photovoltaic power generation data into the model for prediction. In addition, unlike most existing works that focused on short-term forecasting, this work showed a significant improvement in the long-term forecasting results, which is more important to real-world PV applications. The main contributions of this work are summarized as follows:
- (1)
- A customized graph neural network (GNN) architecture was designed to model the hidden relationship between photovoltaic power generation and environmental factors. The proposed model is a structure where TCN and MLP layers alternate with graph neural network layers, which is conducive to capturing the coupled spatiotemporal features in the data while paying attention to both the global change patterns and local trends of photovoltaic power generation. 
- (2)
- An adaptive graph neural network was used to learn the latent variable relationships from the data. Compared with related works, using directed graphs in long-span prediction tasks can better model the interrelationships between variables in real scenarios, thereby improving the prediction accuracy of the model. 
- (3)
- The proposed method was applied to the real-world photovoltaic power generation prediction of three photovoltaic power sites. In the prediction tasks of the three stations, the proposed model achieved the highest prediction accuracy at prediction steps of 384 and 768, demonstrating good robustness and significant superiority in capturing the peaks, troughs, and fluctuations in long-term photovoltaic power generation. 
  5. Conclusions
This study presented a long-term photovoltaic (PV) power forecasting model based on customized graph neural networks (GNNs) designed to capture complex spatiotemporal dependencies among multiple variables and to leverage environmental information for enhanced predictive accuracy. The model’s performance was evaluated using MSE and MAE across datasets from three PV power sites. The main experiments and findings are as follows:
- (1)
- Superior accuracy and robustness: Compared with baseline models, the proposed model achieved the highest accuracy and demonstrated stronger robustness in forecasting horizons of 384 and 768 steps. It improved the MSE and MAE by an average of 2.19% and 1.57% at the 384-step horizon, and 2.81% and 2.47% at the 768-step horizon, respectively, relative to the best-performing baseline. Furthermore, models that captured hidden inter-variable relationships consistently outperformed those focusing solely on temporal patterns or spatial relationships. 
- (2)
- Enhanced long-term predictive capability: To more intuitively demonstrate the model’s predictive performance in long-term photovoltaic (PV) power forecasting, the predicted and true power curves for the 76-step horizon were visualized. The proposed model showed significant advantages in capturing the peak, trough, and fluctuation patterns compared with all baseline models, achieving superior fitting performance. 
- (3)
- Impact of correlation coefficient k: The correlation coefficient k determines the number of relevant variables used for prediction, and an optimal value of k exists. The model achieved its best performance when k = 4, particularly for longer forecasting horizons of 384 and 768 steps. An excessive number of relevant variables introduced informational noise, while too few variables led to the insufficient utilization of contextual information. Both extremes resulted in decreased prediction accuracy 
- (4)
- Impact of correlation coefficient k: The type of graph structure also notably affects the forecasting performance. Incorporating graph information improved the accuracy in longer-horizon predictions. Compared with using an undirected graph, the proposed model with a directed graph achieved the best results at 384 and 768 steps, improving the MSE and MAE by 0.67% and 0.99% at the 384-step horizon and 1.12% and 0.58% at the 768-step horizon, respectively. 
The proposed GNN-based framework integrates environmental information to address the challenge of accuracy degradation in long-term PV power forecasting, demonstrating strong robustness, adaptability, and scalability in complex and dynamic environments.
Despite the promising performance of the proposed model, certain limitations remain. In particular, the adjacency matrix employed by the adaptive graph neural network is dynamic and may lack sufficient stability, limiting its ability to fully capture and explain the relationships between PV power generation and the surrounding environmental factors. In future work, we will further explore the interpretability of graph-based models. These efforts aim to facilitate optimal PV installation planning by comprehensively incorporating environmental information, thereby maximizing the power generation efficiency.