Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs

Lv, Chen; Fan, Hang; Zhang, Zuhan; Fan, Menghua; Run, Wencai; Yang, Liuqing; Yang, Yuying; Liu, Dunnan

doi:10.3390/electronics14224519

Open AccessArticle

Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs

by

Chen Lv

¹,

Hang Fan

^2,*

,

Zuhan Zhang

²,

Menghua Fan

¹,

Wencai Run

²

,

Liuqing Yang

²,

Yuying Yang

² and

Dunnan Liu

²

¹

State Grid Energy Research Institute Co., Ltd., Beijing 102209, China

²

School of Economics and Management, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4519; https://doi.org/10.3390/electronics14224519

Submission received: 9 October 2025 / Revised: 10 November 2025 / Accepted: 12 November 2025 / Published: 19 November 2025

Download

Browse Figures

Versions Notes

Abstract

Distributed photovoltaic power generation is volatile and intermittent, and its power generation is usually difficult to accurately predict. Previous studies have focused on physical or mathematical modeling methods, and it is difficult to grasp the complexity and variability of historical data, and the prediction accuracy is limited. To address these challenges, this paper proposes Solar-LLM, a novel prediction framework that adapts a pre-trained Large Language Model (LLM) for time-series forecasting. By freezing the core LLM and reprogramming only its input and output layers, Solar-LLM efficiently translates numerical time-series data into a format the model can understand. This approach leverages the LLM’s inherent ability to capture long-term dependencies and complex patterns, enabling effective learning even from limited data. Experiments conducted on a dataset from five photovoltaic power stations show that Solar-LLM significantly outperforms baseline models, proving it to be a highly effective and feasible solution for distributed PV power prediction.

Keywords:

photovoltaic power prediction; large language model; time series prediction

1. Introduction

Distributed photovoltaic (PV) power generation is becoming more and more popular as a clean and sustainable energy source as a result of the global energy environment changing and the rapid development of renewable energy. However, the intermittent and erratic nature of PV power generation poses difficulties for the power grid’s stable operation. Reducing operating costs, improving energy consumption efficiency, and optimizing grid scheduling all depend on accurate PV power projection.

Distributed PV power prediction in current research typically uses deep learning techniques or physical models. The actual research is primarily dependent on machine learning algorithms’ prediction approach, from a data-driven perspective to start the prediction, because physical modeling is frequently not a good fit for the actual operating conditions due to the multitude of elements that affect distributed PV power. Sun et al. [1] builds a PV prediction model based on BP neural networks and uses sensors to gather data from PV cells and the surrounding environment, but the prediction is only made on one site, which E_MAPE in sunny day is 6.96%; Wang et al. [2] suggests a hybrid prediction approach for distributed PV power that takes into account the time shift in meteorological data and uses an integrated learning modeling framework to create a hybrid mechanistic-data-driven model that effectively reduces the prediction impact caused by the geometrical and meteorological offset; Wang et al. [3] screens source domain data with high similarity using instance migration learning to produce an accurate PV power prediction under small sample conditions. The validation was conducted on datasets from different time periods at two sites, with the optimal result achieving a 7.28% in E_MAE. Golestaneh et al. [4] proposed a nonparametric method for ultrashort-term probabilistic prediction using the extreme learning machine. The method achieved an average RMSE of 12.8% across four seasons, demonstrating strong accuracy. However, it is unable to incorporate auxiliary data, such as cloud maps. With the advancement of computer science, some studies have started by improving forecasting techniques by utilizing deep learning methods such as Long Short-Term Memory Networks (LSTM) for time series forecasting. Li et al. [5] considered weather information and deep learning methods, weather clustering of fuzzy weather information in the dataset, on this basis combined with the LSTM network for distributed PV power prediction, and verified the effectiveness of the prediction method under a variety of typical meteorological conditions; Wang et al. [6] improved the LSTM model, added Dropout layer on its basis, and established a comprehensive consideration of the spatio-temporal correlation characteristics of the Neural network structure, so as to carry out distributed PV power interval prediction; Zhang et al. [7] considered the spatio-temporal coupling characteristics of distributed PV output, constructed distributed PV spatial feature extraction based on graph attention network, and combined with meteorological information, constructed iDGA-LSTM probabilistic prediction model with improved quantile regression, which improved the accuracy of probabilistic interval prediction; and Khan et al. [8], through the improved generalized stacked integration algorithm, combined with the ANN-LSTM model, and improved the prediction accuracy through the extreme gradient boosting algorithm (XGBoost) to integrate the prediction results of the base model, effectively improving the prediction accuracy; Ye et al. [9] proposed a prediction model for distributed PV prediction by selecting the historical day with the smallest Euclidean distance from the weather type and other factors of the day to be predicted as the similar day and combining a genetic algorithm and a fuzzy radial-based neural network; Nastić et al. [10] proposed an optimized CatBoost model using PVGIS simulations and Open-Meteo data for hourly PV forecasting in newly commissioned stations with limited historical data. With SFS and Optuna tuning on five inputs, it achieved R² of 0.83–0.90 across three real-world sites. While effective to a degree, these deep learning approaches often require extensive datasets and can struggle to capture very long-term dependencies, particularly in the context of highly variable distributed PV systems.

Currently, there has been some related research on prediction based on large language models. Liu et al. [11] propose Timer, a generative pre-training time series model based on Transformer, which unifies heterogeneous time series into a single sequence format, enabling it to handle multiple time series tasks. Das et al. [12] proposes a decoder-only architecture for a time series prediction base model, which employs a causal attention mechanism where each output token can only depend on all its previous input tokens, enabling it to adapt to different context lengths and prediction lengths based on a pre-trained base-large model. Wu et al. [13] combined spatial cues and temporal cues to apply LLM to wind speed prediction to enhance the model’s understanding of spatio-temporal patterns and prediction ability, using a decomposition architecture to deal with the trend and seasonal components separately, and verified the effectiveness of their method on four datasets. Rasul et al. [14] proposed a decoder-only, lag-embedded foundation model that, once pre-trained on vast univariate time series, unleashes strong zero-shot probabilistic forecasting and can be swiftly fine-tuned to surpass specialized baselines on downstream datasets.

To address these challenges, this paper explores the burgeoning paradigm of leveraging LLM for time-series forecasting. Rather than building a native time-series foundation model from scratch, which requires vast, heterogeneous datasets and may obscure domain-specific relationships, we propose Solar-LLM, a novel framework based on the principle of reprogramming. This approach adapts a pre-trained, general-purpose LLM for the specific task of PV power prediction. Our rationale is twofold: first, this parameter-efficient fine-tuning strategy is highly effective in data-scarce scenarios typical of distributed PV systems. Second, unlike models that universalize data formats, our method preserves the inherent multi-variable structure of PV data, maintaining the critical, physically grounded relationships between factors like solar irradiance and power output. This allows Solar-LLM to not only model complex temporal patterns but also to uniquely integrate numerical data with semantic, contextual information through prompting. We also compare our work with the literature in Table 1.

The primary contributions of this work are as follows:

(1): A dedicated LLM reprogramming framework. We propose an efficient approach that freezes the pre-trained LLM parameters while redesigning only the input and output adaptation layers. This transforms numerical PV time-series data into embedding-compatible representations for direct processing by the LLM, which greatly reduces training costs, mitigates the need for large labeled datasets, and facilitates effective few-shot learning.
(2): A Prompt-as-Prefix (PaP) based cross-modal prompting mechanism. We design a PaP strategy to integrate meteorological information, statistical features, and task instructions into textual prompts. These prompts are fused with the reprogrammed time-series embeddings as prefix tokens, allowing the LLM to fully leverage its pre-trained global reasoning capacity to improve forecasting accuracy and stability across multiple PV stations and under varying weather conditions.
(3): A cross-attention data reconstruction module for numerical-text modality alignment. We propose a module that maps normalized time-series slices into the LLM’s semantic space and aligns them with pre-trained word embeddings via multi-head cross-attention. This approach effectively captures nonlinear dependencies and complex spatio-temporal patterns, enabling accurate ultra-short-term PV predictions without retraining the entire LLM.

2. Materials and Methods

2.1. Problem Formulation and Feature Analysis

The task of distributed PV power forecasting is formally defined as: given a multivariate time series of the past L time steps, predict the power output for the next H time steps. The input data includes various factors such as meteorological conditions, historical generation data, and geospatial information [15]. To ensure the model utilizes the most effective information, this study conducted a Pearson correlation analysis to select input features. This analysis quantifies the linear relationship strength between various influencing factors and the target variable (PV power output). The correlation coefficient R_i for each factor Fi is calculated as follows:

R_{i} = \frac{cov (P_{p v}, F_{i})}{σ P_{p v} σ F_{i}}

(1)

where,

cov (x, m)

is the covariance between the PV output power time series

P_{p v}

, and the influencing factor

F_{i}

, and

σ P_{p v}

,

σ F_{i}

are their respective standard deviations.

As shown in Table 2 and related work [16], the analysis revealed a range of correlation strengths. For a principled feature selection, we adopted a widely recognized threshold where an absolute correlation coefficient |R_i| ≥ 0.5 indicates a relationship of ‘moderate’ strength or higher. This criterion is established to balance model complexity with predictive power, a crucial consideration for PV forecasting data which often contains numerous interacting variables. This ensures that only the most influential factors are included, while excluding those with weaker linear relationships that could introduce statistical noise rather than valuable predictive signal [17]. Based on this threshold, solar irradiance (R_i = 0.921), historical PV power (R_i = 0.793), distance to adjacent stations (R_i = 0.712), and temperature (Ri = 0.536) were identified as having a substantial correlation with the target variable. Therefore, these four features were selected as inputs for the Solar-LLM to optimize forecasting performance.

2.2. LLM Prediction Methodology

2.2.1. Transformer Model

The core of the Solar-LLM is a pre-trained GPT-2 model, which is based on the Transformer [18] decoder architecture. Unlike recurrent models that process data sequentially, the Transformer architecture allows for parallel processing of sequences and effectively models long-term temporal dependencies through its self-attention mechanism which is shown in Figure 1. This foundational capability is leveraged in our framework, where the pre-trained knowledge of the GPT-2 model is adapted for the specific task of photovoltaic power forecasting.

Mult-head self-attention [18]: the input sequences are mapped into queries (Q), keys (K), and values (V), respectively, the attention scores are calculated using a scaled dot-product:

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V = s o f t \max (\frac{σ_{x}^{2} Q' K'^{T} + 1 μ_{Q}^{T} K^{T}}{\sqrt{d_{k}}}) V

(2)

where, d_k is the dimension of the key vector. This mechanism allows the model, when processing an element in a sequence, to simultaneously attend to all other elements and assign different weights based on their relevance. It is this capability that enables the Transformer to excel at capturing long-range dependencies, thereby overcoming issues such as vanishing gradients or information bottlenecks that traditional models face when processing long sequences.

2.2.2. Pre-Training and Fine-Tuning Paradigm

Pre-training refers to training a model on a large amount of datain the pre-training phaseto learn generalized features, which improves the model’s performance and generalization ability on a target task. By learning from extensive data, pre-training provides effective initial weights, enhances model robustness, and helps avoid issues such as gradient explosion [19].

When adapting a pre-trained model to a downstream task, two main strategies are used: feature extraction and model fine-tuning [20]. In feature extraction, the parameters of the pre-trained model are frozen, and its intermediate outputs are used as fixed features for a new task-specific layer. This approach is suitable for scenarios where labeled data is scarce. In contrast, model fine-tuning unfreezes some or all of the pre-trained parameters and optimizes them end-to-end using the downstream task’s labeled data. This allows the model to preserve general features while efficiently adapting to the new task. Most current pre-training techniques use a Transformer as the feature extractor [21], as its parameters can be effectively adjusted to the data characteristics of a new task. This efficient adjustment capability is precisely what empowers these models to exhibit strong performance even with limited task-specific data, a hallmark of few-shot learning.

2.2.3. Adapting LLMs for Time-Series Forecasting

A fundamental challenge in applying pre-trained LLMs to our task is the “modality gap”: LLMs are natively trained on discrete textual tokens, whereas photovoltaic power data consists of continuous numerical time-series. To bridge this gap, two primary adaptation strategies have been explored: embedding-visible and text-visible adaptation. The text-visible adaptation paradigm is shown in Figure 2.

Embedding-visible Adaptation: This approach leverages the LLM as a powerful feature extractor. It entails designing a specialized input layer to convert numerical time-series data into vector embeddings that the LLM can process. The model is then fine-tuned to recognize temporal patterns within these new “numerical tokens.”
Text-visible Adaptation: By contrast, this method attempts to convert numerical data directly into natural language descriptions. It then utilizes the LLM’s inherent text comprehension and reasoning capabilities, guided by prompting, to generate forecasts.

Figure 2. Text-visible LLM adaptation paradigm.

The Solar-LLM framework proposed in this study employs an embedding-visible approach. This strategy focuses on converting the numerical time-series into a format that the LLM can process internally. It involves redesigning the input embedding layer to transform numerical data into high-dimensional vectors that are aligned with the LLM’s latent space. By doing so, the model learns to recognize temporal patterns within these new numerical embeddings, effectively enabling the pre-trained Transformer core to apply its powerful sequence processing capabilities to the forecasting task [22].

3. The Proposed Solar-LLM Framework

The decision to adopt this reprogramming paradigm, rather than developing a native time-series foundation model from scratch or pursuing a full fine-tuning of the LLM, is strategically motivated by the specific challenges of PV forecasting. Full fine-tuning represents a different paradigm with substantial computational and data requirements, aimed at creating a domain-specific model. In contrast, our approach preserves the inherent multi-variable structure of PV data, which is critical for physical plausibility. Furthermore, by leveraging a pre-trained LLM with a frozen core, Solar-LLM capitalizes on powerful, pre-existing sequence modeling capabilities, making it a parameter-efficient strategy that is highly effective for the data-scarce scenarios typical of distributed PV systems.

This framework falls into the category of embedded-visible LLM adaptation. It effectively reprograms a pre-trained large language model for generalized time-series forecasting, a process achieved by deconstructing the dynamic features of PV power sequences into interpretable textual prototypes and leveraging PaP optimization to activate the LLM’s reasoning capabilities on time-series data. This approach yields breakthrough advantages in scenario understanding and data generalization when compared to traditional prediction methods.

A key strength of the framework lies in its strong few-shot learning and transfer capabilities. Solar-LLM adopts a PaP fine-tuning strategy with parameter freezing, which leverages the extensive, generalized knowledge embedded within the LLM from its initial pre-training. This requires only a minimal update of model parameters to adapt to new scenarios. This low-rank adaptation mechanism significantly reduces the need for large volumes of labeled data, allowing the model to maintain reliable predictions even in data-scarce distributed PV systems.

Secondly, PV forecasting is a typical non-linear problem, influenced by various factors such as weather conditions, seasonal changes, and geographical location. The strength of Solar-LLM lies in its foundation on the Transformer architecture. However, its primary advantage extends beyond merely modeling non-linear numerical data. The core innovation of Solar-LLM is its ability to perform cross-modal fusion, integrating quantitative time-series data with qualitative, semantic information derived from textual features. The global attention mechanism enables this fusion, allowing the model to interpret both numerical patterns and semantic context directly. This provides a richer understanding of spatio-temporal correlations than models limited to numerical inputs, and bypasses intricate feature engineering to deliver more accurate and robust predictions.

In summary, Solar-LLM provides a feasible and effective method for distributed PV power prediction through the advantages of multi-feature capture, model robustness, and small-sample learning, and shows a broad application prospect in this field.

3.1. Overall Structure

Considering the low resource requirements and small data sample characteristics of distributed PV prediction scenarios, the core concept of Solar-LLM proposed in this study is to redesign the model input feature processing and output stages based on freezing the LLM, which only requires training the model input and output layers, thus making Solar-LLM more flexible and adaptable to distributed PV power prediction tasks [23]. The overall stricture is shown in Figure 3.

In the input layer module, Solar-LLM introduces a data processing module that patches the normalized multivariate time-series data. This process, which uses a reprogramming layer, partitions the data into contiguous, overlapping patches to capture local temporal data dependencies and transforms the prediction task into a linguistic task [24]. Then, these patches are converted into vector representations and aligned with the LLM’s semantic space via the cross-attention mechanism detailed in the data reconstruction module.

To fully activate the Solar-LLM for the task of predicting distributed PV power forecasting, after completing the reprogramming embedding of the input layer, the Solar-LLM provides the LLM with contextual and task-specific information by enriching the cue prefix technique to enhance the reasoning of the LLM on the timing data. Subsequently, the frozen LLM processes the input vector to generate an output vector.

In the output layer module, Solar-LLM performs output mapping on the derived output vectors, converts the output format of the language model to the desired time-series prediction values, and finally obtains the prediction data. Through the above process, the Solar-LLM realizes efficient and accurate distributed PV power prediction with low resource consumption [25].

3.2. Prompt-as-Prefix Module

In the cue prefix module, to fully activate the predictive capability of Solar-LLM, the cue prefix optimization technique can be used as an overall input layer along with the reprogrammed time series blocks to direct the model to focus on specific tasks and data characteristics, significantly improving the understanding and prediction performance of LLM on time series data [26]. The data related to multiple sites are decomposed into multiple representational cues, and the spatiotemporal correlation features among sites are represented using design cue words. Considering the significant characteristics of the spatiotemporal correlation features of the task, this study proposes designing cue word prefixes from the three aspects of data contextual background, task instructions, and statistical descriptions. The PaP optimization is exemplified in Figure 4.

Dataset context: The dataset context includes background information related to distributed PV power prediction, such as the data source, collection frequency, geographic location, and environmental conditions. For example, “This dataset comes from the historical power generation records of a PV power plant with a geographic location of 30° north latitude and 120° east longitude. The frequency of data collection is hourly, and the power generation and corresponding weather conditions, including sunny, cloudy, and rainy days, were recorded for the past year.” This information helps the LLM establish a basic understanding of the data and clarify its physical meaning and practical application scenarios.
Mission Directive: The mission directive specifies the objectives and requirements of the LLM in the current mission. For example, “Please predict the change in PV power over the next 48 h based on historical power generation data and weather conditions over the past week.” This enables the adaptation of the LLM to different downstream tasks and ensures that the model is tuned for specific forecasting needs.
Statistical description includes trends, cyclical characteristics, fluctuation ranges, and time delays of the data, for example, enter, “Daily generation power shows significant daily cyclical variations, with higher power in the morning and afternoon and lower power in the midday and evening. Power generation over the past month shows significant weekend fluctuations, with higher power on weekdays and lower on weekends.” The textualization of PV power data based on the characteristics of the data helped the LLM better understand and process the data.

The cue prefixes were combined with a reprogrammed vector

O^{(i)} \in R^{P \times D}

to enrich the input PV power sequence information, which was fed into the LLM for prediction to facilitate pattern recognition and inference and reduce prediction errors.

Figure 4. Prompt prefix optimization example diagram.

3.3. Data Reconstruction Module

3.3.1. Time-Series Patching

The process of vectorizing time-series data is crucial and directly affects the model’s prediction accuracy. This process involves two main steps: normalization and patching.

First, to eliminate distributional shifts in PV power data caused by factors like sudden weather changes or seasonal alternation, each input channel is individually normalized using Reversible Instance Normalization (RevIN). This ensures that the data for each channel has a zero mean and unit standard deviation, thereby improving the model’s robustness.

After normalization, a sliding window technique is used to partition the time-series data. For an input series of length

L_{i n}

, we define a patch length

L_{p}

and a stride S. In our experiments, we set

L_{p} = 32

and S = 16. This partitions the series into

N_{p} = [(L_{i n} - L_{p}) / S] + 1

overlapping patches. This overlapping strategy ensures that local temporal context is captured robustly while transforming the series into a sequence of tokens suitable for the Transformer architecture. This structured patching enhances the model’s ability to capture short-term fluctuations and improves the parallelization efficiency of the model [27].

3.3.2. Time-Series Data Reprogramming

The essence of time-series data reconstruction is to map low-dimensional time-series slice data to the high-dimensional latent features of LLMs, thereby aligning them with the prompt prefix embedding results and serving as inputs for language models. This resolves the conflict of heteromorphic data between numerical and text modalities by converting it into a language that LLMs can understand for prediction purposes [28].

The reprogramming steps are shown in Figure 5. The segmented data are mapped to the latent space dimension of the LLMs through a learnable linear projection layer. Each data segment identifies its corresponding mapping from the text set by employing the multi-head self-attention mechanism. Pre-trained word vectors are incorporated to enhance the semantic comprehension of time-series patterns. The interaction between time series features and text semantics is facilitated by a multi-head cross-attention mechanism. This process establishes associations between the key features of time-series embeddings and text semantics (e.g., “fluctuations” and “peak”), thereby activating the cross-modal reasoning capability of LLMs.

First, each numerical patch

p_{j} \in R^{L_{p}}

is projected into an initial embedding

E_{j} \in R^{d_{model}}

using a learnable linear layer:

E_{j} = p_{j} W_{E} + b_{E}

(3)

where

W_{E} \in R^{L_{p} \times d_{model}}

is a learnable projection matrix and

b_{E} \in R^{d_{m o d e l}}

is a bias vector. This step produces a sequence of patch embeddings,

E_{p a t c h} = [E_{1}, E_{2}, \dots, E_{N_{p}}]

.

Next, to align these numerical embeddings with the LLM’s semantic space, a multi-head cross-attention mechanism is employed. This mechanism uses the patch embeddings to query a set of learnable textual prototypes,

T_{p r o t o} \in R^{N_{p r o t o} \times d_{m o d e l}}

, which act as anchors in the LLM’s semantic space. The Query (Q), Key (K), and Value (V) matrices are defined as:

Q = E_{p a t c h} W_{Q} K = T_{p r o t o} W_{K} V = T_{p r o t o} W_{V}

(4)

where

W_{Q}

,

W_{K}

, and

W_{V}

are learnable weight matrices. The cross-attention output for a single head is then calculated as:

Z_{k}^{(i)} = A T T E N T I O N (Q_{k}^{i}, K_{k}^{i}, V_{k}^{i}) = S O F T M A X (\frac{Q_{k}^{(i)} K_{k}^{(i) T}}{\sqrt{d_{k}}}) V_{k}^{(i)}

(5)

The sum of each head

Z_{k}^{(i)} \in R^{P \times d}

is

Z^{(i)} \in R^{P \times d_{m}}

, The outputs from all attention heads are concatenated and linearly projected to produce the final reprogrammed vector

O^{(i)} \in R^{P \times D}

.This process effectively translates the temporal patterns contained in the numerical patches into a semantically rich representation that the frozen LLM can interpret, thereby bridging the modality gap and enabling accurate forecasting.

3.4. Output Projection

The prediction results of the LLM are usually in vector form and must be converted to the required data format for the prediction task. The output mapping module converts the frozen LLM prediction results into the expected output by flattening and processing them with linear projection. Subsequently, they were converted into a standard prediction data format, and the prediction effect was evaluated using model accuracy evaluation indicators. The detailed steps are as follows, and the process is shown in Figure 6.

(1): Flattening: This process involves compressing a multidimensional data structure into a one-dimensional structure to simplify processing by subsequent fully connected or other layers. For distributed photovoltaic power prediction tasks, flattening helps to convert complex prediction results into a more manageable form. When the prediction task includes multiple time steps and features, the results must be converted to standard prediction data. If the input data X dimension is (T,F) with indicating T the number of time steps and indicating $F$ the number of features for each step, $n$ standard prediction data points need to be output. The flattening process is as follows:

$X_{f l a t t e r n} = F l a t t e r n {X} \Rightarrow X_{f l a t t e r n} \in R^{T \times F}$

(6)

where $X_{flattern}$ is the data after flattening, and is it a one-dimensional array.
(2): Linear projection processing: Linear projection uses a weight matrix $W$ and a bias vector $b$ to map the flattened data to the $X_{f l a t t e r n}$ target feature space through linear transformation, thereby outputting the final predicted result, Y* Linear projection processing is as follows.

$Y^{*} = X_{f l a t t e r n} W + b \Rightarrow Y^{*} \in R$

(7)

where $W$ is the weight matrix, ( $T \times F$ ) is the dimension, and $b$ is the bias vector that adjusts the output value.
(3): Output result mapping: Through flattening and linear projection processing, complex prediction results are converted into a standardized output format that is easy to process. The output result is $Y^{*} = \{y 1, y 2, \dots \dots \dots yn\}$ , which represents the model’s prediction of the future T-step photovoltaic power generation.

Figure 6. Schematic diagram of the output mapping module.

3.5. Parameter-Efficient Fine-Tuning Strategy

A pivotal aspect of the Solar-LLM framework is its parameter-efficient fine-tuning (PEFT) strategy. This approach is designed to adapt the pre-trained LLM to the specialized task of PV power forecasting with minimal computational overhead and without requiring extensive datasets, which is particularly crucial for distributed PV scenarios with limited data. The core principle is to maintain the pre-trained knowledge within the LLM by keeping the vast majority of its parameters frozen, while enabling adaptation through a small set of trainable parameters in the task-specific components we introduced. The specific allocation of frozen and trainable parameters across different modules is systematically summarized in Table 3.

In summary, this PEFT strategy achieves an optimal balance between accuracy and efficiency through a modular design. As shown in Figure 3, the frozen pre-trained LLM core serves as a stable backbone, preserving general sequence capabilities, while lightweight adapters, such as the data reconstruction module, enable precise modality alignment via cross-attention mechanisms. By updating a very small fraction of total parameters, this approach significantly reduces the risk of overfitting on limited PV data, accelerates the training process, and is fundamental to the model’s demonstrated capabilities in few-shot learning and cross-scenario generalization, as evidenced by the experimental results in Section 4.

4. Results and Discussion

4.1. Experimental Setup and Operating Environment

We conducted a case study using a publicly available dataset comprising five photovoltaic power stations in Hebei Province, China (approximately 36° N–39° N latitude and 113° E–118° E longitude) [29]. The rated capacity of these stations is approximately from 6600 MW to 20,000 MW. This dataset comprises 184 days of photovoltaic power station data collected at 15 min intervals from 1 July to 30 November 2018. To ensure sufficient model training and adhere to the temporal dependency requirements of time-series forecasting, the original dataset was partitioned into training, validation, and test sets in a 3:1:1 ratio [30]. Each sample used 24 h of time-series data (i.e., 96 steps) as input to predict the photovoltaic output over the subsequent four hours (i.e., 16 steps). This partitioning strategy preserves temporal continuity and implements stratified sampling across the five photovoltaic plants, ensuring equal representation of each plant in all subsets. Since photovoltaic power generation is significantly affected by weather conditions, we conducted experiments under three weather states: sunny, cloudy, and rainy, to accurately assess the predictive accuracy of our models. Due to the lack of explicit weather labels in the dataset, and drawing on studies of the linear relationship between PV peak power and irradiance as well as empirical data on power attenuation under different weather conditions, we employed a data-driven classification method based on the maximum peak power (P_peak) observed within a day relative to a sunny-day reference value [31,32,33]. Specifically, we first determine the historical maximum peak power (P_max) for each photovoltaic plant, which is derived from the highest recorded power generation under clear-sky conditions in the current dataset. Then, based on the relative magnitude of Ppeak to Pmax, we classified each day’s weather according to the following criteria:

Sunny Day: P_peak > 0.9 × P_max

Cloudy Day: 0.6 × P_max ≤ P_peak ≤ 0.9 × P_max

Rainy Day: P_peak < 0.6 × P_max

It should be noted that the maximum power output is related to the highest solar irradiance and the system efficiency under ideal conditions on that day. Therefore, indirectly inferring the day’s irradiance conditions through the highest power output achievable during the day effectively reflects the key meteorological features that dominate the photovoltaic output for that day.

The experiment was conducted using the Ubuntu 18.04 operating system and PyTorch 2.3 deep learning framework. The experiments were performed using an NVIDIA RTX 3080 GPU. The various methods are described below.

4.2. Parameter Design

In this paper, the performance of the proposed model is evaluated through designed experiments. We compare this model with a variety of classical time series prediction techniques, including Temporal Convolutional Network (TCN), CNN-BiLSTM, Informer, and GCN-LSTM. TCN can effectively learn the local dependencies of time series data by leveraging the receptive field of convolution kernels [34]. CNN-BiLSTM combines a convolutional neural network (CNN) and bidirectional long short-term memory networks (BiLSTMs) for time series prediction, enabling it to deeply capture data features and temporal dependencies [35]. As an improved model based on the self-attention mechanism, Informer has been optimized for the processing of long time series data, thereby achieving higher computational efficiency and better prediction performance [36]. GCN-LSTM integrates the advantages of graph convolutional network (GCN) and long short-term memory (LSTM) networks, and can simultaneously explore the spatial correlations and temporal dynamic patterns in the data. Specifically, the adjacency matrix of the GCN was generated by calculating the Euclidean distance based on the geographical locations of the photovoltaic power plants. Power plants with distances less than the threshold were selected as adjacent nodes, and an association matrix was constructed [37].

To ensure the fairness and rigor of the evaluation, all comparison models are trained on the same test set to optimize performance, and their performance on the specific task is refined through precise hyperparameter tuning. The detailed parameter design is shown in Table 4.

4.3. Evaluation Metrics

The maximum number of training iterations for all methods is set to 50. If no performance improvement is observed on the validation set for 10 consecutive iterations, training will be terminated early. To evaluate the model’s prediction performance, three error metrics commonly used in the literature are employed: mean absolute error (MAE), root mean squared error (RMSE), and the coefficient of determination R² [38,39,40].

\begin{array}{l} RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} \sum_{t}^{T_{p r e d}} {(y_{i, t} - {\hat{y}}_{i, t})}^{2}} \\ MAE = \frac{1}{N} \sum_{i = 1}^{N} \sum_{t}^{T_{p r e d}} |y_{i, t} - {\hat{y}}_{i, t}| \\ R^{2} = 1 - \frac{\sum_{i = 1}^{N} \sum_{t}^{T_{p r e d}} {(y_{i, t} - {\hat{y}}_{i, t})}^{2}}{\sum_{i = 1}^{N} \sum_{t}^{T_{p r e d}} {(y_{i, t} - \bar{y})}^{2}} \end{array}

(8)

where

N

is the number of test samples,

T_{p r e d}

is the number of prediction periods,

y_{i, t}

is the actual value of sample

i

at time

t

,

{\hat{y}}_{i, t}

is the predicted value of sample

i

at time

t

, and

\bar{y}

is the sample mean.

4.4. Performance Comparison

We verified the predictive performance of each model using a test set. Table 3 shows the predictive performance for the three weather conditions (clear, cloudy, and rainy) in the test set. Each sample considers the prediction results for the next 15 min, 1 h, and 4 h (i.e., prediction time periods

T_{p r e d}

= 1, 4, and 16). The true

y_{i, t}

and predicted

{\hat{y}}_{i, t}

values used in the calculation were both normalized based on the installed photovoltaic capacity. The best results for each method are bolded.

As shown in Table 5, the Solar-LLM proposed in this study outperforms other methods in terms of various performance metrics under different weather conditions and prediction time periods. This demonstrates the superiority of the model. Solar-LLM maintained good prediction accuracy under extreme weather conditions, such as cloudy and rainy days, demonstrating its robustness and versatility. Owing to the rich prior knowledge acquired during pre-training and the LLM’s advanced reasoning capabilities, the proposed model demonstrates a deep understanding of photovoltaic forecasting tasks. Unlike traditional deep learning models, their performance is not constrained by network structure limitations. Solar-LLM leverages textual prompts and the statistical features of input sequences to adaptively identify current meteorological conditions and infer trends in photovoltaic power output more accurately for subsequent time periods. Thus, methods based on LLMs can consistently perform well across various forecasting horizons.

Figure 7 illustrates the distribution of absolute prediction errors for each method when predicting the output for the next 15 min in the test set. As shown, the proposed method generally has lower prediction errors, smaller error variances, and fewer extreme error cases (represented by the circles), demonstrating its accuracy and robustness.

4.5. Small Sample Learning Ability Analysis

Owing to their extensive prior knowledge, pretrained large models typically demonstrate significant few-shot learning capabilities. This means that they can adapt effectively to specific prediction tasks with a small amount of training data, avoiding excessive reliance on volume. To validate the generalization and small-sample learning capabilities of the proposed model, we evaluated the predictive performance of each method on the test set using different training data volumes (25%, 50%, and 100%). Table 6 shows the mean absolute error (MAE) of each method in predicting the photovoltaic output for the next 15 min on the test set under different training data volumes.

As Table 6 shown, the Solar-LLM proposed in this study outperforms the other comparison methods in terms of accuracy across different training data sizes. In addition, the proposed method demonstrated the strongest small-sample learning capability, indicating that the predictive performance of the model trained with 25% of the training data was only slightly different from that of the model trained with 100% of the training data. In contrast, other methods are highly sensitive to the amount of training data and perform poorly with limited data. This highlights the fundamental advantage of our LLM-based approach: its extensive pre-training provides a robust foundation of generalized knowledge, significantly reducing the need for large task-specific datasets. In contrast, owing to the rich prior knowledge and strong reasoning capabilities of LLMs, the proposed LLM-based method in this study exhibits a test performance that is not significantly constrained by data volume, demonstrating strong practical value.

Figure 8 further illustrates the impact of the training data volume on the predictive performance of the different methods using three typical days from the test set as examples. As illustrated in the accompanying figure, the Solar-LLM proposed in this study exhibited highly stable performance across a range of training data volumes, achieving relatively accurate predictions even with as little as 25% of the training data. This is particularly significant in applications with limited sample sizes. Conversely, methods such as Informer encounter challenges in fully extracting potential features from time-series data when the sample size is limited, leading to suboptimal performance with 25% of the training data. However, as the data volume increases, there is a concomitant and gradual improvement in training performance. This finding underscores the necessity of deep learning models tailored for specific tasks to meet certain sample size requirements, as inadequate sample sizes can impede the attainment of optimal performance. Conversely, methods based on LLMs exhibit inherent reasoning capabilities and consistently demonstrate efficacy in small-sample training tasks.

4.6. Ablation Study

To validate the contributions of the key components within our proposed reprogramming framework, we conducted a series of ablation studies. Our architectural choice is fundamentally centered on the parameter-efficient paradigm of leveraging a frozen, pre-trained LLM. Therefore, these experiments are designed to isolate the impact of our novel adaptation modules—the prompt prefix and the data reconstruction layer—rather than comparing this paradigm to the alternative approach of full model fine-tuning. To quantitatively assess the contribution of each module, we calculate the relative improvement (

R I

) in MAE compared to the baseline model (without the module) as follows:

R I = \frac{{M A E}_{b a s e} - {M A E}_{v a r i a n t}}{{M A E}_{b a s e}} \times 100 %

(9)

Table 7 presents the model’s prediction performance with and without prompt words. The performance without prompt words can be obtained by setting the prompt words to an empty string. As shown in Table 7, prompt words play a role in improving prediction task performance. The task description in the prompt words helps language models understand the prediction task, and the statistical information in the prompt words promotes reasoning and calculation.

Table 8 shows the prediction performance with and without data reconstruction (patch reprogramming). In the control example, the data reconstruction layer is replaced by a linear mapping network with input dimension

p a t c h_l e n

and output dimension

d_{l l m}

. The results indicate that data reconstruction improves prediction performance. This improvement occurs because the data reconstruction module applies a cross-attention mechanism to the temporal data and the vocabulary embeddings from the LLM’s pretraining, effectively translating the temporal data. This allows the LLM to better understand the meaning of the data and extract its features, thereby improving prediction accuracy.

Figure 9 systematically presents the performance comparison of various model variants under different weather conditions in the ablation study. Specifically, the full model demonstrates optimal and most stable performance across all test scenarios, with significantly lower prediction errors than the variant models, while consistently maintaining high R² values, thereby validating the effectiveness of the model architecture design. The removal of any module leads to performance degradation, particularly under rainy conditions with complex meteorological factors and longer prediction horizons.

4.7. Comparison with Traditional Time-Series Forecasting Models

To further validate the effectiveness of the proposed Solar-LLM framework, we conducted a comparative analysis of several traditional time-series forecasting methods, including back-propagation neural network (BP), extreme learning machine (ELM), and support vector regression (SVR). The experimental settings for the traditional methods were kept consistent with those of the deep learning model described in Section 4.2. Specifically, we used the same dataset from five photovoltaic plants in Hebei Province, China, with identical training, validation, and test splits to ensure optimal performance for each method. The same evaluation metrics (MAE, RMSE, and R²) were used to provide a consistent basis for comparison. Table 9 shows the predictive performance of Solar-LLM, BP, ELM, and SVR across different forecast horizons.

The results presented in Table 9 clearly demonstrate the superior performance of the proposed Solar-LLM framework compared to traditional time series forecasting methods across all prediction horizons. This highlights the effectiveness of leveraging a pre-trained LLM for ultra-short-term photovoltaic power forecasting. While traditional methods are effective in some time series forecasting tasks, they often struggle to capture the complex, non-linear relationships and long-term dependencies present in photovoltaic power generation data.

4.8. Discussion on Solar-LLM Generalizability

A key objective of this study is to evaluate the model’s generalizability and robustness, particularly its ability to perform reliably across different climatic seasons.

The initial validation relied on operational data from July to November 2018, covering summer and autumn. To rigorously test the model under low-irradiance conditions, we conducted supplementary experiments using a dataset from the same photovoltaic plants but collected during the winter (December 2018 to February 2019). This period includes more complex weather scenarios, such as low-irradiance sun, persistent cloud cover, and snow, providing a robust testbed for seasonal generalization.

The 4 h forecast performance during the winter period is presented in Table 10. Analysis of these results yields two key findings. First, as anticipated, the more challenging winter conditions led to a marginal decline in overall predictive accuracy for all models compared to the summer-autumn performance in Table 4. Second, despite this slight, physically explainable performance dip, Solar-LLM demonstrated superior robustness. Its 4 h MAE on sunny winter days (0.079) was only marginally higher than in summer (0.074) and remained significantly lower than all baseline models (e.g., 0.098 for Informer). This trend holds for cloudy and rainy/snowy conditions, confirming that Solar-LLM’s performance advantage is consistently maintained across different seasonal distributions.

The effectiveness of this generalization stems from the Solar-LLM architecture. The framework leverages a frozen, pre-trained LLM core, which serves as a universal sequence processor. This provides a stable foundation of sequential inductive biases that are not overfitted to a specific season. Adaptability is achieved through parameter-efficient fine-tuning of specific modules, most notably the Prompt-as-Prefix (PaP) module. The PaP module allows the model to be dynamically conditioned on domain-specific metadata, such as seasonal indicators (e.g., ‘winter’) or specific weather characteristics (‘snowy’). This guides the LLM’s inference process to adapt to new data distributions without requiring extensive retraining, providing a pragmatic and efficient pathway for broader deployment.

5. Conclusions

This paper addresses the issue of insufficient prediction accuracy in ultra-short-term power forecasting for distributed photovoltaic systems. The proposed method, dubbed Solar-LLM, is designed for the prediction of power output in ultra-short-term scenarios for distributed photovoltaic systems. The method utilizes a process of reprogramming the input and output layers of LLMs. It employs Transformer modules to align language sequences with power generation data sequences. The method then performs time series forecasting based on this alignment. This approach facilitates precise prediction of distributed photovoltaic power generation in scenarios where historical data is limited. The following conclusions are derived from the experiments presented in this paper:

(1): The Solar-LLM demonstrates a notable reduction in MAE, RMSE, and R² metrics when compared to traditional TCN, CNN-BiLSTM, Informer, and GCN-LSTM models, a result of its meticulously designed input and output layers. Specifically, in 15 min clear-sky forecasting, Solar-LLM achieves a 43.6% reduction in MAE and a 43.9% reduction in RMSE compared to TCN. Compared to Informer, the MAE is reduced by 39.7% and the RMSE by 48.3%. Similarly, in 4 h rainy-day forecasting, Solar-LLM shows a 13.5% reduction in MAE and a 0.7% reduction in RMSE compared to CNN-BiLSTM. Compared to Informer, the MAE is reduced by 6.3% and the RMSE by 9.0%. These figures indicate the model’s high proficiency in PV forecasting tasks and its consistently high performance across various forecasting time horizons. Additionally, it exhibits strong small-sample learning capabilities. The model’s broad applicability and superiority in photovoltaic forecasting tasks have been validated by relevant experiments.
(2): It has been demonstrated that the prompt prefix module of the Solar-LLM plays a certain role in improving the performance of the prediction task. The task description in the prompt encompasses not only the LLM’s comprehension of the prediction task but also incorporates statistical data concerning the input data, thereby facilitating the LLM’s reasoning and computational processes. The paper designed relevant ablation experiments, and the results showed that the use of prompts resulted in varying degrees of reduction in MAE values across all weather types and prediction periods, with reductions ranging from 3.03% to 15.91%. This further validates the effectiveness of the prompt prefix module in the Solar-LLM.
(3): The data reconstruction module in the Solar-LLM has been demonstrated to enhance prediction performance. The cross-attention mechanism enables the data reconstruction module to effectively integrate time-series data with the pre-trained word embeddings of the LLM. This integration enables the model to gain a deeper understanding of data meaning and extract data features. The experimental results of the paper indicate that data reconstruction reduces MAE across all weather types and forecast horizons, with relative improvements in a single experiment ranging from approximately 1.54% to 15.09%, and an average relative improvement across nine experimental groups of about 7.7%. These quantitative results further validate that the integration of spatiotemporal features not only enhances the model’s understanding and feature representation of the data but also further improves prediction accuracy.
(4): Experiments indicate that Solar-LLM has demonstrated excellent performance in ultra-short-term PV output forecasting, with high reliability, rapid adaptability, and controllable fine-tuning capabilities. This enables timely and accurate predictions of PV output fluctuations, thereby enhancing grid robustness and optimizing the dispatch strategy for distributed PV generation. By providing reliable forecasts, Solar-LLM improves the efficiency of electricity market operations and reduces reliance on traditional peaking resources. The practical significance for grid dispatch is substantial. Solar-LLM’s forecasting capability enables operators to optimize resource allocation and maintain grid stability amid renewable energy fluctuations. Future development will focus on integrating it into broader grid control architectures, exploring its potential in microgrid management and energy storage optimization, thereby contributing to a more resilient and sustainable energy future.

Author Contributions

Conceptualization, C.L. and H.F.; Methodology, Z.Z. and W.R.; Software, Z.Z.; Validation, Z.Z. and W.R.; Formal analysis, C.L., Z.Z., L.Y. and Y.Y.; Investigation, C.L. and L.Y.; Resources, H.F., M.F., W.R. and D.L.; Data curation, C.L., M.F. and L.Y.; Writing—original draft, C.L.; Writing—review & editing, H.F., Z.Z., M.F., L.Y., Y.Y. and D.L.; Visualization, C.L., Z.Z. and Y.Y.; Supervision, H.F., W.R. and D.L.; Project administration, H.F.; Funding acquisition, H.F. and D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is mainly supported by Project Supported by Science and Technology Project of SGCC “Design and Key Technology Research on Distributed Trading Mechanism of County Active Distribution Network for Energy Autonomous Microgrid (Group) Large scale Integration” (5400-202357828A-4-1-KJ).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Chen Lv and Menghua Fan were employed by the company State Grid Energy Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Sun, R.; Wang, L.; Wang, Y.; Ding, R.; Xu, H.; Wang, J.; Li, Q. Ultra-short-term Forecasting of Photovoltaic Power Generation Output Based on Digital Twin. Power Syst. Technol. 2021, 45, 1258–1264. [Google Scholar]
Wang, B.; Lyu, Y.; Chen, Z.; Zhao, Q.; Zhang, Z.; Tian, J. Mechanistic-data Hybrid-driven Short-term Power Forecasting of Distributed Photovoltaic Considering Information Time Delay. Autom. Electr. Power Syst. 2022, 46, 67–74. [Google Scholar]
Wang, X.; Ai, X.; Wang, T. Short-term Prediction of Photovoltaic Power with Small Samples Based on Instance Transfer Learning. Acta Energiae Solaris Sin. 2024, 45, 325–333. [Google Scholar] [CrossRef]
Golestaneh, F.; Pinson, P.; Gooi, H.B. Very short-term nonparametric probabilistic forecasting of renewable energy generation—With application to solar energy. IEEE Trans. Power Syst. 2016, 31, 3850–3863. [Google Scholar] [CrossRef]
Li, F.; Wang, L.; Zhao, J.; Zhang, J.; Zhang, S.; Tian, Y. A Short-Term Power Forecasting Method for Distributed Photovoltaic Based on Weather Fusion and LSTM Network. Electr. Power 2022, 55, 149–154. [Google Scholar]
Wang, H.; Ju, R.; Dong, Y. Interval Forecasting of Distributed Photovoltaic Power Based on Spatiotemporal Correlation Characteristics and B-LSTM Model. Electr. Power 2024, 57, 74–80. [Google Scholar]
Zhang, K.; Ma, L.; Zhang, T.; Zhong, H.; Tan, W.; Wei, Y.; Lin, Z. A Spatiotemporal Collaborative Probabilistic Forecasting Method for Distributed Photovoltaic Output Based on iDGA-LSTM. Autom. Electr. Power Syst. 2025, 49, 128–138. [Google Scholar]
Khan, W.; Walker, S.; Zeiler, W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
Ye, L.; Chen, Z.; Zhao, Y.; Zhu, Q.W. A Photovoltaic Power Generation Output Forecasting Model Based on Genetic Algorithm-Fuzzy Radial Basis Function Neural Network. Autom. Electr. Power Syst. 2015, 39, 16–22. [Google Scholar]
Nastić, F.; Jurišević, N.; Nikolić, D.; Končalović, D. Harnessing open data for hourly power generation forecasting in newly commissioned photovoltaic power plants. Energy Sustain. Dev. 2024, 81, 101512. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, H.; Li, C.; Huang, X.; Wang, J.; Long, M. Timer: Generative pre-trained transformers are large time series models. arXiv 2024, arXiv:2402.02368. [Google Scholar]
Das, A.; Kong, W.; Sen, R.; Zhou, Y. A decoder-only foundation model for time-series forecasting. arXiv 2024, arXiv:2310.10688v4. [Google Scholar]
Wu, T.; Ling, Q. STELLM: Spatio-temporal enhanced pre-trained large language model for wind speed forecasting. Appl. Energy 2024, 375, 124034. [Google Scholar] [CrossRef]
Rasul, K.; Ashok, A.; Williams, A.R.; Khorasani, A.; Adamopoulos, G.; Bhagwatkar, R.; Biloš, M.; Ghonia, H.; Hassen, N.; Schneider, A. Lag-llama: Towards foundation models for time series forecasting. R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models. arXiv 2023, arXiv:2310.08278. [Google Scholar]
Ma, Z.; Mei, G. A hybrid attention-based deep learning approach for wind power prediction. Appl. Energy 2022, 323, 119608. [Google Scholar] [CrossRef]
Zheng, R.; Li, G.; Han, B.; Wang, K.; Peng, D. Day-ahead Power Forecasting of Distributed Photovoltaic Power Generation Based on Weighted Extended Daily Feature Matrix. Electr. Power Autom. Equip. 2022, 42, 99–105. [Google Scholar]
Liu, Y.J.; Chen, Y.L.; Liu, J.Y.; Zhang, X.M.; Wu, X.Y.; Kong, W.Z. Distributed photovoltaic power generation day-ahead prediction based on ensemble learning. China Electr. Power 2022, 55, 38–45. [Google Scholar]
Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Yue, Z.; Ye, X.; Liu, R. A Review of Pre-training Technologies Based on Language Models. J. Chin. Inf. Process. 2021, 35, 15–29. [Google Scholar]
Cai, R.; Ge, J.; Sun, Z.; Hu, B.; Xu, Y.; Sun, Z. Review of the Development of AI Pre-trained Large-scale Models. J. Chin. Comput. Syst. 1–12. Available online: http://kns.cnki.net/kcms/detail/21.1106.tp.20240510.1900.010.html (accessed on 21 August 2024).
Sun, K.; Luo, X.; Luo, Y. Review on the Applications of Pre-trained Language Models. Comput. Sci. 2023, 50, 176–184. [Google Scholar]
Zhang, H.; Yan, J.; Liu, Y.; Gao, Y.; Han, S.; Li, L. Multi-source and temporal attention network for probabilistic wind power prediction. IEEE Trans. Sustain. Energy 2021, 12, 2205–2218. [Google Scholar] [CrossRef]
Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.-Y.; Liang, Y.; Li, Y.-F.; Pan, S.; Wen, Q. Time-llm: Time series forecasting by reprogramming large language models. arXiv 2023, arXiv:2310.01728. [Google Scholar]
Wu, H.; Meng, K.; Fan, D.; Zhang, Z.; Liu, Q. Multistep short-term wind speed forecasting using transformer. Energy 2022, 261, 125231. [Google Scholar] [CrossRef]
Liu, X.; Zhou, J. Short-term wind power forecasting based on multivariate/multi-step LSTM with temporal feature attention mechanism. Appl. Soft Comput. 2024, 150, 111050. [Google Scholar] [CrossRef]
Xue, H.; Salim, F.D. Utilizing language models for energy load forecasting. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul, Turkey, 15–16 November 2023; pp. 224–227. [Google Scholar]
Ilharco, G.; Wortsman, M.; Gadre, S.Y.; Song, S.; Hajishirzi, H.; Kornblith, S.; Farhadi, A.; Schmidt, L. Patching open-vocabulary models by interpolating weights. Adv. Neural Inf. Process. Syst. 2022, 35, 29262–29277. [Google Scholar]
Asudani, D.S.; Nagwani, N.K.; Singh, P. Impact of word embedding models on text analytics in deep learning environment: A review. Artif. Intell. Rev. 2023, 56, 10345–10425. [Google Scholar] [CrossRef] [PubMed]
Yao, T.; Wang, J.; Wu, H.; Zhang, P.; Li, S.; Wang, Y.; Chi, X.; Shi, M. A photovoltaic power output dataset: Multi-source photovoltaic power output dataset with Python toolkit. Sol. Energy 2021, 230, 122–130. [Google Scholar] [CrossRef]
Tang, Y.G.; Yang, K.; Zhang, S.J.; Zhang, Z. Photovoltaic power forecasting: A hybrid deep learning model incorporating transfer learning strategy. Renew. Sustain. Energy Rev. 2022, 162, 112473. [Google Scholar] [CrossRef]
Zhao, H.; Zhu, D.; Yang, Y.; Li, Q.; Zhang, E. Study on photovoltaic power forecasting model based on peak sunshine hours and sunshine duration. Energy Sci. Eng. 2023, 11, 4570–4580. [Google Scholar] [CrossRef]
Wang, F.; Mi, Z.Q.; Zhen, Z.; Yang, G.; Zhou, H.M. A classification prediction method for photovoltaic power plant generation power based on weather state pattern recognition. Chin. J. Electr. Eng. 2013, 33, 75–82. [Google Scholar]
Yang, J.H.; Zhang, L.; Liang, F.Z.; Yang, Y.J.; Zhang, W. A Prediction Method for Surface Solar Radiation Based on Secondary Weather Classification. Distrib. Energy 2024, 9, 54–63. [Google Scholar]
Zhou, X.; Pang, C.; Zeng, X.; Jiang, L.; Chen, Y. A short-term power prediction method based on temporal convolutional network in virtual power plant photovoltaic system. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
Tang, C.; Zhang, Y.; Wu, F.; Tang, Z. An improved cnn-bilstm model for power load prediction in uncertain power systems. Energies 2024, 17, 2312. [Google Scholar] [CrossRef]
Phan, Q.T.; Wu, Y.K.; Phan, Q.D. An innovative hybrid model combining informer and K-Means clustering methods for invisible multisite solar power estimation. IET Renew. Power Gener. 2024, 18, 4318–4333. [Google Scholar] [CrossRef]
Chen, H.; Zhu, M.; Hu, X.; Wang, J.; Sun, Y.; Yang, J. Research on short-term load forecasting of new-type power system based on GCN-LSTM considering multiple influencing factors. Energy Rep. 2023, 9, 1022–1031. [Google Scholar] [CrossRef]
Vaz, A.G.R.; Elsinga, B.; van Sark, W.G.J.H.M.; Brito, M. An artificial neural network to assess the impact of neighbouring photovoltaic systems in power forecasting in Utrecht, the Netherlands. Renew. Energy 2016, 85, 631–641. [Google Scholar] [CrossRef]
Gensler, A.; Henze, J.; Sick, B.; Raabe, N. Deep Learning for solar power forecasting-An approach using AutoEncoder and LSTM Neural Networks. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 2858–2865. [Google Scholar]
Bouzerdoum, M.; Mellit, A.; Massi Pavan, A. A hybrid model (SARIMA-SVM) for short-term power forecasting of a small-scale grid-connected photovoltaic plant. Sol. Energy 2013, 98, 226–235. [Google Scholar] [CrossRef]

Figure 1. Transformer model mechanism.

Figure 3. The overall architecture of the Solar-LLM.

Figure 5. Schematic diagram of the time series data patch reprogramming module.

Figure 7. Distribution of prediction errors for each method tested.

Figure 8. Different number of training prediction results.

Figure 9. Comparison results of indicators in ablation experiments.

Table 1. Contrasting our work with the literature.

Models	Reference	Architectural Paradigm	Training Requirement	Spatio-Temporal Correlation	Small Sample Learning	Long-Term Dependencies	Cross-Modal Fusion
Machine/Ensemble Learning Model	[1,4,8]	Classical Statistical/ML	Task-specific training	×	×	×	×
Deep Learning Models	[3]	Task-specific DL	Task-specific training	×	√	×	×
	[10]		Task-specific training	×	×	√	×
	[6,7]		Task-specific training	√	×	√	×
LLM-based Models	[11,12,14]	Native time-series foundation model	Large-scale pre-training	×	×	√	×
	[13]	Native time-series foundation model	Two-stage pre-training	√	×	√	×
	Our work	Reprogramming pre-trained LLM	Light weight fine-tuning	√	√	√	√

Table 2. Influencing factors of distributed photovoltaic prediction.

Influencing Factor	R_i	Influencing Factor	R_i
temp	0.536	distance to adjacent stations	0.712
humidity	−0.423	historical PV power	0.793
solar irradiance	0.921	historical power consumption	0.217
air velocity	0.161	inverter efficiency	0.255
air visibility	0.143	historical grid exchange power	0.402

Table 3. Specification of parameter freezing and fine-tuning in Solar-LLM.

Module	Parameter State	Key Components
Pre-trained LLM core	frozen	all Transformer parameters
Data reconstruction module	fine-tuned	$W_{E}, b_{E}, T_{p r o t o}$
Cross-Attention mechanism	fine-tuned	$W_{K}, W_{Q}, W_{V}$
Prompt embeddings	fine-tuned	vector representations of task context and statistical features
Output projection layer	fine-tuned	$W, b$

Table 4. The detailed parameters of various models.

Classification	Prediction Model	Parameters	Value/Type
Our work	Solar-LLM	Based-model	GPT-2
		Feature dimension	768
		Patch length	32
		Stride	16
		Hidden layer feature dimension	32
		Attention heads	8
Hybrid Models	CNN-BiLSTM	CNN kernel	8 × 1
	CNN-BiLSTM	BiLSTM hidden layer feature dimension	32
	GCN-LSTM	GCN/LSTM layer	2
		LSTM hidden layer feature dimension	32
		GCN node embedding dimension	16
CNN-based	TCN	Layer	2
		Hidden layer feature dimension	32
		Kernel size	16
Transformer-based	Informer	Encoder/Decoder structure	2
		Attention heads	8
		Attention network feature dimension	32
		Feed-forward network feature dimension	64

Table 5. The results of the basic experiments.

Weather Type	Prediction Model	15 min			1 h			4 h
Weather Type	Prediction Model	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
Sunny	Solar-LLM	0.044	0.060	0.964	0.056	0.079	0.938	0.074	0.110	0.877
	TCN	0.078	0.107	0.884	0.072	0.108	0.880	0.096	0.150	0.769
	CNN-BiLSTM	0.058	0.090	0.917	0.065	0.102	0.896	0.103	0.162	0.730
	Informer	0.073	0.116	0.865	0.067	0.109	0.879	0.081	0.132	0.820
	GCN-LSTM	0.059	0.097	0.903	0.061	0.099	0.902	0.093	0.143	0.787
Cloudy	Solar-LLM	0.048	0.080	0.899	0.064	0.109	0.814	0.087	0.142	0.687
	TCN	0.084	0.131	0.730	0.079	0.130	0.734	0.090	0.146	0.680
	CNN-BiLSTM	0.064	0.106	0.822	0.070	0.120	0.771	0.112	0.167	0.563
	Informer	0.086	0.148	0.660	0.081	0.140	0.691	0.089	0.151	0.658
	GCN-LSTM	0.066	0.113	0.800	0.068	0.121	0.769	0.108	0.169	0.565
Rainy	Solar-LLM	0.037	0.065	0.878	0.057	0.098	0.714	0.090	0.151	0.476
	TCN	0.083	0.133	0.464	0.082	0.136	0.451	0.096	0.156	0.438
	CNN-BiLSTM	0.060	0.105	0.659	0.068	0.119	0.576	0.104	0.152	0.406
	Informer	0.078	0.138	0.397	0.081	0.143	0.366	0.096	0.166	0.410

Table 6. Comparison of MAE performance with different training data sizes.

Weather Type	Prediction Model	Training Data Quantity
Weather Type	Prediction Model	25%	50%	100%
Sunny	Solar-LLM	0.048	0.047	0.044
	TCN	0.152	0.096	0.078
	CNN-BiLSTM	0.073	0.063	0.058
	Informer	0.140	0.090	0.073
	GCN-LSTM	0.081	0.072	0.059
Cloudy	Solar-LLM	0.051	0.049	0.048
	TCN	0.110	0.097	0.084
	CNN-BiLSTM	0.097	0.076	0.064
	Informer	0.129	0.100	0.086
	GCN-LSTM	0.086	0.074	0.066
Rainy	Solar-LLM	0.041	0.038	0.037
	TCN	0.119	0.100	0.083
	CNN-BiLSTM	0.102	0.073	0.060
	Informer	0.122	0.096	0.078
	GCN-LSTM	0.075	0.067	0.059

Table 7. Comparison of MAE w/o prompt.

Weather Type	Forecast Period	MAE (with Prompts)	MAE (Without Prompts)	RI (%)
Sunny	15 min	0.044	0.048	8.33%
	1 h	0.056	0.059	5.08%
	4 h	0.074	0.080	7.50%
Cloudy	15 min	0.048	0.052	7.69%
	1 h	0.064	0.066	3.03%
	4 h	0.087	0.097	10.31%
Rainy	15 min	0.037	0.044	15.91%
	1 h	0.057	0.065	12.31%
	4 h	0.090	0.101	10.89%

Table 8. Comparison of MAE w/o reprogramming.

Weather Type	Forecast Period	MAE (with Reprogramming)	MAE (Without Reprogramming)	RI (%)
Sunny	15 min	0.044	0.049	10.20%
	1 h	0.056	0.059	5.08%
	4 h	0.074	0.077	3.90%
Cloudy	15 min	0.048	0.054	11.11%
	1 h	0.064	0.065	1.54%
	4 h	0.087	0.092	5.43%
Rainy	15 min	0.037	0.043	13.95%
	1 h	0.057	0.059	3.39%
	4 h	0.090	0.106	15.09%

Table 9. Comparison with traditional time-series forecasting models.

Prediction Model	15 min			1 h			4 h
Prediction Model	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
Solar-LLM	0.045	0.068	0.946	0.059	0.091	0.901	0.087	0.135	0.781
BP	0.079	0.104	0.853	0.089	0.113	0.814	0.128	0.162	0.616
ELM	0.084	0.108	0.834	0.098	0.125	0.776	0.146	0.185	0.508
SVR	0.060	0.079	0.906	0.068	0.091	0.879	0.098	0.128	0.757

Table 10. Comparison of 4 h forecasting performance in winter scenarios.

Weather Type	Model	MAE	RMSE	R²
Sunny	Solar-LLM	0.079	0.118	0.864
	TCN	0.112	0.168	0.718
	CNN-BiLSTM	0.121	0.181	0.687
	Informer	0.098	0.151	0.776
	GCN-LSTM	0.11	0.163	0.742
Cloudy	Solar-LLM	0.094	0.153	0.662
	TCN	0.109	0.166	0.613
	CNN-BiLSTM	0.128	0.189	0.504
	Informer	0.105	0.171	0.608
	GCN-LSTM	0.123	0.185	0.521
Rainy/Snowy	Solar-LLM	0.101	0.168	0.453
	TCN	0.118	0.179	0.374
	CNN-BiLSTM	0.124	0.182	0.359
	Informer	0.113	0.189	0.368
	GCN-LSTM	0.109	0.174	0.402

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lv, C.; Fan, H.; Zhang, Z.; Fan, M.; Run, W.; Yang, L.; Yang, Y.; Liu, D. Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs. Electronics 2025, 14, 4519. https://doi.org/10.3390/electronics14224519

AMA Style

Lv C, Fan H, Zhang Z, Fan M, Run W, Yang L, Yang Y, Liu D. Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs. Electronics. 2025; 14(22):4519. https://doi.org/10.3390/electronics14224519

Chicago/Turabian Style

Lv, Chen, Hang Fan, Zuhan Zhang, Menghua Fan, Wencai Run, Liuqing Yang, Yuying Yang, and Dunnan Liu. 2025. "Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs" Electronics 14, no. 22: 4519. https://doi.org/10.3390/electronics14224519

APA Style

Lv, C., Fan, H., Zhang, Z., Fan, M., Run, W., Yang, L., Yang, Y., & Liu, D. (2025). Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs. Electronics, 14(22), 4519. https://doi.org/10.3390/electronics14224519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Short-Term Power Prediction for Distributed Photovoltaics Based on Time-Series LLMs

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Formulation and Feature Analysis

2.2. LLM Prediction Methodology

2.2.1. Transformer Model

2.2.2. Pre-Training and Fine-Tuning Paradigm

2.2.3. Adapting LLMs for Time-Series Forecasting

3. The Proposed Solar-LLM Framework

3.1. Overall Structure

3.2. Prompt-as-Prefix Module

3.3. Data Reconstruction Module

3.3.1. Time-Series Patching

3.3.2. Time-Series Data Reprogramming

3.4. Output Projection

3.5. Parameter-Efficient Fine-Tuning Strategy

4. Results and Discussion

4.1. Experimental Setup and Operating Environment

4.2. Parameter Design

4.3. Evaluation Metrics

4.4. Performance Comparison

4.5. Small Sample Learning Ability Analysis

4.6. Ablation Study

4.7. Comparison with Traditional Time-Series Forecasting Models

4.8. Discussion on Solar-LLM Generalizability

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI