Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings

Dimitrova-Angelova, Doroteya; González-González, Juan F.; Carmona-Fernández, Diego; Jaramillo-Morán, Miguel A.

doi:10.3390/app15168883

Open AccessArticle

Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings

by

Doroteya Dimitrova-Angelova

,

Juan F. González-González

,

Diego Carmona-Fernández

and

Miguel A. Jaramillo-Morán

^*

School of Industrial Engineering, University of Extremadura, Avda. de Elvas s/n, 06006 Badajoz, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 8883; https://doi.org/10.3390/app15168883

Submission received: 23 July 2025 / Revised: 9 August 2025 / Accepted: 10 August 2025 / Published: 12 August 2025

(This article belongs to the Special Issue Applications in Neural and Symbolic Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Efficiently controlling and managing photovoltaic installations for self-consumption is key to making optimal use of these systems. The first step to achieving that control is modeling the generation of electricity by photovoltaic panels with digital twins. This paper presents an experimental evaluation of four photovoltaic power prediction models in a real-world environment using high-resolution, annual data from a university facility. A mathematical model and three neural network models (MLP, LSTM, and GRU) are compared under standardized metrics. The results demonstrate that the Multilayer Perceptron (MLP) achieves the highest overall accuracy and seasonal consistency. It outperforms the other three models, particularly in scenarios where the relationships between meteorological variables and power generation are predominantly static. However, the mathematical model maintains a competitive performance in conditions of low variability, such as in the summer and autumn. The seasonal analysis reveals the importance of selecting and adjusting models according to operational and climatic contexts. This study’s main contribution is identifying the MLP as the most efficient option for implementing digital twins in photovoltaic installation control. This facilitates real-time monitoring, optimization, and predictive maintenance and contributes to smart, sustainable energy management in buildings to align with climate neutrality guidelines.

Keywords:

digital twins; photovoltaic installations; neuronal networks; power generation prediction

1. Introduction

Reducing CO₂ emissions in construction is essential to achieve climate neutrality by 2050 and reach the EU target of a 55% reduction in emissions by 2030 [1]. Within this framework, buildings contribute about 36% of total greenhouse gas emissions and consume about 40% of energy in the EU, positioning the sector as a key player in energy efficiency and decarbonization strategies [2]. Moreover, the most recent European directive on energy efficiency stipulates that, from 2030, all new buildings must comply with the zero-emission standard, and that existing buildings must be gradually transformed to meet this requirement by 2050 [3].

The mass integration of photovoltaic (PV) systems for self-consumption in smart buildings can be key to moving towards sustainable and resilient energy models. These systems allow buildings to generate their own clean energy, as well as optimize consumption and reduce costs, while contributing to decarbonization and urban energy efficiency [4]. However, the inherent variability in solar radiation and local weather conditions poses significant challenges for the accurate prediction of PV production, a critical aspect for efficient demand management, grid stability, and optimization of distributed energy resources [5]. In this context, accurate forecasting of the power generated by PV systems for self-consumption is essential to anticipate the availability of renewable energy, facilitate operational planning, and improve the integration of these systems into the grid, thereby maximizing their benefits and minimizing the risks associated with the intermittency of solar generation [6].

Faced with these challenges, digital twins (DTs) applied to photovoltaic installations for self-consumption emerge as transformative solutions by creating dynamic virtual replicas that simulate physical installations in real time, allowing their continuous monitoring, operational optimization, and predictive maintenance [7]. However, the effectiveness of these twins depends critically on the selection of the underlying predictive model, which must balance accuracy, robustness, and computational efficiency under real operating conditions [8].

Digital twins are revolutionizing the management of intelligent buildings by enabling the continuous, real-time monitoring, simulation, and optimization of their systems and energy consumption [9]. This technology virtually replicates the operation of a building, integrating data from renewable sources, energy storage, and load management systems. This opens up new possibilities for achieving greater efficiency and sustainability [10]. DTs enable a proactive response to changes in demand, improving operational efficiency and reducing costs [11]. Regarding the use of predictive maintenance, implementing DTs enables the early detection of failures and anomalies in critical equipment, facilitating predictive maintenance strategies that reduce breakdowns and prolong the useful life of the systems [12]. DTs also allow for the advanced monitoring of a building’s services and interior environment. They adjust the air conditioning, lighting, and ventilation in real time to maximize comfort and energy efficiency [13]. DTs help determine when to store or release energy by considering electricity prices, demand curves, and renewable production. Thus, smart strategies can be designed to minimize consumption peaks and take advantage of both dynamic tariffs and distributed storage, contributing to more resilient and sustainable grids [14].

A traditional approach to the development of predictive tools in DTs has been the formulation of mathematical models capable of accurately describing the dynamics of the system. This approach is particularly suitable when a solid knowledge of the physical behavior of the components is available, allowing their fundamental relationships to be expressed by well-established equations and laws [15]. However, this method has certain limitations when applied to complex systems, especially those that are highly nonlinear or involve many interrelated factors. In such cases, achieving an accurate representation of reality is difficult [8]. In addition, the lack of sufficient information or reliable data can lead to inaccurate or even erroneous results, compromising the usefulness of these models [16]. Traditional mathematical approaches are generally designed for very specific cases, making them difficult to scale or adapt to larger systems or those requiring a high degree of customization [17]. To simplify handling, approximations are often incorporated that sacrifice important details of the actual behavior of the system, which can negatively affect the fidelity of the predictions obtained [18].

In response to these limitations, the advance of artificial intelligence, and particularly artificial neural networks, has opened up new modeling possibilities for digital twins. These networks have demonstrated an outstanding ability to model multifactorial dynamics, such as those present in energy, biomedical, and electronic systems, extending the range of technical and industrial applications beyond the typical uses of artificial intelligence [19]. Its main strength lies in the integration and processing of data in real time, which enables the digital twin to remain up to date and able to continuously adapt to new operating conditions, thereby increasing both the accuracy and robustness of predictions and diagnostics [20]. In addition, neural networks are used for predictive control and process optimization, overcoming the limitations of classical approaches through their ability to capture complex relationships between variables and learn from large volumes of historical and operational data [21]. The use of convolutional networks brings added value in image and signal analysis tasks, which favors pattern recognition, human–machine interaction, and advanced monitoring in fields such as manufacturing and robotics, thus consolidating neural networks as versatile and essential tools for the development of intelligent digital twins in complex industrial environments [22]. This versatility and explanatory power make neural networks an ideal tool for the development of DTs in environments where uncertainty, variability, or lack of information prevent the use of classical mathematical models alone [23].

Recently, hybrid frameworks combining physical models with neural networks have been developed to take advantage of the accuracy of physical laws and the flexibility of data learning. This improves the generalization capability and adaptability of DTs in complex systems [24]. For instance, in designing flat-plate solar collectors, where simplified, trial-and-error methods are traditionally used, physics-based neural networks have been suggested to predict the optimal design conditions in regions with unique environmental conditions, such as the highlands of Ecuador [25]. Evolutionary algorithms and language models have been employed to design and optimize hybrid DT architectures, improving their efficiency and applicability in scenarios with limited data [26].

Therefore, neural networks can be assumed as a suitable option that can model nonlinear relationships and complex temporal patterns in the data, surpassing traditional methods such as linear regression or physical models [27]. However, its optimal implementation requires critical evaluation of the operational context, the granularity of the available data, and specific computational constraints [28].

The use of neural networks as a basis for the development of digital twins in PV installations has enabled the accurate simulation and prediction of system behavior under a wide variety of operating conditions. For example, digital twins incorporating hybrid neural network architectures are able to simulate with high fidelity the characteristics of PV panels in changing contexts, providing versatile tools for energy monitoring and optimization [29]. Furthermore, the integration of recurrent neural networks has facilitated the real-time estimation of PV power generation, even when weather conditions are variable, which is essential for dynamic energy resource management [30]. Such forecasts should have a short time horizon, covering intervals of minutes or hours, and no more than a few days, in order to adapt production precisely to consumption needs at any given time [31]. Other variants, such as models based on Multilayer Perceptron (MLP) and Elman networks, allow realistic and reliable estimates of the power generated, adapting to the complex nature of the data collected [32]. The ability of neural networks to process large volumes of information in real time is especially leveraged in short-term forecasting applications, where the digital twin uses IoT (Internet of Things) data to anticipate power generation and adjust operational strategy almost instantaneously [33]. Additionally, advanced architectures such as the FFNN-LSTM have outperformed traditional physical models and other artificial intelligence techniques in accurately estimating PV power [34]. In the field of predictive maintenance, multi-twin digital twins supported by deep networks excel at diagnosing faults in strings of photovoltaic modules, achieving accuracy levels of over 98%, which translates into greater reliability and operational safety [35]. Finally, comparisons between digital twins of a physical nature and those driven by neural networks in the context of combined photovoltaic and battery systems show the advantages of the artificial intelligence-based approach, particularly in terms of adaptability, accuracy, and scalability in the face of the complexity of modern systems [36].

The scientific literature has explored a wide range of neural network models to improve the prediction accuracy of PV generation, addressing both the nonlinear nature and time dependence of the data, distinguishing between long-, medium-, short-, and very short-term predictions [37]. In [38], neural network models, in particular the Multilayer Perceptron (MLP), demonstrated a high predictive ability to estimate the power generated by a PV system under real conditions, reaching R² values higher than 0.93 and mean absolute errors (MAEs) lower than 0.08 in the experimental validation. The results presented in [39] showed that the MLP model achieved a very satisfactory performance in the very short-term (5 min) prediction of PV production, achieving accuracy comparable to recurrent architectures, but with significantly shorter training and inference times. This indicates that, in contexts where computational resources are limited or frequent model updating is required, the MLP represents a more efficient and practical alternative to more complex models such as LSTM (Long Short-Term Memory). These results show that simple structures, such as the MLP, can provide more accurate fits than other modern models and also reduce the prediction time in different fields, as some works have pointed out [40]. However, the results obtained in [41] show that the combination of LSTM with self-attention mechanisms and the integration of historical and forecast meteorological data increased the coefficient of determination (R²) by 26.4% with respect to the basic LSTM, thus achieving superior accuracy and adaptability in both short- and long-term forecast horizons. Therefore, it can be said that the LSTM model is designed to remember relevant information over long periods, which allows it to model the temporal evolution of PV power and anticipate changes due to variable meteorological conditions. The GRU (Gated Recurrent Unit) model, like the LSTM, is designed to learn patterns and relationships over time, which is essential to anticipate the evolution of PV power under changing weather conditions. In addition, it is simpler and faster to train than other recurrent models such as the LSTM; in fact, it is a simplification of the latter, which allows its use in real-time applications and with large volumes of data [42].

Neural networks, particularly the MLP, LSTM, and GRU models, are arguably driving a transformation in DTs for buildings. These models enable predictive monitoring and more efficient energy management. However, the full adoption of these technologies still faces technical hurdles, primarily regarding interoperability between Building Information Modeling (BIM) systems, the Internet of Things (IoT), and artificial intelligence (AI) algorithms. Adoption of these technologies remains challenging [43]. MLPs have been used for prediction and classification tasks, such as estimating CO₂ emissions or analyzing energy consumption patterns in DTs [44]. LSTMs tend to be more accurate for energy consumption prediction, though GRUs can match or even outperform LSTM models in certain cases, especially when greater computational efficiency is required [45]. In addition, GRUs have been proven to offer better results in occupancy and trajectory estimation within buildings. They achieve lower errors and have fewer parameters to adjust, which facilitates their training and scalability [46]. Integrating neural networks into DTs offers clear benefits, such as improving the visualization and understanding of critical information, automating monitoring, and enabling the implementation of data-driven management strategies, thereby increasing operational efficiency [47].

The main contribution and originality of this work lies in the realization of an exhaustive and homogeneous experimental comparison between a validated mathematical model and three neural network architectures (MLP, LSTM, and GRU) for the prediction of PV production in a real environment, using high-temporal-resolution data and complete annual coverage of the installation of the School of Industrial Engineering of the University of Extremadura. Unlike previous studies, which tend to focus on specific prediction horizons, limited datasets, or partial comparisons between models, this work objectively evaluates the accuracy, robustness, and seasonal adaptability of each approach under standardized metrics (MSE, RMSE, MAE, and R²), following a transparent and reproducible methodological protocol. This approach allows not only for the identification of the most suitable model for its integration in digital twins of PV installations but also for the provision of practical recommendations for its deployment in real scenarios, considering the seasonal variability and the usual operational constraints in the sector. Thus, this study contributes to closing the existing gap in the literature on the comprehensive and contextualized comparison of predictive models for advanced energy management applications in smart buildings.

The rest of the article is organized as follows: Section 2 describes the actual plant monitored, the data collected, the prediction methods used, and the metrics applied to evaluate the performances of the models. In Section 3, the results achieved are presented and analyzed. Finally, in Section 5, the conclusions are presented.

2. Materials and Methods

2.1. Actual Plant Description

This study was carried out on the photovoltaic installation located on the roof of the School of Industrial Engineering of the University of Extremadura, in Badajoz. The installation has a nominal power of 2.79 kWp, consisting of six JA SOLAR JAM72S20 monocrystalline photovoltaic modules (from JA SOLAR GmbH, München, Germany) connected in series, and a Huawei SUN2000-5KTL-M1 inverter (from Huawei Technologies Co. Ltd., Shenzhen, China). The orientation of the panels is south, with an inclination of 30° (see Figure 1).

The dataset used in this study was 36,823 records of solar irradiance, ambient temperature, wind speed, and actual DC power generated by the facility under study collected every 5 min. Meteorological data acquisition was performed using a Davis Vantage Pro2™ Wireless weather station, complemented with a Weatherlink Live system and a DAVIS 6450 pyranometer installed in the same plane as the panels, located next to the facility under study. The power generated is obtained through Excel file downloads from the Huawei manufacturer’s application.

The electrical parameters required for the mathematical model were obtained from the JA SOLAR JAM72S20 panel manufacturer’s datasheet.

2.2. Data Collection

The inputs to the DT should be those outdoor variables that are considered to affect the actual PV installation [48]. Solar radiation and cell temperature are, among many other factors, some of the variables on which the energy generated by the PV installation depends [49]. When studying meteorological variables, it is fundamental to perform it by seasons of the year because the relationships, trends, and effects of these variables change significantly depending on the season. Meteorological conditions such as temperature, solar radiation, wind, and precipitation present different behaviors and relationships in each season [50]. In the case of photovoltaic technology, an increase in ambient temperature causes an increase in PV cell temperature. This leads to a decrease in installation performance [51]. In the comparison of different mathematical models for estimating the PV cell temperature used in [52], it was shown that wind speed reduces the negative influence of temperature on power generation by up to 10%. Ultimately, solar irradiance, ambient temperature, and wind speed are critical factors for accurate prediction, and neural network models can effectively integrate them [53]. Taking into account these considerations, the variables selected for this study were solar irradiance (W/m²), ambient temperature (°C), and wind speed (m/s). They were selected for their physical relevance and direct impact on photovoltaic production, allowing for a homogeneous and robust comparison between the different models evaluated.

In order to evaluate the robustness and generalization capability of the predictive models in different meteorological conditions, it was necessary to have data from the complete annual cycle. Therefore, each sample, captured at five-minute intervals, covered all seasons of the year and climatic conditions representative of the geographical location of the study. The monitoring period spanned from May 2024 to June 2025.

As an example, the data corresponding to a week (14–21 June 2024) are shown in Figure 2. Clear-sky conditions on 14 June, for instance, result in high irradiance peaks (>900 W/m²), while cloudy days like 18 June show significantly reduced values. Wind speed fluctuates between 0 and 5 m/s, with isolated gusts corresponding to changes in weather conditions. Maximum temperatures reach approximately 35 °C on 14 June, coinciding with clear skies and high irradiance, while minimum peak temperatures are observed during the cloudiest periods (about 22 °C on 19 June). The power generated reflects these patterns, reaching up to 3 kW on days with the highest solar irradiance and decreasing during periods of lower irradiance. These data highlight the strong influence of weather variability, particularly solar irradiance and temperature, on the PV system’s generation. Statistical information regarding the four datasets are provided in Table 1.

The collected data underwent preprocessing to ensure their quality. During this stage, erroneous readings and records with missing values were systematically eliminated. These missing values were primarily caused by connectivity interruptions or temporary failures in data acquisition. Next, the data corresponding to each variable were normalized using the StandardScaler technique, which adjusts the data to have a mean of zero and a standard deviation of one. This normalization ensures that all variables are on a homogeneous scale. This prevents differences in magnitude between parameters from negatively affecting the neural networks’ learning process and contributes to more stable and efficient model convergence during training.

The simulations were performed using the Python 3.13.2 programming language together with the PyTorch 2.6.0 framework, widely used for the development and training of deep learning models. The modeling and training process was performed on a personal computer with 64-bit architecture, equipped with an Intel Core i7-10750H processor (from Intel corporation, Santa Clara, CA, USA) (2.6 GHz, 6 cores, and 12 threads) and a dedicated NVIDIA GeForce RTX 2060 graphics card (from Nvidia Corporation, Santa Clara, CA, USA).

2.3. Forecasting Models

When developing a digital twin, the first option is to model it using mathematical equations that describe the system dynamics. If that is not possible or does not provide satisfactory accuracy, then other options should be tested. But while this mathematical model may perform accurately, other models should also be tested to determine whether they can improve accuracy and, if so, to identify the model with the best results. In this work, a mathematical model of the PV plant is provided alongside three neural networks to determine which performs best. The selected neural models are the Multilayer Perceptron (MLP), which is one of the most widely used models and has proven to be an accurate and reliable tool for regression and classification problems; and two deep learning models: the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). LSTMs and GRUs have been used for translation and language processing problems, as well as time series forecasting, providing very good performance.

Neural networks have shown great potential as predictive models in the context of DTs [19]. However, their implementation in real-time monitoring systems presents practical challenges. The most relevant challenges include the limited availability of training data in certain locations, the need for sufficient computational resources to train and run the models, and the existence of intermittent or unreliable connectivity in some application environments [20]. These limitations can affect the frequency with which models are updated and their ability to adapt to changing conditions. Therefore, it is crucial to select efficient architectures and develop robust preprocessing and data transfer strategies to ensure system reliability [23].

Both the mathematical model and the neural network take as inputs solar irradiance, ambient temperature, and wind speed and provide a prediction of the DC power generated as output. So, each model will receive an input vector made up of values of the aforementioned three variables at a time point and provide one output: a prediction of the DC power that the PV systems should give.

A neural network must be trained before it can perform any task. Therefore, the entire dataset must be divided into two subsets: one for training and one for validation. To properly perform the training and validation processes, this work randomly divided the entire dataset into 75% for training and 25% for validation using the hold-out method. This strategy is widely used in the literature and allows us to evaluate the models’ abilities when processing unlearned data. This approach is particularly useful when the goal is to compare the performances of different architectures under consistent conditions. After training each model, performance evaluations were carried out on the same 25% of the data reserved for validation for both global metrics and seasonal analysis. This ensures that the results of each model are assessed comparably and objectively under the same conditions. It is worth noting that other data partitions for training and validation were tested (40–60% and 20–80%, as mentioned when explaining the MLP training process model below), but the 25–75% partition yielded slightly better results.

The mathematical model only requires its parameters to be adjusted using information provided by the system’s manufacturers. Therefore, it does not require training; only the validation set must be simulated. This allows us to evaluate the model’s performance under the same conditions as the neural networks.

2.3.1. Mathematical Model

The mathematical model chosen to simulate the photovoltaic system in this work was selected based on its ability to incorporate module efficiency, enabling the simulations to accurately reflect the system’s real behavior [54].

The first step in defining the mathematical model is to adapt the available information—solar irradiance, ambient temperature, and wind speed—to the variables of that model. As the model selected used the irradiance and PV cell temperature as inner variables, the irradiance is directly used from the available data; however, the PV cell temperature must be obtained from the ambient temperature and wind speed:

T_{P} = 1.14 (T - T^{r e f}) + 0.0175 (I_{D} - 300) - k_{r} w + 30 .

(1)

In this equation,

T_{P}

represents the cell temperature, T is the ambient temperature,

T^{r e f}

is a reference temperature, (

T^{r e f} = 25 ° C

),

I_{D}

is the solar irradiance, w is the wind speed, and

k_{r}

is a coefficient which depends on the panel technology, which in our case is monocrystalline (

k_{r} = 1.509)

[55].

Once the cell temperature is obtained, the equations providing the cell’s current and voltage can be defined [54]:

I_{P V} (I_{D}, T_{p}) = \frac{I_{P V}^{r e f} I_{D} (1 + K_{i} (T_{p} - T^{r e f}))}{I_{D}^{r e f}},

(2)

V_{P V} (I_{D}, T_{p}) = \frac{V_{P V}^{r e f} (1 + K_{p} (T_{p} - T^{r e f}))}{1 + K_{i} (T_{p} - T^{r e f})} .

(3)

In these equations

I_{P V} (I_{D}, T_{p})

and

V_{P V} (I_{D}, T_{p})

are the cell’s current and voltage, which depend on the solar irradiance and the cell’s temperature.

I_{P V}^{r e f}

y

V_{P V}^{r e f}

represent the cell’s current and voltage, respectively, obtained from the manufacturer’s datasheet at the maximum power point under reference conditions (

I_{D}^{r e f} = 1000 W / m^{2}

and

T^{r e f} = 25 ° C

). Their values for the panel used in this work are

I_{P V}^{r e f} = 10.96 A

and

V_{P V}^{r e f} = 42.43 V

. Finally,

K_{i}

and

K_{P}

are the coefficients of variation of the current and power with temperature, respectively. According to the manufacturer’s datasheet for the analyzed panel, their values are 0.044%/°C and −0.35%/°C, respectively [56].

As Equations (2) and (3) provide the current and voltage that a cell gives, the corresponding current and voltage given by the PV system are obtained by multiplying each one by the number of cells in parallel (for current) and in series (for voltage):

I_{m} = N_{p} I_{P V} (I_{D}, T_{p}),

(4)

V_{m} = N_{s} V_{P V} (I_{D}, T_{p}),

(5)

Therefore, the power provided by the system is as follows:

P_{s} = I_{m} V_{m} .

(6)

In this mathematical model, the calculation of the voltage drop of the PV cable (which transforms into a power loss due to the Joule effect) from the PV panel to the DC inverter is neglected. It is assumed that the installation company has correctly calculated the PV cable cross section and that the voltage drop is less than 1.5%, according to the Spanish instruction ITC-BT 40 [57].

2.3.2. Multilayer Perceptron (MLP)

The Multilayer Perceptron (MLP) [58] is probably the most basic neural model used today. It has become a classic because it was one of the first models that could provide accurate and reliable results for classification and regression tasks. Despite its simplicity, it is still one of the most commonly used models for applications that are not closely related to human intelligence. In fact, it has been proven that an MLP with one hidden layer is a universal approximator, provided that it has enough neurons [59]. It is a multilayer, feedforward structure, meaning its processing elements (neurons) are arranged into layers, with information flowing from the input layer to the output layer for processing (see Figure 3). There is no feedback between neurons. In this multilayer structure, the input layer is not an actual layer but rather the input data to the network. In other words, the input layer is a data vector. Similarly, the output layer is not defined to properly process information but rather to provide an output that matches the network outputs to the range of the data used.

The information the network receives is sequentially processed by all layers. Each neuron in one layer receives the outputs of all the neurons in the preceding layer (this is why they are usually known as fully connected layers) and processes all this information by means of a transfer function:

x_{j}^{l} = f (\sum_{i} w_{j i}^{l} x_{i}^{l - 1} + b_{j}) .

(7)

In this expression

x_{j}^{l}

represents the output of neuron j in layer l;

x_{i}^{l - 1}

is the output of neuron i in layer l − 1;

w_{j i}^{l}

is the weight that defines the strength of connections between neurons; and

b_{j}

is a bias term. f( ) is usually a sigmoid function (outputs between 0 and 1) or a hyperbolic tangent function (output values between −1 and 1) for the neurons in the hidden layers, although other functions such as the ReLU (which is 0 for inputs lower than 0 and linearly increases for values higher than 0) can also be used. The transfer function for neurons in the output layer is usually linear.

Neuronal networks can accurately reproduce the behavior of many complex systems because they can learn this behavior from data. To acquire this capability, the networks must be trained before they are used for their intended task. Therefore, the available dataset must be divided into two subsets: one for training and one for validation. Usually, divisions ranging from 40–60% (training–validation) to 20–80% are used. The MLP training process is carried out using the well-known backpropagation algorithm [44]. To apply the algorithm, the training dataset is organized as pairs of inputs (patterns) and their corresponding desired outputs. The patterns are then sequentially presented to the network, which processes them and provides the corresponding outputs. These outputs are compared with the desired outputs to measure the prediction error. The sum of these errors is then backpropagated to allow the algorithm to minimize its value by adjusting the weights of all neurons. This process is repeated until the desired level of accuracy is achieved.

In this work, an MLP with a single hidden layer with 128 neurons was used; the activation function of the neurons was the ReLU. The input layer has three components that correspond to the selected meteorological variables: solar irradiance, ambient temperature, and wind speed. The output layer has a single neuron that provides the prediction of the generated power. We tested different numbers of neurons in the hidden layer, but the configuration with 128 neurons performed better.

The MLP was trained using the Adam algorithm, a procedure that optimizes the backpropagation algorithm, with an initial learning rate of 0.001 and a batch size of 32 samples. The mean-squared error (MSE), which is described in Section 2.4 below, was used as the loss function. To prevent overfitting and optimize the number of epochs, an early stopping strategy was implemented and executed after ten consecutive epochs without improvement in the loss function. The algorithm stops after a maximum of 200 epochs.

2.3.3. Long Short-Term Memories

The LSTM network [60] is a complex neural model with a multilayer structure and feedback connections. The neurons in each layer can be organized into blocks containing multiple elements. Each neuron receives inputs from the preceding layer and feedback from the other neurons in the same layer (Figure 4). LSTM neurons also store a type of “memory” of their past states, which is processed with inputs and feedback [46]. Two control gates decide which portion of the neuron’s inputs (new inputs and feedback), the input gate, and inner state (“memory”), the forget gate, will be processed to create a new inner state:

i_{t} = σ (W_{i} \cdot [x_{t}, y_{t - 1}] + b_{i}),

(8)

f_{t} = σ (W_{f} \cdot [x_{t}, y_{t - 1}] + b_{f}),

(9)

In these equations

x_{t}

represents the new input data;

y_{t - 1}

is the feedback from the neurons in the same layer; W_i and W_f are weight matrices; b_f and b_o represent biases; and σ( ) is a sigmoid function. To provide clearer and more compact expressions, both the new inputs,

x_{t}

, and the feedback,

y_{t - 1}

, are arranged into a single input vector,

[x_{t}, y_{t - 1}]

.

The neuron calculates a temporal new inner state from its inputs by means of the following:

c_{t}^{’} = Tan h (W_{c} \cdot [x_{t}, y_{t - 1}] + b_{c}),

(10)

In this expression

W_{c}

and

b_{c}

are the corresponding weights and bias. A fraction of this temporal inner state is then combined with a fraction of the stored one, both controlled by gates

i_{t}

and

f_{t}

, to obtain the new inner state:

c_{t} = i_{t} \cdot c_{t}^{’} + f_{t} \cdot c_{t - 1} .

(11)

Finally, the neuron’s output is a fraction of this inner state after been processed by a hyperbolic tangent to limit its value between −1 and +1. The fraction of this value to be provided as output is decided by a third gate (

o_{t}

), the output gate:

o_{t} = σ (W_{o} \cdot [x_{t}, y_{t - 1}] + b_{o}),

(12)

y_{t} = o_{t} \cdot Tan h (c_{t}) .

(13)

The variables, parameters, and function (σ( )) in Equation (12) have the same meanings as those in Equations (8) and (9).

The LSTM is trained with a modified version of backpropagation, which was adapted to take into account both the feedback and “memory” present in its structure [60]. Two modifications have been defined by taking into account the nature and goals of the different weights and biases: truncated backpropagation through time (BPTT), for output units and output gates, and real-time recurrent learning (RTRL), for the neuron’s inputs, input gates, and forget gates.

To explore possible temporal correlations between each input variable’s data, the LSTM’s input vector consisted of six historical data points from each input variable preceding the PV power value to be predicted. In other words, sequences of six historical data points for each meteorological variable were provided in 30 min windows (six 5 min intervals). Thus, the input vector comprises 18 components. The model’s structure consists of a hidden LSTM layer with 50 neurons and a dropout layer, which randomly switches off neurons to create a more efficient network and prevent overfitting. The dropout rate was fixed to 0.2. This is followed by an output layer with a single neuron that provides the power prediction. Since the dropout layer can disable neurons, only the model with 50 neurons was tested, as the model itself can reduce the number of neurons if necessary to improve accuracy.

As with the MLP, the training algorithms employed the Adam optimizer with a learning rate of 0.001 and the mean-squared error (MSE) as the loss function. The early stopping criterion waits 10 epochs before ending the training process, which concludes after a maximum of 200 epochs in any case.

2.3.4. Gated Recurrent Unit

With the aim of defining a simpler structure of the LSTM model while retaining its computational capability, a simplification has been proposed: the Gated Recurrent Unit (GRU) [47]. A first simplification is assumed by considering that a single gate controls the combination of the new inputs and the stored “memory”, providing a balanced combination of both:

c_{t} = i_{t} \cdot c_{t}^{’} + (1 - i_{t}) \cdot c_{t - 1} .

(14)

A second simplification consists of defining the neuron’s output as only a fraction of its new inner state:

y_{t} = o_{t} \cdot c_{t} .

(15)

This model was proposed for speech recognition [61], although it has also been applied for time series forecasting [62].

The GRU model used in this work has a structure similar to that of the LSTM model described in the previous section. The only difference is that a GRU is used instead of an LSTM. The model consists of one hidden GRU layer with 50 neurons, a dropout layer with a 0.2 dropout rate, and a linear output layer with one neuron. This configuration prioritizes efficiency in learning sequential patterns without compromising predictive capability.

As with the LSTM, the input data have 18 components: three blocks of six historical data points, one for each meteorological variable considered. The training algorithm employs the Adam optimizer with a learning rate of 0.001 and the mean-squared error (MSE) as the loss function. The training algorithm runs a maximum of 200 epochs, stopping early if the MSE does not decrease after 10 epochs.

2.4. Model Assessment

To rigorously assess the accuracy and explanatory power of the four predictive models proposed, four widely recognized statistical metrics were calculated: the mean-squared error (MSE), which quantifies the average magnitude of the squared errors; the root-mean-squared error (RMSE), which provides the square root of the MSE to give a value with the original units of the data and facilitates direct interpretation of the standard deviation of the predictions; the mean absolute error (MAE), which measures the average of the absolute differences between actual and estimated values; and the coefficient of determination (R²), which expresses the proportion of variance of the dependent variable explained by the model. Together, these metrics provide a comprehensive view of performance by considering both the magnitude of the errors and the model’s ability to capture the variability of the data. Their mathematical expressions are as follows:

M S E = \frac{1}{N} \sum_{t = 1}^{N} {(x (t) - \hat{x} (t))}^{2},

(16)

R M S E = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} (x (t) - \hat{x} (t)),}

(17)

M A E = \frac{1}{N} \sum_{t = 1}^{N} |x (t) - \hat{x} (t)|,

(18)

R^{2} = 1 - \frac{\sum_{t = 1}^{N} {(x (t) - \hat{x} (t))}^{2}}{\sum_{t = 1}^{N} {(x (t) - \bar{x} (t))}^{2}} .

(19)

In these expressions

x (t)

represents an actual data point,

\hat{x} (t)

is its predicted value, and

\bar{x} (t)

is the mean of all the data. N represents the total number of observations. For the MSE, RMSE, and MAE, lower values indicate better model accuracy. For the R², closer values to 1 indicate better model performance.

3. Results

Table 2 shows the results of comparing the performances of the four models using the MSE, RMSE, MAE, and R² metrics with the data of the validation set. The single-layer MLP performed best overall, achieving the lowest mean-squared error (MSE = 0.0389) and the highest coefficient of determination (R² = 0.931). This confirms its ability to accurately capture the relationships between meteorological variables and generated PV power. This performance surpasses that of the traditional mathematical model, which, despite also exhibiting good values (R² = 0.914; MAE = 0.0752), is less precise in contexts with greater climatic variability, such as overall yearly predictions.

In contrast, the LSTM and GRU models, which are designed to process complex temporal dependencies, performed worse. They produced higher error metrics (MSE = 0.0588 for the LSTM and MSE = 0.0593 for the GRU) and less satisfactory fits (R² = 0.896 for the LSTM and R² = 0.895 for the GRU). The performances of these models can be interpreted by analyzing the dynamics of the data used. In this context, the relationship between weather variables and PV power production varies smoothly over short time intervals. Thus, much of the system’s behavior can be captured by models that consider only the most recent values of these variables. Therefore, incorporating recurrent mechanisms, such as those present in the LSTM and GRU architectures, which were designed to account for strong temporal dependencies, like those in language or some time series, does not appreciably improve performance compared to simpler models for problems like the one studied here. In fact, complex models increase computational complexity and may present a trend to induce overfitting.

Consequently, for real-time PV power forecasting scenarios, where data are updated with high frequency but there are no significant time dependencies, the Multilayer Perceptron (MLP) is the most efficient, robust, and accurate option.

To provide a graphical representation of the models’ performances that can help with analysis, the scatter plots of predictions versus actual data for each model were obtained. These plots, which were obtained using only the data reserved for validation, are shown in Figure 5. The proximity and concentration of points along the diagonal indicate the degree of predictive accuracy; closer alignment implies a lower prediction error and higher coefficient of determination (R²).

The MLP model exhibits the highest density of points tightly clustered around the diagonal (R² = 0.931), demonstrating its superior goodness of fit and predictive capability across the entire data range. The mathematical model, while slightly less precise than the MLP (R² = 0.914), achieves a robust performance and maintains notable accuracy.

In contrast, both recurrent neural network architectures (LSTM and GRU) show a clear wider dispersion of points, especially at intermediate and higher power values. This indicates a reduced predictive fidelity under these conditions (R² = 0.896 for the LSTM and R² = 0.895 for the GRU). This is consistent with the error metrics in Table 2 and may be due to the nature of the dataset, which is dominated by relatively static and direct relationships between meteorological variables and PV power. Unlike in scenarios with complex temporal relationships, recurrent networks did not provide additional benefits here. Their time-dependent structure may even introduce a tendency toward overfitting or increase errors in cases with limited or no time dependencies.

Seasonal Analysis

Seasonal analysis is essential for assessing the robustness and adaptability of predictive photovoltaic (PV) generation models because the solar energy production and algorithm performance vary significantly throughout the year due to changes in weather conditions and incident radiation [50].

To accomplish this, we evaluated the performances of the four models on a seasonal basis by dividing the entire dataset into four seasons: spring, summer, autumn, and winter. Each season was used to independently train and validate each model, just as was performed with the entire dataset. This allows for a homogeneous comparison of the accuracy and explanatory power of the models by analyzing their performances within a set of data with similar weather behavior.

The results of the seasonal analysis are presented numerically in Table 3 and graphically in Figure 6 for easier understanding. The graphical visualization allows for a quicker and clearer interpretation of the results. Overall, the single-layer MLP model shows the highest consistency and accuracy across all seasons. It achieves the lowest values of the MSE, RMSE, and MAE, as well as the highest coefficients of determination (R²), in three of the four seasons (spring, summer, and winter). It clearly outperforms the other models in spring (MSE = 0.631; R² = 0.9204), the season with the most fluctuating weather conditions. Nevertheless, the mathematical model performs slightly better than the MLP in autumn, when weather conditions are more stable. These results demonstrate the MLP’s ability to effectively capture the relationships between meteorological variables and generated power, despite seasonal variability. Overall, the MLP’s results are better than those of the other models, providing higher accuracy in seasons with fluctuating weather conditions, though the mathematical model slightly outperforms the MLP when favorable weather conditions appear. Therefore, it can be concluded that the MLP provides more balanced and accurate results in all weather conditions. The mathematical model can only provide valuable predictions for seasons with relatively stable weather conditions.

It is worth noting that in the season with the most stable weather conditions, summer, the four models were able to notably increase their accuracies: they all achieved values of the R² very close to 1, with the MLP providing slightly better metrics (MSE = 0.0032; R² = 0.9957).

In winter and spring, the mathematical model, LSTM, and GRU achieved clearly worse results than the MLP. The MLP has significantly lower MSE values (MSE = 0.0370 in winter and MSE = 0.0631 in spring) and higher R² values (R² = 0.9251 in winter and R² = 0.9204 in spring) than those of the other three models. Notably, while the other three models experience a notable drop in forecasting accuracy due to the changing weather conditions typical of these seasons, the MLP is able to provide relatively accurate predictions. However, while the mathematical model slightly outperforms the LSTM and GRU in spring, it underperforms these two models in winter, demonstrating its limitations in dealing with changing weather conditions.

When comparing the forecasting performances of several models, especially those that will be implemented in real-time systems, it is important to consider their computational performances, i.e., how long it takes the model to provide a prediction. Table 4 presents a comparative summary of the training and validation times measured for each model. For simplicity, only the values obtained with the entire annual dataset are used. Three times are provided: the training time, the validation time, and the sum of both. Since the mathematical model does not require training, only the validation time is provided. The validation time represents the time needed to predict the entire validation dataset.

Of the three neural network models, the MLP is the most time-efficient, requiring 41.84 s for training and practically instantaneous validation (0.09 s). The total time consumption is 41.93 s. This performance makes the MLP well-suited for implementations with moderate computational resources that require frequent updates. It is worth noting that, while training can be carried out offline, validation must be carried out in real time. Therefore, the validation time is the most significant factor in evaluating a model’s performance in real-time applications.

In contrast, recurrent models perform worse in terms of runtime. The GRU requires less time than the LSTM: 91.99 s for training, 1.04 s for validation, and 93.03 s in total versus 126.88 s for training, 1.10 s for validation, and 127.98 s in total. These results align with expectations, as the GRU has a simpler structure and fewer parameters than the LSTM. However, both models require substantially more prediction time than the MLP, which may limit their use in scenarios where speed and computational efficiency are priorities.

Notably, the neural models outperformed the mathematical model in the prediction time by a significant margin (1.86 s). The other two neural models also outperformed the mathematical model, demonstrating that once trained, neural models are simpler computational structures.

4. Discussion

The results achieved and described in the preceding section show that the MLP yields more accurate and confident predictions than the recurrent neural architectures (LSTM and GRU) in all seasons. The MLP performs similarly to the mathematical model in stable weather conditions (summer and autumn). However, it significantly outperforms the mathematical model in seasons with changing conditions. Moreover, the mathematical model provides an even worse performance than the LSTM and GRU in some of these cases (winter). The MLP stands out as the better forecasting option due to its consistent and balanced accuracy. It provides the best metrics in the worst weather conditions. However, the mathematical model remains competitive in scenarios with low variability, such as in autumn.

It is worth noting that the four models yielded better accuracies when trained and validated using seasonal data than when the entire dataset was used. This is not surprising, since, when using the entire dataset, models must deal with data representing different weather behaviors, which complicates training and prediction. However, when seasonal datasets are used, the weather conditions are more stable, and the models can process the information more easily.

Including historical data as inputs to the LSTM and GRU models used in this study did not improve accuracy. In fact, it increased prediction errors compared to the simpler MLP structure defined in this work. Using historical data with no time dependency not only fails to improve accuracy but also tends to degrade it because using more data with a more complex prediction structure tends to induce higher computational errors, as the results obtained show.

Finally, it should be noted that the prediction times are not critical for the four models tested when used in the actual system analyzed in this work because the monitoring system captures data every five minutes, and the models require only a very short time to provide a prediction. Note that the times shown in Table 4 correspond to the times needed to forecast the entire validation dataset. However, the prediction model will only provide one prediction when the corresponding meteorological values are entered, and this process will take significantly less time than the times shown in Table 4.

Therefore, it can be concluded that neural networks are valuable tools for implementing DTs. As demonstrated in this work, they can outperform accurate mathematical models. Nevertheless, not all models can provide good results in every case. Several models must be tested to determine the best model for each particular case. Therefore, new neural models [42] or combinations of neural models with other tools, hybrid models [41], are valuable options that deserve to be tested. It is worth noting that more complex models do not always perform better, as some studies have pointed out [39,40] and this study has shown.

5. Conclusions

This study demonstrates that predicting PV power in real environments using high-resolution data and annual coverage benefits significantly from a rigorous comparative evaluation between traditional mathematical models and several neural network architectures. The results obtained show that the single-layer MLP model provides the best overall performance in terms of accuracy, with a lower MSE and higher R². It outperforms both the mathematical model and the recurrent LSTM and GRU architectures.

Seasonal analysis reveals that the MLP maintains high consistency and accuracy across all seasons, performing best in summer and autumn. The traditional mathematical model closely follows the MLP in conditions of low variability, such as in summer and autumn. However, in spring and winter it does not perform as well. Conversely, the recurrent LSTM and GRU architectures underperform in most seasons, suggesting that their greater ability to capture temporal dependencies does not provide an advantage in contexts with low temporal complexity of PV generation patterns.

These findings highlight the importance of tailoring the selection and implementation of predictive models to the specific operational and seasonal characteristics of each PV installation. In particular, integrating the MLP into digital twins of PV installations is presented as an efficient, accurate, and cost-effective solution that facilitates real-time monitoring, optimization, and predictive maintenance, key aspects of smart energy management in sustainable buildings.

The methodological approach adopted, which includes a homogeneous comparison of models under standardized metrics and detailed seasonal analysis, contributes to closing the existing gap in the literature on comprehensive model evaluation for advanced energy management applications.

These findings underscore the importance of conducting thorough seasonal analyses when validating the DTs of photovoltaic (PV) installations. These analyses enable us to identify the strengths and limitations of each approach under real operating conditions. Additionally, seasonal comparisons provide essential information for selecting and adjusting models according to the installation’s climatic and technological context. This contributes to more efficient and resilient energy management throughout the year. This information is crucial for designing advanced maintenance, optimization, and early warning strategies for smart PV systems. Overcoming the challenges of seasonality and climate variability is essential for predicting photovoltaic energy with neural networks. Models that explicitly incorporate these factors significantly improve accuracy and robustness, two essential aspects for application in distributed generation and energy management systems.

These predictive models enable a DT that allows for the proactive and optimized management of smart buildings by anticipating photovoltaic generation and adapting energy consumption in real time. This capability translates into reduced costs and emissions, as well as improved urban environment comfort and sustainability. Additionally, integrating DTs with energy storage systems and load management strategies creates new opportunities for balancing supply and demand, increasing system resilience, and maximizing renewable energy use. These synergies contribute together to a smart ecosystem geared toward increasingly autonomous and efficient buildings that can respond dynamically to environmental conditions and consumption needs.

Author Contributions

Conceptualization, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M.; data curation, D.D.-A.; formal analysis, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M.; funding acquisition, J.F.G.-G. and M.A.J.-M.; investigation, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M.; methodology, D.D.-A. and M.A.J.-M.; project administration, J.F.G.-G. and M.A.J.-M.; resources, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M.; software, D.D.-A.; supervision, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M.; validation, D.D.-A. and M.A.J.-M.; visualization, D.D.-A. and M.A.J.-M.; writing—original draft, D.D.-A. and M.A.J.-M.; writing—review and editing, D.D.-A., J.F.G.-G., D.C.-F. and M.A.J.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Spanish Ministry of Science and Innovation, grant number TED2021-132326B-I00, and “APC was funded by this Project”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article are available at https://github.com/Dorotea93/Photovoltaic-Digital-Twins (accessed on 21 June 2025).

Acknowledgments

The authors thank the RIMSGES network (RED for Research in Sustainable Energy Management Models) of the CYTED program for the support received for the doctoral thesis of the author D.D.A.

Conflicts of Interest

The authors declare no conflicts of interest.

References

European Parliament European Green Pact: Key to a Climate-Neutral and Sustainable EU. Available online: https://www.europarl.europa.eu/topics/es/article/20200618STO81513/pacto-verde-europeo-clave-para-una-ue-climaticamente-neutral-y-sostenible (accessed on 22 April 2025).
European Parliament Energy Efficiency of Buildings: New Law to Decarbonise the Sector. Available online: https://www.europarl.europa.eu/news/es/press-room/20240308IPR19003/eficiencia-energetica-de-los-edificios-nueva-ley-para-descarbonizar-el-sector (accessed on 22 April 2025).
European Parliament and the Council of the European Union DIRECTIVE (EU) 2024/1275. 2024. Available online: http://data.europa.eu/eli/dir/2024/1275/oj (accessed on 25 June 2025).
Belloni, E.; Bianchini, G.; Casini, M.; Faba, A.; Intravaia, M.; Laudani, A.; Lozito, G.M. An Overview on Building-Integrated Photovoltaics: Technological Solutions, Modeling, and Control. Energy Build. 2024, 324, 114867. [Google Scholar] [CrossRef]
Polasek, T.; Čadík, M. Predicting Photovoltaic Power Production Using High-Uncertainty Weather Forecasts. Appl. Energy 2023, 339, 120989. [Google Scholar] [CrossRef]
Al-Dahidi, S.; Madhiarasan, M.; Al-Ghussain, L.; Abubaker, A.M.; Ahmad, A.D.; Alrbai, M.; Aghaei, M.; Alahmer, H.; Alahmer, A.; Baraldi, P.; et al. Forecasting Solar Photovoltaic Power Production: A Comprehensive Review and Innovative Data-Driven Modeling Framework. Energies 2024, 17, 4145. [Google Scholar] [CrossRef]
Dimitrova Angelova, D.; Carmona Fernández, D.; Calderón Godoy, M.; Antonio, J.; Moreno, Á.; Félix González González, J. A Review on Digital Twins and Its Application in the Modeling of Photovoltaic Installations. Energies 2024, 17, 1227. [Google Scholar] [CrossRef]
Tao, F.; Xiao, B.; Qi, Q.; Cheng, J.; Ji, P. Digital Twin Modeling. J. Manuf. Syst. 2022, 64, 372–389. [Google Scholar] [CrossRef]
Zahedi, F.; Alavi, H.; Majrouhi Sardroud, J.; Dang, H. Digital Twins in the Sustainable Construction Industry. Buildings 2024, 14, 3613. [Google Scholar] [CrossRef]
Eiada, R.R.; Elabasy, M.M.; Abdelrasheed, S.A.; Elhmady, N.; Elshahat, D.R. Integrating Digital Twin Technology in Energy Management: A Review of Smart Building Solutions. J. Eng. Res. Rep. 2025, 27, 354–360. [Google Scholar] [CrossRef]
Ghansah, F.A. Digital Twins for Smart Building at the Facility Management Stage: A Systematic Review of Enablers, Applications and Challenges. Smart Sustain. Built Environ. 2024, 14, 1194–1229. [Google Scholar] [CrossRef]
Khajavi, S.H.; Motlagh, N.H.; Jaribion, A.; Werner, L.C.; Holmstrom, J. Digital Twin: Vision, Benefits, Boundaries, and Creation for Buildings. IEEE Access 2019, 7, 147406–147419. [Google Scholar] [CrossRef]
Yitmen, I.; Almusaed, A.; Hussein, M.; Almssad, A. AI-Driven Digital Twins for Enhancing Indoor Environmental Quality and Energy Efficiency in Smart Building Systems. Buildings 2025, 15, 1030. [Google Scholar] [CrossRef]
Miao, L.; Zhao, J.; Ma, J.; Wei, X.; Hu, Y. Study on Optimization and Scheduling Strategies of Distributed Energy Storage Based on Digital Twin. In Proceedings of the 2024 4th International Conference on Smart Grid and Energy Internet, SGEI 2024, Shenyang, China, 13–15 December 2024; pp. 32–36. [Google Scholar]
Antil, H. Mathematical Opportunities in Digital Twins (MATH-DT). arXiv 2024, arXiv:2402.10326. [Google Scholar] [CrossRef]
Sharma, A.; Kosasih, E.; Zhang, J.; Brintrup, A.; Calinescu, A. Digital Twins: State of the Art Theory and Practice, Challenges, and Open Research Questions. J. Ind. Inf. Integr. 2022, 30, 100383. [Google Scholar] [CrossRef]
Jia, W.; Wang, W.; Zhang, Z. From Simple Digital Twin to Complex Digital Twin Part I: A Novel Modeling Method for Multi-Scale and Multi-Scenario Digital Twin. Adv. Eng. Inform. 2022, 53, 101706. [Google Scholar] [CrossRef]
Kapteyn, M.G.; Knezevic, D.J.; Huynh, D.B.P.; Tran, M.; Willcox, K.E. Data-Driven Physics-Based Digital Twins via a Library of Component-Based Reduced-Order Models. Int. J. Numer. Methods Eng. 2022, 123, 2986–3003. [Google Scholar] [CrossRef]
Yang, S.; Kim, H.; Hong, Y.; Yee, K.; Maulik, R.; Kang, N. Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective. Comput. Methods Appl. Mech. Eng. 2024, 428, 117075. [Google Scholar] [CrossRef]
Castilla, M.; Redondo, J.L.; Martínez, A.; Álvarez, J.D. Artificial Neural Network-Based Digital Twin for a Flat Plate Solar Collector Field. Eng. Appl. Artif. Intell. 2024, 133, 108387. [Google Scholar] [CrossRef]
Jessie, B.; Fahimi, B.; Balsara, P. Development of Adaptive Digital Twin for DC-DC Converters Using Artificial Neural Networks. In Proceedings of the 2024 IEEE Transportation Electrification Conference and Expo, ITEC 2024, Chicago, IL, USA, 19–21 June 2024. [Google Scholar]
Ramkumar, A.; Balasubramanian, G. Deep Convolutional Neural Network Object Net Model Based Cognitive Digital Twin for Trust in Human–Robot Collaborative Manufacturing. J. Intell. Manuf. 2024, 10, 1–21. [Google Scholar] [CrossRef]
Chen, J.; Meng, C.; Gao, Y.; Liu, Y. Multi-Fidelity Neural Optimization Machine for Digital Twins. Struct. Multidiscip. Optim. 2022, 65, 340. [Google Scholar] [CrossRef]
Adams, M.; Li, X.; Boucinha, L.; Kher, S.S.; Banerjee, P.; Gonzalez, J.L. Hybrid Digital Twins: A Primer on Combining Physics-Based and Data Analytics Approaches. IEEE Softw. 2022, 39, 47–52. [Google Scholar] [CrossRef]
Cáceres, M.; Avila, C.; Rivera, E. Thermodynamics-Informed Neural Networks for the Design of Solar Collectors: An Application on Water Heating in the Highland Areas of the Andes. Energies 2024, 17, 4978. [Google Scholar] [CrossRef]
Holt, S.; Liu, T.; van der Schaar, M. Automatically Learning Hybrid Digital Twins of Dynamical Systems. arXiv 2024, arXiv:2410.23691. [Google Scholar] [CrossRef]
Keddouda, A.; Ihaddadene, R.; Boukhari, A.; Atia, A.; Arıcı, M.; Lebbihiat, N.; Ihaddadene, N. Solar Photovoltaic Power Prediction Using Artificial Neural Network and Multiple Regression Considering Ambient and Operating Conditions. Energy Convers. Manag. 2023, 288, 117186. [Google Scholar] [CrossRef]
Shrestha, A.; Mahmood, A. Review of Deep Learning Algorithms and Architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
Zhang, X.; Li, Y.; Li, T.; Gui, Y.; Sun, Q.; Gao, D.W. Digital Twin Empowered PV Power Prediction. J. Mod. Power Syst. Clean. Energy 2024, 12, 1472–1483. [Google Scholar] [CrossRef]
Walters, M.; Venayagamoorthy, G.K. Digital Twin for Solar Photovoltaic Power Estimations Based on an Ensemble of Recurrent Neural Networks. In Proceedings of the 2024 IEEE 19th Conference on Industrial Electronics and Applications, ICIEA 2024, Kristiansand, Norway, 5–8 August 2024. [Google Scholar]
Garrido-Herrero, M.; Jaramillo-Moran, M.A.; Carmona-Fernandez, D.; Ozcariz-Arraiza, I.M. The Impact of Photovoltaic Self-Consumption on the Daily Electricity Demand in Spain: Definition of a Model to Estimate It. Heliyon 2024, 10, e32581. [Google Scholar] [CrossRef]
Walters, M.; Yonce, J.; Venayagamoorthy, G.K. Data-Driven Digital Twins for Power Estimations of a Solar Photovoltaic Plant. In Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023, Mexico City, Mexico, 5–8 December 2023; pp. 246–251. [Google Scholar]
Yonce, J.; Walters, M.; Venayagamoorthy, G.K. Short-Term Prediction of Solar Photovoltaic Power Generation Using a Digital Twin. In Proceedings of the 2023 North American Power Symposium (NAPS), Asheville, NC, USA, 15–17 October 2023. [Google Scholar] [CrossRef]
Bazakas, C.; Kothona, D.; Panapakidis, I.P.; Christoforidis, G.C. Digital Twin Modeling for Photovoltaic Systems Based on Deep Learning. In Proceedings of the 13th Mediterranean Conference on Power Generation, Transmission, Distribution and Energy Conversion (MEDPOWER 2022), Valletta, Malta, 7–9 November 2022. [Google Scholar] [CrossRef]
Li, D.; Liu, L.; Qi, Y.; Li, Y.; Liu, H.; Luo, Z. Failure Analysis of Photovoltaic Strings by Constructing a Digital Multi-Twin Integrating Theory, Features, and Vision. Eng. Fail. Anal. 2025, 167, 108980. [Google Scholar] [CrossRef]
Pierce, B.G.; Wieser, R.J.; Ciardi, T.G.; Yao, A.D.; French, R.H.; Bruckman, L.S.; Li, M. Comparison of Empirical and Data Driven Digital Twins for a PV+Battery Fleet. In Proceedings of the 2024 IEEE 52nd Photovoltaic Specialist Conference (PVSC), Seattle, WA, USA, 9–14 June 2024; pp. 1391–1397. [Google Scholar]
Andrade, C.H.T.; Vieira, T.F.; Araújo, Í.B.Q.; Melo, G.C.G.; Barboza, E.d.A.; Brito, D.B.; Torres, I.C. Comparing Neural Network Models for Photovoltaic Power Generation Prediction. Congr. Bras. De Inteligência Comput. 2021, 1–8. [Google Scholar] [CrossRef]
Kazem, H.A.; Chaichan, M.T.; Abd, H.S.; Al-Waeli, A.H.A.; Mahdi, M.T.; Fadhil, H.H.; Mohd, I.I.; Khadom, A.A. Performance Evaluation of Solar Photovoltaic/Thermal System Performance: An Experimental and Artificial Neural Network Approach. Case Stud. Therm. Eng. 2024, 61, 104860. [Google Scholar] [CrossRef]
de Andrade, C.H.T.; de Melo, G.C.G.; Vieira, T.F.; de Araújo, Í.B.Q.; Martins, A.d.M.; Torres, I.C.; Brito, D.B.; Santos, A.K.X. How Does Neural Network Model Capacity Affect Photovoltaic Power Prediction? A Study Case. Sensors 2023, 23, 1357. [Google Scholar] [CrossRef]
Lazcano, A.; Jaramillo-Morán, M.A.; Sandubete, J.E. Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting. Mathematics 2024, 12, 1920. [Google Scholar] [CrossRef]
Hu, Z.; Gao, Y.; Ji, S.; Mae, M.; Imaizumi, T. Improved Multistep Ahead Photovoltaic Power Prediction Model Based on LSTM and Self-Attention with Weather Forecast Data. Appl. Energy 2024, 359, 122709. [Google Scholar] [CrossRef]
Asghar, R.; Fulginei, F.R.; Quercio, M.; Mahrouch, A. Artificial Neural Networks for Photovoltaic Power Forecasting: A Review of Five Promising Models. IEEE Access 2024, 12, 90461–90485. [Google Scholar] [CrossRef]
Eneyew, D.D.; Capretz, M.A.M.; Bitsuamlak, G.T. Toward Smart-Building Digital Twins: BIM and IoT Data Integration. IEEE Access 2022, 10, 130487–130506. [Google Scholar] [CrossRef]
Arsiwala, A.; Elghaish, F.; Zoher, M. Digital Twin with Machine Learning for Predictive Monitoring of CO2 Equivalent from Existing Buildings. Energy Build. 2023, 284, 112851. [Google Scholar] [CrossRef]
Amalou, I.; Mouhni, N.; Abdali, A. Multivariate Time Series Prediction by RNN Architectures for Energy Consumption Forecasting. Energy Rep. 2022, 8, 1084–1091. [Google Scholar] [CrossRef]
Das, A.; Kolvig-Raun, E.S.; Kjærgaard, M.B. Accurate Trajectory Prediction in a Smart Building Using Recurrent Neural Networks. In Proceedings of the UbiComp/ISWC 2020 Adjunct—Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers, Virtual Event, 12–17 September 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 619–628. [Google Scholar]
Deng, M.; Menassa, C.C.; Kamat, V.R. From BIM to Digital Twins: A Systematic Review of the Evolution of Intelligent Building Representations in the AEC-FM Industry. J. Inf. Technol. Constr. 2021, 26, 58–83. [Google Scholar] [CrossRef]
Khaled, N.; Pattel, B.; Siddiqui, A. Digital Twin Model Creation of Solar Panels. In Digital Twin Development and Deployment on the Cloud; Academic Press: Cambridge, MA, USA, 2020; pp. 137–162. [Google Scholar]
Vera-Dávila, A.G.; Delgado-Ariza, J.C.; Sepúlveda-Mora, S.B. Validación Del Modelo Matemático de Un Panel Solar Empleando La Herramienta Simulink de Matlab. Rev. De Investig. Desarro. E Innovación 2018, 8, 343–356. [Google Scholar] [CrossRef]
Olvera Alvarez, H.A.; Myers, O.B.; Weigel, M.; Armijos, R.X. The Value of Using Seasonality and Meteorological Variables to Model Intra-Urban PM2.5 Variation. Atmos. Environ. 2018, 182, 1–8. [Google Scholar] [CrossRef]
Mukisa, N. Analysis of Solar Cell Temperature Models Used in Solar Photovoltaic Softwares. In Proceedings of the 2019 IEEE PES GTD Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019. [Google Scholar] [CrossRef]
Correa-Betanzo, C.; Calleja, H.; De León-Aldaco, S. Module Temperature Models Assessment of Photovoltaic Seasonal Energy Yield. Sustain. Energy Technol. Assess. 2018, 27, 9–16. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, J.; Li, Z.; Lu, H. Short-Term Photovoltaic Power Forecasting Based on Signal Decomposition and Machine Learning Optimization. Energy Convers. Manag. 2022, 267, 115944. [Google Scholar] [CrossRef]
Jain, P.; Poon, J.; Singh, J.P.; Spanos, C.; Sanders, S.R.; Panda, S.K. A Digital Twin Approach for Fault Diagnosis in Distributed Photovoltaic Systems. IEEE Trans. Power Electron. 2020, 35, 940–956. [Google Scholar] [CrossRef]
Ayaz, R.; Nakir, I.; Tanrioven, M. An Improved Matlab-Simulink Model of Pv Module Considering Ambient Conditions. Int. J. Photoenergy 2014, 2014, 315893. [Google Scholar] [CrossRef]
JA Solar Datasheet JAM72S20 445/470 MR. Available online: https://cdn.enfsolar.com/z/pp/2023/5/hta54g2cmi1599/JAM72S20-MR.pdf (accessed on 13 November 2024).
Ministry of Science and Technology BOE-A-2002-18099-Consolidated. State Agency Boletín Oficial del Estado. 2002. Available online: https://www.boe.es/eli/es/rd/2002/08/02/842/con (accessed on 26 June 2025).
Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Jung, S.; Moon, J.; Park, S.; Hwang, E. An Attention-Based Multilayer Gru Model for Multistep-Ahead Short-Term Load Forecasting. Sensors 2021, 21, 1639. [Google Scholar] [CrossRef]

Figure 1. Real PV installation located on the roof of the School of Industrial Engineering of Badajoz (University of Extremadura).

Figure 2. Time evolution of the four variables measured during the week of 14–21 June 2024.

Figure 3. Structure of an MLP with one hidden layer.

Figure 4. LSTM neuron inner architecture.

Figure 5. Dispersion of predictions versus actual electrical power values for the MLP, LSTM, and GRU models in the validation set.

Figure 6. Graphical representation of seasonal comparison of error metrics and fitting of mathematical models and neural networks (MLP, LSTM, GRU) for PV power prediction.

Table 1. Descriptive statistics for irradiance, wind speed, ambient temperature, and real DC power.

	Irradiance (W/m²)	Wind Speed (m/s)	Ambient Temperature (°C)	Real DC Power (kW)
Max	1158.00	9.75	39.00	3.30
Min	0.00	0.00	−0.80	0.00
Mean	159.03	1.10	14.09	0.45
Standard Deviation	265.16	1.15	6.70	0.75

Table 2. Comparison of error and fit metrics for the mathematical and neural network models (MLP, LSTM, and GRU) in PV power prediction after evaluating the annual dataset.

Model	MSE	RMSE	MAE	R²
Mathematical	0.0477	0.2185	0.0762	0.914
MLP	0.0389	0.1973	0.0762	0.931
LSTM	0.0588	0.2424	0.0969	0.896
GRU	0.0593	0.2436	0.1056	0.895

Table 3. Seasonal comparison of error metrics and fit of mathematical models and neural networks (MLP, LSTM, GRU) for PV power prediction.

	Spring				Summer
Model	MSE	RMSE	MAE	R2	MSE	RMSE	MAE	R²
Mathematical	0.0885	0.2975	0.1278	0.8867	0.0067	0.0822	0.0410	0.9913
MLP	0.0631	0.2512	0.1019	0.9204	0.0032	0.0570	0.0271	0.9957
LSTM	0.1055	0.3249	0.1481	0.8679	0.0152	0.1234	0.0535	0.9803
GRU	0.1081	0.3288	0.1572	0.8647	0.0173	0.1315	0.0607	0.9777
	Autumn				Winter
Model	MSE	RMSE	MAE	R2	MSE	RMSE	MAE	R²
Mathematical	0.0195	0.1398	0.0426	0.9486	0.0461	0.2147	0.0701	0.9065
MLP	0.0251	0.1587	0.0598	0.9358	0.0370	0.1924	0.0786	0.9251
LSTM	0.0312	0.1768	0.0671	0.9129	0.0422	0.2055	0.0803	0.9149
GRU	0.0313	0.1769	0.0689	0.9153	0.0411	0.2027	0.0752	0.9172

Table 4. Comparison of training, validation, and total times for PV prediction models.

Model	Training (s)	Validation (s)	Total (s)
Mathematical	-	1.86	1.86
MLP	41.84	0.09	41.93
LSTM	126.88	1.10	127.98
GRU	91.99	1.04	93.03

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dimitrova-Angelova, D.; González-González, J.F.; Carmona-Fernández, D.; Jaramillo-Morán, M.A. Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings. Appl. Sci. 2025, 15, 8883. https://doi.org/10.3390/app15168883

AMA Style

Dimitrova-Angelova D, González-González JF, Carmona-Fernández D, Jaramillo-Morán MA. Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings. Applied Sciences. 2025; 15(16):8883. https://doi.org/10.3390/app15168883

Chicago/Turabian Style

Dimitrova-Angelova, Doroteya, Juan F. González-González, Diego Carmona-Fernández, and Miguel A. Jaramillo-Morán. 2025. "Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings" Applied Sciences 15, no. 16: 8883. https://doi.org/10.3390/app15168883

APA Style

Dimitrova-Angelova, D., González-González, J. F., Carmona-Fernández, D., & Jaramillo-Morán, M. A. (2025). Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings. Applied Sciences, 15(16), 8883. https://doi.org/10.3390/app15168883

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Photovoltaic Digital Twins: Mathematical Modeling vs. Neural Networks for Energy Management in Smart Buildings

Abstract

1. Introduction

2. Materials and Methods

2.1. Actual Plant Description

2.2. Data Collection

2.3. Forecasting Models

2.3.1. Mathematical Model

2.3.2. Multilayer Perceptron (MLP)

2.3.3. Long Short-Term Memories

2.3.4. Gated Recurrent Unit

2.4. Model Assessment

3. Results

Seasonal Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI