AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems

Gavilánez, Tomás; Zamora, Néstor; Navarrete, Josué; Vega, Nino; Vergara, Gabriela

doi:10.3390/su17198909

Open AccessArticle

AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems

by

Tomás Gavilánez

¹

,

Néstor Zamora

²,

Josué Navarrete

¹

,

Nino Vega

^1,*

and

Gabriela Vergara

³

¹

Industrial Processes Research Group, Universidad Politécnica Salesiana, Guayaquil 090204, Ecuador

²

Electronics Faculty of Technical Education for Development, Universidad Católica de Santiago de Guayaquil, Guayaquil 090504, Ecuador

³

A Career of Risks and Disasters, Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, Calceta 1701518, Ecuador

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(19), 8909; https://doi.org/10.3390/su17198909

Submission received: 5 September 2025 / Revised: 1 October 2025 / Accepted: 3 October 2025 / Published: 8 October 2025

(This article belongs to the Special Issue Advancing Sustainable Development Through Artificial Intelligence (AI))

Download

Browse Figures

Review Reports Versions Notes

Abstract

Advances in machine learning have improved the ability to predict critical environmental conditions, including solar radiation levels that, while essential for life, can pose serious risks to human health. In Ecuador, due to its geographical location and altitude, UV radiation reaches extreme levels. This study presents the development of a chatbot system driven by a hybrid artificial intelligence model, combining Random Forest, CatBoost, Gradient Boosting, and a 1D Convolutional Neural Network. The model was trained with meteorological data, optimized using hyperparameters (iterations: 500–1500, depth: 4–8, learning rate: 0.01–0.3), and evaluated through MAE, MSE, R², and F1-Score. The hybrid model achieved superior accuracy (MAE = 13.77 W/m², MSE = 849.96, R² = 0.98), outperforming traditional methods. A 15% error margin was observed without significantly affecting classification. The chatbot, implemented via Telegram and hosted on Heroku, provided real-time personalized alerts, demonstrating an effective, accessible, and scalable solution for health safety and environmental awareness. Furthermore, it facilitates decision-making in the efficient generation of renewable energy and supports a more sustainable energy transition. It offers a tool that strengthens the relationship between artificial intelligence and sustainability by providing a practical instrument for integrating clean energy and mitigating climate change.

Keywords:

chatbot; artificial intelligence; solar radiation; energy sustainability; sustainable development

Graphical Abstract

1. Introduction

Solar radiation is a fundamental source of energy for life on Earth, regulating climatic processes and sustaining essential biological functions. However, excessive exposure to ultraviolet (UV) radiation has long been recognized as a major risk factor for human health, contributing to skin cancer, premature aging, cataracts, and other dermatological and ocular conditions [1,2]. These risks are particularly acute in equatorial regions such as Ecuador, where geographical and atmospheric conditions intensify UV incidence [3]. Reports indicate that in both the inter-Andean region and coastal zones, the UV index frequently exceeds extreme thresholds, exposing millions of people to chronic radiation hazards.

Global concerns over UV exposure are further amplified by lifestyle changes, outdoor occupational activities, insufficient protective practices, and depletion of the ozone layer [4]. These factors highlight the urgency of preventive strategies that combine environmental monitoring, early warning systems, and public education [5]. While meteorological networks and satellite-based systems have enhanced forecasting capacity, they often require costly infrastructure and lack the adaptability needed to deliver real-time, user-centered alerts.

In this context, artificial intelligence (AI) and machine learning (ML) have emerged as powerful tools for processing large volumes of environmental data and producing accurate predictive models [6,7]. These techniques have shown strong performance in forecasting solar irradiance, photovoltaic energy production, and extreme weather events [8,9,10,11]. Hybrid approaches that combine algorithms such as Random Forest, Gradient Boosting, and CatBoost have demonstrated greater robustness, reducing overfitting and improving generalization capacity [12,13,14]. Accurate prediction of solar radiation strengthens the relationship between artificial intelligence and sustainability and is essential to anticipate: the production of photovoltaic energy [15,16], reduce the operation and maintenance costs of these energy sources, minimize energy losses during climate variations, and mitigate risks to human health by anticipating episodes of extreme sun exposure [17,18,19,20,21].

Parallel to these advances, chatbots have been increasingly deployed as interfaces for environmental and health-related information [22]. By embedding AI models in real-time communication platforms, chatbots provide personalized interaction with users, enabling timely preventive alerts and informed decision-making. Previous applications have included air quality monitoring, climate awareness, and educational contexts, underscoring their accessibility and versatility [23,24,25].

Nevertheless, predicting solar radiation remains challenging due to the limited distribution of monitoring stations, heterogeneous data resolution, and the absence of key variables such as cloud cover [26,27]. These constraints reduce the accuracy of conventional models and emphasize the need for alternative approaches that leverage available meteorological data while optimizing predictive performance. In Ecuador, machine learning techniques have already been applied to predict global horizontal irradiance, improving model accuracy by increasing R² values from 0.607 to 0.876. Analyses have shown that between February and April, the UV index often reaches extreme levels of 14.3%, with fluxes of up to 1381.5 W/m², while in other months, values remain 1.6% below the March equinox [28,29,30,31,32].

Satellite-based early warning systems, such as GLE Alert++ and HENON, stand out for their precision in detecting solar radiation peaks and issuing alerts [33,34,35,36,37,38,39,40,41,42]. However, compared to these, chatbot-based systems offer greater adaptability and customization, allowing real-time alerts to be tailored to specific locations and user profiles [43,44,45,46,47]. Moreover, unlike satellite infrastructure such as ESPERTA, which requires high financial investment, AI-based solutions using meteorological API data offer an affordable and scalable alternative [48,49,50,51,52,53,54].

The present study addresses these challenges by implementing a hybrid AI model that integrates Random Forest, Gradient Boosting, CatBoost, and a 1D Convolutional Neural Network (CNN). The model was trained and validated with data from the ESMET-IAIRD meteorological station in Calceta, Ecuador, and embedded into a chatbot deployed on Telegram. This system delivers real-time, user-centered alerts on solar radiation exposure, contributing not only to the advancement of hybrid AI methods in environmental prediction but also to sustainable public health strategies and climate adaptation in high-UV regions.

Beyond their technological contributions, AI- and ML-based approaches to UV risk forecasting align directly with the broader goals of sustainability. By enabling more efficient environmental monitoring and fostering adaptive capacity, these systems support public health protection, inform resilient urban planning, and promote the responsible use of natural resources. In particular, the integration of predictive analytics into decision-making frameworks strengthens the capacity of communities to cope with climate variability and environmental stressors, advancing progress toward the United Nations Sustainable Development Goals (SDGs), especially SDG 3 (Good Health and Well-Being) and SDG 13 (Climate Action).

2. Materials and Methods

The development of the proposed system was carried out in several stages, each designed to ensure the accuracy and reliability of solar radiation predictions: (i) data preprocessing, (ii) implementation of prediction algorithms, and (iii) integration into a chatbot, as illustrated in Figure 1.

2.1. Data Acquisition and Preprocessing

The data used in this study were provided by the ESMET-IAIRD meteorological station, located at 0°49′45″ S, 80°11′8″ W, at an altitude of 2 m, in the town of Calceta (Ecuador). The historical dataset collected over a 9-month period consisted of 144 daily samples, resulting in approximately 38,880 records of 18 meteorological variables.

The data was processed to remove outliers, such as null values, zeros, nighttime radiation measurements, and records affected by power failures or maintenance, before being used to train the model. Outliers were identified and treated using the interquartile range (IQR) method, where values outside the range defined by 1.5 times the IQR were considered extreme and removed to improve the quality of the dataset.

Initially, exploratory correlation plots were generated to visually inspect the relationships among variables and to identify anomalous values or data points without clear trends. These values were considered outliers and filtered out to ensure data quality. After this step, a Spearman correlation analysis was performed using all variables provided by the meteorological station to identify those most strongly associated with solar radiation. Variables with a correlation coefficient (r) greater than 0.3 or lower than −0.3 were retained as relevant for further modeling, following common practice for identifying moderate to strong relationships in environmental data as shown in Figure 2.

After the correlation analysis, 12 variables were retained for further modeling, as they showed moderate to strong associations with solar radiation. These include Temperature, Thermal Sensation, Dew Point, Heat Index, Interior Humidity, Maximum Wind Gust, Average Wind Speed, Average Wind Direction, Atmospheric Pressure, Rainfall, Rain Intensity, and UV Index. In addition, solar radiation data itself was considered in subsequent training and validation stages of the models (Figure 3).

2.2. Prediction Algorithms

In this stage, various predictive models were implemented with the aim of predicting solar radiation. The models considered include Linear Regression, Random Forest, CatBoost, a Fully Connected Neural Network (FCN), a Long Short-Term Memory (LSTM) network, and the proposed hybrid model. A one-dimensional Convolutional Neural Network (CNN) layer was tested independently to assess its capacity to extract local temporal features from meteorological variables, capturing short-term fluctuations in solar radiation data. Similarly, a Long Short-Term Memory (LSTM) network was included due to its ability to model sequential dependencies. The purpose of these tests was not to replace ensemble models but to verify whether feature extraction through convolutional layers or sequential learning through LSTM could provide complementary advantages in prediction accuracy. To evaluate their performance, the Mean Squared Error (MSE) and the Mean Absolute Error (MAE) were used as the primary evaluation metrics.

During the training process, cross-validation was applied to an initial set of 12 variables with the highest correlation to solar radiation. From this dataset, 75% of the data was allocated for model training, while the remaining 25% was used for validation.

The choice of three models (Random Forest, CatBoost) was made not only for their accuracy but also for their low computational complexity in inference, which guarantees the real-time performance of the Chatbot.

2.2.1. Linear Regression

In the Multiple Linear Regression (MLR) model, the predictor variables selected in the preprocessing stage are denoted as x_i, while the dependent variable, represented by y_i corresponds to solar radiation Equation (1) shows the relationship:

y = β_{0} + β_{1} x_{i} + ε_{i}

(1)

where β₀ is the intercept term, β₁, β₂,…, β_n are the coefficients quantifying the effect of each predictor variable, and εi represents the error term. The coefficients β_j were estimated using the Ordinary Least Squares (OLS) method, which minimizes the sum of squared residuals Equation (2) shows the relationship.

\hat{B} = {(X}^{T} {X)}^{- 1} X^{T} Y

(2)

Here, X is the feature matrix, and Y is the vector of observed solar radiation values. The predictor variables were selected according to their Spearman correlation with solar radiation in the preprocessing step, ensuring that only variables with moderate to strong relationships were included [55].

2.2.2. Random Forest

The Random Forest model is an ensemble learning algorithm that combines multiple decision trees to improve prediction accuracy and reduce overfitting. Each tree is constructed from training data generated by bootstrapping, and the final prediction is obtained by aggregating the outputs of all trees.

The Random Forest was trained using the 12 selected predictor variables and the target variable, solar radiation. The dataset can be represented as Equation (3).

D = {(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})}

(3)

where X_i represents a set of characteristics (wind chill, rain, UV Index, etc.) and Y_i is the output variable (solar radiation). A bagging technique was used for the training process of the random forest model, where for each Tn tree, a subset of the training data was generated by sampling with replacement.

Each tree Tn was trained on a bootstrap sample of the dataset, and the final prediction P for a new observation X_i was calculated by averaging the predictions of all trees with Equation (4).

P = \frac{1}{N} \sum_{n = 1}^{N} T_{n} (X_{i})

(4)

where T_n(X_i) is the prediction of the n-th tree for sample X_i, and N is the total number of trees.

Since the target variable is continuous, the Random Forest was configured for regression tasks. Variable importance was evaluated based on the reduction in variance during tree construction. At each split, the feature that maximized the reduction in the variance of the target variable was selected as the splitting criterion. This approach, standard in regression-based Random Forest models, allows the identification of the most influential predictors of solar radiation. The variable that contributes the most to reducing variance across all trees is considered the most important, thus highlighting key environmental factors in the prediction of solar radiation.

2.2.3. CatBoost

The CatBoost model is based on an enhanced version of Gradient Boosting. This decision tree algorithm efficiently handles categorical data without requiring extensive preprocessing [56]. CatBoost achieves this by using a technique called ordered boosting, which allows categorical features to be processed directly, reducing the need for transformations such as one-hot encoding or target encoding. Ordered boosting helps preserve relationships between categories and reduces the risk of overfitting. The objective function is defined as Equation (5).

L (Y_{r a d}, F (X_{w e a t h e r})) = \sum_{i = l}^{n} l (Y_{r a d}, F (X_{w e a t h e r}))

(5)

where Y_rad is the vector of observed solar radiation values, X_weather represents the meteorological predictor variables (e.g., temperature, UV index, thermal sensation), and F(X_weather) is the model prediction. The loss function l corresponds to the Mean Squared Error (MSE), which is standard for regression tasks.

The training process follows an additive scheme. Starting with an initial estimate F₀, calculated as the means of the target values, each subsequent tree approximates the negative gradient of the loss function, progressively reducing residual errors. The additive model is expressed as Equation (6).

F_{n} (X_{w e a t h e r}) = \sum_{i = 1}^{N} h_{n} (X_{w e a t h e r})

(6)

where each tree h_i (X_weather) is fitted to correct the residuals from the previous iteration. The iterative construction of the CatBoost model can be generalized as Equation (7).

F_{m} (X_{w e a t h e r}) = F_{m - 1} (X_{w e a t h e r}) + γ_{m} h_{m} (X_{w e a t h e r})

(7)

where γ_m is the learning rate at iteration mmm, regulating the contribution of each tree.

As the model learns the data, the value of μ corresponds to the adjustment made in each iteration n to improve the prediction of solar radiation. The equations are used (8)–(9).

F_{n} (X_{w e a t h e r}) = F_{n - 1} (X_{w e a t h e r}) + μ h_{n} (X_{w e a t h e r})

(8)

F_{n} (X_{w e a t h e r}) = F_{n - 1} (X_{w e a t h e r}) + μ h_{n} (X_{w e a t h e r})

(9)

For model optimization, hyperparameters such as tree depth, learning rate, and number of iterations were tuned using cross-validation. This process ensured that the model maintained predictive performance while reducing the risk of overfitting.

2.2.4. Fully Connected Neural Network (FCN)

The Fully Connected Neural Network (FCN) was implemented to nolinear relationships between meteorological variables and solar radiation. The architecture consisted of an input layer with the 12 predictor variables, two hidden layers, and one output layer. Each hidden layer applied a weighted sum of the inputs, followed by a nolinear activation. The Rectified Linear Unit (ReLU) activation function was used in hidden layers due to its efficiency and ability to mitigate vanishing gradients, while the output layer used a linear activation suitable for regression tasks.

The forward propagation through the network can be expressed as Equation (10).

a^{(l)} = f (W^{(l)} a^{(l - 1)} + b^{l})

(10)

where a^(l) is the activation vector of layer l, W^(l) and b^(l) are the weight matrix and bias vector, and f is the activation function.

2.2.5. Long Short-Term Memory (LSTM)

The Long Short-Term Memory (LSTM) model was applied to capture temporal dependencies in solar radiation data. LSTMs are a specialized form of recurrent neural networks (RNNs) that address the vanishing gradient problem by incorporating memory cells and gating mechanisms. This allows the network to selectively retain or discard information, making it suitable for time series forecasting where past conditions influence present outcomes.

LSTM cell operates through three main gates: the forget gate (f_t), the input gate (i_t), and the output gate (o_t), which control the flow of information Equations (11)–(13).

Ft = σ(Wf[ht − 1,xt] + bf)

(11)

it = σ(Wi[ht − 1,xt] + bi)

(12)

ot = σ(Wo[ht − 1,xt] + bo)

(13)

where xt is the input at time step t, and ht − 1 is the previous hidden state

Although the dataset covered six months, its high temporal resolution (144 measurements per day) provided enough observations to capture short-term autocorrelation, full diurnal cycles, and multi-day variability, allowing the LSTM to effectively exploit sequential patterns in solar radiation.

2.2.6. Hybrid Model

The hybrid model integrates Random Forest (RF), Gradient Boosting (GB), and CatBoost (CB) to improve prediction accuracy and reduce error in solar radiation forecasting. Each component contributes complementary strengths: Random Forest handles feature interactions effectively, Gradient Boosting refines predictions by focusing on residuals, and CatBoost efficiently manages categorical data. By combining these models, the hybrid approach leverages their complementary features to enhance predictive performance.

The architecture of the hybrid model included a one-dimensional convolutional layer followed by two additional layers for feature extraction, as shown in Figure 4. This design allowed the model to capture both local and global patterns in the data, which is particularly important for time-series predictions such as solar radiation.

Model performance was evaluated using multiple metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), and the coefficient of determination (R²) for regression accuracy, as well as Precision, Recall, F1-Score, and Accuracy for classification of solar radiation into four categories (low, medium, high, and extreme). This combination of metrics provided a comprehensive assessment of model performance.

To develop the hybrid model, different combinations of base learners were tested (e.g., RF + GB, RF + GB + CB). Hyperparameters such as the number of iterations (500, 1000, 1500), tree depth (4, 6, 8), and learning rate (0.01, 0.2, 0.3) were tuned through cross-validation. This process was implemented to minimize overfitting and optimize generalization capacity.

2.3. Implementation of the Chatbot

The chatbot was designed as a user interface for the solar radiation prediction system, allowing real-time access to forecasts via the Telegram platform. The implementation was carried out in Python 3.13, using the python-telegram-bot library for interaction, asyncio for asynchronous event handling, and the trained hybrid model for predictions.

The predictive engine of the chatbot was trained on a historical dataset, where categorical variables were transformed using Label Encoder. Feature selection was applied, and the hybrid model was trained using a Stacking Regressor with Ridge regression as the meta-learner. This configuration allowed the combination of multiple base learners to reduce overfitting and improve generalization.

Once trained, the system retrieves real-time weather data from Weather API, constructs a Data Frame with the required predictor variables, and generates a radiation risk category (low, medium, high, very high). The chatbot then communicates results to users with textual messages and visual alerts. Interactive buttons were implemented to request predictions or retrieve recent station data. Deployment was performed on Heroku, using Long Polling to ensure continuous interaction between the server and end users. The chatbot required minimal technological resources, without the need for significant storage or cloud computing capacity.

To validate usability and system stability, pilot tests were conducted with 20 academic users over a two-week period. The group was composed of participants aged 24–65, selected for their ability to interpret forecasts and provide structured feedback. The limited sample size allowed for close monitoring of interactions and early detection of system issues in a controlled environment. Evaluation metrics included response time, number of successful interactions, and user satisfaction, the latter assessed through a short survey administered at the end of the testing period. Figure 5 illustrates the chatbot workflow, from the reception of weather data to the final delivery of radiation risk predictions through the user interface.

3. Results

The evaluation metrics highlight notable differences in performance across the models tested for solar radiation forecasting (Table 1).

In terms of MAE, the Hybrid model (RF + GB + CB) achieved the lowest error (13.93 W/m²), closely followed by CatBoost (14.21 W/m²) and Random Forest (14.41 W/m²). These results indicate that ensemble-based models were more effective in minimizing average prediction errors compared to linear (21.19 W/m²) and neural approaches (FCN: 34.39 W/m²; LSTM: 73.80 W/m²).

For MSE, the Hybrid model again reported the smallest error (871.70), confirming its superior ability to reduce squared deviations. CatBoost (897.88) and Random Forest (934.79) followed closely, while Linear Regression showed the highest error (1466.61). Interestingly, the FCN reported a low MSE (288.76) despite a higher MAE, suggesting that it captured the general trend but struggled with localized fluctuations.

Regarding R², the Hybrid model explained the highest proportion of variance (0.98), outperforming Random Forest (0.97), CatBoost (0.97), and Linear Regression (0.95). The FCN achieved a moderate R² (0.91), while the LSTM performed poorly (0.69), indicating limited capacity to model temporal dependencies despite the high-frequency dataset.

Finally, in terms of prediction counts, the Hybrid model correctly classified 9759 out of 9912 cases, slightly ahead of CatBoost (9747) and Random Forest (9721). Linear Regression reached 9592 correct predictions, while FCN (9652) and LSTM (9552) underperformed relative to ensemble methods.

Overall, the Hybrid model consistently outperformed the individual models across all metrics, confirming its robustness, stability, and suitability for solar radiation forecasting. From a sustainability perspective, the ability of ensemble approaches to combine complementary strengths not only improves predictive reliability but also enhances their potential for real-world applications such as solar energy management, climate adaptation strategies, and public health risk communication.

Based on the results shown in Table 1, the FCN and LSTM models were excluded from graphical comparisons, as their performance was considerably lower and their inclusion would not contribute to the visual interpretation of prediction accuracy.

The performance of the Multiple Linear Regression model suggested overfitting, as validation error remained higher than training error and did not improve with larger training sizes. Figure 6 shows the MAE learning curve against training size, where the gap between the training and validation lines illustrates the limited generalization capacity of this model.

Ensemble methods improved predictive accuracy. Random Forest reduced the MAE compared to linear regression, but the learning curve in Figure 7 shows that validation error plateaued above training error, reflecting moderate overfitting. This implies that the model effectively captures patterns in the training data but fails to generalize optimally to unseen data.

The CatBoost model showed improved stability compared to MLR and Random Forest. As the training size increased, the validation MAE decreased gradually and converged closer to the training error, as shown in Figure 8. However, a persistent gap remained between both curves, suggesting that while the model learned the training data effectively, its ability to generalize to unseen data was still constrained.

The comparison between models in prediction vs. actual values is shown through scatter plots in Figure 9. Each plot represents predicted versus observed solar radiation for one model: Linear Regression (top left), Random Forest (top right), CatBoost (bottom left), and the Hybrid model (bottom right). In all cases, the points follow a positive correlation along with the ideal diagonal. Linear Regression exhibits greater dispersion, particularly at higher values. Random Forest and CatBoost reduce this scatter, showing a closer alignment with the diagonal. The Hybrid model presents the tightest clustering, with most points concentrated around the line, indicating the best agreement between predictions and actual values.

Figure 10 displays the temporal evolution of observed versus predicted solar radiation for Linear Regression, Random Forest, CatBoost, and the Hybrid model (RF + GB + CB). Across all models, the predicted series generally followed the seasonal and daily variability of solar radiation. However, clear differences in alignment are evident. Linear Regression showed larger deviations from the observed values, particularly under peak radiation conditions, reflecting limited accuracy in capturing fluctuations. Random Forest and CatBoost presented improved correspondence, with predictions more closely tracking the observed series and fewer large deviations. The Hybrid model achieved the closest alignment with the real values, maintaining stable accuracy across the entire evaluation period, including both peaks and troughs. Overall, ensemble approaches, particularly the Hybrid model, demonstrated superior consistency and reduced error compared to individual methods.

To evaluate the ability of the models to categorize solar radiation into four levels (low, medium, high, very high), classification metrics were calculated (Figure 11). The results show that ensemble-based models consistently outperformed individual learners across all indicators. Precision and recall were higher for the Hybrid model (RF + GB + CB), reaching values close to 0.9, followed by Gradient Boosting and CatBoost. The F1-Score confirmed this trend, indicating that the Hybrid approach provided the best balance between precision and recall. Accuracy values were also highest for the Hybrid model, demonstrating superior robustness in assigning radiation categories correctly. These results complement the regression analysis, highlighting that the Hybrid model not only minimizes prediction errors but also provides reliable classification of solar radiation into practical risk levels.

The integration of the hybrid prediction model into the chatbot achieved robust performance, with a deviation of approximately 15% when comparing Weather API data with filtered datasets, indicating that external data sources did not compromise prediction accuracy. Overall, the chatbot reached a 95% accuracy in solar radiation predictions.

Figure 12 illustrates the operational interface of the solar radiation prediction chatbot implemented in Telegram. The system provides users with clear information on daily radiation levels, categorized into three risk levels (low, medium, high), and delivers hourly forecasts in both textual and graphical formats. Users can select the desired prediction day and receive structured outputs that facilitate interpretation and decision-making. This demonstrates the capacity of the chatbot to integrate the hybrid prediction model into a user-friendly platform, effectively bridging predictive analytics with practical communication.

User feedback confirmed the system’s applicability and acceptance. Despite the limited scope of the test population, acceptance was unanimous: 40% of participants agreed, and 60% strongly agreed on their willingness to receive radiation notifications from the system. Furthermore, when asked about the reliability of the chatbot compared to satellite APIs, 65% agreed and 35% strongly agreed that the chatbot provided more accurate and useful information, with no dissenting responses recorded.

In terms of usability, 90% of participants expressed satisfaction with the clarity and usefulness of the alerts. Based on this feedback, the message structure was refined to enhance readability and personalize recommendations according to the user profile. The classification of radiation levels into three categories (low, medium, high) was identified as particularly helpful, facilitating interpretation for non-specialist users.

These results highlight the chatbot as an effective tool for real-time solar radiation prediction and communication, combining high predictive accuracy with strong user acceptance and accessibility.

The development of the chatbot is the central component of this research, since it integrates prediction algorithms with a conversational interface that is easily accessible to non-specialized users. The system was designed under a client-server architecture Figure 13, where the backend implements the hybrid prediction model (RF, GB, and CB) and the frontend corresponds to the conversational chatbot. The chatbot receives weather parameters (temperature, relative humidity, wind speed, and atmospheric pressure) as input, which are entered directly by the user or automatically acquired from the weather station. This data is pre-processed and sent to the hybrid prediction engine, which generates an estimate of solar radiation in real time. Subsequently, the response is returned in textual and graphic formats within the conversational interface.

The interaction is structured on three levels: (i) data entry (manual or automatic), (ii) processing using the hybrid model, and (iii) feedback to the user in the form of numerical values, graphs, and explanatory messages. This design allows users without training in machine learning to interpret the results intuitively, favoring decision-making in renewable energy and smart agriculture applications.

Computational efficiency analysis is a determining factor in the viability of the chatbot system. It was shown that the total response time perceived by the user, from entering the prediction request until the data is displayed by the chatbot, is consistently in the range of 2 to 3 s. This response performance, suitable for real-time applications, is due to the strategic management of computational complexity: while the training of complex models (CatBoost and LSTM) was performed offline, the inference phase is optimized for low latency. In fact, the average processing time of the predictive core (pure inference) remained below 5 milliseconds (<5 ms) when running on a standard central processing unit (CPU). The difference between the total response time (2–3 s) and the inference time (≤5 ms) is attributed to network latency, natural language processing, and the chatbot’s communication overhead, not to the model’s computation. This ultra-fast internal efficiency confirms that the system is computationally modest and does not require GPU acceleration, ensuring its commercial scalability and enabling efficient deployment on microservices or low-cost hardware.

4. Discussion

The results demonstrate that the hybrid model (RF + GB + CB) consistently outperformed the individual models in terms of MAE, MSE, and R², highlighting its robustness for solar radiation prediction. This aligns with findings from previous studies [57,58,59,60,61,62], confirming that combining multiple algorithms leverages complementary strengths to enhance predictive performance.

Recent research has also explored hybrid forecasting approaches in the context of integrated energy systems. For example, solar–hydropower optimization studies have applied advanced hybrid decomposed residual ensembling methods, combining statistical (ARIMA, STL) and deep learning models (Bi-LSTM) with optimization algorithms such as WOA, achieving remarkably low error rates (MAE = 1.31 W/m², RMSE = 1.85 W/m²) [53]. While these results demonstrate the potential of highly specialized hybrid models, they often require complex architectures [53]. In contrast, the present study adopts a simpler yet effective hybrid architecture (RF + GB + CB with CNN), which achieved robust accuracy (R² = 0.98; MAE = 13.77 W/m²) while maintaining generalizability and accessibility. This balance underscores the potential of hybrid ensemble approaches not only for technical accuracy but also for practical deployment through user-oriented platforms such as chatbots.

Previous studies have emphasized the critical role of solar forecasting in the renewable energy transition, particularly for grid stability and load balancing. For instance, a data-driven contextual forecasting (DCF) framework combined Support Vector Machines (SVM) for short-term horizons and Facebook Prophet (FBP) for long-term horizons, achieving average R² values of 85% across multiple U.S. cities [55]. While such approaches demonstrated strong adaptability depending on the prediction horizon, the present study contributes by integrating ensemble methods with convolutional layers to enhance prediction accuracy under high temporal resolution conditions. Unlike DCF, which partitions models by forecast horizon, the proposed hybrid model simultaneously captures nonlinear relationships and temporal patterns, achieving superior accuracy for short-term daily predictions. This distinction highlights the versatility of hybrid ML architectures for real-time applications, where both accuracy and accessibility—via chatbot implementation—are essential for sustainable energy management.

While linear regression provided efficiency and interpretability, its limitations in capturing nonlinear relationships restricted its accuracy. Similarly, although Random Forest and CatBoost achieved strong performance, signs of overfitting during testing highlighted the advantages of the hybrid approach. The inclusion of a 1D convolutional layer further improved the model’s ability to capture temporal patterns, reinforcing its suitability for time-series data such as solar radiation.

In contrast to ensemble methods, the Fully Connected Neural Network (FCN) and the Long Short-Term Memory network (LSTM) achieved lower performance. The FCN, although capable of modeling nonlinear relationships, requires large datasets to optimize its parameters and avoid overfitting. In this study, the limited dataset size restricted its ability to generalize, resulting in higher MAE and MSE values compared to boosting algorithms. The LSTM exhibited the weakest performance overall, with an R² of 0.69. This outcome can be attributed to two main factors: (i) the dataset covered only a six-month period, which limited the availability of long-term temporal patterns that recurrent networks typically exploit, and (ii) the temporal resolution of the dataset (144 measurements per day) introduced noise and short-term variability, which LSTMs tend to overfit when not balanced with longer sequences. As a result, LSTM could not capture stable temporal dependencies as effectively as ensemble methods.

Nevertheless, occasional misalignments between predicted and actual values remain inevitable due to the inherent complexity of solar radiation forecasting. Sudden weather changes, sensor inaccuracies, and unpredictable environmental variability introduce uncertainty that no model can fully capture. The 15% error margin observed illustrates these challenges, particularly in the categorization of radiation levels near threshold boundaries. This margin can lead to misclassification between categories (e.g., high vs. very high), reducing reliability for decision-making in borderline cases.

Similar challenges have been reported in tropical regions, where high variability in cloud cover complicates solar irradiance forecasting. A study conducted in Thailand achieved correlations above 0.8 for intra-day and short-term forecasts but still showed phase shifts in irradiance fluctuations during cloudy conditions [54]. These findings reinforce the notion that prediction errors are to some extent unavoidable in tropical climates, where convective processes introduce short-term variability that models struggle to capture. In line with these observations, the 15% error margin identified in the present study reflects not only model limitations but also the intrinsic unpredictability of tropical weather systems. This highlights the importance of integrating robust forecasting with user-oriented communication tools, ensuring that even imperfect predictions provide actionable value for decision-making in renewable energy management and public health protection.

Finally, the implementation of the chatbot on Telegram illustrates how predictive models can be effectively translated into practical tools. By offering real-time accessibility through a widely used communication platform, the system bridges advanced machine learning with user-centered sustainability applications. This combination of predictive accuracy and accessible delivery demonstrates significant potential for supporting renewable energy management and informed decision-making in everyday contexts.

The chatbot trained using Random Forest, Cat-Boost, and Gradient Boosting demonstrates, the feasibility of integrating artificial intelligence into energy management. This predictive capability not only optimizes the use of solar radiation but also has a tangible impact on the transition to clean energy by facilitating sustainable energy planning and reducing dependence on fossil fuels.

The results obtained are based on a nine-month dataset recorded at a meteorological station in Ecuador, which implies a limitation in the generalization of the model. Although the performance of the hybrid approach was superior to that of individual models in this context, it is reasonable to expect that its applicability will improve with larger datasets, including several years and stations from different geographic regions. In particular, the robustness of the ensemble algorithms employed (RF, GB, CB) suggests that the hybrid architecture can adapt to diverse climatic and geographic conditions, although its multi-center validation constitutes a line of future research.

5. Conclusions

In conclusion, this study demonstrates the effectiveness of a novel hybrid machine learning system for solar radiation forecasting, which integrates ensemble models with a 1D convolutional layer to enhance feature extraction. The hybrid model consistently outperformed individual approaches in terms of MAE, MSE, and R², highlighting its robustness and reliability. By selecting highly relevant meteorological variables and applying hyperparameter fine-tuning, the system achieved improved generalization while minimizing overfitting. Beyond its technical contributions, the implementation of the SolarPredictionBot on Telegram illustrates how advanced predictive models can be translated into practical, user-centered tools. By providing real-time alerts on radiation levels, the system supports informed decision-making, promotes public health protection, and demonstrates strong potential for real-world applications in renewable energy and environmental monitoring.

This research further contributes to sustainability by bridging advanced machine learning techniques with practical applications in environmental monitoring and public health. Accurate solar radiation forecasting not only enhances the efficient use of renewable energy resources but also mitigates health risks associated with excessive UV exposure. By integrating predictive modeling into an accessible chatbot interface, the system democratizes environmental information, empowering communities to make informed decisions aligned with sustainable energy practices, climate adaptation, and long-term environmental resilience.

Author Contributions

G.V., conceptualization; N.Z., J.N. and N.V., investigation; T.G., methodology; N.V. and J.N., formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Available data. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Häder, D.; Kumar, H.; Smith, R.; Worrest, R. Efectos de la radiación UV solar en los ecosistemas acuáticos e interacciones con el cambio climático. Photochem. Photobiol. Sci. 2007, 6, 267–285. [Google Scholar] [CrossRef] [PubMed]
Wright, C.; Norval, M. Riesgos para la salud asociados con la exposición excesiva a la radiación ultravioleta solar entre los trabajadores al aire libre en Sudáfrica: Una descripción general. Frente Salud Pública. 2021, 9, 678680. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Weber, B.; Kohn, A.; Gleadow, A.; Nelson, D. Historia fanerozoica de baja temperatura del cratón Vilgarn del norte, Australia Occidental. Tectonophysics 2005, 400, 127–151. [Google Scholar] [CrossRef]
Roller, S.; Dinan, E.; Goyal, N.; Ju, D.; Williamson, M.; Liu, Y.; Xu, J.; Smith, E.M.; Boureau, Y.-L.; Weston, J. Recetas para construir un chatbot de dominio abierto. arXiv 2020, arXiv:2004.13637. [Google Scholar]
Borda, B.; Lahura, N. Riesgos de exposición a la radiación solar para los trabajadores de la limpieza pública, Lima (Perú). Yotantsipanko 2022, 2, 67–73. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, Y.; Wang, D.; Liu, X.; Wang, Y. Una revisión sobre la predicción de la radiación solar global con modelos de aprendizaje automático en una perspectiva integral. Conversión Gestión Energía 2021, 235, 113960. [Google Scholar] [CrossRef]
Cornejo-Bueno, C.; Casanova-Mateo, J.; Justo, S.; Salcedo-Sanz, S. Regresores de aprendizaje automático para la estimación de la radiación solar a partir de datos satelitales. Energía Sol. 2019, 183, 768–775. [Google Scholar] [CrossRef]
Rahim, A.; Rifai, D.; Ali, K.; Abdalla, A.; Faraj, M. Instrumentación de medición de irradiancia solar y pronóstico de generación solar de energía basado en redes neuronales artificiales (ann): Una revisión de la tendencia de investigación de cinco años. Cienc. Medio Ambiente Total 2020, 715, 136848. [Google Scholar] [CrossRef]
Mohamed, Z.E.; Saleh, H.H. Potential of Machine Learning Based Support Vector Regression for Solar Radiation Prediction. Comput. J. 2023, 66, 399–415. [Google Scholar] [CrossRef]
Cumbicus, O.; Ludueñas, P.; Neira, L. Técnicas de Machine Learning para la detección de Ransomware: Revisión sistemática de la literatura. J. Sci. Res. 2022, 7, 32–60. [Google Scholar] [CrossRef]
Madhavan, B.; Ratnam, M. Impacto de un eclipse solar en la radiación superficial y fotovoltaica. Energía Sol. 2021, 223, 351–366. [Google Scholar] [CrossRef]
Aybar, A.; Jiménez, S.; Cornejo, L.; Casanova, C.; Sanz, J.; Salvador, P.; Salcedo, S. Un nuevo enfoque de algoritmo genético de agrupación: Máquina de aprendizaje extremo para la predicción de la radiación solar global a partir de entradas de modelos meteorológicos numéricos. Sol. Energy 2016, 132, 129–142. [Google Scholar] [CrossRef]
Şenkal, O.; Kuleli, T. Estimación de la radiación solar sobre Turquía utilizando redes neuronales artificiales y datos satelitales. Appl. Energy 2009, 86, 1222–1228. [Google Scholar] [CrossRef]
Chen, Y.; Wang, J.; Li, X.; Zhao, Q. A Hybrid Deep Learning Framework for Solar Irradiation Prediction Based on Regional Satellite Images and Data. Neural Comput. Appl. 2025, 37, 14327–14363. [Google Scholar] [CrossRef]
Tercha, W.; Tadjer, S.A.; Chekired, F.; Canale, L. Machine Learning-Based Forecasting of Temperature and Solar Irradiance for Photovoltaic Systems. Energies 2024, 17, 1124. [Google Scholar] [CrossRef]
Falope, T.O.; Lao, L.; Hanak, D. A Three-Step Weather Data Approach in Solar Energy Prediction Using Machine Learning. Renew. Energy Focus 2024, 50, 100615. [Google Scholar] [CrossRef]
Demir, V.; Bayrak, H.; Taspinar, S.; Korkmaz, O.; Nuhoglu, S. Evaluation of Solar Radiation Prediction Models Using AI: A Performance Comparison in the High-Potential Region of Konya, Türkiye. Atmosphere 2025, 16, 398. [Google Scholar] [CrossRef]
El-Shahat, D.; Tolba, A.; Abouhawwash, M.; Abdel-Basset, M. Machine Learning and Deep Learning Models Based Grid Search Cross Validation for Short-Term Solar Irradiance Forecasting. J. Big Data 2024, 11, 134. [Google Scholar] [CrossRef]
Nadeem, A.; Hanif, M.F.; Naveed, M.S.; Hassan, M.T.; Gul, M.; Husnain, N.; Mi, J. AI-Driven precision in solar forecasting: Breakthroughs in machine learning and deep learning. AIMS Geosci. 2024, 10, 684–734. [Google Scholar] [CrossRef]
Chodakowska, E.; Nazarko, J.; Nazarko, Ł.; Rabayah, H.S. Solar Radiation Forecasting: A Systematic Meta-Review of Current Methods and Emerging Trends. Energies 2024, 17, 3156. [Google Scholar] [CrossRef]
Tandon, A.; Awasthi, A.; Pattnayak, K.C.; Tandon, A.; Choudhury, T.; Kotecha, K. Machine learning-driven solar irradiance prediction: A comparative analysis in Rajasthan. Discov. Appl. Sci. 2025, 7, 107. [Google Scholar] [CrossRef]
Parmezan, G.; Souza, A.; Batista, P. Evaluación de modelos estadísticos y de aprendizaje automático 16 para la predicción de series temporales: Identificación del estado del arte y las mejores condiciones para el uso de cada modelo. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
Cascaes, B.; Freitas, L.; Aguiar, M. Chatbot como apoyo a la toma de decisiones en el contexto de la gestión de los recursos naturales. In Proceedings of the Anales del XII Taller de Computación Aplicada a la Gestión del Medio Ambiente y los Recursos Naturales (WCAMA 2021), Belgrade, Serbia, 21–23 October 2025. [Google Scholar]
Phaokla, N.; Netinant, P. Diseñar un sistema de chatbots de información ambiental para un marco escolar inteligente. In Proceedings of the Actas de la 4ª Conferencia Internacional de Ingeniería de Software y Gestión de la Información 2021, Yokohama, Japan, 16–18 January 2021. [Google Scholar]
Chi, N.T.K. El efecto de los chatbots de IA en la actitud a favor del medio ambiente y la disposición a pagar por la protección del medio ambiente. SAGE Open 2024, 14, 21582440231226001. [Google Scholar] [CrossRef]
Islam, M.; Rashel, M.R.; Ahmed, M.T.; Islam, A.K.M.K.; Tlemçani, M. Artificial Intelligence in Photovoltaic Fault Identification and Diagnosis: A Systematic Review. Energies 2023, 16, 7417. [Google Scholar] [CrossRef]
Belaid, S.; Mellit, A. Predicción de la radiación solar global diaria y mensual utilizando una máquina de vectores de soporte en un clima árido. Energy Convers. Manag. 2016, 118, 105–118. [Google Scholar] [CrossRef]
Guanoluisa, R.; Arcos-Avilés, D.; Flores-Calero, M.; Martínez, W.; Guinjoan, F. Photovoltaic Power Forecast Using Deep Learning Techniques with Hyperparameters Based on Bayesian Optimization: A Case Study in the Galapagos Islands. Sustainability 2023, 15, 12151. [Google Scholar] [CrossRef]
Parra, R.; Cadena, E.; Flores, R. Maximum UV Index Records (2010–2014) in Quito (Ecuador) and Its Trend Inferred from Remote Sensing Data (1979–2018). Atmosphere 2019, 10, 787. [Google Scholar] [CrossRef]
Anchundia Troncoso, J.; Torres Quijije, Á.; Oviedo, B.; Zambrano-Vega, C. Solar Radiation Prediction in the UTEQ Based on Machine Learning Models. arXiv 2023, arXiv:2312.17659. [Google Scholar] [CrossRef]
Ordóñez, F.; Vaca-Revelo, D.; López-Villada, J. Assessment of the Solar Resource in Andean Regions by Comparison between Satellite Estimation and Ground Measurements: Study Case of Ecuador. J. Sustain. Dev. 2019, 12, 62–70. [Google Scholar] [CrossRef]
Ponce-Jara, M.; Cano Gordillo, C.A.; Velásquez, C.; Lazo Lazo, J.G.; Talavera, Á. Redes neurales para la predicción de la radiación solar en Manta, Ecuador. Rev. Campo História 2023, 8, 183–194. [Google Scholar] [CrossRef]
Provinciali, L.; Calcagno, D.; Amabili, P.; Saita, G.; Riccobono, D.; Cicalo, S.; Marcucci, M.; Laurenza, M.; Zimbardo, G.; Landi, S.; et al. Principales retos de una misión CUBESAT de Alertas Meteorológicas Espaciales. In Proceedings of the Conferencia Aeroespacial IEEE, Big Sky, MT, USA, 2–9 March 2024; pp. 1–12. [Google Scholar]
Mavromichalaki, H.; Paschalis, P.; Gerontidou, M.; Tezari, A.; Papailiou, M.-C.; Lingri, D.; Livada, M.; Stassinakis, A.; Crosby, N.; Dierckxsens, M. An Assessment of the GLE Alert++ Warning System. Atmosphere 2024, 15, 345. [Google Scholar] [CrossRef]
Mavromichalaki, H.; Paschalis, P.; Gerontidou, M.; Papailiou, M.-C.; Paouris, E.; Tezari, A.; Lingri, D.; Livada, M.; Stassinakis, A.N.; Crosby, N.; et al. The Updated Version of the A.Ne.Mo.S. GLE Alert System: The Case of the Ground-Level Enhancement GLE73 on 28 October 2021. Universe 2022, 8, 378. [Google Scholar] [CrossRef]
Crosby, N.; Mavromichalaki, H.; Malandraki, O.; Gerontidou, M.; Karavolos, M.; Lingri, D.; Makrantoni, P.; Papailiou, M.; Paschalis, P.; Tezari, A. Very High Energy Solar Energetic Particle Events and Ground Level Enhancement Events: Forecasting and Alerts. Space Weather 2024, 22, e2023SW003839. [Google Scholar] [CrossRef]
Kang, N.; Zhang, L.; Zong, W.; Huang, P.; Zhang, Y.; Zhou, C.; Qiao, J.; Xue, B. A Multi-Satellite Space Environment Risk Prediction and Real-Time Warning System for Satellite Safety Management. Remote Sens. 2024, 16, 1814. [Google Scholar] [CrossRef]
Miroshnichenko, L.I. Retrospective Analysis of GLEs and Estimates of Radiation Risks. J. Space Weather Space Clim. 2018, 8, A52. [Google Scholar] [CrossRef]
Caballero-Lopez, R.A.; Manzano, R. Analysis of the Solar Cosmic-Ray Spectrum During Ground-Level Enhancements. Adv. Space Res. 2022, 70, 2602–2609. [Google Scholar] [CrossRef]
da Rocha, Á.B.; Fernandes, E.D.M.; dos Santos, C.A.C.; Diniz, J.M.T.; Wanderley, F.A., Jr. Development of a Real-Time Surface Solar Radiation Measurement System Based on the Internet of Things (IoT). Sensors 2021, 21, 3836. [Google Scholar] [CrossRef]
Laurenza, M.; Stumpo, M.; Zucca, P.; Mancini, M.; Benella, S.; Clark, L.; Alberti, T.; Marcucci, M.F. Actualizaciones de la herramienta de pronóstico ESPERTA para eventos de protones solares. J. Space Weather. Space Clim. 2024, 14, 8. [Google Scholar] [CrossRef]
Ma, Z.; Dou, Z.; Zhu, Y.; Zhong, H.; Wen, J.R. One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles. arXiv 2021, arXiv:2108.09355. [Google Scholar] [CrossRef]
Aggarwal, A.; Tam, C.C.; Wu, D.; Li, X.; Qiao, S. Artificial Intelligence–Based Chatbots for Promoting Health Behavioral Changes: Systematic Review. J. Med. Internet Res. 2023, 25, e40789. [Google Scholar] [CrossRef]
Sufi, F.; Alsulami, M. AI-Driven Chatbot for Real-Time News Automation. Mathematics 2025, 13, 850. [Google Scholar] [CrossRef]
Peña-Cáceres, O.; Tavara-Ramos, A.; Correa-Calle, T.; More-More, M. Aplicación de chatbot basada en servicio en la nube de WhatsApp para emergencias o desastres. J. Adv. Inf. Technol. 2024, 15, 435–445. [Google Scholar] [CrossRef]
Peña-Cáceres, O.; Tavara-Ramos, A.; Correa-Calle, T.; More-More, M. Integral Chatbot Solution for Efficient Incident Management and Emergency or Disaster Response: Optimizing Communication and Coordination. TEM J. 2024, 13, 50–61. [Google Scholar] [CrossRef]
Alzahrani, A. Short-Term Solar Irradiance Prediction Based on Adaptive Extreme Learning Machine and Weather Data. Sensors 2022, 22, 8218. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Li, X.; Hao, Y.; Zhang, Q. Ultra-Short-Term Solar Irradiance Prediction Using an Integrated Framework with Novel Textural Convolution Kernel for Feature Extraction of Clouds. Sustainability 2025, 17, 2606. [Google Scholar] [CrossRef]
Ispir, M.; Aksoy, M.H.; Kalyoncu, M. Estimation of Solar Radiation and Photovoltaic Power Potential of Türkiye Using ANFIS. J. King Saud Univ. Eng. Sci. 2025, 37, 2. [Google Scholar] [CrossRef]
Güzel, B.I.; Sevli, O.; Okatan, E. Forecasting Solar Radiation Based on Meteorological Data Using Machine Learning Techniques: A Case Study of Isparta. Int. J. Eng. Res. Dev. 2023, 15, 704–713. [Google Scholar] [CrossRef]
Konduru, S.; Naveen, C.; Sathik, M.J. Advanced Solar Irradiance Forecasting Using Hybrid Ensemble Deep Learning and Multisite Data Analytics for Optimal Solar-Hydro Hybrid Power Plants. Int. Trans. Electr. Energy Syst. 2025, 2025, 6694504. [Google Scholar] [CrossRef]
Chinnavornrungsee, P.; Kittisontirak, S.; Chollacoop, N.; Songtrai, S.; Sriprapha, K.; Uthong, P.; Yoshino, J.; Kobayashi, T. Solar irradiance prediction in the tropics using a weather forecasting model. Jpn. J. Appl. Phys. 2023, 62, SK1050. [Google Scholar] [CrossRef]
Bendiek, P.; Taha, A.; Abbasi, Q.H.; Barakat, B. Solar Irradiance Forecasting Using a Data-Driven Algorithm and Contextual Optimisation. Appl. Sci. 2022, 12, 134. [Google Scholar] [CrossRef]
Li, Q.; Bessafi, M.; Li, P. Mapping prediction of surface solar radiation with linear regression models: Case study over reunion island. Atmosphere 2023, 14, 1331. [Google Scholar] [CrossRef]
Kim, H.; Park, S.; Park, H.; Son, H.; Kim, S. Solar radiation forecasting based on the hybrid cnn-catboost model. IEEE Access 2023, 11, 13492–13500. [Google Scholar] [CrossRef]
Kuhn, K.; Johnson, K. Applied Predictive Modeling; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
Solano, E.S.; Affonso, C.M. Solar Irradiation Forecasting Using Ensemble Voting Based on Machine Learning Algorithms. Sustainability 2023, 15, 7943. [Google Scholar] [CrossRef]
Ramírez-Rivera, F.A.; Guerrero-Rodríguez, N.F. Ensemble Learning Algorithms for Solar Radiation Prediction in Santo Domingo: Measurements and Evaluation. Sustainability 2024, 16, 8015. [Google Scholar] [CrossRef]
So, D.; Oh, J.; Leem, S.; Ha, H.; Moon, J. A Hybrid Ensemble Model for Solar Irradiance Forecasting: Advancing Digital Models for Smart Island Realization. Electronics 2023, 12, 2607. [Google Scholar] [CrossRef]
Banik, R.; Biswas, A. Improving Solar PV Prediction Performance with RF-CatBoost Ensemble: A Robust and Complementary Approach. Renew. Energy Focus 2023, 46, 207–221. [Google Scholar] [CrossRef]
Alam, M.S.; Al-Ismail, F.S.; Hossain, M.S.; Rahman, S.M. Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh. Processes 2023, 11, 908. [Google Scholar] [CrossRef]
Guermoui, M.; Melgani, F.; Gairaa, K.; Mekhalfi, M.L. A Comprehensive Review of Hybrid Models for Solar Radiation Forecasting. J. Clean. Prod. 2020, 258, 120357. [Google Scholar] [CrossRef]

Figure 1. Stages of solar radiation prediction (1) data capture, (2) data preprocessing, (3) implementation of prediction algorithms and (4) integration into a chatbot.

Figure 2. Spearman correlation matrix showing the relationships between meteorological variables. The matrix uses a blue gradient for negative correlations and a red gradient for positive correlations, with correlation values displayed for each pair of variables.

Figure 3. Average solar radiation per hour and month, ranging from 0 to 800 W/m², highlighting the daily and monthly variations in solar radiation.

Figure 4. Architecture of the proposed hybrid model.

Figure 5. Flow diagram of the solar radiation prediction process with the hybrid model and chatbot. Weather data inputs are used for training and validation of the hybrid model. The web server, hosting the hybrid algorithm, utilizes Weather API data to ensure accurate and real-time predictions, which are then communicated to the user through the chatbot interface.

Figure 6. Learning curve of the Multiple Linear Regression (MLR) model showing the Mean Absolute Error (MAE) as a function of training size. The blue line represents training error, and the red line represents validation error. The persistent gap between curves indicates limited generalization capacity and potential overfitting.

Figure 7. Learning curve of the Random Forest model showing the Mean Absolute Error (MAE) as a function of training size. The blue line represents training error, and the red line represents validation error. The persistent gap between curves indicates moderate overfitting and limited generalization capacity.

Figure 8. Learning curve of the CatBoost model showing the Mean Absolute Error (MAE) as a function of training size. The blue line represents training error, and the red line represents validation error. The reduction in the validation error indicates improved learning stability, although the persistent gap suggests limited generalization capacity.

Figure 9. Comparison of multivariable predictions versus real samples for Linear Regression (RL), Random Forest (RF), CatBoost (CB), and the Hybrid model (Assembler2 RF + GB + CB). Each graph shows the predicted values plotted against the actual values, with data points aligned along the diagonal.

Figure 10. Learning performance for solar radiation prediction from September 2023 to April 2024 using different models.

Figure 11. Comparative performance of Random Forest, CatBoost, Gradient Boosting, and Hybrid models (RF + GB + CB) in terms of precision, F1-Score, accuracy, and recall for solar radiation categorization into four levels (low, medium, high, very high).

Figure 12. Operational interface of the solar radiation prediction chatbot in Telegram, showing category-based alerts (low, medium, high) and an example of hourly forecast for user-selected days.

Figure 13. Architecture and flow diagram of the proposed chatbot.

Table 1. Prediction accuracy of solar radiation models.

Model	MAE	MSE	R²	Predictions Correct	Predictions Total
Regression Linear	21.19	1466.61	0.95	9592	9912
Forest Random	14.41	934.79	0.97	9721	9912
Catboot	14.21	897.88	0.97	9747	9912
FCN	34.39	288.76	0.91	9652	9912
LSTM	73.80	993.41	0.69	9552	9912
Hybrid (RF, GB, CB)	13.93	871.70	0.98	9759	9912

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gavilánez, T.; Zamora, N.; Navarrete, J.; Vega, N.; Vergara, G. AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems. Sustainability 2025, 17, 8909. https://doi.org/10.3390/su17198909

AMA Style

Gavilánez T, Zamora N, Navarrete J, Vega N, Vergara G. AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems. Sustainability. 2025; 17(19):8909. https://doi.org/10.3390/su17198909

Chicago/Turabian Style

Gavilánez, Tomás, Néstor Zamora, Josué Navarrete, Nino Vega, and Gabriela Vergara. 2025. "AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems" Sustainability 17, no. 19: 8909. https://doi.org/10.3390/su17198909

APA Style

Gavilánez, T., Zamora, N., Navarrete, J., Vega, N., & Vergara, G. (2025). AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems. Sustainability, 17(19), 8909. https://doi.org/10.3390/su17198909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Based Virtual Assistant for Solar Radiation Prediction and Improvement of Sustainable Energy Systems

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

2.2. Prediction Algorithms

2.2.1. Linear Regression

2.2.2. Random Forest

2.2.3. CatBoost

2.2.4. Fully Connected Neural Network (FCN)

2.2.5. Long Short-Term Memory (LSTM)

2.2.6. Hybrid Model

2.3. Implementation of the Chatbot

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI