AI-Driven Digital Twin for Optimizing Solar Submersible Pumping Systems
Abstract
1. Introduction
- We design an AI-driven architecture capable of predicting inverter motor frequency and pump output power under fluctuating environmental conditions, ensuring reliable water delivery in remote areas.
- We integrate predictive intelligence with physical hydraulic modeling, enabling accurate estimation of water volume without additional sensors, reducing cost and complexity.
- Through extensive experimentation on real operational data, we demonstrate that the Random Forest model consistently outperforms other techniques, achieving high accuracy with minimal computational overhead, making the solution suitable for real-time and edge deployments.
2. Materials and Methods
2.1. System Design
- Solar panel: Suntech STP340-A72/Vfh, model specifications are shown in Table 1.
- Inverter motor: YASKAWA AC Drive GA700, model specifications are shown in Table 2.
- Pump: Vansan VSPss04090/6.B2, model specifications are shown in Table 3.
- Motor (powering the pump): Vansan VSMOF4/30T, model specifications are shown in Table 4.
2.2. Dataset
- UV index measured by (W/m2)
- Air temperature measured by degree Celsius (°C)
- Wind speed measured by (m/s)
- Wind direction measured by degrees from North
- Humidity measured by percentage
- Gust measured by (m/s)
- Cloud cover measured by percentage.
- Frequency reference
- Output frequency
- Output current
- DC bus voltage
- Output power
- Output frequency fault
- Heatsink temperature
- Proportional-Integral-Derivative (PID) controller output
- PID input
2.3. Proposed Methodology
2.3.1. Phase 1: Output Frequency Prediction
Data Pre-Processing (Phase 1)
Machine Learning Models
- Random Forest: A Random Forest Regressor was implemented, an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction of the individual trees. This model was chosen for its ability to model complex relationships between the environmental features and the output frequency without assuming linearity [47].
- Linear Regression: A Linear Regression model was implemented to find the linear relationship between the environmental input features and the output frequency by minimizing the sum of squared errors [43,50]. This model was included as a baseline to assess the linearity of the relationship between the environmental factors and the frequency output [46].
- Neural Network (NN1): A sequential Neural Network 1 (NN1) was constructed. The network consists of an input layer accepting 5 features (UV Index, Air Temperature, Wind Speed, Humidity, Frequency Reference), connected to hidden layers that use the ReLU activation function () [51]. The output layer has a single neuron with no activation, predicting the continuous output frequency. The model was trained using the Adam optimizer, an algorithm that adapts the learning rate during training [52], with Mean Squared Error (MSE) as the loss function. Mean Absolute Error (MAE) was tracked as an additional metric. This architecture was chosen to capture potential complex, non-linear interactions between the environmental inputs and the resulting frequency.
- XGBoost: Additionally, a XGBoost Regressor, a very effective and optimized gradient boosting solution. Because of its exceptional performance and regularization capabilities, which make it resistant to overfitting while retaining a high predictive power, XGBoost was selected [48].
Hyperparameter Tuning
2.3.2. Phase 2: Output Power Prediction
Data Pre-Processing (Phase 2)
Machine Learning Models
- Random Forest: The output of the Phase 1 Random Forest will be used as the single input feature for training a new Random Forest Regressor. More intricate patterns within the original predictions themselves [47] may be captured in this way.
- Linear Regression: When predicting the final target variable, a Linear Regression model will be used to evaluate any linear trends seen in the output of the Phase 1 Random Forest.
- Neural Network (NN2): In order to learn non-linear modifications of these predictions, our sequential Neural Network (NN2) will be set up to accept a single input neuron (which corresponds to the output of the Phase 1 Random Forest) into its input layer. Activation functions like ReLU () would probably be used by this network. and be optimized using methods such as Adam [52].
- Gradient Boosting: The goal of a Gradient Boosting Regressor is to increase the predictive accuracy of this refined signal [53] by building decision trees in a sequential manner using only the one input feature obtained from the Phase 1 Random Forest.
- XGBoost: Additionally, only the output from the Phase 1 Random Forest will be used as input for the XGBoost Regressor, an efficient gradient boosting solution. To get the most predictive power out of this pre-processed signal [48], its strong capabilities will be used.
Hyperparameter Tuning
2.3.3. Phase 3: Water Volume Prediction
2.4. Experiments
2.4.1. Experimental Setup
2.4.2. Performance Metrics
3. Results
3.1. Phase 1: Output Frequency Prediction
3.1.1. Comparative Model Metrics Across Datasets
3.1.2. Model Performance Summary
3.1.3. Statistical Significance Testing
3.1.4. Prediction Accuracy Visualization
3.1.5. Error Distribution Analysis
3.1.6. Computational Performance
3.1.7. Feature Importance Analysis
Feature Correlation Analysis
Multicollinearity Assessment
Feature Redundancy Assessment
3.2. Phase 2: Output Power Prediction
3.2.1. Comparative Model Metrics Across Datasets
3.2.2. Model Performance Summary
3.2.3. Statistical Significance Testing
3.2.4. Prediction Accuracy Visualization
3.2.5. Error Distribution Analysis
3.2.6. Computational Performance
3.3. Annual Water Volume Distribution
4. Discussion
4.1. Output Frequency Prediction Analysis
4.1.1. Model Performance and Generalization
4.1.2. Practical Significance of Prediction Errors
4.1.3. Feature Importance Discrepancies and Model Interpretability
Nonlinear Correlation and Feature Redundancy
4.1.4. Computational Efficiency Considerations
4.2. Output Power Prediction Analysis
4.2.1. Performance Differences Between Models
4.2.2. Practical Significance for Industrial Applications
4.2.3. Why Other Models Failed in Phase 2
- Increased Complexity: The relationship between environmental factors and output power may involve more complex interactions than the frequency prediction task. Power depends not only on frequency but also on load conditions, efficiency curves, and other non-linear dependencies.
- Error Propagation: Phase 2 models receive predicted frequency from Phase 1 as input rather than true frequency values. Any prediction errors from Phase 1 cascade into Phase 2, potentially amplifying errors for models less robust to input noise.
- Model Architecture Limitations: Linear models inherently cannot capture complex non-linear relationships. The Neural Network’s failure despite its theoretical capacity for non-linear modeling suggests inadequate architecture or poor hyperparameter tuning for this specific problem.
- Random Forest’s Advantage: Random Forest’s ensemble approach with bootstrap aggregating may provide superior robustness to noisy inputs and enhanced ability to model complex interaction effects compared to single-tree or linear approaches.
4.2.4. Computational Trade-Offs in Phase 2
4.3. Deployment Feasibility in Resource-Constrained Desert Environments
4.4. Water Volume Analysis and Seasonal Patterns
4.5. Summary of Discussion
5. Conclusions
Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, Y.; Wang, H.; Wang, C.; Zhou, L.; Ji, L.; Yang, Y.; Shi, W.; Agarwal, R.K. An entropy efficiency model and its application to energy performance analysis of a multi-stage electric submersible pump. Energy 2024, 288, 129741. [Google Scholar] [CrossRef]
- Zhou, Q.; Li, H.; Zeng, X.; Li, L.; Cui, S.; Du, Z. A quantitative safety assessment for offshore equipment evaluation using fuzzy FMECA: A case study of the hydraulic submersible pump system. Ocean. Eng. 2024, 293, 116611. [Google Scholar] [CrossRef]
- Wei, A.; Wang, W.; Hu, Y.; Feng, S.; Qiu, L.; Zhang, X. Numerical and experimental analysis of the cavitation and flow characteristics in liquid nitrogen submersible pump. Phys. Fluids 2024, 36, 042109. [Google Scholar] [CrossRef]
- Salah, Y.; Shalash, O.; Khatab, E. A lightweight speaker verification approach for autonomous vehicles. Robot. Integr. Manuf. Control 2024, 1, 15–30. [Google Scholar] [CrossRef]
- García, J.A.; Asuaje, M.; Pereyra, E.; Ratkovich, N. Analysis of two-phase gas-liquid flow in an Electric Submersible Pump using A CFD approach. Geoenergy Sci. Eng. 2024, 233, 212510. [Google Scholar] [CrossRef]
- Yang, C.; Xu, Q.; Chang, L.; Dai, X.; Wang, H.; Su, X.; Guo, L. Interstage performance and power consumption of a multistage mixed-flow electrical submersible pump in gas–liquid conditions: An experimental study. J. Fluids Eng. 2024, 146, 051203. [Google Scholar] [CrossRef]
- Elkholy, M.; Shalash, O.; Hamad, M.S.; Saraya, M. Harnessing Machine Learning for Effective Energy Theft Detection Based on Egyptian Data. In Proceedings of the International Conference on Energy Systems, Istanbul, Turkey, 12–14 May 2025. [Google Scholar]
- Cui, B.; Chen, H.; Zhu, Z.; Sun, L.; Sun, L. Optimization of low-temperature multi-stage submersible pump based on blade load. Phys. Fluids 2024, 36, 035157. [Google Scholar] [CrossRef]
- Ahmadizadeh, M.; Heidari, M.; Thangavel, S.; Al Naamani, E.; Khashehchi, M.; Verma, V.; Kumar, A. Technological advancements in sustainable and renewable solar energy systems. In Highly Efficient Thermal Renewable Energy Systems; CRC Press: Boca Raton, FL, USA, 2024; pp. 23–39. [Google Scholar]
- Elkholy, M.; Shalash, O.; Hamad, M.S.; Saraya, M.S. Empowering the grid: A comprehensive review of artificial intelligence techniques in smart grids. In Proceedings of the 2024 International Telecommunications Conference (ITC-Egypt), Cairo, Egypt, 22–25 July 2024; pp. 513–518. [Google Scholar]
- Tripathi, A.K.; Aruna, M.; Elumalai, P.; Karthik, K.; Khan, S.A.; Asif, M.; Rao, K.S. Advancing solar PV panel power prediction: A comparative machine learning approach in fluctuating environmental conditions. Case Stud. Therm. Eng. 2024, 59, 104459. [Google Scholar] [CrossRef]
- Sun, Y.; Usman, M.; Radulescu, M.; Pata, U.K.; Balsalobre-Lorente, D. New insights from the STIPART model on how environmental-related technologies, natural resources and the use of the renewable energy influence load capacity factor. Gondwana Res. 2024, 129, 398–411. [Google Scholar] [CrossRef]
- Xu, G.; Yang, M.; Li, S.; Jiang, M.; Rehman, H. Evaluating the effect of renewable energy investment on renewable energy development in China with panel threshold model. Energy Policy 2024, 187, 114029. [Google Scholar] [CrossRef]
- Fawzy, H.; Elbrawy, A.; Amr, M.; Eltanekhy, O.; Khatab, E.; Shalash, O. A systematic review: Computer vision algorithms in drone surveillance. J. Robot. Integr. 2025. [Google Scholar] [CrossRef]
- Li, B.; Amin, A.; Nureen, N.; Saqib, N.; Wang, L.; Rehman, M.A. Assessing factors influencing renewable energy deployment and the role of natural resources in MENA countries. Resour. Policy 2024, 88, 104417. [Google Scholar] [CrossRef]
- Shahzad, U.; Tiwari, S.; Mohammed, K.S.; Zenchenko, S. Asymmetric nexus between renewable energy, economic progress, and ecological issues: Testing the LCC hypothesis in the context of sustainability perspective. Gondwana Res. 2024, 129, 465–475. [Google Scholar] [CrossRef]
- Yasser, M.; Shalash, O.; Ismail, O. Optimized decentralized swarm communication algorithms for efficient task allocation and power consumption in swarm robotics. Robotics 2024, 13, 66. [Google Scholar] [CrossRef]
- Gaber, I.M.; Shalash, O.; Hamad, M.S. Optimized inter-turn short circuit fault diagnosis for induction motors using neural networks with leleru. In Proceedings of the 2023 IEEE Conference on Power Electronics and Renewable Energy (CPERE), Luxor, Egypt, 19–21 February 2023; pp. 1–5. [Google Scholar]
- Khatab, E.; Onsy, A.; Abouelfarag, A. Evaluation of 3D vulnerable objects’ detection using a multi-sensors system for autonomous vehicles. Sensors 2022, 22, 1663. [Google Scholar] [CrossRef]
- Shalash, O.; MÃl’twalli, A.; Sallam, M.H.; Khatab, E. High-Performance Polygraph-Based Truth Detection System: Leveraging Multi-Modal Data Fusion and Particle Swarm-Optimized Random Forest for Robust Deception Analysis. Authorea 2025. [Google Scholar] [CrossRef]
- Khatab, E.; Onsy, A.; Varley, M.; Abouelfarag, A. A lightweight network for real-time rain streaks and rain accumulation removal from single images captured by avs. Appl. Sci. 2022, 13, 219. [Google Scholar] [CrossRef]
- Said, H.; Mohamed, S.; Shalash, O.; Khatab, E.; Aman, O.; Shaaban, R.; Hesham, M. Forearm Intravenous Detection and Localization for Autonomous Vein Injection Using Contrast-Limited Adaptive Histogram Equalization Algorithm. Appl. Sci. 2024, 14, 7115. [Google Scholar] [CrossRef]
- Khaled, A.; Shalash, O.; Ismaeil, O. Multiple objects detection and localization using data fusion. In Proceedings of the 2023 2nd International Conference on Automation, Robotics and Computer Engineering (ICARCE), Wuhan, China, 14–16 December 2023; pp. 1–6. [Google Scholar]
- Elsayed, H.; Tawfik, N.S.; Shalash, O.; Ismail, O. Enhancing human emotion classification in human-robot interaction. In Proceedings of the 2024 International Conference on Machine Intelligence and Smart Innovation (ICMISI), Alexandria, Egypt, 12–14 May 2024; pp. 1–6. [Google Scholar]
- Métwalli, A.; Shalash, O.; Elhefny, A.; Rezk, N.; El Gohary, F.; El Hennawy, O.; Akrab, F.; Shawky, A.; Mohamed, Z.; Hassan, N.; et al. Enhancing hydroponic farming with Machine Learning: Growth prediction and anomaly detection. Eng. Appl. Artif. Intell. 2025, 157, 111214. [Google Scholar] [CrossRef]
- Issa, R.; Badr, M.M.; Shalash, O.; Othman, A.A.; Hamdan, E.; Hamad, M.S.; Abdel-Khalik, A.S.; Ahmed, S.; Imam, S.M. A data-driven digital twin of electric vehicle Li-ion battery state-of-charge estimation enabled by driving behavior application programming interfaces. Batteries 2023, 9, 521. [Google Scholar] [CrossRef]
- Alao, K.T.; Gilani, S.I.U.H.; Sopian, K.; Alao, T.O. A review on digital twin application in photovoltaic energy systems: Challenges and opportunities. JMST Adv. 2024, 6, 257–282. [Google Scholar] [CrossRef]
- Shalash, O.; Rowe, P. Computer-assisted robotic system for autonomous unicompartmental knee arthroplasty. Alex. Eng. J. 2023, 70, 441–451. [Google Scholar] [CrossRef]
- Olayiwola, O.; Cali, U.; Elsden, M.; Yadav, P. Enhanced Solar Photovoltaic System Management and Integration: The Digital Twin Concept. Solar 2025, 5, 7. [Google Scholar] [CrossRef]
- Alshireedah, A.; Yusupov, Z.; Rahebi, J. Optimizing Solar Water-Pumping Systems Using PID-Jellyfish Controller with ANN Integration. Electronics 2025, 14, 1172. [Google Scholar] [CrossRef]
- Sumathi, S.; Abitha, S. A novel method for solar water pumping system using machine learning techniques. Proc. Aip Conf. 2023, 2901, 070001. [Google Scholar]
- Don, M.G.; Liyanarachchi, S.; Wanasinghe, T.R. A digital twin development framework for an electrical submersible pump (ESP). Arch. Adv. Eng. Sci. 2025, 3, 35–43. [Google Scholar] [CrossRef]
- Cheacharoen, R.; Boyd, C.C.; Burkhard, G.F.; Leijtens, T.; Raiford, J.A.; Bush, K.A.; Bent, S.F.; McGehee, M.D. Encapsulating perovskite solar cells to withstand damp heat and thermal cycling. Sustain. Energy Fuels 2018, 2, 2398–2406. [Google Scholar] [CrossRef]
- Jordan, D.C.; Kurtz, S.R. Photovoltaic degradation rates—an analytical review. Prog. Photovolt. Res. Appl. 2013, 21, 12–29. [Google Scholar] [CrossRef]
- Hudișteanu, V.S.; Cherecheș, N.C.; Țurcanu, F.E.; Hudișteanu, I.; Romila, C. Impact of temperature on the efficiency of monocrystalline and polycrystalline photovoltaic panels: A comprehensive experimental analysis for sustainable energy solutions. Sustainability 2024, 16, 10566. [Google Scholar] [CrossRef]
- Hudișteanu, S.V.; Țurcanu, F.E.; Cherecheș, N.C.; Popovici, C.G.; Verdeș, M.; Huditeanu, I. Enhancement of PV panel power production by passive cooling using heat sinks with perforated fins. Appl. Sci. 2021, 11, 11323. [Google Scholar] [CrossRef]
- Rabaia, M.K.H.; Abdelkareem, M.A.; Sayed, E.T.; Elsaid, K.; Chae, K.J.; Wilberforce, T.; Olabi, A. Environmental impacts of solar energy systems: A review. Sci. Total Environ. 2021, 754, 141989. [Google Scholar] [CrossRef]
- Salah, Y.; Shalash, O.; Khatab, E.; Imam, S.; Hamad, M. Dataset for Solar-Powered Submersible Pump Systems. 2025. Available online: https://data.mendeley.com/datasets/wgfhmx37ng/1 (accessed on 9 September 2025).
- Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
- Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Seber, G.A.F.; Lee, A.J. Linear Regression Analysis; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Elrefai, M.; Hamdy, R.A.; ElZawawi, A.; Hamad, M.S. Design and performance evaluation of a solar water pumping system: A case study. In Proceedings of the 2016 Eighteenth International Middle East Power Systems Conference (MEPCON), Cairo, Egypt, 27–29 December 2016; pp. 914–920. [Google Scholar]
- Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Spearman, C. The proof and measurement of association between two things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
- Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Dormann, C.F.; Elith, J.; Bacher, S. Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013, 36, 27–46. [Google Scholar] [CrossRef]
- Dubey, S.; Sarvaiya, J.N.; Seshadri, B. Temperature dependent photovoltaic (PV) efficiency and its effect on PV production in the world—A review. Energy Procedia 2013, 33, 311–321. [Google Scholar] [CrossRef]
- Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models; McGraw-Hill Irwin: Columbus, OH, USA, 2005. [Google Scholar]
- Skoplaki, E.; Palyvos, J. Operating temperature of photovoltaic modules: A survey of pertinent correlations. Renew. Energy 2009, 34, 23–29. [Google Scholar] [CrossRef]












| PV Panel Model | Suntech STP340-A72/Vfh |
|---|---|
| Rated power | 340 W |
| Rated voltage | 37.9 V |
| Rated current | 8.98 A |
| Open circuit voltage | 46.0 V |
| Short circuit current | 9.5 A |
| Module efficiency | 17% |
| Total number of PV panels | 96 panels |
| Connection | 6 strings, each string consists of 16 series PV panels |
| Inverter Model | YASKAWA AC Drive GA700 |
|---|---|
| Rated power | (30 Hp/22 Kw) |
| Rated output current | 45 A |
| Rated voltage | 3 phases/380 V |
| Control method | V/F |
| Pump Model | Vansan VSPss04090/6.B2 |
|---|---|
| Rated power | 30 Hp |
| Stages | 6 |
| Material | Stainless steel |
| Outlet connection diameter | 4 inches |
| Motor Model | Vansan VSMOF4/30T |
|---|---|
| Rated power | 30 Hp |
| Rated voltage | 3 phases/380 V |
| Shaft | Stainless steel |
| Outlet connection diameter | 4 inches |
| Hyperparameter | Random Forest | Gradient Boosting | XGBoost |
|---|---|---|---|
| Number of Estimators | 300 | 500 | 500 |
| Maximum Depth | 3 | 4 | 4 |
| Minimum Samples Split | 5 | 4 | – |
| Minimum Samples Leaf | 2 | 2 | – |
| Learning Rate | – | 0.03 | 0.03 |
| Subsample Ratio | – | 0.9 | 0.9 |
| Maximum Features | – | 0.8 | – |
| Column Subsample Ratio by Tree | – | – | 0.8 |
| Gamma | – | – | 0.1 |
| Hyperparameter | Value |
|---|---|
| Learning Rate | 0.005 |
| Batch Size | 128 |
| Epochs (Max) | 150 |
| Number of Hidden Layers | 2 |
| Neurons per Layer | 64 |
| Dropout Rate | 0.2 |
| Activation Function | ReLU |
| Hyperparameter | Random Forest | Gradient Boosting | XGBoost |
|---|---|---|---|
| Number of Estimators | 200 | 300 | 300 |
| Learning Rate | 0.02 | 0.05 | 0.05 |
| Maximum Depth | 2 | 4 | 4 |
| Minimum Samples Split | 4 | 4 | – |
| Minimum Samples Leaf | 2 | 2 | – |
| Subsample Ratio | – | 0.8 | 0.8 |
| Maximum Features | – | sqrt | – |
| Column Subsample by Tree | – | – | 0.8 |
| Gamma | – | – | 0.1 |
| Hyperparameter | Value |
|---|---|
| Learning Rate | 0.001 |
| Batch Size | 32 |
| Epochs (Max) | 100 |
| Early Stopping Patience | 10 |
| Number of Hidden Layers | 2 |
| Neurons per Layer | [64, 32] |
| Dropout Rate | 0.2 |
| Activation Function | ReLU |
| Model | Train R2 | Train MSE | Train RMSE | Train MAE |
|---|---|---|---|---|
| Persistence Baseline | 0.0001 | 986.3 | 31.53 | 23.18 |
| Linear Regression | 0.9229 | 38.36 | 6.19 | 3.21 |
| Random Forest | 0.9895 | 5.21 | 2.28 | 0.75 |
| Gradient Boosting | 0.9607 | 19.56 | 4.42 | 1.83 |
| XGBoost | 0.9836 | 8.19 | 2.86 | 1.07 |
| Neural Network | 0.9568 | 21.53 | 4.64 | 1.89 |
| Model | Val R2 | Val MSE | Val RMSE | Val MAE |
|---|---|---|---|---|
| Persistence Baseline | 0.0001 | 986.3 | 31.53 | 23.18 |
| Linear Regression | 0.9232 | 38.39 | 6.20 | 3.23 |
| Random Forest | 0.9788 | 10.59 | 3.25 | 0.99 |
| Gradient Boosting | 0.9584 | 20.82 | 4.56 | 1.85 |
| XGBoost | 0.9762 | 11.92 | 3.45 | 1.21 |
| Neural Network | 0.9550 | 22.50 | 4.74 | 1.89 |
| Model | Test R2 | Test MSE |
|---|---|---|
| Persistence Baseline | 0.0001 | 986.3 |
| Linear Regression | 0.9213 | 39.27 |
| Random Forest | 0.9784 | 10.77 |
| Gradient Boosting | 0.9592 | 20.36 |
| XGBoost | 0.9770 | 11.47 |
| Neural Network | 0.9549 | 22.51 |
| Model | Test RMSE |
|---|---|
| Persistence Baseline | 31.53 |
| Linear Regression | 6.27 (Std Dev: 0.14, 95% CI: [5.98, 6.55]) |
| Random Forest | 3.28 (Std Dev: 0.13, 95% CI: [3.04, 3.53]) |
| Gradient Boosting | 4.51 (Std Dev: 0.13, 95% CI: [4.25, 4.76]) |
| XGBoost | 3.39 (Std Dev: 0.12, 95% CI: [3.15, 3.63]) |
| Neural Network | 4.74 (Std Dev: 0.14, 95% CI: [4.47, 5.01]) |
| Model | Test MAE |
|---|---|
| Persistence Baseline | 23.18 (Baseline) |
| Linear Regression | 3.24 (Std Dev: 0.05, 95% CI: [3.14, 3.34]) |
| Random Forest | 1.00 (Std Dev: 0.03, 95% CI: [0.95, 1.06]) |
| Gradient Boosting | 1.85 (Std Dev: 0.04, 95% CI: [1.77, 1.93]) |
| XGBoost | 1.20 (Std Dev: 0.03, 95% CI: [1.14, 1.27]) |
| Neural Network | 1.91 (Std Dev: 0.04, 95% CI: [1.82, 2.00]) |
| Model | Test RMSE | 95% CI | vs. Linear Regression | vs. Random Forest |
|---|---|---|---|---|
| Linear Regression | 6.27 | [5.98, 6.55] | - | 3.08 × 10−164 |
| Random Forest | 3.28 | [3.04, 3.53] | 3.08 × 10−164 | - |
| Gradient Boosting | 4.51 | [4.25, 4.76] | 8.15 × 10−163 | 1.05 × 10−122 |
| Model | Test RMSE | vs. Gradient Boosting | vs. XGBoost |
|---|---|---|---|
| XGBoost | 3.39 | 3.33 × 10−165 | - |
| Neural Network | 4.74 | - | 3.33 × 10−165 |
| Model | Train Time (s) | Test Inf. Time (s) | Time/Sample (s) |
|---|---|---|---|
| Linear Regression | 0.0149 | 0.0013 | 0.000000 |
| Random Forest | 14.3678 | 0.1775 | 0.000016 |
| Gradient Boosting | 8.1279 | 0.0191 | 0.000002 |
| XGBoost | 0.5425 | 0.0250 | 0.000002 |
| Neural Network | 678.2115 | 0.6862 | 0.000062 |
| Feature | Pearson (r) | Spearman () | Kendall () |
|---|---|---|---|
| Frequency Reference | 0.952 | 0.963 | 0.874 |
| uvIndex (W/m2) | 0.740 | 0.758 | 0.594 |
| Humidity (%) | −0.571 | −0.583 | −0.431 |
| Air Temperature (°C) | 0.437 | 0.451 | 0.326 |
| Wind Speed (m/s) | 0.401 | 0.409 | 0.294 |
| Feature | VIF |
|---|---|
| Humidity (%) | 3.04 |
| Air Temperature (celsius) | 2.62 |
| Frequency Reference | 2.16 |
| uvIndex (W/m2) | 2.08 |
| Wind Speed (m/s) | 1.23 |
| Metric | Feature Pair | Correlation | Exceeds 0.80? |
|---|---|---|---|
| Pearson | uvIndex ↔ Air Temperature | −0.780 | No |
| Spearman | uvIndex ↔ Air Temperature | −0.821 | Yes |
| Kendall | uvIndex ↔ Air Temperature | −0.613 | No |
| Model | Train R2 | Train MSE | Train RMSE | Train MAE |
|---|---|---|---|---|
| Persistence Baseline | −0.01 | 95.00 | 9.75 | 9.20 |
| Linear Regression | 0.0003 | 89.76 | 9.47 | 8.97 |
| Random Forest | 0.9321 | 6.10 | 2.47 | 1.21 |
| Gradient Boosting | 0.1079 | 80.10 | 8.95 | 8.39 |
| XGBoost | 0.1291 | 78.20 | 8.84 | 8.03 |
| Neural Network | 0.0044 | 89.39 | 9.45 | 8.94 |
| Model | Val R2 | Val MSE | Val RMSE | Val MAE |
|---|---|---|---|---|
| Persistence Baseline | −0.01 | 95.00 | 9.75 | 9.20 |
| Linear Regression | 0.0001 | 89.92 | 9.48 | 8.98 |
| Random Forest | 0.8896 | 9.93 | 3.15 | 1.46 |
| Gradient Boosting | 0.1053 | 80.46 | 8.97 | 8.41 |
| XGBoost | 0.1242 | 78.76 | 8.87 | 8.05 |
| Neural Network | 0.0052 | 89.47 | 9.46 | 8.94 |
| Model | Test R2 | Test MSE |
|---|---|---|
| Persistence Baseline | 0.0000 | 88.50 |
| Linear Regression | 0.0007 | 88.50 |
| Random Forest | 0.9007 | 8.89 |
| Gradient Boosting | 0.1102 | 78.20 |
| XGBoost | 0.1378 | 77.10 |
| Neural Network | 0.0051 | 88.90 |
| Model | Test RMSE | 95% CI/Std Dev |
|---|---|---|
| Persistence Baseline | 9.41 | [9.37, 9.45] (0.02) |
| Linear Regression | 9.46 | [9.42, 9.50] (0.021) |
| Random Forest | 2.98 | [2.87, 3.09] (0.054) |
| Gradient Boosting | 8.92 | [8.88, 8.97] (0.023) |
| XGBoost | 8.79 | [8.72, 8.85] (0.032) |
| Neural Network | 9.44 | [9.40, 9.48] (0.021) |
| Model | Test MAE | 95% CI/Std Dev |
|---|---|---|
| Persistence Baseline | 8.91 | [8.86, 9.00] (0.028) |
| Linear Regression | 8.94 | [8.89, 9.00] (0.029) |
| Random Forest | 1.39 | [1.35, 1.44] (0.024) |
| Gradient Boosting | 8.36 | [8.31, 8.42] (0.029) |
| XGBoost | 7.97 | [7.90, 8.03] (0.034) |
| Neural Network | 8.92 | [8.86, 8.97] (0.029) |
| Model | Test RMSE | 95% CI | vs. Linear Regression | vs. Random Forest |
|---|---|---|---|---|
| Linear Regression | 9.46 | [9.42, 9.50] | - | 1.02 × 10−150 |
| Random Forest | 2.98 | [2.87, 3.09] | 1.02 × 10−150 | - |
| Gradient Boosting | 8.92 | [8.88, 8.97] | 5.44 × 10−145 | 3.21 × 10−130 |
| Model | Test RMSE | vs. Gradient Boosting | vs. XGBoost |
|---|---|---|---|
| XGBoost | 8.79 | 3.21 × 10−130 | - |
| Neural Network | 9.44 | - | 2.12 × 10−128 |
| Model | Train Time (s) | Test Inf. Time (s) | Time/Sample (s) |
|---|---|---|---|
| Linear Regression | 0.0520 | 0.0014 | 0.000000 |
| Random Forest | 27.2417 | 0.3486 | 0.000000 |
| Gradient Boosting | 5.2221 | 0.0132 | 0.000000 |
| XGBoost | 0.4387 | 0.0238 | 0.000000 |
| Neural Network | 828.2635 | 0.6922 | 0.000100 |
| Month | Volume (m3) |
|---|---|
| January 2024 | 9784.20 |
| February 2024 | 11,752.69 |
| March 2024 | 11,212.82 |
| April 2024 | 16,455.45 |
| May 2024 | 15,737.80 |
| June 2024 | 16,928.75 |
| July 2024 | 18,276.96 |
| August 2024 | 15,111.29 |
| September 2024 | 13,894.36 |
| October 2024 | 12,219.22 |
| November 2024 | 12,389.35 |
| December 2024 | 12,369.88 |
| Total (Annual) | 166,132.77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salah, Y.; Shalash, O.; Khatab, E.; Hamad, M.; Imam, S. AI-Driven Digital Twin for Optimizing Solar Submersible Pumping Systems. Inventions 2025, 10, 93. https://doi.org/10.3390/inventions10060093
Salah Y, Shalash O, Khatab E, Hamad M, Imam S. AI-Driven Digital Twin for Optimizing Solar Submersible Pumping Systems. Inventions. 2025; 10(6):93. https://doi.org/10.3390/inventions10060093
Chicago/Turabian StyleSalah, Yousef, Omar Shalash, Esraa Khatab, Mostafa Hamad, and Sherif Imam. 2025. "AI-Driven Digital Twin for Optimizing Solar Submersible Pumping Systems" Inventions 10, no. 6: 93. https://doi.org/10.3390/inventions10060093
APA StyleSalah, Y., Shalash, O., Khatab, E., Hamad, M., & Imam, S. (2025). AI-Driven Digital Twin for Optimizing Solar Submersible Pumping Systems. Inventions, 10(6), 93. https://doi.org/10.3390/inventions10060093

