Contextual Reuse of Big Data Systems: A Case Study Assessing Groundwater Recharge Influences
Abstract
1. Introduction
2. Related Work and Background
Our Approach in a Nutshell
3. Materials and Methods
3.1. The Domain and the Analysis of Groundwater Recharge
3.2. Reusable Domain Assets Already Available: Factors Influencing the Groundwater Level in Different Irrigation Periods
4. Detecting Water Table Level Influences Through Reusing Domain Assets
4.1. Create Reuse Case
4.1.1. Creating the Cases: nIVR and IVR
4.1.2. Common Hypotheses for Both Contextual Cases (nIVR and IVR)
4.1.3. An Additional Hypothesis for the Second, Case (IVR)
4.2. Instantiate the Case
4.2.1. Instantiate Source Variety
First Case: The Non-Irrigation Period in Villa Regina (nIVR)
- 1888_station_VillaRegina.csv: Contains a daily report of the flow variables of the Río Negro River. Specifically, it contains four variables: date, height, IGN (IGN refers to the National Geographic Institute, which provides the official elevation reference system) elevation, and corrected IGN elevation.
- weather_station_VillaRegina.csv: Contains 33 variables measured at 10 min intervals. For example, some of the climate variables are out humidity, atmospheric pressure, solar radiation, rain rate, wind speed, low temperature, etc.
- piezometers_VillaRegina.csv: Contain a monthly report of each of the 77 piezometers located in Villa Regina region. These data are semi-structured and contain four variables: a piezometer identifier, elevation, month, and groundwater level in masl (meters above sea level).
Second Case: The Irrigation Period in Villa Regina (IVR)
4.3. Instantiate Content Variety
4.3.1. First Case: The Non-Irrigation Period in Villa Regina (nIVR)
- The most influential weather variables associated with fluctuations in water table levels were dew point, humidity, and rain rate. These influences were stronger in the piezometers located farther from the course of the Río Negro River.
- The river flow also shown strong influences, especially in piezometers located closest to the course of the Río Negro River.
- Models based on dew point, humidity, and rainfall were highly effective in forecasting water table levels.
- Models based on river flow were also very effective in forecasting water table levels.
- Models based on weather variables shown slightly better performance than those based on river flow.
4.3.2. Second Case: The Irrigation Period in Villa Regina (IVR)
- Weather variables and river flow only shown very low influence on water table fluctuations.
- The previous results showed that flood irrigation minimized or eliminated the influence of any other analyzed variables.
4.4. Instantiate Process Variety
4.4.1. First Case: The Non-Irrigation Period in Villa Regina (nIVR)
4.4.2. Second Case: The Irrigation Period in Villa Regina (IVR)
5. Analyze Results
- Mean Absolute Error (MAE): Represents the average absolute difference between the actual and predicted values. Its values range from 0 to , where lower MAE values indicate better model performance. The formula is as follows:where n is the number of observations, represents the actual value, and represents the predicted value, both for the i-th observation. The unit is the same as the target variable (water table level). In this case, .
- Mean Squared Error (MSE): Represents the average of the squared difference between the original and predicted values, and therefore penalizes large errors more severely. Its values range from 0 to . As with the previous metrics, values closer to 0 indicate better model performance. The formula is as follows:where n, , and represents the same as MAE. The unit is the square of the target variable’s unit ().
- Root Mean Squared Error (RMSE): Represents the square root of MSE. Its values range from 0 to . The formula is as follows:where n, , and represents the same as MAE. Returns the error to the original units of the target variable ().
- Coefficient of determination (): Refers to how well the model’s predictions approximate the true values. It represents the proportion of the variance in the dependent variable (water table level) that is explained by the independent variable (flow rate of the Río Negro River). A value of 1 indicates a perfect fit. The formula is as follows:where is the mean of the observed values.
5.1. First Case: The Non-Irrigation Period in Villa Regina (nIVR)
5.2. Second Case: The Irrigation Period in Villa Regina (IVR)
5.3. Comparing the Cases: nIVR Versus IVR
6. Discussion
6.1. Threats to Validity
6.1.1. From the Analytics
6.1.2. From the Process to the Problem
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| BDS | Big Data Systems |
| SPL | Software Product Line |
| OVM | Orthogonal Variability Model |
| CoVaMaT | Context-based Variety Management Tool |
| ML | Machine Learning |
| DC | Domian Case |
| RC | Reusable Case |
| VR | Villa Regina |
| IVR | Irrigation Villa Regina |
| nIVR | Non-Irrigation Villa Regina |
| LSTM | Long Short-Term Memory |
| ANN | Artificial Neural Network |
| SVM | Support Vector Machine |
| R2 | Coefficient of determination |
| RMSE | Root Mean Square Error |
| MSE | Mean Square Error |
| MAE | Mean Absolute Error |
References
- Bahga, A.; Madisetti, V. Big Data Science & Analytics: A Hands-On Approach, 1st ed.; VPT: Atlanta, GA, USA, 2016. [Google Scholar]
- Erl, T.; Khattak, W.; Buhler, P. Big Data Fundamentals: Concepts, Drivers & Techniques, 1st ed.; Prentice Hall Press: Upper Saddle River, NJ, USA, 2016. [Google Scholar]
- Klein, J.; Buglak, R.; Blockow, D.; Wuttke, T.; Cooper, B. A reference architecture for big data systems in the national security domain. In Proceedings of the 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE), Austin, TX, USA, 16 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 51–57. [Google Scholar]
- Davoudian, A.; Liu, M. Big Data Systems: A Software Engineering Perspective. ACM Comput. Surv. 2020, 53, 1–39. [Google Scholar] [CrossRef]
- Borrison, R.; Klöpper, B.; Chioua, M.; Dix, M.; Sprick, B. Reusable Big Data System for Industrial Data Mining—A Case Study on Anomaly Detection in Chemical Plants. In Proceedings of the Intelligent Data Engineering and Automated Learning—IDEAL 2018, Madrid, Spain, 21–23 November 2018; Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J., Eds.; Springer: Cham, Switzerland, 2018; pp. 611–622. [Google Scholar]
- Epperson, W.; Wang, A.Y.; DeLine, R.; Drucker, S.M. Strategies for Reuse and Sharing among Data Scientists in Software Teams. In Proceedings of the ICSE-SEIP ’22, Pittsburgh, PA, USA, 22–24 May 2022; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
- Garrido, W.; Buccella, A.; Cechich, A.; Montenegro, A. Análisis de influencias en la recarga de las napas freáticas: Un caso de estudio en reusabilidad contextual de sistemas big data. JAIIO Jorn. Argent. Inform. 2025, 11, 71–84. [Google Scholar]
- Muhuri, P.S.; Chatterjee, P.; Yuan, X.; Roy, K.; Esterline, A. Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classify Network Attacks. Information 2020, 11, 243. [Google Scholar] [CrossRef]
- Pasquetto, I.; Randles, B.; Borgman, C. On the Reuse of Scientific Data. Data Sci. J. 2017, 16, 1–9. [Google Scholar] [CrossRef]
- Custers, B.; Uršič, H. Big data and data reuse: A taxonomy of data reuse for balancing big data benefits and personal data protection. Int. Data Priv. Law 2016, 6, 4–15. [Google Scholar] [CrossRef]
- Xie, Z.; Chen, Y.; Speer, J.; Walters, T.; Tarazaga, P.A.; Kasarda, M. Towards Use And Reuse Driven Big Data Management. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, New York, NY, USA, 7–16 June 2009; Association for Computing Machinery: New York, NY, USA, 2015; pp. 65–74. [Google Scholar]
- Klein, J. Reference Architectures for Big Data Systems, Carnegie Mellon University’s Software Engineering Institute Blog. 2017. Available online: http://insights.sei.cmu.edu/blog/reference-architectures-for-big-data-systems/ (accessed on 9 June 2021).
- Nadal, S.; Herrero, V.; Romero, O.; Abelló, A.; Franch, X.; Vansummeren, S.; Valerio, D. A software reference architecture for semantic-aware Big Data systems. Inf. Softw. Technol. 2017, 90, 75–92. [Google Scholar] [CrossRef]
- Cuesta, C.E.; Martínez-Prieto, M.A.; Fernández, J.D. Towards an Architecture for Managing Big Semantic Data in Real-Time. In Proceedings of the Software Architecture; Drira, K., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 45–53. [Google Scholar]
- Duggan, J.; Elmore, A.J.; Stonebraker, M.; Balazinska, M.; Howe, B.; Kepner, J.; Madden, S.; Maier, D.; Mattson, T.; Zdonik, S. The BigDAWG polystore system. ACM SIGMOD Rec. 2015, 44, 11–16. [Google Scholar] [CrossRef]
- Pohl, K.; Böckle, G.; Linden, F. Software Product Line Engineering: Foundations, Principles and Techniques; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Osycka, L.; Cechich, A.; Buccella, A.; Montenegro, A.; Muñoz, A. CoVaMaT: Functionality for Variety Reuse through a Supporting Tool. In Proceedings of the Cloud Computing, Big Data & Emerging Topics; Springer Nature: Cham, Switzerland, 2023; pp. 57–74. [Google Scholar]
- Blanken, P. Essentials of Water: Water in the Earth’s Physical and Biological Environments; Cambridge University Press: Cambridge, UK, 2024. [Google Scholar]
- Gebreslassie, H.; Berhane, G.; Gebreyohannes, T.; Hagos, M.; Hussien, A.; Walraevens, K. Water Harvesting and Groundwater Recharge: A Comprehensive Review and Synthesis of Current Practices. Water 2025, 17, 976. [Google Scholar] [CrossRef]
- Lallahem, S.; Mania, J.; Hani, A.; Najjar, Y. On the use of neural networks to evaluate groundwater levels in fractured media. J. Hydrol. 2005, 307, 92–111. [Google Scholar] [CrossRef]
- Djurovic, N.; Domazet, M.; Stricevic, R.; Pocuca, V.; Spalevic, V.; Pivic, R.; Gregoric, E.; Domazet, U. Comparison of Groundwater Level Models Based on Artificial Neural Networks and ANFIS. Sci. World J. 2015, 2015, 13. [Google Scholar] [CrossRef] [PubMed]
- Shamsuddin, M.K.N.; Mohd Kusin, F.; Sulaiman, W.; Ramli, M.; Tajul Baharuddin, M.F.; Adnan, M.S. Forecasting of Groundwater Level using Artificial Neural Network by incorporating river recharge and river bank infiltration. MATEC Web Conf. 2017, 103, 04007. [Google Scholar] [CrossRef]
- Liu, Q.; Gui, D.; Zhang, L.; Niu, J.; Dai, H.; Wei, G.; Hu, B.X. Simulation of regional groundwater levels in arid regions using interpretable machine learning models. Sci. Total Environ. 2022, 831, 154902. [Google Scholar] [CrossRef] [PubMed]
- Eftekhari, M.; Khashei-Siuki, A. Evaluating machine learning methods for predicting groundwater fluctuations using GRACE satellite in arid and semi-arid regions. J. Groundw. Sci. Eng. 2025, 13, 5–21. [Google Scholar] [CrossRef]
- Yang, Y.; Zhao, J. Forecasting the Spatio-Temporal Evolution of Groundwater Vulnerability: A Coupled Time-Series and Hydrogeological Modeling Approach. Water 2025, 17, 3033. [Google Scholar] [CrossRef]
- Baki, A.M.; Ghavami, S.M. A modified DRASTIC model for groundwater vulnerability assessment using connecting path and analytic hierarchy process methods. Environ. Sci. Pollut. Res. 2023, 30, 111270–111283. [Google Scholar] [CrossRef] [PubMed]
- Afful, S.K.; Boateng, C.D.; Ahene, E.; Aryee, J.N.A.; Wemegah, D.D.; Gidigasu, S.S.R.; Britwum, A.; Osei, M.A.; Gilbert, J.; Touré, H.; et al. A systematic review of neural network applications for groundwater level prediction. Discov. Appl. Sci. 2025, 7, 942. [Google Scholar] [CrossRef]
- Jesse, G.; Boateng, C.D.; Aryee, J.N.; Osei, M.A.; Wemegah, D.D.; Gidigasu, S.S.; Britwum, A.; Afful, S.K.; Touré, H.; Mensah, V.; et al. A systematic review of machine learning models for groundwater level prediction. Appl. Comput. Geosci. 2025, 28, 100303. [Google Scholar] [CrossRef]
- Mandler, H.; Weigand, B. A review and benchmark of feature importance methods for neural networks. ACM Comput. Surv. 2024, 56, 1–30. [Google Scholar] [CrossRef]
- Buccella, A.; Cechich, A.; Saurin, F.; Montenegro, A.; Rodríguez, A.; Muñoz, A. A Context-Based Perspective on Frost Analysis in Reuse-Oriented Big Data-System Developments. Information 2024, 15, 661. [Google Scholar] [CrossRef]
- Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud Univ.–Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
- Sakovich, N.; Aksenov, D.; Pleshakova, E.; Gataullin, S. A neural operator using dynamic mode decomposition analysis to approximate partial differential equations. AIMS Math. 2025, 10, 22432–22444. [Google Scholar] [CrossRef]
- Daun, M.; Brings, J.; Aluko Obe, P.; Tenbergen, B. An industry survey on approaches, success factors, and barriers for technology transfer in software engineering. Softw. Pract. Exp. 2023, 53, 1496–1524. [Google Scholar] [CrossRef]
- Mauladdawilah, H.; Balfaqih, M.; Balfagih, Z.; Pegalajar, M.d.C.; Gago, E.J. Deep Feature Selection of Meteorological Variables for LSTM-Based PV Power Forecasting in High-Dimensional Time-Series Data. Algorithms 2025, 18, 496. [Google Scholar] [CrossRef]
- Epting, J.; Huggenberger, P.; Radny, D.; Hammes, F.; Hollender, J.; Page, R.M.; Weber, S.; Bänninger, D.; Auckenthaler, A. Spatiotemporal scales of river-groundwater interaction—The role of local interaction processes and regional groundwater regimes. Sci. Total Environ. 2018, 618, 1224–1243. [Google Scholar] [CrossRef] [PubMed]
- Ejaz, N.; Choudhury, S. A comprehensive survey of the machine learning pipeline for wildfire risk prediction and assessment. Ecol. Inform. 2025, 90, 103325. [Google Scholar] [CrossRef]

















| Hyperparameters | 1st Configuration | 2nd Configuration | 3rd Configuration |
|---|---|---|---|
| Number of LSTM layers | 1 | 2 | 3 |
| Number of units per layer | 32 | 128–64 | 256–128–64 |
| Batch size | 16 | 16 | 32 |
| Number of epochs | 200 | 200 | 200 |
| Learning rate | 0.0005 | 0.001 | 0.001 |
| Activation function (LSTM) | tanh | tanh | tanh |
| Dropout rate (LSTM) | 0.2 | 0.3 | 0.3 |
| Optimization | Adam | Adam | Adam |
| Piezometer ID | Prediction (masl) | Actual Value (masl) | Difference (masl) |
|---|---|---|---|
| 60040 | 191.22 | 191.27 | 0.05 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| 1st LSTM | 0.0385 | 0.1422 | 0.9636 | 0.1963 | 12.47 | 1.90 | 0.0046 | 6.48 |
| 2nd LSTM | 0.0344 | 0.1419 | 0.9675 | 0.1856 | 2.03 | 1.69 | 0.0007 | 1.08 |
| 3rd LSTM | 0.1045 | 0.2250 | 0.9013 | 0.3233 | 67.75 | 38.00 | 0.0669 | 43.21 |
| ANN | 0.0337 | 0.1395 | 0.9682 | 0.1836 | 0.00 | 0.00 | 0.0000 | 0.00 |
| SVM | 0.0349 | 0.1447 | 0.9670 | 0.1869 | 3.44 | 3.59 | 0.0012 | 1.76 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| LSTM | 0.6745 | 0.6374 | 0.8213 | 0.3404 | 89.55 | 68.00 | 0.1098 | 22.00 |
| ANN | 0.0705 | 0.2040 | 0.9311 | 0.2655 | 0.00 | 0.00 | 0.0000 | 0.00 |
| SVM | 0.0939 | 0.2596 | 0.9082 | 0.3064 | 24.93 | 21.42 | 0.0229 | 13.35 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| LSTM | 0.0329 | 0.1313 | 0.9687 | 0.1813 | 0.00 | 0.00 | 0.0000 | 0.00 |
| ANN | 0.0482 | 0.1715 | 0.9545 | 0.2195 | 31.74 | 23.41 | 0.0142 | 17.40 |
| SVM | 0.6414 | 0.6174 | 0.3944 | 0.8009 | 94.87 | 78.73 | 0.5743 | 77.36 |
| MSE | MAE | R2 | RMSE | |
|---|---|---|---|---|
| ANN for | 0.0337 | 0.1395 | 0.9682 | 0.1836 |
| ANN for | 0.0705 | 0.2040 | 0.9311 | 0.2655 |
| LSTM for | 0.0329 | 0.1313 | 0.9687 | 0.1813 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| 2nd LSTM | 0.0301 | 0.1304 | 0.9702 | 0.1735 | 0.00 | 0.00 | 0.0000 | 0.00 |
| ANN | 0.0302 | 0.1314 | 0.9707 | 0.1737 | 0.33 | 0.76 | −0.0005 | 0.12 |
| SVM | 0.0325 | 0.1411 | 0.9685 | 0.1802 | 7.38 | 7.59 | 0.0017 | 3.72 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| LSTM | 0.0753 | 0.2201 | 0.9291 | 0.3404 | 41.04 | 22.94 | 0.0291 | 38.07 |
| ANN | 0.0444 | 0.1696 | 0.9582 | 0.2108 | 0.00 | 0.00 | 0.0000 | 0.00 |
| SVM | 0.0472 | 0.1723 | 0.9555 | 0.2174 | 5.93 | 1.57 | 0.0027 | 3.04 |
| Improvements in Metrics | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | R2 | RMSE | MSE (%) | MAE (%) | ΔR2 | RMSE (%) |
| LSTM | 0.0413 | 0.1434 | 0.9611 | 0.2032 | 13.80 | −6.49 | 0.0054 | 7.18 |
| ANN | 0.0356 | 0.1527 | 0.9665 | 0.1886 | 0.00 | 0.00 | 0.0000 | 0.00 |
| SVM | 0.0373 | 0.1622 | 0.9649 | 0.1932 | 4.56 | 5.86 | 0.0016 | 2.38 |
| MSE | MAE | R2 | RMSE | |
|---|---|---|---|---|
| 2nd LSTM fo H1 | 0.0301 | 0.1304 | 0.9702 | 0.1735 |
| ANN for H2 | 0.0444 | 0.1696 | 0.9582 | 0.2108 |
| ANN for H3 | 0.0356 | 0.1527 | 0.9665 | 0.1886 |
| Hypotheses | Application Case nIVR | Application Case IVR | ||||||
|---|---|---|---|---|---|---|---|---|
| MSE | MAE | R2 | RMSE | MSE | MAE | R2 | RMSE | |
| ANN | LSTM | |||||||
| 0.0337 | 0.1395 | 0.9682 | 0.1836 | 0.0301 | 0.1304 | 0.9702 | 0.1735 | |
| ANN | ANN | |||||||
| 0.0705 | 0.2040 | 0.9311 | 0.2655 | 0.0444 | 0.1696 | 0.9582 | 0.2108 | |
| LSTM | ANN | |||||||
| 0.0329 | 0.1313 | 0.9687 | 0.1813 | 0.0356 | 0.1527 | 0.9665 | 0.1886 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Buccella, A.; Cechich, A.; Garrido, W.; Montenegro, A. Contextual Reuse of Big Data Systems: A Case Study Assessing Groundwater Recharge Influences. Appl. Sci. 2026, 16, 1650. https://doi.org/10.3390/app16031650
Buccella A, Cechich A, Garrido W, Montenegro A. Contextual Reuse of Big Data Systems: A Case Study Assessing Groundwater Recharge Influences. Applied Sciences. 2026; 16(3):1650. https://doi.org/10.3390/app16031650
Chicago/Turabian StyleBuccella, Agustina, Alejandra Cechich, Walter Garrido, and Ayelén Montenegro. 2026. "Contextual Reuse of Big Data Systems: A Case Study Assessing Groundwater Recharge Influences" Applied Sciences 16, no. 3: 1650. https://doi.org/10.3390/app16031650
APA StyleBuccella, A., Cechich, A., Garrido, W., & Montenegro, A. (2026). Contextual Reuse of Big Data Systems: A Case Study Assessing Groundwater Recharge Influences. Applied Sciences, 16(3), 1650. https://doi.org/10.3390/app16031650

