Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (11)

Search Parameters:
Keywords = infilling missing data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 3046 KB  
Article
Geostatistically Enhanced Learning for Supervised Classification of Wall-Rock Alteration Using Assay Grades of Trace Elements and Sulfides
by Abhishek Borah, Parag Jyoti Dutta and Xavier Emery
Minerals 2025, 15(11), 1128; https://doi.org/10.3390/min15111128 - 29 Oct 2025
Viewed by 1438
Abstract
The spatial zoning of wall-rock alteration is a useful guide for exploration of porphyry deposits. The current techniques to typify and quantify alteration types have a component of subjectivity and may not reconcile with mineralogical observations. An alternative is to apply machine learning [...] Read more.
The spatial zoning of wall-rock alteration is a useful guide for exploration of porphyry deposits. The current techniques to typify and quantify alteration types have a component of subjectivity and may not reconcile with mineralogical observations. An alternative is to apply machine learning (ML) to classify alteration based on geochemical and mineralogical feature variables. However, classification loses accuracy because of natural and artificial short-scale variability and missing information, or because it ignores the spatial correlations of the feature variables. Here we show that these inconveniences can be overcome by replacing these variables with proxies obtained through geostatistical simulation. The use of such proxies improves the accuracy scores by eight percentual points by removing the noise affecting the feature variables and infilling their missing values. Furthermore, the uncertainty in the classification predictions can be quantified accurately. Our results demonstrate how geostatistics enriches ML to achieve higher predictive performance and handle incomplete and noisy data sets in a spatial setting. This synergy has far-reaching consequences for decision making in mining exploration, geological modeling, and geometallurgical planning. Beyond the presented pioneering application, we expect our approach to be used in supervised classification problems that arise in varied disciplines of natural sciences and engineering and involve regionalized data. Full article
Show Figures

Figure 1

17 pages, 1111 KB  
Article
NLP-Based Restoration of Damaged Student Essay Archives for Educational Preservation and Fair Reassessment
by Julius Olaniyan, Silas Formunyuy Verkijika and Ibidun C. Obagbuwa
Electronics 2025, 14(16), 3189; https://doi.org/10.3390/electronics14163189 - 11 Aug 2025
Viewed by 678
Abstract
The degradation of physical student examination archives, particularly handwritten essay booklets, presents a significant barrier to longitudinal academic research, institutional record preservation, and student performance analysis. This study introduces a novel natural language processing (NLP)-based framework for the automated reconstruction of damaged academic [...] Read more.
The degradation of physical student examination archives, particularly handwritten essay booklets, presents a significant barrier to longitudinal academic research, institutional record preservation, and student performance analysis. This study introduces a novel natural language processing (NLP)-based framework for the automated reconstruction of damaged academic essay manuscripts using a span-infilling transformer architecture. A synthetic dataset comprising 5000 paired samples of damaged Text and full Text was curated from archived Data Science examination scripts collected at the Center for Applied Data Science, Sol Plaatje University, South Africa. The proposed method fine-tunes a T5-based encoder–decoder model, leveraging span corruption and task-specific prompting to restore missing or illegible segments. Comprehensive evaluation using ROUGE-L, BLEU-4, and BERTScore demonstrates substantial improvements over baseline models including BERT and GPT-2. Qualitative assessments by academic experts further validate the fluency, coherence, and contextual relevance of restored texts. Training dynamics reveal stable convergence without overfitting, while ablation studies confirm the contribution of each architectural component. Token-level error analyses and confidence-scored predictions provide additional interpretability. The proposed framework offers a scalable and effective solution for educational institutions seeking to digitize and recover lost historical student essay records, with potential extensions to other domains, such as digital humanities and archival restoration. Full article
Show Figures

Figure 1

17 pages, 5791 KB  
Article
On the Use of Reanalysis Data to Reconstruct Missing Observed Daily Temperatures in Europe over a Lengthy Period of Time
by Konstantinos V. Varotsos, George Katavoutas and Christos Giannakopoulos
Sustainability 2023, 15(9), 7081; https://doi.org/10.3390/su15097081 - 23 Apr 2023
Cited by 3 | Viewed by 2472
Abstract
In this study, a methodology that can reconstruct missing daily values of maximum and minimum temperatures over a long time period under the assumption of a sparse network of meteorological stations is described. To achieve this, a well-established software used for quality control, [...] Read more.
In this study, a methodology that can reconstruct missing daily values of maximum and minimum temperatures over a long time period under the assumption of a sparse network of meteorological stations is described. To achieve this, a well-established software used for quality control, homogenization and the infilling of missing climatological series data, Climatol, is used to combine a mosaic of data, including daily observations from 15 European stations and daily data from two high-resolution reanalysis datasets, ERA5-Land and MESCAN-SURFEX; this is in order reconstruct daily values over the 2000–2018 period. By comparing frequently used indices, defined by the Expert Team on Climate Change Detection and Indices (ETCCDI) in studies of climate change assessment and goodness-of-fit measures, the reconstructed time series are evaluated against the observed ones. The analysis reveals that the ERA5-Land reconstructions outperform the MESCAN-SURFEX ones when compared to the observations in terms of biases, the various indices evaluated, and in terms of the goodness of fit for both the daily maximum and minimum temperatures. In addition, the magnitude and significance of the observed long-term temporal trends maintained in the reconstructions, in the majority of the stations examined, for both the daily maximum and daily minimum temperatures, is an issue of the greatest relevance in many climatic studies. Full article
(This article belongs to the Special Issue Climate Change and Urban Thermal Effects)
Show Figures

Figure 1

24 pages, 6532 KB  
Article
Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning
by Saul G. Ramirez, Gustavious Paul Williams, Norman L. Jones, Daniel P. Ames and Jani Radebaugh
Water 2023, 15(6), 1236; https://doi.org/10.3390/w15061236 - 22 Mar 2023
Cited by 7 | Viewed by 2927
Abstract
Obtaining and managing groundwater data is difficult as it is common for time series datasets representing groundwater levels at wells to have large gaps of missing data. To address this issue, many methods have been developed to infill or impute the missing data. [...] Read more.
Obtaining and managing groundwater data is difficult as it is common for time series datasets representing groundwater levels at wells to have large gaps of missing data. To address this issue, many methods have been developed to infill or impute the missing data. We present a method for improving data imputation through an iterative refinement model (IRM) machine learning framework that works on any aquifer dataset where each well has a complete record that can be a mixture of measured and input values. This approach corrects the imputed values by using both in situ observations and imputed values from nearby wells. We relied on the idea that similar wells that experience a similar environment (e.g., climate and pumping patterns) exhibit similar changes in groundwater levels. Based on this idea, we revisited the data from every well in the aquifer and “re-imputed” the missing values (i.e., values that had been previously imputed) using both in situ and imputed data from similar, nearby wells. We repeated this process for a predetermined number of iterations—updating the well values synchronously. Using IRM in conjuncture with satellite-based imputation provided better imputation and generated data that could provide valuable insight into aquifer behavior, even when limited or no data were available at individual wells. We applied our method to the Beryl-Enterprise aquifer in Utah, where many wells had large data gaps. We found patterns related to agricultural drawdown and long-term drying, as well as potential evidence for multiple previously unknown aquifers. Full article
(This article belongs to the Section Hydrogeology)
Show Figures

Figure 1

17 pages, 3830 KB  
Article
Deterministic and Stochastic Generation of Evaporation Data for Long-Term Mine Pit Lake Water Balance Modelling
by Kristian Mandaran, Neil McIntyre and David McJannet
Water 2022, 14(24), 4123; https://doi.org/10.3390/w14244123 - 17 Dec 2022
Cited by 1 | Viewed by 2898
Abstract
Lakes commonly form in mine pits following the end of mining. A good understanding of the pit lake water balance over future decades to centuries is essential to understand and manage environmental risks from the lake. Evaporation is often the major or only [...] Read more.
Lakes commonly form in mine pits following the end of mining. A good understanding of the pit lake water balance over future decades to centuries is essential to understand and manage environmental risks from the lake. Evaporation is often the major or only outflow from the lake, thus being an important determinant of equilibrium lake level and environmental risks. A general lack of in situ measurements of pit lake evaporation has meant that estimates have usually been based on pan coefficients derived for other contexts or on alternative unvalidated evaporation models. Our research used data from an evaporation pan and weather station that were floated on a pit lake in semi-arid central Queensland, Australia. A deterministic aerodynamic evaporation model was developed from these data to infill missing values, and an adjusted aerodynamic model was used to reconstruct long-term historical daily evaporation data. With an average bias of 6.5% during the measurement period, this long-term model was found to be more accurate than alternative simple models (e.g., using the commonly used pan coefficient of 0.7 gave a bias of 45%). The reconstructed data were then used to fit and assess a stochastic model for the generation of future evaporation and rainfall realisations, assuming a stationary climate. Fitting stochastic models at a monthly time step was found to accurately represent the monthly evaporation statistics. For example, the cross-correlation between historical rainfall and evaporation was within the 25 and 75 percentiles of the modelled values in 11 of 12 months and always within the 2.5 and 97.5 percentiles. However, the stationary nature of the model presented limitations in capturing interannual anomalies, with continuous periods of up to 6 years, where the modelled annual rainfall was consistently lower and modelled annual evaporation consistently higher than the historical values. Fitting stochastic models at a daily time step had problems capturing a range of statistics of both rainfall and evaporation. For example, in 6 of the 12 months, the cross-correlation between historical rainfall and evaporation was outside the modelled 2.5 and 97.5 percentiles. This likely arises from the complex patterns in transitions from wet to dry days in the semi-arid climate of the case study. While the long-term model and monthly stochastic model are promising, further work is needed to understand the significance of the observed errors and refine the models. Full article
(This article belongs to the Section Hydrology)
Show Figures

Figure 1

15 pages, 2170 KB  
Technical Note
Parsimonious Gap-Filling Models for Sub-Daily Actual Evapotranspiration Observations from Eddy-Covariance Systems
by Danlu Guo, Arash Parehkar, Dongryeol Ryu, Quan J. Wang and Andrew W. Western
Remote Sens. 2022, 14(5), 1286; https://doi.org/10.3390/rs14051286 - 5 Mar 2022
Cited by 1 | Viewed by 3538
Abstract
Missing data and low data quality are common issues in field observations of actual evapotranspiration (ETa) from eddy-covariance systems, which necessitates the need for gap-filling techniques to improve data quality and utility for further analyses. A number of models have been [...] Read more.
Missing data and low data quality are common issues in field observations of actual evapotranspiration (ETa) from eddy-covariance systems, which necessitates the need for gap-filling techniques to improve data quality and utility for further analyses. A number of models have been proposed to fill temporal gaps in ETa or latent heat flux observations. However, existing gap-filling approaches often use multi-variate models that rely on relationships between ETa and other meteorological and flux variables, highlighting a critical lack of parsimonious gap-filling models. This study aims to develop and evaluate parsimonious approaches to fill gaps in ETa observations. We adapted three gap-filling models previously used for other meteorological variables but never applied to infill sub-daily ETa or flux observations from eddy-covariance systems before. All three models are solely based on the observed diurnal patterns in the ETa data, which infill gaps in sub-daily data with sinusoidal functions (Sinusoidal), smoothing functions (Smoothing) and pattern matching (MaxCor) approaches, respectively. We presented a systematic approach for model evaluation, considering multiple patterns of data gaps during different times of the day. The three gap-filling models were evaluated together with another benchmarking gap-filling model, mean diurnal variation (MDV) that has been commonly used and has similar data requirement. We used a case study with field measurements from an EC system over summer 2020–2021, at a maize field in southeastern Australia. We identified the MaxCor model as the best gap-filling model, which informs the diurnal pattern of the day to infill by using another day with similar temporal patterns and complete data. Following the MaxCor model, the MDV and the Sinusoidal models show comparable performances. We further discussed the infilling models in terms of their dependence on data availability and their suitability for different practical situations. The MaxCor model relies on high data availability for both days with complete data and the available records within each day to infill. The Sinusoidal model does not rely on any day with complete data, which makes it the ideal choice in situations where days with complete records are limited. Full article
(This article belongs to the Special Issue Accuracy and Quality Control of Remote Sensing Data)
Show Figures

Figure 1

26 pages, 1520 KB  
Article
Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation
by Thelma Dede Baddoo, Zhijia Li, Samuel Nii Odai, Kenneth Rodolphe Chabi Boni, Isaac Kwesi Nooni and Samuel Ato Andam-Akorful
Int. J. Environ. Res. Public Health 2021, 18(16), 8375; https://doi.org/10.3390/ijerph18168375 - 7 Aug 2021
Cited by 21 | Viewed by 4731
Abstract
Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing [...] Read more.
Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed. Full article
Show Figures

Figure 1

15 pages, 1104 KB  
Article
Copula-Based Infilling Methods for Daily Suspended Sediment Loads
by Jenq-Tzong Shiau and Yu-Cheng Lien
Water 2021, 13(12), 1701; https://doi.org/10.3390/w13121701 - 19 Jun 2021
Cited by 11 | Viewed by 3221
Abstract
Less-frequent and inadequate sampling of sediment data has negatively impacted the long and continuous records required for the design and operation of hydraulic facilities. This data-scarcity problem is often found in most river basins of Taiwan. This study aims to propose a parsimonious [...] Read more.
Less-frequent and inadequate sampling of sediment data has negatively impacted the long and continuous records required for the design and operation of hydraulic facilities. This data-scarcity problem is often found in most river basins of Taiwan. This study aims to propose a parsimonious probabilistic model based on copulas to infill daily suspended sediment loads using streamflow discharge. A copula-based bivariate distribution model of sediment and discharge of the paired recorded data is constructed first. The conditional distribution of sediment load given observed discharge is used to provide probabilistic estimation of sediment loads. In addition, four different methods based on the derived conditional distribution of sediment load are used to give single-value estimations. The obtained outcomes of these methods associated with the results of the traditional sediment rating curve are compared with recorded data and evaluated in terms of root mean square error (RMSE), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and modified Nash-Sutcliffe efficiency (MNSE). The proposed approach is applied to the Janshou station located in eastern Taiwan with recorded daily data for the period of 1960–2019. The results indicate that the infilled sediments by the sediment rating curve exhibit better performance in RMSE and NSE, while the copula-based methods outperform in MAPE and MNSE. Additionally, the infilled sediments by the copula-based methods preserve scattered characteristics of observed sediment-discharge relationships and exhibit similar frequency distributions to that of recorded sediment data. Full article
(This article belongs to the Section Hydrology)
Show Figures

Figure 1

34 pages, 14401 KB  
Article
Cross Assessment of Twenty-One Different Methods for Missing Precipitation Data Estimation
by Asaad M. Armanuos, Nadhir Al-Ansari and Zaher Mundher Yaseen
Atmosphere 2020, 11(4), 389; https://doi.org/10.3390/atmos11040389 - 15 Apr 2020
Cited by 31 | Viewed by 6251
Abstract
The results of metrological, hydrological, and environmental data analyses are mainly dependent on the reliable estimation of missing data. In this study, 21 classical methods were evaluated to determine the best method for infilling the missing precipitation data in Ethiopia. The monthly data [...] Read more.
The results of metrological, hydrological, and environmental data analyses are mainly dependent on the reliable estimation of missing data. In this study, 21 classical methods were evaluated to determine the best method for infilling the missing precipitation data in Ethiopia. The monthly data collected from 15 different stations over 34 years from 1980 to 2013 were considered. Homogeneity and trend tests were performed to check the data. The results of the different methods were compared using the mean absolute error (MAE), root-mean-square error (RMSE), coefficient of efficiency (CE), similarity index (S-index), skill score (SS), and Pearson correlation coefficient (rPearson). The results of this paper confirmed that the normal ratio (NR), multiple linear regression (MLR), inverse distance weighting (IDW), correlation coefficient weighting (CCW), and arithmetic average (AA) methods are the most reliable methods of those studied. The NR method provides the most accurate estimations with rPearson of 0.945, mean absolute error of 22.90 mm, RMSE of 33.695 mm, similarity index of 0.999, CE index of 0.998, and skill score of 0.998. When comparing the observed results and the estimated results from the NR, MLR, IDW, CCW, and AA methods, the MAE and RMSE were found to be low, and high values of CE, S-index, SS, and rPearson were achieved. On the other hand, using the closet station (CS), UK traditional, linear regression (LR), expectation maximization (EM), and multiple imputations (MI) methods gave the lowest accuracy, with MAE and RMSE values varying from 30.424 to 47.641 mm and from 49.564 to 58.765 mm, respectively. The results of this study suggest that the recommended methods are applicable for different types of climatic data in Ethiopia and arid regions in other countries around the world. Full article
(This article belongs to the Section Meteorology)
Show Figures

Graphical abstract

22 pages, 1337 KB  
Article
Infilling Missing Data in Hydrology: Solutions Using Satellite Radar Altimetry and Multiple Imputation for Data-Sparse Regions
by Iguniwari Thomas Ekeu-wei, George Alan Blackburn and Philip Pedruco
Water 2018, 10(10), 1483; https://doi.org/10.3390/w10101483 - 20 Oct 2018
Cited by 24 | Viewed by 6643
Abstract
In developing regions missing data are prevalent in historical hydrological datasets, owing to financial, institutional, operational and technical challenges. If not tackled, these data shortfalls result in uncertainty in flood frequency estimates and consequently flawed catchment management interventions that could exacerbate the impacts [...] Read more.
In developing regions missing data are prevalent in historical hydrological datasets, owing to financial, institutional, operational and technical challenges. If not tackled, these data shortfalls result in uncertainty in flood frequency estimates and consequently flawed catchment management interventions that could exacerbate the impacts of floods. This study presents a comparative analysis of two approaches for infilling missing data in historical annual peak river discharge timeseries required for flood frequency estimation: (i) satellite radar altimetry (RA) and (ii) multiple imputation (MI). These techniques were applied at five gauging stations along the floodprone Niger and Benue rivers within the Niger River Basin. RA and MI enabled the infilling of missing data for conditions where altimetry virtual stations were available and unavailable, respectively. The impact of these approaches on derived flood estimates was assessed, and the return period of a previously unquantified devastating flood event in Nigeria in 2012 was ascertained. This study revealed that the use of RA resulted in reduced uncertainty when compared to MI for data infilling, especially for widely gapped timeseries (>3 years). The two techniques did not differ significantly for data sets with gaps of 1–3 years, hence, both RA and MI can be used interchangeably in such situations. The use of the original in situ data with gaps resulted in higher flood estimates when compared to datasets infilled using RA and MI, and this can be attributed to extrapolation uncertainty. The 2012 flood in Nigeria was quantified as a 1-in-100-year event at the Umaisha gauging station on the Benue River and a 1-in-50-year event at Baro on the Niger River. This suggests that the higher levels of flooding likely emanated from the Kiri and Lagdo dams in Nigeria and Cameroon, respectively, as previously speculated by the media and recent studies. This study demonstrates the potential of RA and MI for providing information to support flood management in developing regions where in situ data is sparse. Full article
(This article belongs to the Special Issue Hydrologic Modelling for Water Resources and River Basin Management)
Show Figures

Figure 1

36 pages, 5628 KB  
Case Report
Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya
by William Githungo, Silvery Otengi, Jacob Wakhungu and Edward Masibayi
Hydrology 2016, 3(4), 40; https://doi.org/10.3390/hydrology3040040 - 22 Nov 2016
Cited by 18 | Viewed by 8440
Abstract
Design and operation of water resources management systems in sub-Saharan Africa suffer from inadequate observation data. Long running uninterrupted time series of data are often not available for water resource planning. Incomplete datasets with missing gaps is a challenge for users of the [...] Read more.
Design and operation of water resources management systems in sub-Saharan Africa suffer from inadequate observation data. Long running uninterrupted time series of data are often not available for water resource planning. Incomplete datasets with missing gaps is a challenge for users of the data. Inadequate data compromise results of analyses leading to wrong inference and conclusions of scientific assessments and research. Infilling of missing sections of data is necessary prior to the practical use of hydrometeorological time series. This paper proposes the use of Tropical Rainfall Measuring Mission satellite data as a viable alternate source of infill for missing rain gauge records. The least square regression method, using satellite-based estimates of rainfall was tested to fill in the missing data for 153 data points at nine rain gauge stations in Machakos, Makueni and the Kitui region of Kenya. Results suggest that the satellite rainfall estimates can be used as an alternative data source for rainfall series where the missing data gaps are large. The infilled data series were used in the development of monitoring, forecasting and drought early warning for Arid and Semi-Arid Lands (ASAL) in Kenya. Full article
Show Figures

Figure 1

Back to TopTop