MDPI - Publisher of Open Access Journals

19 pages, 2664 KB

Open AccessArticle

Machine Learning-Based Prediction of Multi-Year Cumulative Atmospheric Corrosion Loss in Low-Alloy Steels with SHAP Analysis

by Saurabh Tiwari, Seong Jun Heo and Nokeun Park

Coatings 2026, 16(4), 488; https://doi.org/10.3390/coatings16040488 - 17 Apr 2026

Viewed by 83

Abstract

Atmospheric corrosion of carbon and low-alloy steels causes direct economic losses that are estimated at around 3.4% of the global GDP, and its accurate multi-year prediction is essential for protective coating selection, service-life estimation, and infrastructure maintenance scheduling. In this study, machine learning [...] Read more.

Atmospheric corrosion of carbon and low-alloy steels causes direct economic losses that are estimated at around 3.4% of the global GDP, and its accurate multi-year prediction is essential for protective coating selection, service-life estimation, and infrastructure maintenance scheduling. In this study, machine learning (ML) algorithms, including gradient boosting regressor (GBR), eXtreme gradient boosting (XGBoost), random forest (RF), support vector regression (SVR), and ridge regression, were trained on a 600-sample physics-grounded dataset to predict the cumulative atmospheric corrosion loss (µm) of low-alloy steels over 1–10 years of exposure. The dataset was constructed using the exact ISO 9223:2012 dose–response function (DRF) for a first-year corrosion rate and the ISO 9224:2012 power-law multi-year kinetic model (C(t) = C₁·t0.5), spanning ISO 9223 corrosivity categories C2–CX across 11 environmental and material input features. All models were evaluated on the original (untransformed) corrosion scale under an 80/20 train/test split and five-fold cross-validation. Gradient boosting achieved the best overall performance with test set R² = 0.968, CV-R² = 0.969, RMSE = 10.58 µm, MAE = 5.99 µm, and MAPE = 12.6%. XGBoost was a close second (R² = 0.958, CV-R² = 0.960). RF achieved an R² of 0.944. SHAP (SHapley Additive exPlanations) analysis identified SO₂ deposition rate, exposure time, relative humidity, Cl⁻ deposition rate, and temperature as the five most influential predictors. The dominance of the SO₂ deposition rate (mean |SHAP| = 26.37 µm) and the high second-place ranking of exposure time (13.67 µm) are fully consistent with the ISO 9223:2012 dose–response function and ISO 9224:2012 power-law kinetics, respectively, while among the material features, Cu and Cr contents showed the strongest negative SHAP contributions, confirming their corrosion-inhibiting roles in weathering steels. These results establish a physics-consistent, interpretable ML benchmark exceeding R² = 0.90 for multi-year cumulative corrosion loss prediction and provide a quantitative tool for alloy screening, coating selection in aggressive atmospheric environments, and service-life planning. Full article

(This article belongs to the Special Issue Anti-Corrosion and Anti-Wear Coatings: Fundamentals, Technologies, and Applications)

20 pages, 2493 KB

Open AccessArticle

Non-Destructive Determination of Moisture Content in White Tea During Withering Using VNIR Spectroscopy and Ensemble Modeling

by Qinghai He, Hongkai Shen, Zhiyuan Liu, Benxue Ma, Yong He, Zhi Lin, Weihong Liu, Pei Wang, Xiaoli Li and Peng Qi

Horticulturae 2026, 12(4), 488; https://doi.org/10.3390/horticulturae12040488 - 16 Apr 2026

Viewed by 201

Abstract

As one of the six major traditional tea types in China, white tea’s quality formation is primarily influenced by the withering process. However, traditional methods for monitoring withering fail to achieve precise and stable control of moisture content. To address this issue, a [...] Read more.

As one of the six major traditional tea types in China, white tea’s quality formation is primarily influenced by the withering process. However, traditional methods for monitoring withering fail to achieve precise and stable control of moisture content. To address this issue, a total of 650 samples were collected at 13 withering time points (0–36 h), and the dataset was split into training and test sets at a 7:3 ratio. This study proposes a PRXBoost ensemble model for quantitative detection of withered white tea, which integrates data augmentation and intelligent algorithms. The ensemble model uses a Bagging-based weighted integration technique to combine Partial Least Squares Regression (PLSR), Ridge, and Extreme Gradient Boosting (XGBoost) models, and it conducts an in-depth analysis of the decision-making process within the PRXBoost model. First, the effectiveness of the data augmentation strategy and the superiority of the gradient descent algorithm are verified through pre-modeling based on the PLSR model and hyperparameter pre-search using the XGBoost model, respectively. Additionally, the Bayes algorithm is employed to optimize the weights of the sub-models, further enhancing the overall predictive performance. The results show that the PRXBoost model achieved the best performance among the compared models on the test set, with R² = 0.854 and RMSE = 0.080, exceeding the highest R² of a single model by 6%. These results indicate that PRXBoost provided improved predictive performance for moisture estimation within the current dataset. Finally, the SHapley Additive exPlanations (SHAP) algorithm is used to analyze the influence of each input feature on the prediction results, successfully identifying the 1916 nm and 1453 nm spectral bands as significant influencers of the prediction outcomes. These results suggest that the proposed model can support rapid, non-destructive monitoring of moisture evolution and provide actionable information for withering endpoint decision control. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the Processing of Horticultural Crops)

23 pages, 1180 KB

Open AccessArticle

Carbon Emission Prediction Model for Railway Passenger Stations on the Qinghai–Tibet Plateau

by Guanguan Jia and Qingqin Wang

Sustainability 2026, 18(8), 3881; https://doi.org/10.3390/su18083881 - 14 Apr 2026

Viewed by 275

Abstract

Controlling operation-stage carbon emissions (CE) from transport buildings is crucial for China’s dual-carbon goals and the ecological security of the Qinghai–Tibet Plateau (QTP), and the sustainable development of plateau transport infrastructure. For plateau railway passenger stations (RPS), limited monitoring and distinctive high-altitude, cold-climate [...] Read more.

Controlling operation-stage carbon emissions (CE) from transport buildings is crucial for China’s dual-carbon goals and the ecological security of the Qinghai–Tibet Plateau (QTP), and the sustainable development of plateau transport infrastructure. For plateau railway passenger stations (RPS), limited monitoring and distinctive high-altitude, cold-climate operations make daily CE prediction difficult with conventional measurement- or simulation-based methods. This study develops a machine-learning approach based on a Monte Carlo synthetic database and derives engineering-standard formulas for direct use. Building scale, meteorology and passenger flow volume (PFV) were compiled for 12 representative RPS, and a large synthetic database of daily carbon emission was generated under multiple distribution constraints. With daily mean temperature, heating degree days, altitude, station floor area and PFV as inputs, four models were trained and assessed using mean absolute error, root mean square error, mean absolute percentage error (MAPE) and R². The results show that random forest (RF) performed best, achieving ~6% MAPE and R² > 0.99 on the test set, and markedly lower errors than multivariable linear regression. Interpretation of RF via feature importance and partial dependence shows that floor area, altitude and PFV dominate emissions and exhibit nonlinear response patterns. To improve transparency and transferability, ridge regression was used to fit a linear surrogate to RF predictions, producing engineering-standard formulas for daily and annual operation-stage CE. The formulas retain most predictive accuracy while requiring only readily obtainable variables, enabling rapid estimation and scenario analysis for cold, high-altitude RPS. The proposed workflow provides a replicable pathway for operational CE assessment in data-scarce regions and supports low-carbon planning, design and operation of RPS on the QTP, thereby contributing to more sustainable infrastructure development in high-altitude regions. Full article

(This article belongs to the Section Green Building)

17 pages, 834 KB

Open AccessArticle

Improved Data-Driven Shrinkage Estimators for Regression Models Under Severe Multicollinearity

by Ali Rashash R. Alzahrani and Asma Ahmad Alzahrani

Mathematics 2026, 14(8), 1245; https://doi.org/10.3390/math14081245 - 9 Apr 2026

Viewed by 222

Abstract

Multicollinearity is a critical issue in regression analysis, often resulting in inflated variances and unstable parameter estimates. Ridge regression is a widely adopted solution to address this challenge; however, existing ridge estimators are typically tailored to specific scenarios, limiting their universal applicability. Akhtar [...] Read more.

Multicollinearity is a critical issue in regression analysis, often resulting in inflated variances and unstable parameter estimates. Ridge regression is a widely adopted solution to address this challenge; however, existing ridge estimators are typically tailored to specific scenarios, limiting their universal applicability. Akhtar and Alharthi developed ridge estimators based on condition-adjusted ridge estimators (CAREs) to handle severe multicollinearity issues. However, their approach did not account for the error variances in the estimation process. In this study, we propose improvements to these CAREs by incorporating error variances, resulting in the development of multiscale ridge estimators (

M S R E_{1}

,

M S R E_{2}

,

M S R E_{3}

and

M S R E_{4}

) that more effectively address the challenges posed by severe multicollinearity. We compare the performance of our newly proposed estimators with ordinary least square (OLS) and other existing ridge estimators using both simulation studies and real-life datasets. The evaluation, based on estimated mean squared error (MSE), demonstrates that the proposed estimators consistently outperform existing methods, particularly in scenarios with significant multicollinearity, larger sample sizes, and higher predictor dimensions. Results from three real-life datasets further validate the proposed estimators’ ability to reduce estimation error and improve predictive accuracy across diverse practical applications. Full article

(This article belongs to the Special Issue Statistical Machine Learning: Models and Its Applications)

► Show Figures

Figure 1

27 pages, 23751 KB

Open AccessArticle

A Mathematical Framework for Retinal Vessel Segmentation: Fractional Hessian-Based Curvature Analysis

by Priyanka Harjule, Mukesh Delu, Rajesh Kumar and Pilani Nkomozepi

Fractal Fract. 2026, 10(4), 246; https://doi.org/10.3390/fractalfract10040246 - 8 Apr 2026

Viewed by 232

Abstract

This study proposes an improved retinal blood vessel segmentation method to enhance the diagnosis of microvascular retinal complications. The proposed method extracts local shape features from retinal images utilizing a fractional Hessian matrix, which models blood vessels as surface structures characterized by ridges [...] Read more.

This study proposes an improved retinal blood vessel segmentation method to enhance the diagnosis of microvascular retinal complications. The proposed method extracts local shape features from retinal images utilizing a fractional Hessian matrix, which models blood vessels as surface structures characterized by ridges and valleys resulting from variations in curvature. The methodology integrates adaptive principal curvature estimation with a new framework leveraging the fractional Hessian matrix with nonsingular and nonlocal kernels. The effectiveness of the suggested method is assessed using publicly accessible datasets, including DRIVE, HRF, STARE, and some real images obtained from a local hospital. The proposed segmentation achieves 96.77% accuracy and 98.82% specificity on the DRIVE database, 96.91% accuracy and 98.69% specificity on STARE, and 95.90% accuracy and 98.36% specificity on the HRF database. Optimal parameters for the fractional order and Gaussian standard deviation were empirically determined by maximizing segmentation accuracy. Our findings show that the proposed approach achieves competitive performance compared to the listed methods, including several deep learning approaches, while maintaining significant computational efficiency. The output of the suggested method can be further utilized with deep learning techniques, which will be applied in the clinical context of diabetic retinopathy and glaucoma to identify abnormalities likely related to disease progression and different stages. Full article

► Show Figures

Figure 1

15 pages, 1474 KB

Open AccessArticle

Prognostic Power of Ensemble Learning in Colorectal Cancer with Peritoneal Metastasis: A Multi-Institutional Analysis

by Yoshiko Bamba, Michio Itabashi, Hirotoshi Kobayashi, Kenjiro Kotake, Masayasu Kawasaki, Yukihide Kanemitsu, Yusuke Kinugasa, Hideki Ueno, Kotaro Maeda, Takeshi Suto, Kimihiko Funahashi, Heita Ozawa, Fumikazu Koyama, Shingo Noura, Hideyuki Ishida, Masayuki Ohue, Tomomichi Kiyomatsu, Soichiro Ishihara, Keiji Koda, Hideo Baba, Kenji Kawada, Yojiro Hashiguchi, Takanori Goi, Yuji Toiyama, Naohiro Tomita, Eiji Sunami, Yoshito Akagi, Jun Watanabe, Kenichi Hakamada, Goro Nakayama, Kenichi Sugihara and Yoichi Ajioka Show full author list Hide full author list

Bioengineering 2026, 13(4), 434; https://doi.org/10.3390/bioengineering13040434 - 8 Apr 2026

Viewed by 439

Abstract

Background: Owing to significant clinical heterogeneity, the achievement of accurate survival forecasting for individuals with colorectal cancer and peritoneal metastasis continues to be a complex undertaking. We aimed to transcend traditional prognostic limitations by evaluating machine learning boosting models against standard regression-based methods [...] Read more.

Background: Owing to significant clinical heterogeneity, the achievement of accurate survival forecasting for individuals with colorectal cancer and peritoneal metastasis continues to be a complex undertaking. We aimed to transcend traditional prognostic limitations by evaluating machine learning boosting models against standard regression-based methods in terms of estimating overall survival (OS). Methods: Utilizing a multi-institutional registry of 150 patients diagnosed with synchronous peritoneal metastasis of colorectal cancer, we integrated 124 clinicopathological variables to refine our predictive models. Beyond standard preprocessing—including standardization and median imputation—we rigorously compared XGBoost and LightGBM against Ridge, Lasso, and linear regression via five-fold cross-validation. To specifically address right-censoring, an XGBoost Cox model was implemented and validated using Harrell’s C-index, with SHAP and LIME providing essential model interpretability. Results: Boosting models consistently outperformed linear alternatives, which struggled with high error rates and negative R2 values. Specifically, XGBoost achieved an MAE of 475 ± 60 and an RMSE of 585 ± 88. The XGBoost Cox model reached a C-index of 0.64 ± 0.06. SHAP analysis highlighted inflammatory markers and peritoneal disease extent as the most influential prognostic drivers. Conclusions: While boosting models offer a clear accuracy advantage over linear methods, their prognostic power remains moderate. These findings underscore the potential of ensemble learning in oncology, yet mandate external validation before these tools can be integrated into clinical decision-making. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

30 pages, 507 KB

Open AccessArticle

Beyond MSE in Poisson Ridge Regression: New Ridge Parameter Estimators with Additional Distributional Performance Criteria

by Selman Mermi

Mathematics 2026, 14(7), 1190; https://doi.org/10.3390/math14071190 - 2 Apr 2026

Viewed by 258

Abstract

Despite its widespread use for mitigating multicollinearity in count data models, Poisson ridge regression (PRR) remains methodologically constrained by the choice of the ridge parameter

k

. Existing studies predominantly evaluate ridge parameter estimators using only the mean squared error (MSE) criterion, largely [...] Read more.

Despite its widespread use for mitigating multicollinearity in count data models, Poisson ridge regression (PRR) remains methodologically constrained by the choice of the ridge parameter

k

. Existing studies predominantly evaluate ridge parameter estimators using only the mean squared error (MSE) criterion, largely neglecting their distributional properties and estimation stability. Such a narrow evaluation framework may yield unreliable inference, particularly under high correlation and small sample sizes. This study makes two original contributions to the PRR literature. First, we conduct a comprehensive comparison of 13 commonly used ridge parameter estimators and introduce two new estimators that exhibit superior empirical performance. Second, we extend performance evaluation beyond MSE by incorporating outlier ratios and conformity to normality, thereby establishing a multidimensional framework that explicitly addresses distributional robustness and estimator stability. Monte Carlo simulations across 180 scenarios—varying the number of predictors, sample size, correlation level, and intercept value—show that several estimators deemed optimal under MSE perform poorly in terms of outlier prevalence and normality. In contrast, the proposed estimators consistently achieve a balanced performance between error minimization and distributional stability. Two real-data applications further support these findings. Full article

(This article belongs to the Special Issue Statistical Models and Their Applications)

► Show Figures

Figure 1

23 pages, 2936 KB

Open AccessArticle

A Global Multi-Hazard Framework for Projecting Climate Migration Flows to 2100 Along Shared Socioeconomic Pathways (SSPs)

by Zachary M. Hirsch, Danielle N. Medgyesi, Jasmina M. Buresch and Jeremy R. Porter

Climate 2026, 14(4), 81; https://doi.org/10.3390/cli14040081 - 2 Apr 2026

Viewed by 561

Abstract

Climate-induced migration is increasingly recognized as a major demographic consequence of environmental change, yet projections vary widely due to differences in spatial scale, hazard coverage, and modeling approaches. This study introduces the First Street Global Climate Migration Model (FS-GCMM), a globally consistent, multi-hazard [...] Read more.

Climate-induced migration is increasingly recognized as a major demographic consequence of environmental change, yet projections vary widely due to differences in spatial scale, hazard coverage, and modeling approaches. This study introduces the First Street Global Climate Migration Model (FS-GCMM), a globally consistent, multi-hazard framework that estimates climate-driven population redistribution at a 12.5 km resolution across all countries through 2100. The model integrates high-resolution global climate hazard datasets, including flood (GloFAS), wind (IBTrACS and ERA5), drought (ERA5), wildfire (Global Fire Atlas), and extreme heat and cold (ERA5-LAND) datasets, with gridded population data from NASA SEDAC’s Gridded Population of the World (GPWv4) and Shared Socioeconomic Pathway (SSP) projections. To identify climate-related migration effects, we applied within-country propensity score matching to construct balanced samples of exposed and unexposed grid cells with similar socioeconomic, demographic, geographic, and governance characteristics. Hazard-specific impacts on annualized population change from 2000 to 2020 were then estimated using mixed-effects ridge regression with country-level random effects to account for cross-national heterogeneity and multicollinearity. These empirically derived coefficients were applied to SSP1-2.6, SSP2-4.5, and SSP5-8.5 scenarios to project future climate-driven outmigration, which was subsequently redistributed using a spatial attractiveness framework incorporating economic opportunity, population density, climate safety, and geographic proximity. Results indicate statistically significant negative effects of all modeled hazards on population retention globally, with approximately 199.5 million people projected to experience climate-driven displacement by 2055 under SSP2-4.5. Full article

► Show Figures

Figure 1

16 pages, 3658 KB

Open AccessArticle

Runoff and Sediment Flux on the North Coast of KwaZulu-Natal: Counter-Acting Beach Erosion from Rising Seas?

by Mark R. Jury

Coasts 2026, 6(2), 13; https://doi.org/10.3390/coasts6020013 - 1 Apr 2026

Viewed by 341

Abstract

A remote analysis of coastal sedimentation in northern KwaZulu-Natal (KZN), South Africa, describes how summer runoff and winter wave-action operate within a highly variable climate. Despite rising sea levels, the sediment flux can sustain beaches under certain conditions. Daily satellite red-band reflectivity and [...] Read more.

A remote analysis of coastal sedimentation in northern KwaZulu-Natal (KZN), South Africa, describes how summer runoff and winter wave-action operate within a highly variable climate. Despite rising sea levels, the sediment flux can sustain beaches under certain conditions. Daily satellite red-band reflectivity and ocean–atmosphere reanalysis datasets were studied over the period of 2018–2025. Statistical results indicate that streamflow discharges are spread northward by oblique wave-driven currents. Sediment concentrations peak during late winter (>1 mg/L, May–October) when deep turbulent mixing (>40 m) mobilizes sand from the seabed. A case study from September 2021 revealed that ridging high-pressure/cut-off low weather patterns can simultaneously increase streamflow, wave energy, and wind power, creating a surf-zone sediment conveyor along the coast of northern KZN. Long-term climate diagnostics from 1981 to 2025 reveal upward trends in coastal runoff, vegetation, and turbidity (0.29 σ/yr) that point to an increasingly vigorous water cycle. The warming of the southeast Atlantic intensifies the sub-tropical upper-level westerlies and late winter storms over southeast Africa. These processes occur in 5–8 year cycles and drive shoreline advance and retreat, from accretion ~1 T/m and storm surge inundations up to 5.5 m. Using Digital Earth, it was noted that ~1/4 of beaches around Africa are gaining sediment while ~1/3 are eroding. Although remote information could not close the sediment budget, realistic estimates of long-shore transport in the surf-zone (>10⁴ kg/yr/m) and on the beach (>10³ kg/yr/m) were calculated. These provide an emerging explanation for the resilience of northern KZN beaches, as sea levels rise at a rate of 0.6 cm/yr. Full article

► Show Figures

Figure 1

13 pages, 44672 KB

Open AccessArticle

ARMANI: Dictionary-Learning-Inspired Data-Free Deep Generative Modeling with Meta-Attention and Implicit Preconditioning for Compressively Sampled Magnetic Resonance Imaging

by Ming Wu, Jing Cheng, Qingyong Zhu and Dong Liang

Electronics 2026, 15(7), 1402; https://doi.org/10.3390/electronics15071402 - 27 Mar 2026

Viewed by 266

Abstract

Magnetic resonance imaging (MRI) reconstruction from undersampled k-space data enables accelerated acquisition but leads to a severely ill-posed inverse problem. Although supervised deep learning methods have achieved strong performance, they typically rely on large paired datasets that are difficult to obtain in clinical [...] Read more.

Magnetic resonance imaging (MRI) reconstruction from undersampled k-space data enables accelerated acquisition but leads to a severely ill-posed inverse problem. Although supervised deep learning methods have achieved strong performance, they typically rely on large paired datasets that are difficult to obtain in clinical practice. To address these limitations, we propose a dictionary-learning-inspired dAta-fRee deep generative modeling with Meta-Attention and implicit precoNditIoning for compressively sampled MRI (CS-MRI), termed ARMANI. Specifically, a meta-attention-augmented deep image prior (MA-DIP) generator performs a joint optimization over the latent input

η

and the network parameter

θ

, where

η

is regularized via gradient-domain sparsity and

θ

is constrained by a ridge penalty, mirroring the adaptive estimation of sparse coefficients and an empirical sparsifying dictionary. Furthermore, we integrate a single-step pseudo-orthogonal projection to achieve implicit preconditioning, which modulates the loss landscape and mitigates ill-conditioning of the forward operator. Experimental results demonstrate that ARMANI consistently outperforms existing SOTA data-free and self-supervised methods, and, with limited training data, achieves performance comparable to or slightly better than the supervised benchmark MoDL, with effective artifact suppression and faithful recovery of fine structural details. Overall, ARMANI shows strong scalability and potential for practical deployment in fully data-free CS-MRI reconstruction scenarios. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Medical Image Processing)

► Show Figures

Figure 1

27 pages, 8176 KB

Open AccessArticle

Climate and Vegetation Dominate Lake Eutrophication in the Inner Mongolia–Xinjiang Plateau (2000–2024)

by Yuzheng Zhang, Feifei Cao, Yuping Rong, Linglong Wen, Wei Su, Jianjun Wu, Yaling Yin, Zhilin Zi, Shasha Liu and Leizhen Liu

Remote Sens. 2026, 18(7), 988; https://doi.org/10.3390/rs18070988 - 25 Mar 2026

Viewed by 534

Abstract

Lakes on the Inner Mongolia–Xinjiang Plateau (IMXP) are increasingly vulnerable to eutrophication under climate change and human pressure, yet long-term monitoring remains limited by sparse field sampling. Here, we reconstruct multi-decadal trophic dynamics across the IMXP using Landsat time series and temporally transferable [...] Read more.

Lakes on the Inner Mongolia–Xinjiang Plateau (IMXP) are increasingly vulnerable to eutrophication under climate change and human pressure, yet long-term monitoring remains limited by sparse field sampling. Here, we reconstruct multi-decadal trophic dynamics across the IMXP using Landsat time series and temporally transferable machine-learning models and further quantify the underlying natural and anthropogenic drivers. We compiled monthly in situ water-quality observations (chlorophyll-a, Chl-a; total phosphorus, TP; total nitrogen, TN; Secchi depth, SD; and permanganate index, COD_Mn;) and calculated the trophic level index (TLI). After rigorous quality control and monthly aggregation, we compiled a dataset of 1345 matched lake–month samples spanning 2000–2024, and divided it into a training set (n = 1076; ≤2019) and an independent test set (n = 269; 2020–2024) to evaluate temporal transferability. We utilized Google Earth Engine to generate monthly surface reflectance composites from Landsat 7 ETM+, Landsat 8 OLI, and Landsat 9 OLI-2. Four supervised regression algorithms—ridge regression (RR), support vector regression (SVR), random forest (RF), and eXtreme Gradient Boosting (XGBoost)—were trained to estimate TLI. On the independent test period, XGBoost performed best (R² = 0.780, RMSE = 3.290, MAE = 1.779), followed by RF (R² = 0.770, RMSE = 3.364), SVR (R² = 0.700, RMSE = 3.842), and RR (R² = 0.630, RMSE = 4.267); we then used XGBoost to reconstruct monthly and yearly TLI for 610 perennial grassland lakes from 2000 to 2024. From 2000 to 2024, the annual mean TLI (48–49) across the IMXP exhibited a statistically significant upward trend (slope = 0.0158 TLI yr⁻¹; 95% confidence interval (CI) = 0.0050–0.0267; p = 0.006). Meanwhile, spatial heterogeneity was distinct (TLI: 41.51–59.70). High values concentrated in endorheic and desert–oasis basins (e.g., Eastern Inner Mongolia Plateau, >51), whereas lower values characterized high-altitude regions (e.g., Yarkant River, <45). Overall, trends ranged from −0.49 to 0.51 yr⁻¹, increasing in 54% of lakes (15.6% significantly) and decreasing in 46% (15.4% significantly). Attribution analyses identified NDVI (33.92%) and temperature (21.67%) as dominant drivers (55.59% combined), followed by precipitation (13.99%) and human proxies (30.42% combined: population 10.66%, grazing 10.31%, built-up 9.45%). Across 53 sub-basins, NDVI was the primary driver in 28, followed by temperature (11), population (7), precipitation (3), grazing (3), and built-up land (1); notably, the top two drivers explained 56.6–87.1% of variations. TWFE estimates revealed bidirectional NDVI effects (significant in 31/53): positive associations in 22 basins were linked to nutrient retention, contrasting with negative effects in nine basins associated with agricultural return flows. Temperature effects were significant in 15 basins and predominantly negative (14/15), except for the Qiangtang Plateau. Overall, eutrophication risk across the IMXP lake region reflects the combined influences of climatic conditions, vegetation conditions, and human activities, with their relative contributions varying among basins. Full article

(This article belongs to the Special Issue Remote Sensing of Forests, Grasslands, and Lakes and Their Interactions)

► Show Figures

Figure 1

34 pages, 1788 KB

Open AccessArticle

A Two-Stage Comparative Framework for Predicting Photovoltaic Cleaning Schedules: Modeling and Comparisons Based on Real and Simulated Data

by Ali Al-Humairi, Enmar Khalis, Zuhair A. Al Hemyari and Peter Jung

Appl. Sci. 2026, 16(6), 2976; https://doi.org/10.3390/app16062976 - 19 Mar 2026

Viewed by 304

Abstract

This study develops and validates a two-stage comparative framework for predicting Photovoltaic (PV) cleaning schedules by integrating high-resolution operational data with regression-based simulated datasets generated from statistical models trained on real measurements. The work directly addresses the growing need to assess whether model-based [...] Read more.

This study develops and validates a two-stage comparative framework for predicting Photovoltaic (PV) cleaning schedules by integrating high-resolution operational data with regression-based simulated datasets generated from statistical models trained on real measurements. The work directly addresses the growing need to assess whether model-based regression-based simulated data can reliably substitute real measurements in predictive PV maintenance. These models are employed to generate clean-condition power baselines and to estimate daily energy losses attributable to soiling under two distinct paradigms: (i) using real historical PV performance and environmental measurements, and (ii) using regression-derived, regression-based simulated data representing idealized clean operating conditions. Model performance is rigorously quantified using correlation coefficients (R), coefficients of determination (R²), mean absolute deviations, and binary classification metrics including accuracy, precision, recall, and F1-score. The comprehensive results demonstrate that regression-based simulated datasets exhibit high fidelity with real measurements across key electrical variables. This is evident for datasets generated using PLSR, Ridge Regression, and Robust Regression. Strong correlations are observed for DC power (R² = 0.9545) and DC current (R² = 0.9520), with mean deviations consistently below 2.2%. When a threshold-based binary decision rule (“clean” versus “do not clean”) is applied, cleaning decisions derived from simulated and real datasets show near-perfect concordance, achieving a mean F1-score of 0.9792. These results indicate that for a fixed performance-loss threshold, models using regression-based simulated data reproduce real-data-based cleaning triggers with an accuracy exceeding 97%. Furthermore, the findings confirm that regression-based simulation frameworks constitute a reliable and scalable foundation for data-driven PV maintenance optimization. By enabling efficient cleaning scheduling, these frameworks can significantly reduce operational expenditure and maximize energy yield, particularly in regions where continuous, high-quality PV monitoring data are limited or difficult to obtain. Full article

► Show Figures

Figure 1

26 pages, 1011 KB

Open AccessArticle

A Study on Machine Learning-Based Cost Estimation Models for AI Training Data Construction

by Yoon-Seok Ko and Bong Gyou Lee

Appl. Sci. 2026, 16(6), 2891; https://doi.org/10.3390/app16062891 - 17 Mar 2026

Viewed by 649

Abstract

This study proposes an explainable machine learning framework for estimating the total project cost (TPC) of AI training-data construction, where cost information is difficult to structure due to heterogeneous workflows and quality requirements. Using 386 public AI training-data projects conducted between 2020 and [...] Read more.

This study proposes an explainable machine learning framework for estimating the total project cost (TPC) of AI training-data construction, where cost information is difficult to structure due to heterogeneous workflows and quality requirements. Using 386 public AI training-data projects conducted between 2020 and 2022, we derive 24 numerical predictors from standardized final reports and construct three input tracks: a baseline feature set, a principal component analysis (PCA)-enhanced set, and a factor analysis (FA)–enhanced set capturing latent cost structures. Four regression models (Ridge, Random Forest, XGBoost, and LightGBM) are evaluated using nested cross-validation. XGBoost achieves the best overall performance across all three tracks (Baseline, PCA-enhanced, and FA-enhanced). Among them, PCA-enhanced XGBoost attains the highest predictive accuracy (R² = 0.868; RMSE = 1084.9; MAE = 746.9; MAPE = 0.358; pooled out-of-fold), while Baseline XGBoost yields the lowest MAE (731.4; R² = 0.863). To support transparent decision-making, Shapley Additive exPlanations (SHAP)-based attribution and scenario-based sensitivity analyses are conducted. Results show that project scale and process-level unit costs are dominant cost-drivers, while cloud usage, expert participation, and de-identification requirements exhibit secondary effects. The proposed framework provides an interpretable, data-driven approach to cost information management and decision support for data-intensive AI projects. Full article

► Show Figures

Figure 1

30 pages, 1713 KB

Open AccessArticle

Safe-Calibrated TCN–Transformer Transfer Learning for Reliable Battery SoH Estimation Under Lab-to-Field Domain Shift

by Kumbirayi Nyachionjeka and Ehab H. E. Bayoumi

World Electr. Veh. J. 2026, 17(3), 149; https://doi.org/10.3390/wevj17030149 - 17 Mar 2026

Viewed by 597

Abstract

Battery state-of-health (SoH) estimation is central to transportation electrification because it conditions safety limits, warranty accounting, power capability management, and long-horizon fleet optimization. Although deep temporal architectures can achieve high laboratory accuracy, field deployment is frequently limited by laboratory (Lab)-to-field (L2F) domain shift [...] Read more.

Battery state-of-health (SoH) estimation is central to transportation electrification because it conditions safety limits, warranty accounting, power capability management, and long-horizon fleet optimization. Although deep temporal architectures can achieve high laboratory accuracy, field deployment is frequently limited by laboratory (Lab)-to-field (L2F) domain shift that alters input statistics, feature definitions, and noise regimes. Under such a shift, predictors may remain strongly monotonic, preserving degradation ordering and become operationally unreliable due to systematic output distortion (e.g., compression/warping of the SoH scale). A deployment-complete L2F transfer learning pipeline is presented, built around a gated Temporal Convolutional Network (TCN)–Transformer fusion backbone, domain-specific adapters and heads, alignment-regularized fine-tuning, and row-level inference via sliding-window overlap averaging. To address the dominant deployment failure mode, a Safe Calibration stage robustly filters calibration pairs and selects among candidate calibrators under a strict do-no-harm criterion. On an unseen deployment stream (2154 labeled rows), overlap-averaged raw inference achieves MAE = 0.0439, RMSE = 0.0501, and R² = 0.7451, consistent with mid-to-high SoH range compression, while Safe Calibration (Isotonic-Balanced selected) corrects nonlinear scaling without violating monotonic structure, improving to MAE = 0.0188, RMSE = 0.0252, and R² = 0.9357 to obtain a complete understanding of the challenges due to domain shifts, evaluation is extended to include other architecture baselines such as TCN-only, Transformer-only, Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM), and a Ridge regression baseline. Also added is explicit alignment and calibration ablations that include CORAL off/on, that is, none vs. Safe-Global vs. Context-Aware under identical leakage-safe splits and the same overlap-averaged deployment inference operator. This work goes beyond peak-score reporting and looks at the robustness of a pipeline under domain shift, which is quantified across four random seeds and multiple deployment streams, with uncertainty summarized via mean ± std and bootstrap confidence intervals for Mean of Absolute value of Errors (MAE)/Root of the Mean of the Square of Errors (RMSE) computed from per-example absolute errors. Full article

(This article belongs to the Section Storage Systems)

► Show Figures

Figure 1

22 pages, 10555 KB

Open AccessArticle

Deep Learning-Based Recognition of Arch-Back Direction in Bare-Root Strawberry Seedlings for Mechanized Transplanting

by Jinhao Zhou, Pengcheng Zhang, Menglei Wei, Wei Liu, Jiawei Shi, Youheng Tan and Jianping Hu

Agriculture 2026, 16(6), 657; https://doi.org/10.3390/agriculture16060657 - 13 Mar 2026

Viewed by 296

Abstract

Correct arch-back orientation is essential in ridge-based strawberry transplanting. Improper orientation can increase soil contact and soil-borne disease risk, leading to yield loss and reduced harvest efficiency. In current practice, arch-back orientation of bare-root seedlings is still mainly judged and corrected manually, which [...] Read more.

Correct arch-back orientation is essential in ridge-based strawberry transplanting. Improper orientation can increase soil contact and soil-borne disease risk, leading to yield loss and reduced harvest efficiency. In current practice, arch-back orientation of bare-root seedlings is still mainly judged and corrected manually, which is labor-intensive and not always accurate under field conditions. Although plug seedlings are easier for mechanized transplanting, they are about three times more expensive than bare-root seedlings. Therefore, bare-root seedlings remain widely used for cost-effective production. However, accurate real-time orientation perception for bare-root seedlings is still challenging because stems are thin, morphology varies widely, and leaves often occlude key curvature cues. To address this gap, we propose a lightweight machine-vision method for bare-root strawberry seedlings that detects three characteristic keypoints on the new stem. The three-keypoint design is inspired by farmers’ practical judgement: farmers often determine arch-back direction by observing the stem and using manual touch to sense curvature changes. Similarly, three keypoints provide a simple geometric representation of curvature trend, enabling real-time estimation of both arch-back direction and bending angle. Physical tests on 100 bare-root seedlings achieved a 93% agronomically compliant orientation rate, with an MAE of 5.74° and an RMSE of 7.44° for bending-angle estimation. For edge deployment, the optimized model achieved real-time performance on an embedded GPU platform, reaching 152.51 FPS (FP16) and 154.26 FPS (INT8). Overall, the proposed method provides a practical perception module that can be integrated into strawberry transplanting machines to support cost-effective, orientation-aware mechanized transplanting. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

Search Results (483)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (483)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI