Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,467)

Search Parameters:
Keywords = XGboost

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 10868 KiB  
Article
Quantitative Analysis and Nonlinear Response of Vegetation Dynamic to Driving Factors in Arid and Semi-Arid Regions of China
by Shihao Liu, Dazhi Yang, Xuyang Zhang and Fangtian Liu
Land 2025, 14(8), 1575; https://doi.org/10.3390/land14081575 (registering DOI) - 1 Aug 2025
Abstract
Vegetation dynamics are complexly influenced by multiple factors such as climate, human activities, and topography. In recent years, the frequency, intensity, and diversity of human activities have increased, placing substantial pressure on the growth of vegetation. Arid and semi-arid regions are particularly sensitive [...] Read more.
Vegetation dynamics are complexly influenced by multiple factors such as climate, human activities, and topography. In recent years, the frequency, intensity, and diversity of human activities have increased, placing substantial pressure on the growth of vegetation. Arid and semi-arid regions are particularly sensitive to climate change, and climate change and large-scale ecological restoration have led to significant changes in the dynamic of dryland vegetation. However, few studies have explored the nonlinear relationships between these factors and vegetation dynamic. In this study, we integrated trend analysis (using the Mann–Kendall test and Theil–Sen estimation) and machine learning algorithms (XGBoost-SHAP model) based on long time-series remote sensing data from 2001 to 2020 to quantify the nonlinear response patterns and threshold effects of bioclimatic variables, topographic features, soil attributes, and anthropogenic factors on vegetation dynamic. The results revealed the following key findings: (1) The kNDVI in the study area showed an overall significant increasing trend (p < 0.01) during the observation period, of which 26.7% of the area showed a significant increase. (2) The water content index (Bio 23, 19.6%), the change in land use (15.2%), multi-year average precipitation (pre, 15.0%), population density (13.2%), and rainfall seasonality (Bio 15, 10.9%) were the key factors driving the dynamic change of vegetation, with the combined contribution of natural factors amounting to 64.3%. (3) Among the topographic factors, altitude had a more significant effect on vegetation dynamics, with higher altitude regions less likely to experience vegetation greening. Both natural and anthropogenic factors exhibited nonlinear responses and interactive effects, contributing to the observed dynamic trends. This study provides valuable insights into the driving mechanisms behind the condition of vegetation in arid and semi-arid regions of China and, by extension, in other arid regions globally. Full article
(This article belongs to the Section Land Use, Impact Assessment and Sustainability)
Show Figures

Figure 1

12 pages, 869 KiB  
Article
Neonatal Jaundice Requiring Phototherapy Risk Factors in a Newborn Nursery: Machine Learning Approach
by Yunjin Choi, Sunyoung Park and Hyungbok Lee
Children 2025, 12(8), 1020; https://doi.org/10.3390/children12081020 (registering DOI) - 1 Aug 2025
Abstract
Background: Neonatal jaundice is common and can cause severe hyperbilirubinemia if untreated. The early identification of at-risk newborns is challenging despite the existing guidelines. Objective: This study aimed to identify the key maternal and neonatal risk factors for jaundice requiring phototherapy using machine [...] Read more.
Background: Neonatal jaundice is common and can cause severe hyperbilirubinemia if untreated. The early identification of at-risk newborns is challenging despite the existing guidelines. Objective: This study aimed to identify the key maternal and neonatal risk factors for jaundice requiring phototherapy using machine learning. Methods: In this study hospital, phototherapy was administered following the American Academy of Pediatrics (AAP) guidelines when a neonate’s transcutaneous bilirubin level was in the high-risk zone. To identify the risk factors for phototherapy, we retrospectively analyzed the electronic medical records of 8242 neonates admitted between 2017 and 2022. Predictive models were trained using maternal and neonatal data. XGBoost showed the best performance (AUROC = 0.911). SHAP values interpreted the model. Results: Mode of delivery, neonatal feeding indicators (including daily formula intake and breastfeeding frequency), maternal BMI, and maternal white blood cell count were strong predictors. Cesarean delivery and lower birth weight were linked to treatment need. Conclusions: Machine learning models using perinatal data accurately predict the risk of neonatal jaundice requiring phototherapy, potentially aiding early clinical decisions and improving outcomes. Full article
(This article belongs to the Section Pediatric Nursing)
Show Figures

Figure 1

22 pages, 8105 KiB  
Article
Extraction of Sparse Vegetation Cover in Deserts Based on UAV Remote Sensing
by Jie Han, Jinlei Zhu, Xiaoming Cao, Lei Xi, Zhao Qi, Yongxin Li, Xingyu Wang and Jiaxiu Zou
Remote Sens. 2025, 17(15), 2665; https://doi.org/10.3390/rs17152665 (registering DOI) - 1 Aug 2025
Abstract
The unique characteristics of desert vegetation, such as different leaf morphology, discrete canopy structures, sparse and uneven distribution, etc., pose significant challenges for remote sensing-based estimation of fractional vegetation cover (FVC). The Unmanned Aerial Vehicle (UAV) system can accurately distinguish vegetation patches, extract [...] Read more.
The unique characteristics of desert vegetation, such as different leaf morphology, discrete canopy structures, sparse and uneven distribution, etc., pose significant challenges for remote sensing-based estimation of fractional vegetation cover (FVC). The Unmanned Aerial Vehicle (UAV) system can accurately distinguish vegetation patches, extract weak vegetation signals, and navigate through complex terrain, making it suitable for applications in small-scale FVC extraction. In this study, we selected the floodplain fan with Caragana korshinskii Kom as the constructive species in Hatengtaohai National Nature Reserve, Bayannur, Inner Mongolia, China, as our study area. We investigated the remote sensing extraction method of desert sparse vegetation cover by placing samples across three gradients: the top, middle, and edge of the fan. We then acquired UAV multispectral images; evaluated the applicability of various vegetation indices (VIs) using methods such as supervised classification, linear regression models, and machine learning; and explored the feasibility and stability of multiple machine learning models in this region. Our results indicate the following: (1) We discovered that the multispectral vegetation index is superior to the visible vegetation index and more suitable for FVC extraction in vegetation-sparse desert regions. (2) By comparing five machine learning regression models, it was found that the XGBoost and KNN models exhibited relatively lower estimation performance in the study area. The spatial distribution of plots appeared to influence the stability of the SVM model when estimating fractional vegetation cover (FVC). In contrast, the RF and LASSO models demonstrated robust stability across both training and testing datasets. Notably, the RF model achieved the best inversion performance (R2 = 0.876, RMSE = 0.020, MAE = 0.016), indicating that RF is one of the most suitable models for retrieving FVC in naturally sparse desert vegetation. This study provides a valuable contribution to the limited existing research on remote sensing-based estimation of FVC and characterization of spatial heterogeneity in small-scale desert sparse vegetation ecosystems dominated by a single species. Full article
Show Figures

Figure 1

21 pages, 6231 KiB  
Article
Integrating In Vitro Propagation and Machine Learning Modeling for Efficient Shoot and Root Development in Aronia melanocarpa
by Mehmet Yaman, Esra Bulunuz Palaz, Musab A. Isak, Serap Demirel, Tolga İzgü, Sümeyye Adalı, Fatih Demirel, Özhan Şimşek, Gheorghe Cristian Popescu and Monica Popescu
Horticulturae 2025, 11(8), 886; https://doi.org/10.3390/horticulturae11080886 (registering DOI) - 1 Aug 2025
Abstract
Aronia melanocarpa (black chokeberry) is a medicinally valuable small fruit species, yet its commercial propagation remains limited by low rooting and genotype-specific responses. This study developed an efficient, callus-free micropropagation and rooting protocol using a Shrub Plant Medium (SPM) supplemented with 5 mg/L [...] Read more.
Aronia melanocarpa (black chokeberry) is a medicinally valuable small fruit species, yet its commercial propagation remains limited by low rooting and genotype-specific responses. This study developed an efficient, callus-free micropropagation and rooting protocol using a Shrub Plant Medium (SPM) supplemented with 5 mg/L BAP in large 660 mL jars, which yielded up to 27 shoots per explant. Optimal rooting (100%) was achieved with 0.5 mg/L NAA + 0.25 mg/L IBA in half-strength SPM. In the second phase, supervised machine learning models, including Random Forest (RF), XGBoost, Gaussian Process (GP), and Multilayer Perceptron (MLP), were employed to predict morphogenic traits based on culture conditions. XGBoost and RF outperformed other models, achieving R2 values exceeding 0.95 for key variables such as shoot number and root length. These results demonstrate that data-driven modeling can enhance protocol precision and reduce experimental workload in plant tissue culture. The study also highlights the potential for combining physiological understanding with artificial intelligence to streamline future in vitro applications in woody species. Full article
(This article belongs to the Special Issue Tissue Culture and Micropropagation Techniques of Horticultural Crops)
Show Figures

Figure 1

15 pages, 1635 KiB  
Article
Modeling the Abrasive Index from Mineralogical and Calorific Properties Using Tree-Based Machine Learning: A Case Study on the KwaZulu-Natal Coalfield
by Mohammad Afrazi, Chia Yu Huat, Moshood Onifade, Manoj Khandelwal, Deji Olatunji Shonuga, Hadi Fattahi and Danial Jahed Armaghani
Mining 2025, 5(3), 48; https://doi.org/10.3390/mining5030048 (registering DOI) - 1 Aug 2025
Abstract
Accurate prediction of the coal abrasive index (AI) is critical for optimizing coal processing efficiency and minimizing equipment wear in industrial applications. This study explores tree-based machine learning models; Random Forest (RF), Gradient Boosting Trees (GBT), and Extreme Gradient Boosting (XGBoost) to predict [...] Read more.
Accurate prediction of the coal abrasive index (AI) is critical for optimizing coal processing efficiency and minimizing equipment wear in industrial applications. This study explores tree-based machine learning models; Random Forest (RF), Gradient Boosting Trees (GBT), and Extreme Gradient Boosting (XGBoost) to predict AI using selected coal properties. A database of 112 coal samples from the KwaZulu-Natal Coalfield in South Africa was used. Initial predictions using all eight input properties revealed suboptimal testing performance (R2: 0.63–0.72), attributed to outliers and noisy data. Feature importance analysis identified calorific value, quartz, ash, and Pyrite as dominant predictors, aligning with their physicochemical roles in abrasiveness. After data cleaning and feature selection, XGBoost achieved superior accuracy (R2 = 0.92), outperforming RF (R2 = 0.85) and GBT (R2 = 0.81). The results highlight XGBoost’s robustness in modeling non-linear relationships between coal properties and AI. This approach offers a cost-effective alternative to traditional laboratory methods, enabling industries to optimize coal selection, reduce maintenance costs, and enhance operational sustainability through data-driven decision-making. Additionally, quartz and Ash content were identified as the most influential parameters on AI using the Cosine Amplitude technique, while calorific value had the least impact among the selected features. Full article
(This article belongs to the Special Issue Mine Automation and New Technologies)
Show Figures

Figure 1

48 pages, 2506 KiB  
Article
Enhancing Ship Propulsion Efficiency Predictions with Integrated Physics and Machine Learning
by Hamid Reza Soltani Motlagh, Seyed Behbood Issa-Zadeh, Md Redzuan Zoolfakar and Claudia Lizette Garay-Rondero
J. Mar. Sci. Eng. 2025, 13(8), 1487; https://doi.org/10.3390/jmse13081487 - 31 Jul 2025
Abstract
This research develops a dual physics-based machine learning system to forecast fuel consumption and CO2 emissions for a 100 m oil tanker across six operational scenarios: Original, Paint, Advanced Propeller, Fin, Bulbous Bow, and Combined. The combination of hydrodynamic calculations with Monte [...] Read more.
This research develops a dual physics-based machine learning system to forecast fuel consumption and CO2 emissions for a 100 m oil tanker across six operational scenarios: Original, Paint, Advanced Propeller, Fin, Bulbous Bow, and Combined. The combination of hydrodynamic calculations with Monte Carlo simulations provides a solid foundation for training machine learning models, particularly in cases where dataset restrictions are present. The XGBoost model demonstrated superior performance compared to Support Vector Regression, Gaussian Process Regression, Random Forest, and Shallow Neural Network models, achieving near-zero prediction errors that closely matched physics-based calculations. The physics-based analysis demonstrated that the Combined scenario, which combines hull coatings with bulbous bow modifications, produced the largest fuel consumption reduction (5.37% at 15 knots), followed by the Advanced Propeller scenario. The results demonstrate that user inputs (e.g., engine power: 870 kW, speed: 12.7 knots) match the Advanced Propeller scenario, followed by Paint, which indicates that advanced propellers or hull coatings would optimize efficiency. The obtained insights help ship operators modify their operational parameters and designers select essential modifications for sustainable operations. The model maintains its strength at low speeds, where fuel consumption is minimal, making it applicable to other oil tankers. The hybrid approach provides a new tool for maritime efficiency analysis, yielding interpretable results that support International Maritime Organization objectives, despite starting with a limited dataset. The model requires additional research to enhance its predictive accuracy using larger datasets and real-time data collection, which will aid in achieving global environmental stewardship. Full article
(This article belongs to the Special Issue Machine Learning for Prediction of Ship Motion)
18 pages, 1584 KiB  
Article
What Determines Carbon Emissions of Multimodal Travel? Insights from Interpretable Machine Learning on Mobility Trajectory Data
by Guo Wang, Shu Wang, Wenxiang Li and Hongtai Yang
Sustainability 2025, 17(15), 6983; https://doi.org/10.3390/su17156983 (registering DOI) - 31 Jul 2025
Abstract
Understanding the carbon emissions of multimodal travel—comprising walking, metro, bus, cycling, and ride-hailing—is essential for promoting sustainable urban mobility. However, most existing studies focus on single-mode travel, while underlying spatiotemporal and behavioral determinants remain insufficiently explored due to the lack of fine-grained data [...] Read more.
Understanding the carbon emissions of multimodal travel—comprising walking, metro, bus, cycling, and ride-hailing—is essential for promoting sustainable urban mobility. However, most existing studies focus on single-mode travel, while underlying spatiotemporal and behavioral determinants remain insufficiently explored due to the lack of fine-grained data and interpretable analytical frameworks. This study proposes a novel integration of high-frequency, real-world mobility trajectory data with interpretable machine learning to systematically identify the key drivers of carbon emissions at the individual trip level. Firstly, multimodal travel chains are reconstructed using continuous GPS trajectory data collected in Beijing. Secondly, a model based on Calculate Emissions from Road Transport (COPERT) is developed to quantify trip-level CO2 emissions. Thirdly, four interpretable machine learning models based on gradient boosting—XGBoost, GBDT, LightGBM, and CatBoost—are trained using transportation and built environment features to model the relationship between CO2 emissions and a set of explanatory variables; finally, Shapley Additive exPlanations (SHAP) and partial dependence plots (PDPs) are used to interpret the model outputs, revealing key determinants and their non-linear interaction effects. The results show that transportation-related features account for 75.1% of the explained variance in emissions, with bus usage being the most influential single factor (contributing 22.6%). Built environment features explain the remaining 24.9%. The PDP analysis reveals that substantial emission reductions occur only when the shares of bus, metro, and cycling surpass threshold levels of approximately 40%, 40%, and 30%, respectively. Additionally, travel carbon emissions are minimized when trip origins and destinations are located within a 10 to 11 km radius of the central business district (CBD). This study advances the field by establishing a scalable, interpretable, and behaviorally grounded framework to assess carbon emissions from multimodal travel, providing actionable insights for low-carbon transport planning and policy design. Full article
(This article belongs to the Special Issue Sustainable Transportation Systems and Travel Behaviors)
Show Figures

Figure 1

21 pages, 3532 KiB  
Article
Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions
by Qaiser Khan, Peyman Pourafshary, Fahimeh Hadavimoghaddam and Reza Khoramian
Appl. Sci. 2025, 15(15), 8536; https://doi.org/10.3390/app15158536 (registering DOI) - 31 Jul 2025
Abstract
The diffusion coefficient (DC) of CO2 in brine is a key parameter in geological carbon sequestration and CO2-Enhanced Oil Recovery (EOR), as it governs mass transfer efficiency and storage capacity. This study employs three machine learning (ML) models—Random Forest (RF), [...] Read more.
The diffusion coefficient (DC) of CO2 in brine is a key parameter in geological carbon sequestration and CO2-Enhanced Oil Recovery (EOR), as it governs mass transfer efficiency and storage capacity. This study employs three machine learning (ML) models—Random Forest (RF), Gradient Boost Regressor (GBR), and Extreme Gradient Boosting (XGBoost)—to predict DC based on pressure, temperature, and salinity. The dataset, comprising 176 data points, spans pressures from 0.10 to 30.00 MPa, temperatures from 286.15 to 398.00 K, salinities from 0.00 to 6.76 mol/L, and DC values from 0.13 to 4.50 × 10−9 m2/s. The data was split into 80% for training and 20% for testing to ensure reliable model evaluation. Model performance was assessed using R2, RMSE, and MAE. The RF model demonstrated the best performance, with an R2 of 0.95, an RMSE of 0.03, and an MAE of 0.11 on the test set, indicating high predictive accuracy and generalization capability. In comparison, GBR achieved an R2 of 0.925, and XGBoost achieved an R2 of 0.91 on the test set. Feature importance analysis consistently identified temperature as the most influential factor, followed by salinity and pressure. This study highlights the potential of ML models for predicting CO2 diffusion in brine, providing a robust, data-driven framework for optimizing CO2-EOR processes and carbon storage strategies. The findings underscore the critical role of temperature in diffusion behavior, offering valuable insights for future modeling and operational applications. Full article
Show Figures

Figure 1

23 pages, 7266 KiB  
Article
Intelligent ESG Evaluation for Construction Enterprises in China: An LLM-Based Model
by Binqing Cai, Zhukai Ye and Shiwei Chen
Buildings 2025, 15(15), 2710; https://doi.org/10.3390/buildings15152710 (registering DOI) - 31 Jul 2025
Abstract
Environmental, social, and governance (ESG) evaluation has become increasingly critical for company sustainability assessments, especially for enterprises in the construction industry with a high environmental burden. However, existing methods face limitations in subjective evaluation, inconsistent ratings across agencies, and a lack of industry-specificity. [...] Read more.
Environmental, social, and governance (ESG) evaluation has become increasingly critical for company sustainability assessments, especially for enterprises in the construction industry with a high environmental burden. However, existing methods face limitations in subjective evaluation, inconsistent ratings across agencies, and a lack of industry-specificity. To address these limitations, this study proposes a large language model (LLM)-based intelligent ESG evaluation model specifically designed for the construction enterprises in China. The model integrates three modules: (1) an ESG report information extraction module utilizing natural language processing and Chinese pre-trained language models to identify and classify ESG-relevant statements; (2) an ESG rating prediction module employing XGBoost regression with SHAP analysis to predict company ratings and quantify individual statement contributions; and (3) an ESG intelligent evaluation module combining knowledge graph construction with fine-tuned Qwen2.5 language models using Chain-of-Thought (CoT). Empirical validation demonstrates that the model achieves 93.33% accuracy in the ESG rating classification and an R2 score of 0.5312. SHAP analysis reveals that environmental factors contribute most significantly to rating predictions (38.7%), followed by governance (32.0%) and social dimensions (29.3%). The fine-tuned LLM integrated with knowledge graph shows improved evaluation consistency, achieving 65% accuracy compared to 53.33% for standalone LLM approaches, constituting a relative improvement of 21.88%. This study contributes to the ESG evaluation methodology by providing an objective, industry-specific, and interpretable framework that enhances rating consistency and provides actionable insights for enterprise sustainability improvement. This research provides guidance for automated and intelligent ESG evaluations for construction enterprises while addressing critical gaps in current ESG practices. Full article
Show Figures

Figure 1

20 pages, 3593 KiB  
Article
A Feature Engineering Framework for Smart Meter Group Failure Rate Prediction
by Yihong Li, Xia Xiao, Zhengbo Zhang and Wenao Liu
Mathematics 2025, 13(15), 2472; https://doi.org/10.3390/math13152472 - 31 Jul 2025
Abstract
Smart meters play a significant role in power systems, but their condition assessment faces challenges such as inconsistent evaluation criteria and inaccurate assessment results. This paper proposes feature engineering including feature construction and feature selection for smart meter group failure rate prediction. First, [...] Read more.
Smart meters play a significant role in power systems, but their condition assessment faces challenges such as inconsistent evaluation criteria and inaccurate assessment results. This paper proposes feature engineering including feature construction and feature selection for smart meter group failure rate prediction. First, the basic structure and common fault types of smart meters are introduced. Smart meters are grouped by batch and distribution area. Next, 25 condition features are constructed based on failure mechanisms and technical specifications. Then, an evolutionary multi-objective feature selection algorithm combining NSGA-II, Jaccard similarity, and XGBoost is developed, where feature subsets are encoded as binary individuals optimized for three objectives: MSE, 1 − R2, and the number of features. The experimental results demonstrate that the proposed method not only reduces the number of features (25→7) but also improves the prediction accuracy (MSE: 0.0049 → 0.0042, R2: 0.6638 → 0.7228) of smart meter group failure rates. Comparative studies with other feature selection methods further confirm the superiority of our approach. The optimized features enhance interpretability and computational efficiency, providing a practical solution for large-scale smart meter condition assessment in power systems. Full article
(This article belongs to the Special Issue Evolutionary Algorithms and Applications)
Show Figures

Figure 1

34 pages, 17155 KiB  
Article
Machine Learning Ensemble Methods for Co-Seismic Landslide Susceptibility: Insights from the 2015 Nepal Earthquake
by Tulasi Ram Bhattarai and Netra Prakash Bhandary
Appl. Sci. 2025, 15(15), 8477; https://doi.org/10.3390/app15158477 (registering DOI) - 30 Jul 2025
Abstract
The Mw 7.8 Gorkha Earthquake of 25 April 2015 triggered over 25,000 landslides across central Nepal, with 4775 events concentrated in Gorkha District alone. Despite substantial advances in landslide susceptibility mapping, existing studies often overlook the compound role of post-seismic rainfall and lack [...] Read more.
The Mw 7.8 Gorkha Earthquake of 25 April 2015 triggered over 25,000 landslides across central Nepal, with 4775 events concentrated in Gorkha District alone. Despite substantial advances in landslide susceptibility mapping, existing studies often overlook the compound role of post-seismic rainfall and lack robust spatial validation. To address this gap, we validated an ensemble machine learning framework for co-seismic landslide susceptibility modeling by integrating seismic, geomorphological, hydrological, and anthropogenic variables, including cumulative post-seismic rainfall. Using a balanced dataset of 4775 landslide and non-landslide instances, we evaluated the performance of Logistic Regression (LR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost) models through spatial cross-validation, SHapley Additive exPlanations (SHAP) explainability, and ablation analysis. The RF model outperformed all others, achieving an accuracy of 87.9% and a Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) value of 0.94, while XGBoost closely followed (AUC = 0.93). Ensemble models collectively classified over 95% of observed landslides into High and Very High susceptibility zones, demonstrating strong spatial reliability. SHAP analysis identified elevation, proximity to fault, peak ground acceleration (PGA), slope, and rainfall as dominant predictors. Notably, the inclusion of post-seismic rainfall substantially improved recall and F1 scores in ablation experiments. Spatial cross-validation revealed the superior generalizability of ensemble models under heterogeneous terrain conditions. The findings underscore the value of integrating post-seismic hydrometeorological factors and spatial validation into susceptibility assessments. We recommend adopting ensemble models, particularly RF, for operational hazard mapping in earthquake-prone mountainous regions. Future research should explore the integration of dynamic rainfall thresholds and physics-informed frameworks to enhance early warning systems and climate resilience. Full article
(This article belongs to the Section Earth Sciences)
Show Figures

Figure 1

22 pages, 579 KiB  
Article
Automated Classification of Crime Narratives Using Machine Learning and Language Models in Official Statistics
by Klaus Lehmann, Elio Villaseñor, Alejandro Pimentel, Javiera Preuss, Nicolás Berhó, Oswaldo Diaz and Ignacio Agloni
Stats 2025, 8(3), 68; https://doi.org/10.3390/stats8030068 - 30 Jul 2025
Abstract
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier [...] Read more.
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier with bag-of-words features and word embeddings features, an LSTM network using pretrained Spanish word embeddings as a language model, and a fine-tuned BERT language model (BETO). Deep learning models outperformed the traditional baseline, with BETO achieving the highest accuracy. The new ENUSC (Encuesta Nacional Urbana de Seguridad Ciudadana) workflow integrates the selected model into an API for automated classification, incorporating a certainty threshold to distinguish between cases suitable for automation and those requiring expert review. This hybrid strategy led to a 68.4% reduction in manual review workload while preserving high-quality standards. This study represents the first documented application of deep learning for the automated classification of victimization narratives in official statistics, demonstrating its feasibility and impact in a real-world production environment. Our results demonstrate that deep learning can significantly improve the efficiency and consistency of crime statistics coding, offering a scalable solution for other national statistical offices. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

30 pages, 3319 KiB  
Article
A Pilot Study on Thermal Comfort in Young Adults: Context-Aware Classification Using Machine Learning and Multimodal Sensors
by Bibars Amangeldy, Timur Imankulov, Nurdaulet Tasmurzayev, Serik Aibagarov, Nurtugan Azatbekuly, Gulmira Dikhanbayeva and Aksultan Mukhanbet
Buildings 2025, 15(15), 2694; https://doi.org/10.3390/buildings15152694 - 30 Jul 2025
Abstract
While personal thermal comfort is critical for well-being and productivity, it is often overlooked by traditional building management systems that rely on uniform settings. Modern data-driven approaches often fail to capture the complex interactions between various data streams. This pilot study introduces a [...] Read more.
While personal thermal comfort is critical for well-being and productivity, it is often overlooked by traditional building management systems that rely on uniform settings. Modern data-driven approaches often fail to capture the complex interactions between various data streams. This pilot study introduces a high-accuracy, interpretable framework for thermal comfort classification, designed to identify the most significant predictors from a comprehensive suite of environmental, physiological, and anthropometric data in a controlled group of young adults. Initially, an XGBoost model using the full 24-feature dataset achieved the best performance at 91% accuracy. However, after using SHAP analysis to identify and select the most influential features, the performance of our ensemble models improved significantly; notably, a Random Forest model’s accuracy rose from 90% to 94%. Our analysis confirmed that for this homogeneous cohort, environmental parameters—specifically temperature, humidity, and CO2—were the dominant predictors of thermal comfort. The primary strength of this methodology lies in its ability to create a transparent pipeline that objectively identifies the most critical comfort drivers for a given population, forming a crucial evidence base for model design. The analysis also revealed that the predictive value of heart rate variability (HRV) diminished when richer physiological data, such as diastolic blood pressure, were included. For final validation, the optimized Random Forest model, using only the top 10 features, was tested on a hold-out set of 100 samples, achieving a final accuracy of 95% and an F1-score of 0.939, with all misclassifications occurring only between adjacent comfort levels. These findings establish a validated methodology for creating effective, context-aware comfort models that can be embedded into intelligent building management systems. Such adaptive systems enable a shift from static climate control to dynamic, user-centric environments, laying the critical groundwork for future personalized systems while enhancing occupant well-being and offering significant energy savings. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

24 pages, 1686 KiB  
Review
Data-Driven Predictive Modeling for Investigating the Impact of Gear Manufacturing Parameters on Noise Levels in Electric Vehicle Drivetrains
by Krisztián Horváth
World Electr. Veh. J. 2025, 16(8), 426; https://doi.org/10.3390/wevj16080426 - 30 Jul 2025
Viewed by 19
Abstract
Reducing gear noise in electric vehicle (EV) drivetrains is crucial due to the absence of internal combustion engine noise, making even minor acoustic disturbances noticeable. Manufacturing parameters significantly influence gear-generated noise, yet traditional analytical methods often fail to predict these complex relationships accurately. [...] Read more.
Reducing gear noise in electric vehicle (EV) drivetrains is crucial due to the absence of internal combustion engine noise, making even minor acoustic disturbances noticeable. Manufacturing parameters significantly influence gear-generated noise, yet traditional analytical methods often fail to predict these complex relationships accurately. This research addresses this gap by introducing a data-driven approach using machine learning (ML) to predict gear noise levels from manufacturing and sensor-derived data. The presented methodology encompasses systematic data collection from various production stages—including soft and hard machining, heat treatment, honing, rolling tests, and end-of-line (EOL) acoustic measurements. Predictive models employing Random Forest, Gradient Boosting (XGBoost), and Neural Network algorithms were developed and compared to traditional statistical approaches. The analysis identified critical manufacturing parameters, such as surface waviness, profile errors, and tooth geometry deviations, significantly influencing noise generation. Advanced ML models, specifically Random Forest, XGBoost, and deep neural networks, demonstrated superior prediction accuracy, providing early-stage identification of gear units likely to exceed acceptable noise thresholds. Integrating these data-driven models into manufacturing processes enables early detection of potential noise issues, reduces quality assurance costs, and supports sustainable manufacturing by minimizing prototype production and resource consumption. This research enhances the understanding of gear noise formation and offers practical solutions for real-time quality assurance. Full article
Show Figures

Graphical abstract

21 pages, 14469 KiB  
Article
The Downscaled GOME-2 SIF Based on Machine Learning Enhances the Correlation with Ecosystem Productivity
by Chenyu Hu, Pinhua Xie, Zhaokun Hu, Ang Li and Haoxuan Feng
Remote Sens. 2025, 17(15), 2642; https://doi.org/10.3390/rs17152642 - 30 Jul 2025
Viewed by 32
Abstract
Sun-induced chlorophyll fluorescence (SIF) is an important indicator of vegetation photosynthesis. While remote sensing enables large-scale monitoring of SIF, existing products face the challenge of trade-offs between temporal and spatial resolutions, limiting their applications. To select the optimal model for SIF data downscaling, [...] Read more.
Sun-induced chlorophyll fluorescence (SIF) is an important indicator of vegetation photosynthesis. While remote sensing enables large-scale monitoring of SIF, existing products face the challenge of trade-offs between temporal and spatial resolutions, limiting their applications. To select the optimal model for SIF data downscaling, we used a consistent dataset combined with vegetation physiological and meteorological parameters to evaluate four different regression methods in this study. The XGBoost model demonstrated the best performance during cross-validation (R2 = 0.84, RMSE = 0.137 mW/m2/nm/sr) and was, therefore, selected to downscale GOME-2 SIF data. The resulting high-resolution SIF product (HRSIF) has a temporal resolution of 8 days and a spatial resolution of 0.05° × 0.05°. The downscaled product shows high fidelity to the original coarse SIF data when aggregated (correlation = 0.76). The reliability of the product was ensured through cross-validation with ground-based and satellite observations. Moreover, the finer spatial resolution of HRSIF better matches the footprint of eddy covariance flux towers, leading to a significant improvement in the correlation with tower-based gross primary productivity (GPP). Specifically, in the mixed forest vegetation type with the best performance, the R2 increased from 0.66 to 0.85, representing an increase of 28%. This higher-precision product will support more effective ecosystem monitoring and research. Full article
Show Figures

Figure 1

Back to TopTop