MDPI - Publisher of Open Access Journals

28 pages, 13851 KiB

Open AccessArticle

A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations

by Yanyan Huang, Hangyi Ren, Xudong Jia, Xianyu Yu, Dong Xie, You Zou, Daoyuan Chen and Yi Yang

World Electr. Veh. J. 2025, 16(8), 445; https://doi.org/10.3390/wevj16080445 (registering DOI) - 6 Aug 2025

Abstract

The rapid adoption of electric vehicles (EVs) has driven a strong need for optimizing locations of electric vehicle charging stations (EVCSs). Previous methods for locating EVCSs rely on statistical and optimization models, but these methods have limitations in capturing complex nonlinear relationships and [...] Read more.

The rapid adoption of electric vehicles (EVs) has driven a strong need for optimizing locations of electric vehicle charging stations (EVCSs). Previous methods for locating EVCSs rely on statistical and optimization models, but these methods have limitations in capturing complex nonlinear relationships and spatial dependencies among factors influencing EVCS locations. To address this research gap and better understand the spatial impacts of urban activities on EVCS placement, this study presents a spatially aware machine learning (SAML) method that combines a multi-layer perceptron (MLP) model with a spatial loss function to optimize EVCS sites. Additionally, the method uses the Shapley additive explanation (SHAP) technique to investigate nonlinear relationships embedded in EVCS placement. Using the city of Wuhan as a case study, the SAML method reveals that parking site (PS), road density (RD), population density (PD), and commercial residential (CR) areas are key factors in determining optimal EVCS sites. The SAML model classifies these grid cells into no EVCS demand (0 EVCS), low EVCS demand (from 1 to 3 EVCSs), and high EVCS demand (4+ EVCSs) classes. The model performs well in predicting EVCS demand. Findings from ablation tests also indicate that the inclusion of spatial correlations in the model’s loss function significantly enhances the model’s performance. Additionally, results from case studies validate that the model is effective in predicting EVCSs in other metropolitan cities. Full article

(This article belongs to the Special Issue Fast-Charging Station for Electric Vehicles: Challenges and Issues)

27 pages, 4506 KiB

Open AccessFeature PaperArticle

Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data

by Yanhe Wang, Wei Wei, Zhuodong Liu, Jiahe Liu, Yinzhen Lv and Xiangyu Li

Mathematics 2025, 13(15), 2526; https://doi.org/10.3390/math13152526 - 6 Aug 2025

Abstract

High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods [...] Read more.

High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R² of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

► Show Figures

Figure 1

25 pages, 4450 KiB

Open AccessArticle

Analyzing Retinal Vessel Morphology in MS Using Interpretable AI on Deep Learning-Segmented IR-SLO Images

by Asieh Soltanipour, Roya Arian, Ali Aghababaei, Fereshteh Ashtari, Yukun Zhou, Pearse A. Keane and Raheleh Kafieh

Bioengineering 2025, 12(8), 847; https://doi.org/10.3390/bioengineering12080847 (registering DOI) - 6 Aug 2025

Abstract

Multiple sclerosis (MS), a chronic disease of the central nervous system, is known to cause structural and vascular changes in the retina. Although optical coherence tomography (OCT) and fundus photography can detect retinal thinning and circulatory abnormalities, these findings are not specific to [...] Read more.

Multiple sclerosis (MS), a chronic disease of the central nervous system, is known to cause structural and vascular changes in the retina. Although optical coherence tomography (OCT) and fundus photography can detect retinal thinning and circulatory abnormalities, these findings are not specific to MS. This study explores the potential of Infrared Scanning-Laser-Ophthalmoscopy (IR-SLO) imaging to uncover vascular morphological features that may serve as MS-specific biomarkers. Using an age-matched, subject-wise stratified k-fold cross-validation approach, a deep learning model originally designed for color fundus images was adapted to segment optic disc, optic cup, and retinal vessels in IR-SLO images, achieving Dice coefficients of 91%, 94.5%, and 97%, respectively. This process included tailored pre- and post-processing steps to optimize segmentation accuracy. Subsequently, clinically relevant features were extracted. Statistical analyses followed by SHapley Additive exPlanations (SHAP) identified vessel fractal dimension, vessel density in zones B and C (circular regions extending 0.5–1 and 0.5–2 optic disc diameters from the optic disc margin, respectively), along with vessel intensity and width, as key differentiators between MS patients and healthy controls. These findings suggest that IR-SLO can non-invasively detect retinal vascular biomarkers that may serve as additional or alternative diagnostic markers for MS diagnosis, complementing current invasive procedures. Full article

(This article belongs to the Special Issue AI in OCT (Optical Coherence Tomography) Image Analysis)

► Show Figures

Figure 1

23 pages, 3831 KiB

Open AccessArticle

Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest

by Paulo Renato P. Silva, Rayonil G. Carneiro, Alison O. Moraes, Cleo Quaresma Dias-Junior and Gilberto Fisch

Atmosphere 2025, 16(8), 941; https://doi.org/10.3390/atmos16080941 (registering DOI) - 5 Aug 2025

Abstract

This study investigates the use of a Random Forest (RF), an artificial intelligence (AI) model, to estimate the planetary boundary layer height (PBLH) over Central Amazonia from climatic elements data collected during the GoAmazon experiment, held in 2014 and 2015, as it is [...] Read more.

This study investigates the use of a Random Forest (RF), an artificial intelligence (AI) model, to estimate the planetary boundary layer height (PBLH) over Central Amazonia from climatic elements data collected during the GoAmazon experiment, held in 2014 and 2015, as it is a key metric for air quality, weather forecasting, and climate modeling. The novelty of this study lies in estimating PBLH using only surface-based meteorological observations. This approach is validated against remote sensing measurements (e.g., LIDAR, ceilometer, and wind profilers), which are seldom available in the Amazon region. The dataset includes various meteorological features, though substantial missing data for the latent heat flux (LE) and net radiation (Rn) measurements posed challenges. We addressed these gaps through different data-cleaning strategies, such as feature exclusion, row removal, and imputation techniques, assessing their impact on model performance using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and

r^{2}

metrics. The best-performing strategy achieved an RMSE of 375.9 m. In addition to the RF model, we benchmarked its performance against Linear Regression, Support Vector Regression, LightGBM, XGBoost, and a Deep Neural Network. While all models showed moderate correlation with observed PBLH, the RF model outperformed all others with statistically significant differences confirmed by paired t-tests. SHAP (SHapley Additive exPlanations) values were used to enhance model interpretability, revealing hour of the day, air temperature, and relative humidity as the most influential predictors for PBLH, underscoring their critical role in atmospheric dynamics in Central Amazonia. Despite these optimizations, the model underestimates the PBLH values—by an average of 197 m, particularly in the spring and early summer austral seasons when atmospheric conditions are more variable. These findings emphasize the importance of robust data preprocessing and higtextight the potential of ML models for improving PBLH estimation in data-scarce tropical environments. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)

► Show Figures

Figure 1

17 pages, 6884 KiB

Open AccessArticle

An Interpretable XGBoost Framework for Predicting Oxide Glass Density

by Pawel Stoch

Appl. Sci. 2025, 15(15), 8680; https://doi.org/10.3390/app15158680 (registering DOI) - 5 Aug 2025

Abstract

Accurately predicting glass density is crucial for designing novel materials. This study aims to develop a robust predictive model for the density of oxide glasses and, more importantly, to investigate how physically informed feature engineering can create accurate and interpretable models that reveal [...] Read more.

Accurately predicting glass density is crucial for designing novel materials. This study aims to develop a robust predictive model for the density of oxide glasses and, more importantly, to investigate how physically informed feature engineering can create accurate and interpretable models that reveal underlying physical principles. Using a dataset of 76,593 oxide glasses from the SciGlass database, three machine learning (ML) models (ElasticNet, XGBoost, MLP) were trained and evaluated. Four distinct feature sets were constructed with increasing physical complexity, ranging from simple elemental composition to the advanced Magpie descriptors. The best model was further analyzed for interpretability using feature importance and SHapley Additive exPlanations (SHAP) analysis. A clear hierarchical improvement in predictive accuracy was observed with increasing feature sophistication across all models. The XGBoost model combined with the Magpie feature set provided the best performance, achieving a coefficient of determination (R²) of 0.97. Interpretability analysis revealed that the model’s predictions were overwhelmingly driven by physical attributes, with mean atomic weight being the most influential predictor. The model learns to approximate the fundamental density equation using mean atomic weight as a proxy for molar mass and electronic structure features to estimate molar volume. This demonstrates that a data-driven approach can function as a scientifically valid and interpretable tool, accelerating the discovery of new materials. Full article

► Show Figures

Figure 1

28 pages, 4243 KiB

Open AccessArticle

Electric Bus Battery Energy Consumption Estimation and Influencing Features Analysis Using a Two-Layer Stacking Framework with SHAP-Based Interpretation

by Runze Liu, Jianming Cai, Lipeng Hu, Benxiao Lou and Jinjun Tang

Sustainability 2025, 17(15), 7105; https://doi.org/10.3390/su17157105 - 5 Aug 2025

Abstract

The widespread adoption of electric buses represents a major step forward in sustainable transportation, but also brings new operational challenges, particularly in terms of improving their efficiency and controlling costs. Therefore, battery energy consumption management is a key approach for addressing these issues. [...] Read more.

The widespread adoption of electric buses represents a major step forward in sustainable transportation, but also brings new operational challenges, particularly in terms of improving their efficiency and controlling costs. Therefore, battery energy consumption management is a key approach for addressing these issues. Accurate prediction of energy consumption and interpretation of the influencing factors are essential for improving operational efficiency, optimizing energy use, and reducing operating costs. Although existing studies have made progress in battery energy consumption prediction, challenges remain in achieving high-precision modeling and conducting a comprehensive analysis of the influencing features. To address these gaps, this study proposes a two-layer stacking framework for estimating the energy consumption of electric buses. The first layer integrates the strengths of three nonlinear regression models—RF (Random Forest), GBDT (Gradient Boosted Decision Trees), and CatBoost (Categorical Boosting)—to enhance the modeling capacity for complex feature relationships. The second layer employs a Linear Regression model as a meta-learner to aggregate the predictions from the base models and improve the overall predictive performance. The framework is trained on 2023 operational data from two electric bus routes (NO. 355 and NO. W188) in Changsha, China, incorporating battery system parameters, driving characteristics, and environmental variables as independent variables for model training and analysis. Comparative experiments with various ensemble models demonstrate that the proposed stacking framework exhibits superior performance in data fitting. Furthermore, XGBoost (Extreme Gradient Boosting) is introduced as a surrogate model to approximate the decision logic of the stacking framework, enabling SHAP (SHapley Additive exPlanations) analysis to quantify the contribution and marginal effects of influencing features. The proposed stacked and surrogate models achieved superior battery energy consumption prediction accuracy (lowest MSE, RMSE, and MAE), significantly outperforming benchmark models on real-world datasets. SHAP analysis quantified the overall contributions of feature categories (battery operation parameters: 56.5%; driving characteristics: 42.3%; environmental data: 1.2%), further revealing the specific contributions and nonlinear influence mechanisms of individual features. These quantitative findings offer specific guidance for optimizing battery system control and driving behavior. Full article

(This article belongs to the Section Sustainable Transportation)

9 pages, 1436 KiB

Open AccessProceeding Paper

Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques

by Claudio Andenna and Roberta Valentina Gagliardi

Environ. Earth Sci. Proc. 2025, 34(1), 1; https://doi.org/10.3390/eesp2025034001 - 5 Aug 2025

Abstract

In this study, a combined approach joining the machine learning model Extreme Gradient Boosting (XGBoost) with Shapley Additive Explanation (SHAP) is adopted to simulate the temporal pattern of the air quality index (AQI) and subsequently explore the key factors affecting AQI variability. Based [...] Read more.

In this study, a combined approach joining the machine learning model Extreme Gradient Boosting (XGBoost) with Shapley Additive Explanation (SHAP) is adopted to simulate the temporal pattern of the air quality index (AQI) and subsequently explore the key factors affecting AQI variability. Based on the analysis of air pollutants and meteorological data acquired from two air quality monitoring stations in Rome (Italy), over the 2018–2022 period, the results demonstrate the effectiveness of the proposed methodological approach in elucidating the role of the main factors driving AQI evolution, and their interaction effects. Full article

(This article belongs to the Proceedings of The 7th International Electronic Conference on Atmospheric Sciences (ECAS-7))

► Show Figures

Figure 1

19 pages, 2795 KiB

Open AccessArticle

State Analysis of Grouped Smart Meters Driven by Interpretable Random Forest

by Zhongdong Wang, Zhengbo Zhang, Weijiang Wu, Zhen Zhang, Xiaolin Xu and Hongbin Li

Electronics 2025, 14(15), 3105; https://doi.org/10.3390/electronics14153105 - 4 Aug 2025

Abstract

Accurate evaluation of the operational status of smart meters, as the critical interface between the power grid and its users, is essential for ensuring fairness in power transactions. This highlights the importance of implementing rotation management practices based on meter status. However, the [...] Read more.

Accurate evaluation of the operational status of smart meters, as the critical interface between the power grid and its users, is essential for ensuring fairness in power transactions. This highlights the importance of implementing rotation management practices based on meter status. However, the traditional expiration-based rotation method has become inadequate due to the extended service life of modern smart meters, necessitating a shift toward status-driven targeted management. Existing multifactor comprehensive assessment methods often face challenges in balancing accuracy and interpretability. To address these limitations, this study proposes a novel method for analyzing the status of smart meter groups using an interpretable random forest model. The approach incorporates an expert-knowledge-guided grouping assessment strategy, develops a multi-source heterogeneous feature set with strong correlations to meter status, and enhances the random forest model with the SHAP (SHapley Additive exPlanations) interpretability framework. Compared to conventional methods, the proposed approach demonstrates superior efficiency and reliability in predicting the failure rates of smart meter groups within distribution network areas, offering robust support for the maintenance and management of smart meters. Full article

(This article belongs to the Special Issue Advances in Condition Monitoring, Diagnosis, and Prognostics for Power Equipment)

► Show Figures

Figure 1

44 pages, 6212 KiB

Open AccessArticle

A Hybrid Deep Reinforcement Learning Architecture for Optimizing Concrete Mix Design Through Precision Strength Prediction

by Ali Mirzaei and Amir Aghsami

Math. Comput. Appl. 2025, 30(4), 83; https://doi.org/10.3390/mca30040083 - 3 Aug 2025

Viewed by 182

Abstract

Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework [...] Read more.

Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework that integrates deep learning with reinforcement learning to overcome these limitations. First, a Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM) model was developed to capture spatial–temporal patterns from a dataset of 1030 historical concrete samples. The extracted features were enhanced using an eXtreme Gradient Boosting (XGBoost) meta-model to improve generalizability and noise resistance. Then, a Dueling Double Deep Q-Network (Dueling DDQN) agent was used to iteratively identify optimal mix ratios that maximize the predicted compressive strength. The proposed framework outperformed ten benchmark models, achieving an MAE of 2.97, RMSE of 4.08, and R² of 0.94. Feature attribution methods—including SHapley Additive exPlanations (SHAP), Elasticity-Based Feature Importance (EFI), and Permutation Feature Importance (PFI)—highlighted the dominant influence of cement content and curing age, as well as revealing non-intuitive effects such as the compensatory role of superplasticizers in low-water mixtures. These findings demonstrate the potential of the proposed approach to support intelligent concrete mix design and real-time optimization in smart construction environments. Full article

(This article belongs to the Section Engineering)

► Show Figures

Figure 1

22 pages, 4943 KiB

Open AccessArticle

Predicting De-Handing Point in Bananas Using Crown Morphology and Interpretable Machine Learning

by Lei Zhao, Zhou Yang, Chunxia Wang, Mohui Jin and Jieli Duan

Agronomy 2025, 15(8), 1880; https://doi.org/10.3390/agronomy15081880 - 3 Aug 2025

Viewed by 100

Abstract

Banana de-handing is a critical yet labor-intensive step in postharvest processing, with current manual methods resulting in high costs and occupational risks. This study addresses the automation of de-handing point localization by integrating high-resolution 3D scanning and morphometric analysis of banana crowns with [...] Read more.

Banana de-handing is a critical yet labor-intensive step in postharvest processing, with current manual methods resulting in high costs and occupational risks. This study addresses the automation of de-handing point localization by integrating high-resolution 3D scanning and morphometric analysis of banana crowns with machine learning techniques. A total of 210 crown samples were analyzed to extract key morphological features, including inner arc length (L_i), inner arc radius (R_i), outer arc radius (R_o), and the distance between inner and outer arcs (D_oi), among others. Four machine learning algorithms, namely, Multi-Layer Perceptron (MLP), Gradient Boosted Decision Trees (GBDT), Extreme Gradient Boosting (XGBoost), and Random Forest (RF), were developed to predict the target radius (R_t) and target distance (D_ti) of the de-handing point. The RF models achieved the optimal predictive performance on the testing set, with the following results: for R_t, R² = 0.95, MAE = 1.50, and RMSE = 1.94; for D_ti, R² = 0.91, MAE = 1.33, and RMSE = 1.66. A Shapley Additive Explanations (SHAP) analysis revealed that L_i, R_i, and R_o were the most influential features for R_t, while D_oi was the most important for D_ti. Notably, feature threshold effects were observed, with limited gains in prediction accuracy beyond specific morphological values. These results provide a quantitative foundation for vision-guided automated de-handing systems, advancing intelligent and efficient banana postharvest management. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

24 pages, 1964 KiB

Open AccessArticle

Data-Driven Symmetry and Asymmetry Investigation of Vehicle Emissions Using Machine Learning: A Case Study in Spain

by Fei Wu, Jinfu Zhu, Hufang Yang, Xiang He and Qiao Peng

Symmetry 2025, 17(8), 1223; https://doi.org/10.3390/sym17081223 - 2 Aug 2025

Viewed by 231

Abstract

Understanding vehicle emissions is essential for developing effective carbon reduction strategies in the transport sector. Conventional emission models often assume homogeneity and linearity, overlooking real-world asymmetries that arise from variations in vehicle design and powertrain configurations. This study explores how machine learning and [...] Read more.

Understanding vehicle emissions is essential for developing effective carbon reduction strategies in the transport sector. Conventional emission models often assume homogeneity and linearity, overlooking real-world asymmetries that arise from variations in vehicle design and powertrain configurations. This study explores how machine learning and explainable AI techniques can effectively capture both symmetric and asymmetric emission patterns across different vehicle types, thereby contributing to more sustainable transport planning. Addressing a key gap in the existing literature, the study poses the following question: how do structural and behavioral factors contribute to asymmetric emission responses in internal combustion engine vehicles compared to new energy vehicles? Utilizing a large-scale Spanish vehicle registration dataset, the analysis classifies vehicles by powertrain type and applies five supervised learning algorithms to predict CO₂ emissions. SHapley Additive exPlanations (SHAPs) are employed to identify nonlinear and threshold-based relationships between emissions and vehicle characteristics such as fuel consumption, weight, and height. Among the models tested, the Random Forest algorithm achieves the highest predictive accuracy. The findings reveal critical asymmetries in emission behavior, particularly among hybrid vehicles, which challenge the assumption of uniform policy applicability. This study provides both methodological innovation and practical insights for symmetry-aware emission modeling, offering support for more targeted eco-design and policy decisions that align with long-term sustainability goals. Full article

(This article belongs to the Section Engineering and Materials)

► Show Figures

Figure 1

23 pages, 3427 KiB

Open AccessArticle

Visual Narratives and Digital Engagement: Decoding Seoul and Tokyo’s Tourism Identity Through Instagram Analytics

by Seung Chul Yoo and Seung Mi Kang

Tour. Hosp. 2025, 6(3), 149; https://doi.org/10.3390/tourhosp6030149 - 1 Aug 2025

Viewed by 255

Abstract

Social media platforms like Instagram significantly shape destination images and influence tourist behavior. Understanding how different cities are represented and perceived on these platforms is crucial for effective tourism marketing. This study provides a comparative analysis of Instagram content and engagement patterns in [...] Read more.

Social media platforms like Instagram significantly shape destination images and influence tourist behavior. Understanding how different cities are represented and perceived on these platforms is crucial for effective tourism marketing. This study provides a comparative analysis of Instagram content and engagement patterns in Seoul and Tokyo, two major Asian metropolises, to derive actionable marketing insights. We collected and analyzed 59,944 public Instagram posts geotagged or location-tagged within Seoul (n = 29,985) and Tokyo (n = 29,959). We employed a mixed-methods approach involving content categorization using a fine-tuned convolutional neural network (CNN) model, engagement metric analysis (likes, comments), Valence Aware Dictionary and sEntiment Reasoner (VADER) sentiment analysis and thematic classification of comments, geospatial analysis (Kernel Density Estimation [KDE], Moran’s I), and predictive modeling (Gradient Boosting with SHapley Additive exPlanations [SHAP] value analysis). A validation analysis using balanced samples (n = 2000 each) was conducted to address Tokyo’s lower geotagged data proportion. While both cities showed ‘Person’ as the dominant content category, notable differences emerged. Tokyo exhibited higher like-based engagement across categories, particularly for ‘Animal’ and ‘Food’ content, while Seoul generated slightly more comments, often expressing stronger sentiment. Qualitative comment analysis revealed Seoul comments focused more on emotional reactions, whereas Tokyo comments were often shorter, appreciative remarks. Geospatial analysis identified distinct hotspots. The validation analysis confirmed these spatial patterns despite Tokyo’s data limitations. Predictive modeling highlighted hashtag counts as the key engagement driver in Seoul and the presence of people in Tokyo. Seoul and Tokyo project distinct visual narratives and elicit different engagement patterns on Instagram. These findings offer practical implications for destination marketers, suggesting tailored content strategies and location-based campaigns targeting identified hotspots and specific content themes. This study underscores the value of integrating quantitative and qualitative analyses of social media data for nuanced destination marketing insights. Full article

(This article belongs to the Special Issue Data-Driven Insights in Tourism and Hospitality: Smart Technologies and Data Science)

► Show Figures

Figure 1

22 pages, 2120 KiB

Open AccessArticle

Machine Learning Algorithms and Explainable Artificial Intelligence for Property Valuation

by Gabriella Maselli and Antonio Nesticò

Real Estate 2025, 2(3), 12; https://doi.org/10.3390/realestate2030012 - 1 Aug 2025

Viewed by 191

Abstract

The accurate estimation of urban property values is a key challenge for appraisers, market participants, financial institutions, and urban planners. In recent years, machine learning (ML) techniques have emerged as promising tools for price forecasting due to their ability to model complex relationships [...] Read more.

The accurate estimation of urban property values is a key challenge for appraisers, market participants, financial institutions, and urban planners. In recent years, machine learning (ML) techniques have emerged as promising tools for price forecasting due to their ability to model complex relationships among variables. However, their application raises two main critical issues: (i) the risk of overfitting, especially with small datasets or with noisy data; (ii) the interpretive issues associated with the “black box” nature of many models. Within this framework, this paper proposes a methodological approach that addresses both these issues, comparing the predictive performance of three ML algorithms—k-Nearest Neighbors (kNN), Random Forest (RF), and the Artificial Neural Network (ANN)—applied to the housing market in the city of Salerno, Italy. For each model, overfitting is preliminarily assessed to ensure predictive robustness. Subsequently, the results are interpreted using explainability techniques, such as SHapley Additive exPlanations (SHAPs) and Permutation Feature Importance (PFI). This analysis reveals that the Random Forest offers the best balance between predictive accuracy and transparency, with features such as area and proximity to the train station identified as the main drivers of property prices. kNN and the ANN are viable alternatives that are particularly robust in terms of generalization. The results demonstrate how the defined methodological framework successfully balances predictive effectiveness and interpretability, supporting the informed and transparent use of ML in real estate valuation. Full article

(This article belongs to the Topic Improving Nature-Smart Policies through Innovative Resilient Evaluations)

► Show Figures

Figure 1

17 pages, 1584 KiB

Open AccessArticle

What Determines Carbon Emissions of Multimodal Travel? Insights from Interpretable Machine Learning on Mobility Trajectory Data

by Guo Wang, Shu Wang, Wenxiang Li and Hongtai Yang

Sustainability 2025, 17(15), 6983; https://doi.org/10.3390/su17156983 - 31 Jul 2025

Viewed by 195

Abstract

Understanding the carbon emissions of multimodal travel—comprising walking, metro, bus, cycling, and ride-hailing—is essential for promoting sustainable urban mobility. However, most existing studies focus on single-mode travel, while underlying spatiotemporal and behavioral determinants remain insufficiently explored due to the lack of fine-grained data [...] Read more.

Understanding the carbon emissions of multimodal travel—comprising walking, metro, bus, cycling, and ride-hailing—is essential for promoting sustainable urban mobility. However, most existing studies focus on single-mode travel, while underlying spatiotemporal and behavioral determinants remain insufficiently explored due to the lack of fine-grained data and interpretable analytical frameworks. This study proposes a novel integration of high-frequency, real-world mobility trajectory data with interpretable machine learning to systematically identify the key drivers of carbon emissions at the individual trip level. Firstly, multimodal travel chains are reconstructed using continuous GPS trajectory data collected in Beijing. Secondly, a model based on Calculate Emissions from Road Transport (COPERT) is developed to quantify trip-level CO₂ emissions. Thirdly, four interpretable machine learning models based on gradient boosting—XGBoost, GBDT, LightGBM, and CatBoost—are trained using transportation and built environment features to model the relationship between CO₂ emissions and a set of explanatory variables; finally, Shapley Additive exPlanations (SHAP) and partial dependence plots (PDPs) are used to interpret the model outputs, revealing key determinants and their non-linear interaction effects. The results show that transportation-related features account for 75.1% of the explained variance in emissions, with bus usage being the most influential single factor (contributing 22.6%). Built environment features explain the remaining 24.9%. The PDP analysis reveals that substantial emission reductions occur only when the shares of bus, metro, and cycling surpass threshold levels of approximately 40%, 40%, and 30%, respectively. Additionally, travel carbon emissions are minimized when trip origins and destinations are located within a 10 to 11 km radius of the central business district (CBD). This study advances the field by establishing a scalable, interpretable, and behaviorally grounded framework to assess carbon emissions from multimodal travel, providing actionable insights for low-carbon transport planning and policy design. Full article

(This article belongs to the Special Issue Sustainable Transportation Systems and Travel Behaviors)

► Show Figures

Figure 1

35 pages, 3218 KiB

Open AccessArticle

Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers

by Merve Akbas

Appl. Sci. 2025, 15(15), 8516; https://doi.org/10.3390/app15158516 (registering DOI) - 31 Jul 2025

Viewed by 164

Abstract

This study proposes an integrated, machine learning-based multi-objective optimization framework to evaluate and optimize the utilization of steel slag in road base layers, simultaneously addressing economic costs and environmental impacts. A comprehensive dataset of 482 scenarios was engineered based on literature-informed parameters, encompassing [...] Read more.

This study proposes an integrated, machine learning-based multi-objective optimization framework to evaluate and optimize the utilization of steel slag in road base layers, simultaneously addressing economic costs and environmental impacts. A comprehensive dataset of 482 scenarios was engineered based on literature-informed parameters, encompassing transport distance, processing energy intensity, initial moisture content, gradation adjustments, and regional electricity emission factors. Four advanced tree-based ensemble regression algorithms—Random Forest Regressor (RFR), Extremely Randomized Trees (ERTs), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting Regressor (XGBR)—were rigorously evaluated. Among these, GBR demonstrated superior predictive performance (R² > 0.95, RMSE < 7.5), effectively capturing complex nonlinear interactions inherent in slag processing and logistics operations. Feature importance analysis via SHapley Additive exPlanations (SHAP) provided interpretative insights, highlighting transport distance and energy intensity as dominant factors affecting unit cost, while moisture content and grid emission factor predominantly influenced CO₂ emissions. Subsequently, the Gradient Boosted Regressor model was integrated into a Non-Dominated Sorting Genetic Algorithm II (NSGA-II) framework to explore optimal trade-offs between cost and emissions. The resulting Pareto front revealed a diverse solution space, with significant nonlinear trade-offs between economic efficiency and environmental performance, clearly identifying strategic inflection points. To facilitate actionable decision-making, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method was applied, identifying an optimal balanced solution characterized by a transport distance of 47 km, energy intensity of 1.21 kWh/ton, moisture content of 6.2%, moderate gradation adjustment, and a grid CO₂ factor of 0.47 kg CO₂/kWh. This scenario offered a substantial reduction (45%) in CO₂ emissions relative to cost-minimized solutions, with a moderate increase (33%) in total cost, presenting a realistic and balanced pathway for sustainable infrastructure practices. Overall, this study introduces a robust, scalable, and interpretable optimization framework, providing valuable methodological advancements for sustainable decision making in infrastructure planning and circular economy initiatives. Full article

(This article belongs to the Special Issue Advanced Technologies and Optimization for Sustainable Geotechnical Engineering)

► Show Figures

Figure 1

Search Results (919)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (919)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI