Saved Queries

Determining the lithostratigraphic provenance of limestone artefacts is challenging. We addressed the issue by analysing Roman stone artefacts, where previously traditionalpetrological methods failed to identify the provenance of 72% of the products due to the predominance of micrite limestone. We applied statistical classification methods to 15 artefacts using linear discriminant analysis, decision trees, random forest, and support vector machines. The latter achieved the highest accuracy, with 73% of the samples classified to the same stratigraphic member as determined by the expert. We improved classification reliability and evaluated it by aggregating the results of different classifiers for each stone product. Combining aggregated results with additional evidence from paleontological data or precise optical microscopy leads to successful provenance determination. After a few samples were reassigned in this procedure, a support vector machine correctly classified 87% of the samples. Strontium isotope ratios (⁸⁷Sr/⁸⁶Sr) proved particularly effective as provenance indicators. We successfully assigned all stone products to local sources across four lithostratigraphic members, thereby confirming local patterns of stone use by Romans. We provide guidance for future use of statistical learning in provenance determination. Our integrated approach, combining geological and statistical expertise, provides a robust framework for challenging provenance determination. Full article

(This article belongs to the Special Issue Provenance of Construction Stone Materials in Archaeology: New Advances, Methodologies and Issues)

►▼ Show Figures

Figure 1

25 pages, 9622 KB

Open AccessArticle

Prediction of Compressive Strength of Concrete Using Explainable Machine Learning Models

by Hainan Fu, Xiong Zhou, Pengfei Xu and Dandan Sun

Materials 2025, 18(21), 5009; https://doi.org/10.3390/ma18215009 - 3 Nov 2025

Viewed by 341

Abstract

Predicting the compressive strength of concrete is essential for engineering design and quality assurance. Traditional empirical formulas often fall short in capturing complex multi-factor interactions and nonlinear relationships. This study employs an interpretable machine learning framework using Gradient Boosting Trees, Random Forest, and Backpropagation Neural Networks to predict concrete compressive strength. Bayesian optimization was employed for hyperparameter tuning, and SHAP analysis was used to quantify feature contributions. Based on 223 sets of compression test data, this study systematically compared the predictive performance of the five models. Results demonstrate that the CatBoost model achieved the best results, R² of 0.9388, RMSE of 2.7131 MPa, and MAPE of 5.45%, outperforming other models. SHAP analysis indicated that cement content had the greatest impact on strength, followed by water content, water reducer, fly ash, and aggregates, with notable interactive effects between factors. Compared to the empirical formula in the current industry standard Specification for Mix Proportion Design of Ordinary Concrete, the CatBoost model showed higher accuracy under specific raw material and curing conditions, with MAPE values of 2.94% and 5.96%, respectively. The optimized CatBoost model, combined with interpretability analysis, offers a data-driven tool for concrete mix optimization, balancing high precision with practical engineering applicability. Full article

(This article belongs to the Section Construction and Building Materials)

►▼ Show Figures

Figure 1

21 pages, 16664 KB

Open AccessArticle

Integrating UAV LiDAR and Multispectral Data for Aboveground Biomass Estimation in High-Andean Pastures of Northeastern Peru

by Angel J. Medina-Medina, Samuel Pizarro, Katerin M. Tuesta-Trauco, Jhon A. Zabaleta-Santisteban, Abner S. Rivera-Fernandez, Jhonsy O. Silva-López, Rolando Salas López, Renzo E. Terrones Murga, José A. Sánchez-Vega, Teodoro B. Silva-Melendez, Manuel Oliva-Cruz, Elgar Barboza and Alexander Cotrina-Sanchez

Sustainability 2025, 17(21), 9745; https://doi.org/10.3390/su17219745 - 31 Oct 2025

Viewed by 223

Abstract

Accurate estimation of aboveground biomass (AGB) is essential for monitoring forage availability and guiding sustainable management in high-altitude pastures, where grazing sustains livelihoods but also drives ecological degradation. Although remote sensing has advanced biomass modeling in rangelands, applications in Andean–Amazonian ecosystems remain limited, particularly using UAV-based structural and spectral data. This study evaluated the potential of UAV LiDAR and multispectral imagery to estimate fresh and dry AGB in ryegrass (Lolium multiflorum Lam.) pastures of Amazonas, Peru. Field data were collected from subplots within 13 plots across two sites (Atuen and Molinopampa) and modeled using Random Forest (RF), Support Vector Machines, and Elastic Net. AGB maps were generated at 0.2 m and 1 m resolutions. Results revealed clear site- and month-specific contrasts, with Atuen yielding higher AGB than Molinopampa, linked to differences in climate, topography, and grazing intensity. RF achieved the best accuracy, with chlorophyll-sensitive indices dominating fresh biomass estimation, while LiDAR-derived height metrics contributed more to dry biomass prediction. Predicted maps captured grazing-induced heterogeneity at fine scales, while aggregated products retained broader gradients. Overall, this study shows the feasibility of UAV-based multi-sensor integration for biomass monitoring and supports adaptive grazing strategies for sustainable management in Andean–Amazonian ecosystems. Full article

(This article belongs to the Section Environmental Sustainability and Applications)

►▼ Show Figures

Figure 1

26 pages, 1644 KB

Open AccessFeature PaperArticle

Improving Utility of Private Join Size Estimation via Shuffling

by Xin Liu, Yibin Mao, Meifan Zhang and Mohan Li

Mathematics 2025, 13(21), 3468; https://doi.org/10.3390/math13213468 - 30 Oct 2025

Viewed by 126

Abstract

Join size estimation plays a crucial role in query optimization, correlation computing, and dataset discovery. A recent study, LDPJoinSketch, has explored the application of local differential privacy (LDP) to protect the privacy of two data sources when estimating their join size. However, the utility of LDPJoinSketch remains unsatisfactory due to the significant noise introduced by perturbation under LDP. In contrast, the shuffle model of differential privacy (SDP) can offer higher utility than LDP, as it introduces randomness based on both shuffling and perturbation. Nevertheless, existing research on SDP primarily focuses on basic statistical tasks, such as frequency estimation and binary summation. There is a paucity of studies addressing queries that involve join aggregation of two private data sources. In this paper, we investigate the problem of private join size estimation in the context of the shuffle model. First, drawing inspiration from the success of sketches in summarizing data under LDP, we propose a sketch-based join size estimation algorithm, SDPJoinSketch, under SDP, which demonstrates greater utility than LDPJoinSketch. We present theoretical proofs of the privacy amplification and utility of our method. Second, we consider separating high- and low-frequency items to reduce the hash-collision error of the sketch and propose an enhanced method called SDPJoinSketch+. Unlike LDPJoinSketch, we utilize secure encryption techniques to preserve frequency properties rather than perturbing them, further enhancing utility. Extensive experiments on both real-world and synthetic datasets validate the superior utility of our methods. Full article

(This article belongs to the Topic Recent Advances in Security, Privacy, and Trust)

►▼ Show Figures

Figure 1

28 pages, 1624 KB

Open AccessArticle

Domain-Constrained Stacking Framework for Credit Default Prediction

by Ming-Liang Ding, Yu-Liang Ma and Fu-Qiang You

Mathematics 2025, 13(21), 3451; https://doi.org/10.3390/math13213451 - 29 Oct 2025

Viewed by 312

Abstract

Accurate and reliable credit risk classification is fundamental to the stability of financial systems and the efficient allocation of capital. However, with the rapid expansion of customer information in both volume and complexity, traditional rule-based or purely statistical approaches have become increasingly inadequate. Motivated by these challenges, this study introduces a domain-constrained stacking ensemble framework that systematically integrates business knowledge with advanced machine learning techniques. First, domain heuristics are embedded at multiple stages of the pipeline: threshold-based outlier removal improves data quality, target variable redefinition ensures consistency with industry practice, and feature discretization with monotonicity verification enhances interpretability. Then, each variable is transformed through Weight-of-Evidence (WOE) encoding and evaluated via Information Value (IV), which enables robust feature selection and effective dimensionality reduction. Next, on this transformed feature space, we train logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), and a two-layer stacking ensemble. Finally, the ensemble aggregates cross-validated out-of-fold predictions from LR, RF and XGBoost as meta-features, which are fused by a meta-level logistic regression, thereby capturing both linear and nonlinear relationships while mitigating overfitting. Experimental results across two credit datasets demonstrate that the proposed framework achieves superior predictive performance compared with single models, highlighting its potential as a practical solution for credit risk assessment in real-world financial applications. Full article

►▼ Show Figures

Figure 1

25 pages, 6415 KB

Open AccessArticle

Microscopic Numerical Simulation of Compressive Performance of Steel-Recycled PET Hybrid Fiber Recycled Concrete

by Shaolong Guo, Qun Lu, Krzysztof Robert Czech and Julita Krassowska

Buildings 2025, 15(21), 3893; https://doi.org/10.3390/buildings15213893 - 28 Oct 2025

Viewed by 240

Abstract

Numerical simulations, unlike experimental studies, eliminate material and setup costs while significantly reducing testing time. In this study, a random distribution program for steel-recycled polyethylene terephthalate hybrid fiber recycled concrete (SRPRAC) was developed in Python (3.11), enabling direct generation in Abaqus. Mesoscopic simulation parameters were calibrated through debugging and sensitivity analysis. The simulations examined the compressive failure mode of SRPRAC and the influence of different factors. Results indicate that larger recycled coarse aggregate particle sizes intensify tensile and compressive damage in the interfacial transition zone between the coarse aggregate and mortar. Loading rate strongly affects outcomes, while smaller mesh sizes yield more stable results. Stronger boundary constraints at the top and bottom surfaces lead to higher peak stress, peak strain, and residual stress. Failure was mainly distributed within the specimen, forming a distinct X-shaped damage zone. Increasing fiber content reduced the equivalent plastic strain area above the compressive failure threshold, though the effect diminished beyond 1% total fiber volume. During initial loading, steel fibers carried higher tensile stresses, whereas recycled polyethylene terephthalate fibers (rPETF) contributed less. After peak load, tensile stress in rPETF increased significantly, complementing the gradual stress increase in steel fibers. The mesoscopic model effectively captured the stress–strain damage behavior of SRPRAC under compression. Full article

(This article belongs to the Special Issue Advances in Modern Structural Engineering: From Materials to Building Structures)

►▼ Show Figures

Figure 1

29 pages, 4329 KB

Open AccessArticle

Using Machine Learning for the Discovery and Development of Multitarget Flavonoid-Based Functional Products in MASLD

by Maksim Kuznetsov, Evgeniya Klein, Daria Velina, Sherzodkhon Mutallibzoda, Olga Orlovtseva, Svetlana Tefikova, Dina Klyuchnikova and Igor Nikitin

Molecules 2025, 30(21), 4159; https://doi.org/10.3390/molecules30214159 - 22 Oct 2025

Viewed by 457

Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of ten MASLD-relevant targets, spanning nuclear receptors (FXR, PPAR-α/γ, THR-β), lipogenic and cholesterogenic enzymes (ACC1, FASN, DGAT2, HMGCR), and transport/regulatory proteins (LIPG, FABP4), was assembled from proteomic evidence. Bioactivity records were extracted from ChEMBL, structurally standardized, and converted into RDKit descriptors. Predictive modeling employed a stacked ensemble of Random Forest, XGBoost, and CatBoost with isotonic calibration, yielding robust performance (mean cross-validated ROC-AUC 0.834; independent test ROC-AUC 0.840). Calibrated probabilities were aggregated into total activity (TA) and weighted TA metrics, combined with structural clustering (six structural clusters, twelve MOA clusters) to ensure chemical diversity. We used physiologically based pharmacokinetic (PBPK) modeling to translate probabilistic profiles into minimum simulated doses (MSDs) and chrono-specific exposure (%T>IC50) for three prototype concepts: HepatoBlend (morning powder), LiverGuard Tea (evening aqueous form), and HDL-Chews (postprandial chew). Integration of physicochemical descriptors (MW, logP, TPSA) guided carrier and encapsulation choices, addressing stability and sensory constraints. The results demonstrate that a computationally integrated pipeline can rationally generate multi-target nutraceutical formulations, linking molecular predictions with systemic coverage and practical formulation specifications, and thus provides a transferable framework for MASLD and related metabolic conditions. Full article

(This article belongs to the Special Issue Analytical Technologies and Intelligent Applications in Future Food)

►▼ Show Figures

Figure 1

21 pages, 2017 KB

Open AccessArticle

Uncovering CO₂ Drivers with Machine Learning in High- and Upper-Middle-Income Countries

by Cosimo Magazzino, Umberto Monarca, Ernesto Cassetta, Alberto Costantiello and Tulia Gattone

Energies 2025, 18(21), 5552; https://doi.org/10.3390/en18215552 - 22 Oct 2025

Viewed by 369

Abstract

Rapid decarbonization relies on knowing which structural and energy factors affect national carbon dioxide emissions. Much of the literature leans on linear and additive assumptions, which may gloss over curvature and interactions in this energy–emissions link. Unlike previous studies, we take a different approach. Using a panel of 80 high- and upper-middle-income countries from 2011 to 2020, we model emissions as a function of fossil fuel energy consumption, methane, the food production index, renewable electricity output, gross domestic product (GDP), and trade measured as trade over GDP. Our contribution is twofold. First, we evaluate how different modeling strategies, from a traditional Generalized Linear Model to more flexible approaches such as Support Vector Machine regression and Random Forest (RF), influence the identification of emission drivers. Second, we use Double Machine Learning (DML) to estimate the incremental effect of fossil fuel consumption while controlling for other variables, offering a more careful interpretation of its likely causal role. Across models, a clear pattern emerges: GDP dominates; fossil fuel energy consumption and methane follow. Renewable electricity output and trade contribute, but to a moderate degree. The food production index adds little in this aggregate, cross-country setting. To probe the mechanism rather than the prediction, we estimate the incremental role of fossil fuel energy consumption using DML with RF nuisance functions. The partial effect remains positive after conditioning on the other covariates. Taken together, the results suggest that economic scale and the fuel mix are the primary levers for near-term emissions profiles, while renewables and trade matter, just less than is often assumed and in ways that may depend on context. Full article

(This article belongs to the Special Issue Policy and Economic Analysis of Energy Systems: 2nd Edition)

►▼ Show Figures

Figure 1

22 pages, 5538 KB

Open AccessArticle

Evaluating Macroalgal Hyperspectral Reflectance Separability in Support of Kelp Mapping

by Gillian S. L. Rowan, Joanna N. Smart, Chris Roelfsema and Stuart R. Phinn

Remote Sens. 2025, 17(20), 3491; https://doi.org/10.3390/rs17203491 - 21 Oct 2025

Viewed by 437

Abstract

Satellite-based Earth Observation (EO) has been proposed as an efficient, replicable, and scale-able method for monitoring kelp forests. Although kelps (Laminariales) have been mapped with multispectral EO, no evaluation of kelps’ separability across genera, and from other macroalgae, has been conducted with image-applicable methods. Since kelps and other macroalgae commonly cooccur, characterising their spectral separability is vital to defining appropriate use-cases, methods, and limitations of mapping them with EO. This work investigates the spectral reflectance separability of three kelps and twelve other macroalgae from three distinct regions of Australia and New Zealand. Separability was evaluated using hierarchical clustering, spectral angle, random forest classification, and linear discrimination classification algorithms. Random forest was most effective (average F1 score = 0.70) at classifying all macroalgae by genus, while the linear discriminant analysis was most effective at differentiating among kelp genera labelled by sampling region (average F1 score = 0.93). The observed intra-class geographic variability indicates that macroalgal spectral reflectance is regionally specific, thereby limiting reference spectrum transferability and large-spatial-extent classification accuracy. Of the four classification methods evaluated, the random forest was best suited to mapping large spatial extents (e.g., >100 s km²). Using aggregated target classes is recommended if relying solely on spectral reflectance information. This work suggests hyperspectral EO could be a useful tool in monitoring ecologically and economically valuable kelp forests with moderate to high confidence. Full article

(This article belongs to the Special Issue Advances in Hyperspectral Data Analysis for Vegetation and Soil Monitoring)

►▼ Show Figures

Figure 1

20 pages, 4170 KB

Open AccessArticle

Optimized Gradient Boosting Framework for Data-Driven Prediction of Concrete Compressive Strength

by Dawei Sun, Ping Zheng, Jun Zhang and Liming Cheng

Buildings 2025, 15(20), 3761; https://doi.org/10.3390/buildings15203761 - 18 Oct 2025

Viewed by 302

Abstract

Given the significant impact of concrete’s compressive strength on structural service life, the development of accurate and efficient prediction methods is critically important. A hybrid machine learning modeling method based on the Whale Optimization Algorithm (WOA)-optimized XGBoost algorithm is proposed. Using 1030 sets of concrete mix proportion data covering eight key parameters—cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, and curing age—the predictive performance of four models (linear regression, random forest, XGBoost, and WOA-XGBoost) was systematically compared. The results demonstrate that the WOA-XGBoost model achieved the highest goodness of fit (R² = 0.9208, MSE = 4.5546), significantly outperforming the other models, and exhibited excellent generalization capability and robustness. Feature importance and SHAP analysis further revealed that curing age, cement content, and water content are the key variables affecting compressive strength, with blast furnace slag showing a significant marginal diminishing effect. This study provides a high-precision data-driven tool for optimizing mix proportions and predicting the strength of complex-component concrete, offering significant application value in promoting the resource utilization of industrial waste and advancing the development of green concrete. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

►▼ Show Figures

Figure 1

17 pages, 414 KB

Open AccessArticle

DQMAF—Data Quality Modeling and Assessment Framework

by Razan Al-Toq and Abdulaziz Almaslukh

Information 2025, 16(10), 911; https://doi.org/10.3390/info16100911 - 17 Oct 2025

Viewed by 549

Abstract

In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only undermines analytics and machine learning models but also exposes unsuspecting users to unreliable services, compromised authentication mechanisms, and biased decision-making processes. Traditional data quality assessment methods, largely based on manual inspection or rigid rule-based validation, cannot cope with the scale, heterogeneity, and velocity of modern data streams. To address this gap, we propose DQMAF (Data Quality Modeling and Assessment Framework), a generalized machine learning–driven approach that systematically profiles, evaluates, and classifies data quality to protect end-users and enhance the reliability of Internet services. DQMAF introduces an automated profiling mechanism that measures multiple dimensions of data quality—completeness, consistency, accuracy, and structural conformity—and aggregates them into interpretable quality scores. Records are then categorized into high, medium, and low quality, enabling downstream systems to filter or adapt their behavior accordingly. A distinctive strength of DQMAF lies in integrating profiling with supervised machine learning models, producing scalable and reusable quality assessments applicable across domains such as social media, healthcare, IoT, and e-commerce. The framework incorporates modular preprocessing, feature engineering, and classification components using Decision Trees, Random Forest, XGBoost, AdaBoost, and CatBoost to balance performance and interpretability. We validate DQMAF on a publicly available Airbnb dataset, showing its effectiveness in detecting and classifying data issues with high accuracy. The results highlight its scalability and adaptability for real-world big data pipelines, supporting user protection, document and text-based classification, and proactive data governance while improving trust in analytics and AI-driven applications. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)

►▼ Show Figures

Figure 1

14 pages, 1149 KB

Open AccessArticle

Modality Information Aggregation Graph Attention Network with Adversarial Training for Multi-Modal Knowledge Graph Completion

by Hankiz Yilahun, Elyar Aili, Seyyare Imam and Askar Hamdulla

Information 2025, 16(10), 907; https://doi.org/10.3390/info16100907 - 16 Oct 2025

Viewed by 292

Abstract

Multi-modal knowledge graph completion (MMKGC) aims to complete knowledge graphs by integrating structural information with multi-modal (e.g., visual, textual, and numerical) features and leveraging cross-modal reasoning within a unified semantic space to infer and supplement missing factual knowledge. Current MMKGC methods have advanced in terms of integrating multi-modal information but have overlooked the imbalance in modality importance for target entities. Treating all modalities equally dilutes critical semantics and amplifies irrelevant information, which in turn limits the semantic understanding and predictive performance of the model. To address these limitations, we proposed a modality information aggregation graph attention network with adversarial training for multi-modal knowledge graph completion (MIAGAT-AT). MIAGAT-AT focuses on hierarchically modeling complex cross-modal interactions. By combining the multi-head attention mechanism with modality-specific projection methods, it precisely captures global semantic dependencies and dynamically adjusts the weight of modality embeddings according to the importance of each modality, thereby optimizing cross-modal information fusion capabilities. Moreover, through the use of random noise and multi-layer residual blocks, the adversarial training generates high-quality multi-modal feature representations, thereby effectively enhancing information from imbalanced modalities. Experimental results demonstrate that our approach significantly outperforms 18 existing baselines and establishes a strong performance baseline across three distinct datasets. Full article

►▼ Show Figures

Figure 1

37 pages, 8931 KB

Open AccessArticle

Predicting the Properties of Polypropylene Fiber Recycled Aggregate Concrete Using Response Surface Methodology and Machine Learning

by Hany A. Dahish and Mohammed K. Alkharisi

Buildings 2025, 15(20), 3709; https://doi.org/10.3390/buildings15203709 - 15 Oct 2025

Viewed by 318

Abstract

The use of recycled coarse aggregate (RCA) concrete and polypropylene fibers (PPFs) presents a sustainable alternative in concrete production. However, the non-linear and interactive effects of RCA and PPF on both fresh and hardened properties are not yet fully quantified. This study employs Response Surface Methodology (RSM) and the Random Forest (RF) algorithm with K-fold cross-validation to predict the combined effect of using recycled coarse aggregate (RCA) as a partial replacement for natural coarse aggregate and polypropylene fiber (PPF) on the engineering properties of RCA-PPF concrete, addressing the critical need for a robust, data-driven modeling framework. A dataset of 144 tested samples obtained from literature was utilized to develop and validate the prediction models. Three input variables were considered in developing the proposed prediction models, namely, RCA, PPF, and curing age (Age). The examined responses were compressive strength (CS), tensile strength (TS), ultrasonic pulse velocity (UPV), and water absorption (WA). To assess the developed models, statistical metrics were calculated, and analysis of variance (ANOVA) was employed. Afterwards, the responses were optimized using optimization in RSM. The optimal results of responses by maximizing TS, CS, and UPV and minimizing WA were achieved at a PPF of 3% by volume of concrete and an RCA of approximately 100% replacing natural coarse aggregate, highlighting optimal reuse of recycled aggregate, with an AGE of 83.6 days. The RF model demonstrated superior performance, significantly outperforming the RSM model. Feature importance analysis via SHAP values was employed to identify the most effective parameters on the predictions. The results confirm that ML techniques provide a powerful and accurate tool for optimizing sustainable concrete mixes. Full article

(This article belongs to the Section Building Structures)

►▼ Show Figures

Figure 1

19 pages, 3358 KB

Open AccessArticle

Iterative Genetic Algorithm to Improve Optimization of a Residential Virtual Power Plant

by Anas Abdullah Alvi, Luis Martínez-Caballero, Enrique Romero-Cadaval, Eva González-Romera and Mariusz Malinowski

Energies 2025, 18(20), 5377; https://doi.org/10.3390/en18205377 - 13 Oct 2025

Viewed by 343

Abstract

With the increasing penetration of renewable energy such as solar and wind power into the grid as well as the addition of modern types of versatile loads such as electric vehicles, the grid system is more prone to system failure and instability. One of the possible solutions to mitigate these conditions and increase the system efficiency is the integration of virtual power plants into the system. Virtual power plants can aggregate distributed energy resources such as renewable energy systems, electric vehicles, flexible loads, and energy storage, thus allowing for better coordination and optimization of these resources. This paper proposes a genetic algorithm-based optimization to coordinate the different elements of the energy management system of a virtual power plant, such as the energy storage system and charging/discharging of electric vehicles. It also deals with the random behavior of the genetic algorithm and its failure to meet certain constraints in the final solution. A novel method is proposed to mitigate these problems that combines a genetic algorithm in the first stage, followed by a gradient-based method in the second stage, consequently reducing the overall electricity bill by 50.2% and the simulation time by almost 95%. The performance is evaluated considering the reference set-points of operation from the obtained solution of the energy storage and electric vehicles by performing tests using a detailed model where power electronics converters and their local controllers are also taken into account. Full article

(This article belongs to the Special Issue Energy Systems Digitalization: Challenges and Opportunities for a Sustainable Future)

►▼ Show Figures

Figure 1

25 pages, 1453 KB

Open AccessArticle

Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data

by Dorsa Farahmandazad, Kasra Danesh and Hossein Fazel Najaf Abadi

Risks 2025, 13(10), 198; https://doi.org/10.3390/risks13100198 - 13 Oct 2025

Viewed by 478

Abstract

Medicare fraud poses a substantial challenge to healthcare systems, resulting in significant financial losses and undermining the quality of care provided to legitimate beneficiaries. This study investigates the use of machine learning (ML) to enhance Medicare fraud detection, addressing key challenges such as class imbalance, high-dimensional data, and evolving fraud patterns. A dataset comprising inpatient claims, outpatient claims, and beneficiary details was used to train and evaluate five ML models: Random Forest, KNN, LDA, Decision Tree, and AdaBoost. Data preprocessing techniques included resampling SMOTE method to address the class imbalance, feature selection for dimensionality reduction, and aggregation of diagnostic and procedural codes. Random Forest emerged as the best-performing model, achieving a training accuracy of 99.2% and validation accuracy of 98.8%, and F1-score (98.4%). The Decision Tree also performed well, achieving a validation accuracy of 96.3%. KNN and AdaBoost demonstrated moderate performance, with validation accuracies of 79.2% and 81.1%, respectively, while LDA struggled with a validation accuracy of 63.3% and a low recall of 16.6%. The results highlight the importance of advanced resampling techniques, feature engineering, and adaptive learning in detecting Medicare fraud effectively. This study underscores the potential of machine learning in addressing the complexities of fraud detection. Future work should explore explainable AI and hybrid models to improve interpretability and performance, ensuring scalable and reliable fraud detection systems that protect healthcare resources and beneficiaries. Full article

(This article belongs to the Special Issue Artificial Intelligence Risk Management)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 22.

Go to page 1 2 3 4 5

Search Results (1,069)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI