MDPI - Publisher of Open Access Journals

25 pages, 1201 KB

Open AccessArticle

Gradient Boosting Framework with Weight of Evidence Encoding for Vehicle Credit Default Prediction Under Extreme Class Imbalance

by Zehra Keskin and Vildan Özkır

Mathematics 2026, 14(11), 1935; https://doi.org/10.3390/math14111935 - 2 Jun 2026

Viewed by 302

Abstract

Accurate prediction of loan defaults is essential for financial institutions seeking to minimize credit losses and maintain portfolio stability. In the vehicle financing segment of emerging markets, real-world datasets frequently exhibit extreme class imbalance ratios that far exceed those encountered in standard benchmark [...] Read more.

Accurate prediction of loan defaults is essential for financial institutions seeking to minimize credit losses and maintain portfolio stability. In the vehicle financing segment of emerging markets, real-world datasets frequently exhibit extreme class imbalance ratios that far exceed those encountered in standard benchmark corpora, posing severe challenges for conventional machine learning pipelines. This study introduces a gradient boosting framework integrating Weight of Evidence (WoE) transformation, Bayesian hyperparameter optimization, and three complementary classifiers—Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost)—to predict vehicle loan default risk. The methodology is evaluated on a large-scale, fully anonymized Turkish vehicle loan dataset (

N = 207, 572

) with an extreme imbalance ratio of 1:1133 (183 defaults versus 207,389 non-defaults). A strict three-way data partition (60% training, 20% validation, 20% test) is adopted to ensure leakage-free model selection and unbiased performance estimation. A multi-stage experimental pipeline is developed encompassing: (i) statistical feature selection via Mann–Whitney U and chi-square tests with adaptive thresholding, (ii) a comparative analysis of seven resampling strategies including Synthetic Minority Oversampling Technique (SMOTE) variants, Adaptive Synthetic Sampling (ADASYN), and focal loss weighting, (iii) a greedy forward selection ensemble procedure for heterogeneous model fusion, and (iv) a systematic training-set size sensitivity analysis across eight majority undersampling ratios. Under the leakage-free evaluation protocol, the highest-AUC individual model (LightGBM with SMOTE-ENN) achieves an Area Under the Curve (AUC) Receiver Operating Characteristic (ROC) of 0.710 (95% bootstrap CI: 0.614–0.798), while CatBoost with cost-sensitive weighting exhibits superior operational metrics (KS

= 0.389

, PR-AUC

= 0.011

). The greedy ensemble procedure exhibits high selection instability with only 37 validation-set positives, providing a methodological finding on the minimum sample requirements for reliable ensemble construction under extreme scarcity. Ablation results confirm that WoE encoding contributes 3.1 percentage points to the overall AUC gain. Tree SHAP-based interpretability analysis identifies the financing-to-age ratio, WoE-encoded occupation group, and log financing amount as the primary predictive drivers, with cross-model stability confirmed via Spearman rank correlation. A decision support analysis provides precision–recall curves, a Brier score of 0.0082, reliability diagrams, and threshold-dependent performance at operationally plausible review rates. Fairness evaluation across gender and marital status subgroups demonstrates that threshold-dependent metrics such as Disparate Impact Ratio and Equalized Odds Gap are inherently compromised under extreme minority scarcity, whereas rank-based subgroup AUC analysis with bootstrap 95% confidence intervals preserves meaningful discriminative assessment. These findings provide an empirically validated framework for credit default prediction in highly imbalanced and data-scarce financial environments. Full article

(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Personal Finance and Financial Services Industry)

► Show Figures

Figure 1

18 pages, 2436 KB

Open AccessArticle

MechaForge: A Multi-Strategy Time-Series Synthesis Framework for Intelligent Fault Diagnosis

by Xiyang Zhang, Xia Liu, Feiyang Li, Yi Hu, Dong Yu and Yongze Ma

Appl. Sci. 2026, 16(9), 4566; https://doi.org/10.3390/app16094566 - 6 May 2026

Viewed by 326

Abstract

Intelligent fault diagnosis of rotating machinery is essential for manufacturing reliability and predictive maintenance, yet deployment of deep learning models is limited by data scarcity: fault samples are rare, costly, and hazardous to obtain. Conventional synthetic data methods such as Generative Adversarial Networks [...] Read more.

Intelligent fault diagnosis of rotating machinery is essential for manufacturing reliability and predictive maintenance, yet deployment of deep learning models is limited by data scarcity: fault samples are rare, costly, and hazardous to obtain. Conventional synthetic data methods such as Generative Adversarial Networks and Variational Autoencoders often exhibit mode collapse, spectral distortion, and limited physical interpretability. This work presents MechaForge, a multi-strategy framework that employs Large Language Models (LLMs) as physics-guided generators for bearing fault time-series data. The approach is grounded in bearing kinematics, Motor Current Signature Analysis (MCSA), and the interpretation of in-context learning as implicit Bayesian inference. Within MechaForge, four progressively constrained tracks are defined: a real-data baseline, few-shot LLM mimicry, multi-stage semantic reasoning, and physics-guided generation with constraints on root mean square, kurtosis, and fault-band spectral energy. For direct benchmarking, conventional VAE- and GAN-based augmentation baselines are additionally evaluated under the same dataset split, synthetic-data budget, downstream CNN architecture, and evaluation metrics. Experiments on the Paderborn bearing dataset show that the Basic LLM track achieves the strongest performance under the present protocol (0.7862 accuracy, 0.7648 macro-F1), exceeding the added VAE and GAN baselines (both 0.7428 accuracy; 0.7202 and 0.7257 macro-F1, respectively), while a control experiment confirms that synthetic data provides discriminative structure rather than labeled noise. These results indicate the promise of LLM-based diagnostic augmentation under data scarcity in the present Paderborn setting, rather than a definitive demonstration of broad transferability across fault-diagnosis scenarios. Full article

(This article belongs to the Special Issue AI Applications in Modern Industrial Systems)

► Show Figures

Figure 1

19 pages, 3597 KB

Open AccessArticle

Research and Application of an Intelligent Cable-Controlled Injection–Production Integration and Control System

by Jianhua Bai, Zheng Chen, Wei Zhang, Zhaochuan Zhou, Liu Wang, Yuande Xu, Shaojiu Jiang, Chengtao Zhu, Zhijun Liu, Le Zhang, Zechao Huang, Qiang Wang, Zhixiong Zhang, Chenwei Zou, Xiaodong Tang and Yukun Du

Processes 2026, 14(8), 1238; https://doi.org/10.3390/pr14081238 - 13 Apr 2026

Cited by 1 | Viewed by 525

Abstract

During offshore oilfield development, traditional injection–production processes commonly suffer from delayed regulation, low operational efficiency, and heavy reliance on manual intervention. Achieving real-time diagnosis of injection–production anomalies and dynamic optimization under complex geological conditions and harsh marine environments represents a core scientific challenge. [...] Read more.

During offshore oilfield development, traditional injection–production processes commonly suffer from delayed regulation, low operational efficiency, and heavy reliance on manual intervention. Achieving real-time diagnosis of injection–production anomalies and dynamic optimization under complex geological conditions and harsh marine environments represents a core scientific challenge. This study presents the development and field deployment of an intelligent cable-controlled injection–production integrated management system. The work is positioned as an application- and system-oriented study, focusing on addressing practical challenges in offshore oilfield operations through the integration of established machine learning techniques into a cohesive operational platform. The system employs a cloud-native microservice architecture and integrates nine functional modules, enabling closed-loop management from data acquisition to intelligent decision making. Key methodological contributions include: (1) a weighted ensemble model combining Random Forest and SVM for blockage diagnosis, balancing global feature learning with boundary sample discrimination to achieve 92% diagnostic accuracy; (2) a Bayesian fusion framework that integrates static geological priors with dynamic sensitivity analysis for probabilistic quantification of injector–producer connectivity, achieving 85% identification accuracy with rigorous uncertainty propagation; and (3) a three-stage human–machine collaborative mechanism that substantially reduces anomaly response latency while ensuring field safety. Field application in Bohai oilfields demonstrates that the system shortens the injection–production response cycle by approximately 42%, reduces anomaly response time from over 72 h to less than 2 h (a 97% reduction), decreases water consumption per ton of oil by 27.6%, and increases injection–production uptime by 11.3 percentage points. This study provides an interpretable, extensible, and closed-loop technical solution for intelligent offshore oilfield development, with future directions including digital twin predictive simulation and reinforcement learning for real-time optimization. Full article

(This article belongs to the Special Issue Applications of Intelligent Models in the Petroleum Industry)

► Show Figures

Figure 1

21 pages, 1195 KB

Open AccessArticle

Interpretable Machine Learning to Predict Metformin-Induced Vitamin B12 Deficiency: Association with Glycemic Control and Neuropathic Symptoms

by Yasmine Salhi, Meriem Yazidi, Amine Dhraief, Elyes Kamoun, Melika Chihaoui, Tamim Alsuliman and Layth Sliman

Metabolites 2026, 16(4), 227; https://doi.org/10.3390/metabo16040227 - 30 Mar 2026

Viewed by 867

Abstract

Background/Objectives: Vitamin B12 deficiency is a common but often underdiagnosed complication in patients with type 2 diabetes (T2D) undergoing long-term metformin therapy. Accurate early prediction could enable targeted screening and timely intervention. This study aimed to develop and interpret a machine learning model [...] Read more.

Background/Objectives: Vitamin B12 deficiency is a common but often underdiagnosed complication in patients with type 2 diabetes (T2D) undergoing long-term metformin therapy. Accurate early prediction could enable targeted screening and timely intervention. This study aimed to develop and interpret a machine learning model for predicting vitamin B12 deficiency in metformin-treated patients with T2D, using eXtreme Gradient Boosting (XGBoost). Methods: A retrospective cross-sectional study was conducted at a single endocrinology centre (La Rabta University Hospital, Tunis, Tunisia). Patients with T2D treated with metformin for at least three years were included (n = 257); those with conditions independently affecting vitamin B12 metabolism were excluded. Vitamin B12 deficiency was defined as a serum B12 level below 150 pmol/L or a borderline level (150–221 pmol/L) with concurrent hyperhomocysteinemia (>15 μmol/L). XGBoost was selected after comparison with Logistic Regression (L2), Random Forest, and Support Vector Machine on the same 5-fold stratified cross-validated pipeline. Hyperparameters were optimized via Bayesian search (100 iterations × 5-fold stratified cross-validation), with the Matthews correlation coefficient (MCC) as the primary optimization metric to account for class imbalance. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). Discrimination and calibration were assessed on an independent test set using bootstrap 95% confidence intervals (2000 resamples). Results: Of 257 patients, 95 (37.0%) presented with vitamin B12 deficiency. On the independent test set (n = 52), the optimized XGBoost model achieved an ROC-AUC of 0.671 [95% CI: 0.514–0.818], sensitivity of 0.737 [95% CI: 0.533–0.938], specificity of 0.545 [95% CI: 0.375–0.710], MCC of 0.273 [95% CI: 0.018–0.517], and a Brier Score of 0.259. SHAP analysis identified HbA1c, microalbuminuria, autonomic neuropathy, BMI, DN4 score, and fasting glucose as the most influential predictors. Nonlinear SHAP interaction plots revealed an increased predicted risk in patients with low HbA1c combined with a high cumulative metformin dose. Conclusions: The XGBoost–SHAP framework provided interpretable predictions of vitamin B12 deficiency in patients with T2D on metformin, identifying key clinical profiles for targeted screening. External multi-centre validation is required before clinical deployment. Full article

(This article belongs to the Special Issue Metabolic Dysfunction in Diabetic Neuropathy)

► Show Figures

Graphical abstract

23 pages, 2689 KB

Open AccessArticle

Integrating Surveillance and Stakeholder Insights to Predict Influenza Epidemics: A Bayesian Network Study in Queensland, Australia

by Oz Sahin, Hai Phung, Andrea Standke, Mohana Rajmokan, Alex Raulli, Amy York and Patricia Lee

Int. J. Environ. Res. Public Health 2026, 23(1), 69; https://doi.org/10.3390/ijerph23010069 - 1 Jan 2026

Viewed by 1120

Abstract

Seasonal influenza continues to pose a substantial and recurrent public health challenge in Queensland, driven by annual variability in transmission and uncertainty in climatic, demographic, and behavioural determinants. Predictive modelling is constrained by data limitations and parameter uncertainty. In response, this study developed [...] Read more.

Seasonal influenza continues to pose a substantial and recurrent public health challenge in Queensland, driven by annual variability in transmission and uncertainty in climatic, demographic, and behavioural determinants. Predictive modelling is constrained by data limitations and parameter uncertainty. In response, this study developed a Bayesian network (BN) model to estimate the probability of influenza epidemics in Queensland, Australia. The model integrated diverse inputs, including international and local influenza surveillance data, demographic health statistics, and expert and stakeholder insights to capture the complex multifactorial causal relationships underlying epidemic risk. Scenario-based simulations revealed that Southeast Asian viral origin, severe global influenza seasons, peak season timing, increasing international travel, absence of control measures, and low immunisation rates substantially elevate the likelihood of influenza epidemics. Southeast Queensland was identified as particularly vulnerable under high-risk conditions. Model evaluation demonstrated good discriminative performance (AUC = 0.6974, accuracy = 70%) with appropriate uncertainty quantification through credible intervals and sensitivity analysis. Its modular design and capacity for integrating various data sources make it a practical decision-making support tool for public health preparedness and responding to evolving climatic and epidemiological conditions. Full article

► Show Figures

Graphical abstract

18 pages, 3720 KB

Open AccessArticle

Double-Weighted Bayesian Model Combination for Metabolomics Data Description and Prediction

by Jacopo Troisi, Martina Lombardi, Alessio Trotta, Vera Abenante, Andrea Ingenito, Nicole Palmieri, Sean M. Richards, Steven J. K. Symes and Pierpaolo Cavallo

Metabolites 2025, 15(4), 214; https://doi.org/10.3390/metabo15040214 - 21 Mar 2025

Cited by 1 | Viewed by 1592

Abstract

Background/Objectives: This study presents a novel double-weighted Bayesian Ensemble Machine Learning (DW-EML) model aimed at improving the classification and prediction of metabolomics data. This discipline, which involves the comprehensive analysis of metabolites in a biological system, provides valuable insights into complex biological processes [...] Read more.

Background/Objectives: This study presents a novel double-weighted Bayesian Ensemble Machine Learning (DW-EML) model aimed at improving the classification and prediction of metabolomics data. This discipline, which involves the comprehensive analysis of metabolites in a biological system, provides valuable insights into complex biological processes and disease states. As metabolomics assumes an increasingly prominent role in the diagnosis of human diseases and in precision medicine, there is a pressing need for more robust artificial intelligence tools that can offer enhanced reliability and accuracy in medical applications. The proposed DW-EML model addresses this by integrating multiple classifiers within a double-weighted voting scheme, which assigns weights based on the cross-validation accuracy and classification confidence, ensuring a more reliable prediction framework. Methods: The model was applied to publicly available datasets derived from studies on critical illness in children, chronic typhoid carriage, and early detection of ovarian cancer. Results: The results demonstrate that the DW-EML approach outperformed methods traditionally used in metabolomics, such as the Partial Least Squares Discriminant Analysis in terms of accuracy and predictive power. Conclusions: The DW-EML model is a promising tool for metabolomic data analysis, offering enhanced robustness and reliability for diagnostic and prognostic applications and potentially contributing to the advancement of personalized and precision medicine. Full article

(This article belongs to the Section Bioinformatics and Data Analysis)

► Show Figures

Figure 1

26 pages, 8409 KB

Open AccessArticle

Artificial Intelligence Analysis and Reverse Engineering of Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using Gene Expression Data

by Joaquim Carreras, Yara Yukie Kikuti, Masashi Miyaoka, Saya Miyahara, Giovanna Roncador, Rifat Hamoudi and Naoya Nakamura

BioMedInformatics 2024, 4(1), 295-320; https://doi.org/10.3390/biomedinformatics4010017 - 26 Jan 2024

Cited by 6 | Viewed by 3603

Abstract

Diffuse large B-cell lymphoma is one of the most frequent mature B-cell hematological neoplasms and non-Hodgkin lymphomas. Despite advances in diagnosis and treatment, clinical evolution is unfavorable in a subset of patients. Using molecular techniques, several pathogenic models have been proposed, including cell-of-origin [...] Read more.

Diffuse large B-cell lymphoma is one of the most frequent mature B-cell hematological neoplasms and non-Hodgkin lymphomas. Despite advances in diagnosis and treatment, clinical evolution is unfavorable in a subset of patients. Using molecular techniques, several pathogenic models have been proposed, including cell-of-origin molecular classification; Hans’ classification and derivates; and the Schmitz, Chapuy, Lacy, Reddy, and Sha models. This study introduced different machine learning techniques and their classification. Later, several machine learning techniques and artificial neural networks were used to predict the DLBCL subtypes with high accuracy (100–95%), including Germinal center B-cell like (GCB), Activated B-cell like (ABC), Molecular high-grade (MHG), and Unclassified (UNC), in the context of the data released by the REMoDL-B trial. In order of accuracy (MHG vs. others), the techniques were XGBoost tree (100%); random trees (99.9%); random forest (99.5%); and C5, Bayesian network, SVM, logistic regression, KNN algorithm, neural networks, LSVM, discriminant analysis, CHAID, C&R tree, tree-AS, Quest, and XGBoost linear (99.4–91.1%). The inputs (predictors) were all the genes of the array and a set of 28 genes related to DLBCL-Burkitt differential expression. In summary, artificial intelligence (AI) is a useful tool for predictive analytics using gene expression data. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Graphical abstract

12 pages, 3053 KB

Open AccessArticle

Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling

by Sunghae Jun

Computers 2023, 12(12), 258; https://doi.org/10.3390/computers12120258 - 10 Dec 2023

Cited by 11 | Viewed by 2876

Abstract

In big data analysis, various zero-inflated problems are occurring. In particular, the problem of inflated zeros has a great influence on text big data analysis. In general, the preprocessed data from text documents are a matrix consisting of the documents and terms for [...] Read more.

In big data analysis, various zero-inflated problems are occurring. In particular, the problem of inflated zeros has a great influence on text big data analysis. In general, the preprocessed data from text documents are a matrix consisting of the documents and terms for row and column, respectively. Each element of this matrix is an occurred frequency of term in a document. Most elements of the matrix are zeros, because the number of columns is much larger than the rows. This problem is a cause of decreasing model performance in text data analysis. To overcome this problem, we propose a method of zero-inflated text data analysis using generative adversarial networks (GAN) and statistical modeling. In this paper, we solve the zero-inflated problem using synthetic data generated from the original data with zero inflation. The main finding of our study is how to change zero values to the very small numeric values with random noise through the GAN. The generator and discriminator of the GAN learned the zero-inflated text data together and built a model that generates synthetic data that can replace the zero-inflated data. We conducted experiments and showed the results, using real and simulation data sets to verify the improved performance of our proposed method. In our experiments, we used five quantitative measures, prediction sum of squares, R-squared, log-likelihood, Akaike information criterion and Bayesian information criterion to evaluate the model’s performance between original and synthetic data sets. We found that all performances of our proposed method are better than the traditional methods. Full article

(This article belongs to the Special Issue Uncertainty-Aware Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 2752 KB

Open AccessReview

Recent Progress of Machine Learning Algorithms for the Oil and Lubricant Industry

by Md Hafizur Rahman, Sadat Shahriar and Pradeep L. Menezes

Lubricants 2023, 11(7), 289; https://doi.org/10.3390/lubricants11070289 - 10 Jul 2023

Cited by 27 | Viewed by 8401

Abstract

Machine learning (ML) algorithms have brought about a revolution in many industries where otherwise operation time, cost, and safety would have been compromised. Likewise, in lubrication research, ML has been utilized on many occasions. This review provides an in-depth understanding of seven ML [...] Read more.

Machine learning (ML) algorithms have brought about a revolution in many industries where otherwise operation time, cost, and safety would have been compromised. Likewise, in lubrication research, ML has been utilized on many occasions. This review provides an in-depth understanding of seven ML algorithms from a tribological perspective. More specifically, it presents a comprehensive overview of recent advancements in ML applied to lubrication research, organized into four distinct categories. The first category, experimental parameter prediction, highlights the significant contributions of artificial neural networks (ANNs) in accurately forecasting operating conditions related to friction and wear. These predictions offer valuable insights that aid in forensic preparation. Discriminant analysis, Bayesian modeling, and transfer learning approaches have also been used to predict experimental parameters. Second, to predict the lubrication film thickness and identify the lubrication regime, algorithms such as logistic regression and ANN were useful. Such predictions provide up to 99.25% accuracy. Third, to predict the friction and wear for a given experimental condition, support vector machine (SVM), polynomial regression, and ANN offered an accuracy above 93%. Finally, for condition monitoring for bearings, gearboxes, gear trains, and similar critical situations where regular in-person inspection is difficult, Naïve Bayes, SVM, decision trees, and ANN were utilized to predict the safe life of lubricants. This review highlighted these four aspects with state-of-the-art examples and discussed the current situation and projected future possibilities of lubricant design facilitated by ML techniques. Full article

(This article belongs to the Special Issue Tribology and Machine Learning: New Perspectives and Challenges)

► Show Figures

Figure 1

12 pages, 1606 KB

Open AccessArticle

Predicting Breast Cancer Events in Ductal Carcinoma In Situ (DCIS) Using Generative Adversarial Network Augmented Deep Learning Model

by Soumya Ghose, Sanghee Cho, Fiona Ginty, Elizabeth McDonough, Cynthia Davis, Zhanpan Zhang, Jhimli Mitra, Adrian L. Harris, Aye Aye Thike, Puay Hoon Tan, Yesim Gökmen-Polar and Sunil S. Badve

Cancers 2023, 15(7), 1922; https://doi.org/10.3390/cancers15071922 - 23 Mar 2023

Cited by 7 | Viewed by 3239

Abstract

Standard clinicopathological parameters (age, growth pattern, tumor size, margin status, and grade) have been shown to have limited value in predicting recurrence in ductal carcinoma in situ (DCIS) patients. Early and accurate recurrence prediction would facilitate a more aggressive treatment policy for high-risk [...] Read more.

Standard clinicopathological parameters (age, growth pattern, tumor size, margin status, and grade) have been shown to have limited value in predicting recurrence in ductal carcinoma in situ (DCIS) patients. Early and accurate recurrence prediction would facilitate a more aggressive treatment policy for high-risk patients (mastectomy or adjuvant radiation therapy), and simultaneously reduce over-treatment of low-risk patients. Generative adversarial networks (GAN) are a class of DL models in which two adversarial neural networks, generator and discriminator, compete with each other to generate high quality images. In this work, we have developed a deep learning (DL) classification network that predicts breast cancer events (BCEs) in DCIS patients using hematoxylin and eosin (H & E) images. The DL classification model was trained on 67 patients using image patches from the actual DCIS cores and GAN generated image patches to predict breast cancer events (BCEs). The hold-out validation dataset (n = 66) had an AUC of 0.82. Bayesian analysis further confirmed the independence of the model from classical clinicopathological parameters. DL models of H & E images may be used as a risk stratification strategy for DCIS patients to personalize therapy. Full article

(This article belongs to the Collection Artificial Intelligence and Machine Learning in Cancer Research)

► Show Figures

Figure 1

20 pages, 3786 KB

Open AccessArticle

Modified Self-Adaptive Bayesian Algorithm for Smart Heart Disease Prediction in IoT System

by Ahmad F. Subahi, Osamah Ibrahim Khalaf, Youseef Alotaibi, Rajesh Natarajan, Natesh Mahadev and Timmarasu Ramesh

Sustainability 2022, 14(21), 14208; https://doi.org/10.3390/su142114208 - 31 Oct 2022

Cited by 65 | Viewed by 6918

Abstract

Heart disease (HD) has surpassed all other causes of death in recent years. Estimating one’s risk of developing heart disease is difficult, since it takes both specialized knowledge and practical experience. The collection of sensor information for the diagnosis and prognosis of cardiac [...] Read more.

Heart disease (HD) has surpassed all other causes of death in recent years. Estimating one’s risk of developing heart disease is difficult, since it takes both specialized knowledge and practical experience. The collection of sensor information for the diagnosis and prognosis of cardiac disease is a recent application of Internet of Things (IoT) technology in healthcare organizations. Despite the efforts of many scientists, the diagnostic results for HD remain unreliable. To solve this problem, we offer an IoT platform that uses a Modified Self-Adaptive Bayesian algorithm (MSABA) to provide more precise assessments of HD. When the patient wears the smartwatch and pulse sensor device, it records vital signs, including electrocardiogram (ECG) and blood pressure, and sends the data to a computer. The MSABA is used to determine whether the sensor data that has been obtained is normal or abnormal. To retrieve the features, the kernel discriminant analysis (KDA) is used. By contrasting the suggested MSABA with existing models, we can summarize the system’s efficacy. Findings like accuracy, precision, recall, and F1 measures show that the suggested MSABA-based prediction system outperforms competing approaches. The suggested method demonstrates that the MSABA achieves the highest rate of accuracy compared to the existing classifiers for the largest possible amount of data. Full article

► Show Figures

Figure 1

46 pages, 38427 KB

Open AccessArticle

Artificial Intelligence Predicted Overall Survival and Classified Mature B-Cell Neoplasms Based on Immuno-Oncology and Immune Checkpoint Panels

by Joaquim Carreras, Giovanna Roncador and Rifat Hamoudi

Cancers 2022, 14(21), 5318; https://doi.org/10.3390/cancers14215318 - 28 Oct 2022

Cited by 31 | Viewed by 7995

Abstract

Artificial intelligence (AI) can identify actionable oncology biomarkers. This research integrates our previous analyses of non-Hodgkin lymphoma. We used gene expression and immunohistochemical data, focusing on the immune checkpoint, and added a new analysis of macrophages, including 3D rendering. The AI comprised machine [...] Read more.

Artificial intelligence (AI) can identify actionable oncology biomarkers. This research integrates our previous analyses of non-Hodgkin lymphoma. We used gene expression and immunohistochemical data, focusing on the immune checkpoint, and added a new analysis of macrophages, including 3D rendering. The AI comprised machine learning (C5, Bayesian network, C&R, CHAID, discriminant analysis, KNN, logistic regression, LSVM, Quest, random forest, random trees, SVM, tree-AS, and XGBoost linear and tree) and artificial neural networks (multilayer perceptron and radial basis function). The series included chronic lymphocytic leukemia, mantle cell lymphoma, follicular lymphoma, Burkitt, diffuse large B-cell lymphoma, marginal zone lymphoma, and multiple myeloma, as well as acute myeloid leukemia and pan-cancer series. AI classified lymphoma subtypes and predicted overall survival accurately. Oncogenes and tumor suppressor genes were highlighted (MYC, BCL2, and TP53), along with immune microenvironment markers of tumor-associated macrophages (M2-like TAMs), T-cells and regulatory T lymphocytes (Tregs) (CD68, CD163, MARCO, CSF1R, CSF1, PD-L1/CD274, SIRPA, CD85A/LILRB3, CD47, IL10, TNFRSF14/HVEM, TNFAIP8, IKAROS, STAT3, NFKB, MAPK, PD-1/PDCD1, BTLA, and FOXP3), apoptosis (BCL2, CASP3, CASP8, PARP, and pathway-related MDM2, E2F1, CDK6, MYB, and LMO2), and metabolism (ENO3, GGA3). In conclusion, AI with immuno-oncology markers is a powerful predictive tool. Additionally, a review of recent literature was made. Full article

(This article belongs to the Topic Artificial Intelligence in Cancer Diagnosis and Therapy)

► Show Figures

Figure 1

16 pages, 1841 KB

Open AccessArticle

Risk Factors for, and Prediction of, Shoulder Pain in Young Badminton Players: A Prospective Cohort Study

by Antonio Cejudo

Int. J. Environ. Res. Public Health 2022, 19(20), 13095; https://doi.org/10.3390/ijerph192013095 - 12 Oct 2022

Cited by 8 | Viewed by 5433

Abstract

Background: Shoulder pain (SP) caused by hitting the shuttlecock is common in young badminton players. The objectives of the present study were to predict the risk factors for SP in young badminton players, and to determine the optimal risk factor cut-off that best [...] Read more.

Background: Shoulder pain (SP) caused by hitting the shuttlecock is common in young badminton players. The objectives of the present study were to predict the risk factors for SP in young badminton players, and to determine the optimal risk factor cut-off that best discriminates those players who are at higher risk of suffering from SP. Methods: A prospective cohort study was conducted with 45 under-17 badminton players who participated in the Spanish Championship. Data were collected on anthropometric age, sports history, sagittal spinal curves, range of motion (ROM) and maximum isometric strength of shoulder. After 12 months, players completed a SP history questionnaire. Bayesian Student’s t-analysis, binary logistic regression analysis and ROC analysis were performed. Results: Overall, 18 (47.4%) players reported at least one episode of SP. The shoulder internal rotation (SIR) ROM showed the strongest association (OR = 1.122; p = 0.035) with SP. The SIR ROM has an excellent ability to discriminate players at increased risk for SP (p = 0.001). The optimal cut-off for SIR ROM, which predicts players with an 81% probability of developing SP, was set at 55° (sensitivity = 75.0%, specificity = 83.3%). Conclusions: The young badminton players who had a shoulder internal rotation ROM of 55° or less have a higher risk of SP one year later. Full article

(This article belongs to the Special Issue Lower Extremity Diseases, Injuries and Public Health)

► Show Figures

Figure 1

16 pages, 4649 KB

Open AccessArticle

Rock Burst Intensity Classification Prediction Model Based on a Bayesian Hyperparameter Optimization Support Vector Machine

by Shaohong Yan, Yanbo Zhang, Xiangxin Liu and Runze Liu

Mathematics 2022, 10(18), 3276; https://doi.org/10.3390/math10183276 - 9 Sep 2022

Cited by 9 | Viewed by 2525

Abstract

Rock burst disasters occurring in underground high-stress rock mass mining and excavation engineering seriously threaten the safety of workers and hinders the progress of engineering construction. Rock burst classification prediction is the basis of reducing and even eliminating rock burst hazards. Currently, most [...] Read more.

Rock burst disasters occurring in underground high-stress rock mass mining and excavation engineering seriously threaten the safety of workers and hinders the progress of engineering construction. Rock burst classification prediction is the basis of reducing and even eliminating rock burst hazards. Currently, most of mainstream discriminant models for rock burst grade prediction are based on small samples. Comprehensive selection according to many pieces of literature, the maximum tangential stress of surrounding rock and rock uniaxial compressive strength ratio coefficient (stress state parameter), rock uniaxial compressive strength and uniaxial tensile strength ratio (brittleness modulus), and the elastic energy index are used as a grading evaluation index of rock burst based on the collection of different construction engineering instances of rock burst in 114 groups of extensive sample data in different regions of the world, which are used to carry out the training study. The representativeness and accuracy of the index selection were verified by the indicator variance analysis and Spearman correlation coefficient hypothesis test. The Intelligent Rock burst Identification System (IRIS) based on an optimizable SVM model was established using data set samples. After extensive data cross-validation training, the accuracy of the SVM discriminant analysis model can reach 95.6%, which is significantly better than the prediction accuracy of the traditional SVM model of 71.9%. The model is used to classify and predict the rock burst intensity of 10 typical projects at home and abroad. The prediction results are consistent with the actual rock burst intensity, which is better than the discriminant model based on small sample data and other existing prediction models. The application of engineering examples shows that the results of the rock burst intensity classification prediction model based on extensive sample data processing analysis and the SVM discriminant method are in good agreement with the actual rock burst intensity, which can effectively provide a reference for the prediction of rock burst intensity grade in a construction area. Full article

(This article belongs to the Special Issue Engineering Calculation and Data Modeling)

► Show Figures

Figure 1

18 pages, 4519 KB

Open AccessArticle

Artificial Intelligence Analysis of Celiac Disease Using an Autoimmune Discovery Transcriptomic Panel Highlighted Pathogenic Genes including BTLA

by Joaquim Carreras

Healthcare 2022, 10(8), 1550; https://doi.org/10.3390/healthcare10081550 - 16 Aug 2022

Cited by 20 | Viewed by 6028

Abstract

Celiac disease is a common immune-related inflammatory disease of the small intestine caused by gluten in genetically predisposed individuals. This research is a proof-of-concept exercise focused on using Artificial Intelligence (AI) and an autoimmune discovery gene panel to predict and model celiac disease. [...] Read more.

Celiac disease is a common immune-related inflammatory disease of the small intestine caused by gluten in genetically predisposed individuals. This research is a proof-of-concept exercise focused on using Artificial Intelligence (AI) and an autoimmune discovery gene panel to predict and model celiac disease. Conventional bioinformatics, gene set enrichment analysis (GSEA), and several machine learning and neural network techniques were used on a publicly available dataset (GSE164883). Machine learning and deep learning included C5, logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, random trees, SVM, Tree-AS, XGBoost linear, XGBoost tree, CHAID, Quest, C&R tree, random forest, and neural network (multilayer perceptron). As a result, the gene panel predicted celiac disease with high accuracy (95–100%). Several pathogenic genes were identified, some of the immune checkpoint and immuno-oncology pathways. They included CASP3, CD86, CTLA4, FASLG, GZMB, IFNG, IL15RA, ITGAX, LAG3, MMP3, MUC1, MYD88, PRDM1, RGS1, etc. Among them, B and T lymphocyte associated (BTLA, CD272) was highlighted and validated at the protein level by immunohistochemistry in an independent series of cases. Celiac disease was characterized by high BTLA, expressed by inflammatory cells of the lamina propria. In conclusion, artificial intelligence predicted celiac disease using an autoimmune discovery gene panel. Full article

(This article belongs to the Special Issue Artificial Intelligence Applications in Medicine)

► Show Figures

Figure 1

Search Results (30)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (30)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI