Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (141)

Search Parameters:
Keywords = Chi2 feature selection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 582 KiB  
Article
An Empirical Evaluation of Ensemble Models for Python Code Smell Detection
by Rajwant Singh Rao, Seema Dewangan and Alok Mishra
Appl. Sci. 2025, 15(13), 7472; https://doi.org/10.3390/app15137472 - 3 Jul 2025
Viewed by 302
Abstract
Code smells, which represent poor design choices or suboptimal code implementations, reduce software quality and hinder the code maintenance process. Detecting code smells is, therefore, essential during software development. This study introduces a Python-based code smell dataset targeting two smell types: Large Class [...] Read more.
Code smells, which represent poor design choices or suboptimal code implementations, reduce software quality and hinder the code maintenance process. Detecting code smells is, therefore, essential during software development. This study introduces a Python-based code smell dataset targeting two smell types: Large Class and Long Method. Five ensemble learning methods—Bagging, Gradient Boost, Max Voting, AdaBoost, and XGBoost—were employed to detect code smells within these datasets. The ten most significant features were selected using the Chi-square feature selection technique. To address the class imbalance, the SMOTE algorithm was applied. Experimental results yielded a best accuracy score of 0.96 and an MCC of 0.85 for the Large Class dataset using the Max Voting model. For the Long Method dataset, a best accuracy score of 0.98 and an MCC of 0.94 were achieved using the Gradient Boost model in conjunction with Chi-square feature selection. These results highlight the effectiveness of the proposed methodology and its potential to enhance code smell detection in Python significantly, reinforcing confidence in the approach’s thoroughness and applicability. Full article
Show Figures

Figure 1

16 pages, 7509 KiB  
Article
Evaluating the Diagnostic Proficiency Among a Sample of Final Stage Dental Students in Some Orthodontic Cases: A Comprehensive Analysis of Clinical Competence
by Noor Nourie Abbass, Zainab Mousa Kadhom, Wurood Khairallah Al-Lehaibi and Mohammed Nahidh
Dent. J. 2025, 13(7), 300; https://doi.org/10.3390/dj13070300 - 2 Jul 2025
Viewed by 222
Abstract
Background/Objectives: This study evaluates the diagnostic and referral skills of final-year dental students at Al-Iraqia University using a questionnaire based on malocclusion cases ranging from mild to severe. Methods: The questionnaire, featuring photos and radiographs of five selected treated cases from [...] Read more.
Background/Objectives: This study evaluates the diagnostic and referral skills of final-year dental students at Al-Iraqia University using a questionnaire based on malocclusion cases ranging from mild to severe. Methods: The questionnaire, featuring photos and radiographs of five selected treated cases from two textbooks, was answered by 165 students who were asked to assess each case and determine whether orthodontic or surgical treatment was necessary, as well as to identify factors contributing to an unesthetic profile, such as irregular teeth. Frequency distribution and the Chi-square test were used for statistical analysis. Results: The results indicated good overall clinical competence. The unesthetic profile and irregular teeth were the main reasons for referring both Class II and III cases for surgery, with mandibular retrusion being the most common factor in aesthetic concerns. Maxillary protrusion was less frequently selected as a key factor in Class II malocclusion cases. Conclusions: The findings suggest that students demonstrated a high level of diagnostic accuracy in identifying treatment needs for various malocclusion cases. Full article
(This article belongs to the Special Issue Dental Education: Innovation and Challenge)
Show Figures

Figure A1

26 pages, 7645 KiB  
Article
Prediction of Rice Chlorophyll Index (CHI) Using Nighttime Multi-Source Spectral Data
by Cong Liu, Lin Wang, Xuetong Fu, Junzhe Zhang, Ran Wang, Xiaofeng Wang, Nan Chai, Longfeng Guan, Qingshan Chen and Zhongchen Zhang
Agriculture 2025, 15(13), 1425; https://doi.org/10.3390/agriculture15131425 - 1 Jul 2025
Viewed by 384
Abstract
The chlorophyll index (CHI) is a crucial indicator for assessing the photosynthetic capacity and nutritional status of crops. However, traditional methods for measuring CHI, such as chemical extraction and handheld instruments, fall short in meeting the requirements for efficient, non-destructive, and continuous monitoring [...] Read more.
The chlorophyll index (CHI) is a crucial indicator for assessing the photosynthetic capacity and nutritional status of crops. However, traditional methods for measuring CHI, such as chemical extraction and handheld instruments, fall short in meeting the requirements for efficient, non-destructive, and continuous monitoring at the canopy level. This study aimed to explore the feasibility of predicting rice canopy CHI using nighttime multi-source spectral data combined with machine learning models. In this study, ground truth CHI values were obtained using a SPAD-502 chlorophyll meter. Canopy spectral data were acquired under nighttime conditions using a high-throughput phenotyping platform (HTTP) equipped with active light sources in a greenhouse environment. Three types of sensors—multispectral (MS), visible light (RGB), and chlorophyll fluorescence (ChlF)—were employed to collect data across different growth stages of rice, ranging from tillering to maturity. PCA and LASSO regression were applied for dimensionality reduction and feature selection of multi-source spectral variables. Subsequently, CHI prediction models were developed using four machine learning algorithms: support vector regression (SVR), random forest (RF), back-propagation neural network (BPNN), and k-nearest neighbors (KNNs). The predictive performance of individual sensors (MS, RGB, and ChlF) and sensor fusion strategies was evaluated across multiple growth stages. The results demonstrated that sensor fusion models consistently outperformed single-sensor approaches. Notably, during tillering (TI), maturity (MT), and the full growth period (GP), fused models achieved high accuracy (R2 > 0.90, RMSE < 2.0). The fusion strategy also showed substantial advantages over single-sensor models during the jointing–heading (JH) and grain-filling (GF) stages. Among the individual sensor types, MS data achieved relatively high accuracy at certain stages, while models based on RGB and ChlF features exhibited weaker performance and lower prediction stability. Overall, the highest prediction accuracy was achieved during the full growth period (GP) using fused spectral data, with an R2 of 0.96 and an RMSE of 1.99. This study provides a valuable reference for developing CHI prediction models based on nighttime multi-source spectral data. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

30 pages, 2494 KiB  
Article
A Novel Framework for Mental Illness Detection Leveraging TOPSIS-ModCHI-Based Feature-Driven Randomized Neural Networks
by Santosh Kumar Behera and Rajashree Dash
Math. Comput. Appl. 2025, 30(4), 67; https://doi.org/10.3390/mca30040067 - 30 Jun 2025
Viewed by 265
Abstract
Mental illness has emerged as a significant global health crisis, inflicting immense suffering and causing a notable decrease in productivity. Identifying mental health disorders at an early stage allows healthcare professionals to implement more targeted and impactful interventions, leading to a significant improvement [...] Read more.
Mental illness has emerged as a significant global health crisis, inflicting immense suffering and causing a notable decrease in productivity. Identifying mental health disorders at an early stage allows healthcare professionals to implement more targeted and impactful interventions, leading to a significant improvement in the overall well-being of the patient. Recent advances in Artificial Intelligence (AI) have opened new avenues for analyzing medical records and behavioral data of patients to assist mental health professionals in their decision-making processes. In this study performance of four Randomized Neural Networks (RandNNs) such as Board Learning System (BLS), Random Vector Functional Link Network (RVFLN), Kernelized RVFLN (KRVFLN), and Extreme Learning Machine (ELM) are explored for detecting the type of mental illness a user may have by analyzing the random text of the user posted on social media. To improve the performance of the RandNNs during handling the text documents with unbalanced class distributions, a hybrid feature selection (FS) technique named as TOPSIS-ModCHI is suggested in the preprocessing stage of the classification framework. The effectiveness of the suggested FS with all the four randomized networks is assessed over the publicly available Reddit Mental Health Dataset after experimenting on two benchmark multiclass unbalanced datasets. From the experimental results, it is inferred that detecting the mental illness using BLS with TOPSIS-ModCHI produces the highest precision value of 0.92, recall value of 0.66, f-measure value of 0.77, and Hamming loss value of 0.06 as compared to ELM, RVFLN, and KRVFLN with a minimum feature size of 900. Overall, utilizing BLS for mental health analysis can offer a promising avenue toward improved interventions and a better understanding of mental health issues, aiding in decision-making processes. Full article
Show Figures

Figure 1

17 pages, 1348 KiB  
Article
Endo-Periodontal Lesions in Endodontically Treated Teeth with Periapical Pathology
by Mihaela Sălceanu, Anca Melian, Cristina Dascălu, Cristian Giuroiu, Corina Concita, Claudiu Topoliceanu, Diana Melian, Andreea Frumuzache, Sorina Mihaela Solomon and Maria-Alexandra Mârţu
Diagnostics 2025, 15(13), 1663; https://doi.org/10.3390/diagnostics15131663 - 30 Jun 2025
Viewed by 354
Abstract
Background/Objectives: The aim of this study was to identify and assess the independent risk factors and potential predictors for endo-periodontal lesions (EPLs) in endodontically treated teeth with periapical pathology. Methods: The study group included 90 patients (35 men, 55 women; mean age [...] Read more.
Background/Objectives: The aim of this study was to identify and assess the independent risk factors and potential predictors for endo-periodontal lesions (EPLs) in endodontically treated teeth with periapical pathology. Methods: The study group included 90 patients (35 men, 55 women; mean age 47.96 ± 13.495 years) with 126 endodontically treated teeth. Following clinical examinations and radiologic evaluation, 50 patients were diagnosed with endo-periodontal lesions (EPLs) in 64 molars (test group); the control group included 62 endodontically treated teeth without EPLs diagnosed in 40 patients. The independent variables were assessed as risk factors for EPLs. The relationship between patients’ demographic and clinical features and endo-periodontal status was assessed using Chi-squared tests for categorical variables and Student’s t- or Mann–Whitney tests for continuous variables, depending on data distribution. The potential risk factors were characterized by calculating Odds Ratios (ORs) with 95% confidence intervals. The variables included in the multivariate logistic regression model were selected based on their clinical relevance and statistical significance in the univariate analysis. To evaluate the combined effect of the identified risk factors, a binary logistic regression model was constructed using the Enter method. Results: Out of the 126 endodontically treated molars with periapical pathology, 64 teeth (50.8%) were diagnosed with endo-periodontal lesions (EPLs). Patients aged ≥60 years were significantly more represented in the EPL group (32.8%) compared to the control group (12.9%) (p = 0.024). Probing pocket depth ≥ 4 mm was present in 85.9% of teeth with EPLs versus only 30.6% in teeth without EPLs (p < 0.001). Probing pocket depth (PPD) ≥ 4 mm was the strongest predictor (OR = 13.830) and remained significant after adjustment in multivariate analysis (OR = 6.585). PPD ≥ 3.625 mm showed a strong association in univariate analysis (OR = 12.587) and preserved significance in the multivariate model (OR = 6.163). Conclusions: This study highlights age ≥ 60 years and PPD ≥ 4 mm as the most significant independent risk factors for EPLs, emphasizing the need for early periodontal assessment in endodontically treated teeth with periapical pathology. While PPD greater than 3.625 mm is a strong indicator of the presence of EPLs, other factors such as MBL (marginal bone loss) and occlusal considerations appear to have indirect roles in EPL development in endodontically treated teeth with periapical lesions. Full article
(This article belongs to the Section Pathology and Molecular Diagnostics)
Show Figures

Figure 1

24 pages, 3242 KiB  
Article
Integrating Clinical and Transcriptomic Profiles Associated with Vitamin D to Enhance Disease-Free Survival in Cervical Cancer Recurrence Using the CatBoost Algorithm
by Geeitha Senthilkumar, Renuka Pitchaimuthu, Seshathiri Dhanasekaran and Prabu Sankar Panneerselvam
Diagnostics 2025, 15(13), 1579; https://doi.org/10.3390/diagnostics15131579 - 21 Jun 2025
Viewed by 464
Abstract
Background/Objectives: Cervical cancer is a leading cancer-related cause of death among women, with recurrence being a serious clinical issue. Recent evidence demonstrates that long non-coding RNAs (lncRNAs) affect cancer recurrence. This research investigates vitamin D’s regulatory actions in the recurrence of cervical [...] Read more.
Background/Objectives: Cervical cancer is a leading cancer-related cause of death among women, with recurrence being a serious clinical issue. Recent evidence demonstrates that long non-coding RNAs (lncRNAs) affect cancer recurrence. This research investigates vitamin D’s regulatory actions in the recurrence of cervical cancer, centering on the involvement of lncRNA. Clinical data on 738 patients shows that greater serum vitamin D levels are linked to reduced recurrence rates and enhanced disease-free survival (DFS). Methods: A transcriptomic analysis of CaSki cervical cancer cells using data from the GEO dataset GSE267715 identified that vitamin D controls genes that prevent cervical cancer recurrence. Machine learning predictors CatBoost, LightGBM, Extra Trees, and Logistic Regression and feature selection methods such as ANOVA F-test, mutual information, Chi-squared test, and Recursive Feature Elimination (RFE) are used to identify predictors of recurrence, evaluating model performance using accuracy, precision, recall, ROC AUC, confusion matrices, and ROC curves. Result: CatBoost performs the best overall, producing an accuracy of 95.27%. CatBoost provided an ROC AUC of 0.9930, a precision of 0.9296, and a recall of 0.9706, and this implies a significant trade-off between the ability to detect metastatic cases correctly. Conclusions: These data identify the therapeutic potential of vitamin D as a regulatory compound and lncRNA as a potential therapeutic target in the recurrence of cervical cancer. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

19 pages, 4132 KiB  
Article
Comparative Analysis of Deep Learning-Based Feature Extraction and Traditional Classification Approaches for Tomato Disease Detection
by Hakan Terzioğlu, Adem Gölcük, Adnan Mohammad Anwer Shakarji and Mateen Yilmaz Al-Bayati
Agronomy 2025, 15(7), 1509; https://doi.org/10.3390/agronomy15071509 - 21 Jun 2025
Viewed by 406
Abstract
In recent years, significant advancements in artificial intelligence, particularly in the field of deep learning, have increasingly been integrated into agricultural applications, including critical processes such as disease detection. Tomato, being one of the most widely consumed agricultural products globally and highly susceptible [...] Read more.
In recent years, significant advancements in artificial intelligence, particularly in the field of deep learning, have increasingly been integrated into agricultural applications, including critical processes such as disease detection. Tomato, being one of the most widely consumed agricultural products globally and highly susceptible to a variety of fungal, bacterial, and viral pathogens, remains a prominent focus in disease detection research. In this study, we propose a deep learning-based approach for the detection of tomato diseases, a critical challenge in agriculture due to the crop’s vulnerability to fungal, bacterial, and viral pathogens. We constructed an original dataset comprising 6414 images captured under real production conditions, categorized into three image types: leaves, green tomatoes, and red tomatoes. The dataset includes five classes: healthy samples, late blight, early blight, gray mold, and bacterial cancer. Twenty-one deep learning models were evaluated, and the top five performers (EfficientNet-b0, NasNet-Large, ResNet-50, DenseNet-201, and Places365-GoogLeNet) were selected for feature extraction. From each model, 1000 deep features were extracted, and feature selection was conducted using MRMR, Chi-Square (Chi2), and ReliefF methods. The top 100 features from each selection technique were then used for reclassification with traditional machine learning classifiers under five-fold cross-validation. The highest test accuracy of 92.0% was achieved with EfficientNet-b0 features, Chi2 selection, and the Fine KNN classifier. EfficientNet-b0 consistently outperformed other models, while the combination of NasNet-Large and Wide Neural Network yielded the lowest performance. These results demonstrate the effectiveness of combining deep learning-based feature extraction with traditional classifiers and feature selection techniques for robust detection of tomato diseases in real-world agricultural environments. Full article
(This article belongs to the Section Pest and Disease Management)
Show Figures

Figure 1

33 pages, 4434 KiB  
Article
Developing Machine Learning Models for Optimal Design of Water Distribution Networks Using Graph Theory-Based Features
by Iman Bahrami Chegeni, Mohammad Mehdi Riyahi, Amin E. Bakhshipour, Mohamad Azizipour and Ali Haghighi
Water 2025, 17(11), 1654; https://doi.org/10.3390/w17111654 - 29 May 2025
Viewed by 776
Abstract
This study presents an innovative data-driven approach to optimally design water distribution networks (WDNs). The methodology comprises five key stages: Generation of 600 synthetic WDNs with diverse properties, optimized to determine optimal component diameters; Extraction of 80 topological and hydraulic features from the [...] Read more.
This study presents an innovative data-driven approach to optimally design water distribution networks (WDNs). The methodology comprises five key stages: Generation of 600 synthetic WDNs with diverse properties, optimized to determine optimal component diameters; Extraction of 80 topological and hydraulic features from the optimized WDNs using graph theory; preprocessing and preparing the extracted features using established data science methods; Application of six feature selection methods (Variance Threshold, k-best, chi-squared, Light Gradient-Boosting Machine, Permutation, and Extreme Gradient Boosting) to identify the most relevant features for describing optimal diameters; and Integration of the selected features with four machine learning models (Random Forest, Support Vector Machine, Bootstrap Aggregating, and Light Gradient-Boosting Machine), resulting in 24 ensemble models. The Extreme Gradient Boosting-Light Gradient-Boosting Machine (Xg-LGB) model emerged as the optimal choice, achieving R2, MAE, and RMSE values of 0.98, 0.017, and 0.02, respectively. When applied to a benchmark WDN, this model accurately predicted optimal diameters, with R2, MAE, and RMSE values of 0.94, 0.054, and 0.06, respectively. These results highlight the developed model’s potential for the accurate and efficient optimal design of WDNs. Full article
(This article belongs to the Special Issue Advances in Management and Optimization of Urban Water Networks)
Show Figures

Figure 1

19 pages, 630 KiB  
Article
Primary and Emergency Care Use: The Roles of Health Literacy, Patient Activation, and Sleep Quality in a Latent Profile Analysis
by Dietmar Ausserhofer, Verena Barbieri, Stefano Lombardo, Timon Gärtner, Klaus Eisendle, Giuliano Piccoliori, Adolf Engl and Christian J. Wiedermann
Behav. Sci. 2025, 15(6), 724; https://doi.org/10.3390/bs15060724 - 24 May 2025
Viewed by 374
Abstract
Background/Objectives: Healthcare utilization is a behavioral phenomenon influenced by psychosocial factors. This study took place in South Tyrol, a culturally diverse autonomous province in northern Italy, and aimed to identify latent profiles of primary healthcare users based on health literacy, patient activation, sleep [...] Read more.
Background/Objectives: Healthcare utilization is a behavioral phenomenon influenced by psychosocial factors. This study took place in South Tyrol, a culturally diverse autonomous province in northern Italy, and aimed to identify latent profiles of primary healthcare users based on health literacy, patient activation, sleep quality, and service use, and to examine the sociodemographic and health-related predictors of profile membership. Methods: A cross-sectional survey was conducted with a representative adult sample (n = 2090). The participants completed the questionnaire in German or Italian. Latent profiles were identified via model-based clustering using Gaussian mixture modeling and four z-standardized indicators: total primary healthcare contacts (general practice and emergency room visits), HLS-EU-Q16 (health literacy), PAM-10 (patient activation), and B-PSQI (sleep quality). The optimal cluster solution was selected using the Bayesian Information Criterion (BIC). Kruskal–Wallis and chi-square tests were used for between-cluster comparisons of the data. Multinomial logistic regression was used to examine the predictors of cluster membership. Results: Among the 1645 respondents with complete data, a three-cluster solution showed a good model fit (BIC = 19,518; silhouette = 0.130). The identified profiles included ‘Balanced Self-Regulators’ (72.8%), ‘Struggling Navigators’ (25.8%), and ‘Hyper-Engaged Users’ (1.4%). Sleep quality could be used to differentiate between different levels of service use (p < 0.001), while low health literacy and patient activation were key features of the high-utilization groups. Poor sleep and inadequate health literacy were associated with increased healthcare contact. Conclusions: The latent profiling revealed distinct patterns in health care engagement. Behavioral segmentation can inform more tailored and culturally sensitive public health interventions in diverse settings such as South Tyrol. Full article
(This article belongs to the Special Issue The Impact of Psychosocial Factors on Health Behaviors)
Show Figures

Figure 1

20 pages, 3197 KiB  
Article
Research on Intrusion Detection Method Based on Transformer and CNN-BiLSTM in Internet of Things
by Chunhui Zhang, Jian Li, Naile Wang and Dejun Zhang
Sensors 2025, 25(9), 2725; https://doi.org/10.3390/s25092725 - 25 Apr 2025
Viewed by 1239
Abstract
With the widespread deployment of Internet of Things (IoT) devices, their complex network environments and open communication modes have made them prime targets for cyberattacks. Traditional Intrusion Detection Systems (IDS) face challenges in handling complex attack types, data imbalance, and feature extraction difficulties [...] Read more.
With the widespread deployment of Internet of Things (IoT) devices, their complex network environments and open communication modes have made them prime targets for cyberattacks. Traditional Intrusion Detection Systems (IDS) face challenges in handling complex attack types, data imbalance, and feature extraction difficulties in IoT environments. Accurately detecting abnormal traffic in IoT has become increasingly critical. To address the limitation of single models in comprehensively capturing the diverse features of IoT traffic, this paper proposes a hybrid model based on CNN-BiLSTM-Transformer, which better handles complex features and long-sequence dependencies in intrusion detection. To address the issue of data class imbalance, the Borderline-SMOTE method is introduced to enhance the model’s ability to recognize minority class attack samples. To tackle the problem of redundant features in the original dataset, a comprehensive feature selection strategy combining XGBoost, Chi-square (Chi2), and Mutual Information is adopted to ensure the model focuses on the most discriminative features. Experimental validation demonstrates that the proposed method achieves 99.80% accuracy on the CIC-IDS 2017 dataset and 97.95% accuracy on the BoT-IoT dataset, significantly outperforming traditional intrusion detection methods, proving its efficiency and accuracy in detecting abnormal traffic in IoT environments. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

12 pages, 2609 KiB  
Article
VEGF-C and Lymphatic Vessel Density in Tumor Tissue of Gastric Cancer: Correlations with Pathoclinical Features and Prognosis
by Mariusz Szajewski, Maciej Ciesielski, Rafał Pęksa, Piotr Kurek, Michał Stańczak, Jakub Walczak, Jacek Zieliński and Wiesław Janusz Kruszewski
Cancers 2025, 17(9), 1406; https://doi.org/10.3390/cancers17091406 - 23 Apr 2025
Viewed by 524
Abstract
Objectives: The objective of this study was to assess the relationship of VEGF-C and LVD with pathoclinical factors of potential prognostic value and with the survival time of gastric cancer patients. Materials and methods: A total of 103 radically operated patients for gastric [...] Read more.
Objectives: The objective of this study was to assess the relationship of VEGF-C and LVD with pathoclinical factors of potential prognostic value and with the survival time of gastric cancer patients. Materials and methods: A total of 103 radically operated patients for gastric cancer who did not undergo neoadjuvant therapy were included in this study. The minimum follow-up period after surgery was 61 months. VEGF-C and lymphatic vessels were immunohistochemically determined using antibodies, including VEGF-C (c-20) sc 1881-Goat Polyclonal IgG (Santa Cruz Biotechnology) and Podoplanin D2-40 Mouse Monoclonal Antibody (ROCHE). The relationship between VEGF-C expression in gastric adenocarcinoma cells and the density of lymphatic vessels at the periphery of the primary tumor was assessed, along with the relationships of VEGF-C and LVD with selected pathoclinical parameters of gastric cancer and prognosis. Results: VEGF-C overexpression was associated with increased LVD (Mann–Whitney U test, p = 0.03) and the Lauren intestinal type of cancer (Pearson’s chi-square test, p < 0.001). Increased LVD was more often associated with cancers located beyond the cardia (Mann–Whitney U test, p = 0.04). We did not demonstrate an association of VEGF-C or LVD with OS or with prognostic features, such as pT, pN, or pTNM staging. However, in the Lauren intestinal type of cancer, VEGF-C overexpression correlated with shorter OS (log-rank, p = 0.01) and, at the level of p = 0.05 in multivariate analysis, it had an independent negative prognostic value. Conclusions: Peritumoral overexpression of VEGF-C in primary gastric cancer tumors is associated with increased LVD. The Lauren intestinal type of cancer is associated with VEGF-C overexpression. The overexpression of VEGF-C in intestinal-type gastric cancer is associated with worse prognosis. Full article
(This article belongs to the Special Issue Gastric Cancer Surgery: Gastrectomy, Risk, and Related Prognosis)
Show Figures

Figure 1

18 pages, 2629 KiB  
Article
Ensemble Machine Learning Models Utilizing a Hybrid Recursive Feature Elimination (RFE) Technique for Detecting GPS Spoofing Attacks Against Unmanned Aerial Vehicles
by Raghad Al-Syouf, Omar Y. Aljarrah, Raed Bani-Hani and Abdallah Alma’aitah
Sensors 2025, 25(8), 2388; https://doi.org/10.3390/s25082388 - 9 Apr 2025
Viewed by 620
Abstract
The dependency of Unmanned Aerial Vehicles (UAVs), also known as drones, on off-board data, such as control and position data, makes them highly susceptible to serious safety and security threats, including data interceptions, Global Positioning System (GPS) jamming, and spoofing attacks. This indeed [...] Read more.
The dependency of Unmanned Aerial Vehicles (UAVs), also known as drones, on off-board data, such as control and position data, makes them highly susceptible to serious safety and security threats, including data interceptions, Global Positioning System (GPS) jamming, and spoofing attacks. This indeed necessitates the existence of an Intrusion Detection System (IDS) in place to detect potential security threats/intrusions promptly. Recently, machine-learning-based IDSs have gained popularity due to their high performance in detecting known as well as novel cyber-attacks. However, the time and computation efficiencies of ML-based IDSs still present a challenge in the UAV domain. Therefore, this paper proposes a hybrid Recursive Feature Elimination (RFE) technique based on feature importance ranking along with a Spearman Correlation Analysis (SCA). This technique is built on ensemble learning approaches, namely, bagging, boosting, stacking, and voting classifiers, to efficiently detect GPS spoofing attacks. Two benchmark datasets are employed: the GPS spoofing dataset and the UAV location GPS spoofing dataset. The results show that our proposed ensemble models achieved a notable balance between efficacy and efficiency, showing that the bagging classifier achieved the highest accuracy rate of 99.50%. At the same time, the Decision Tree (DT) and the bagging classifiers achieved the lowest processing time of 0.003 s and 0.029 s, respectively, using the GPS spoofing dataset. For the UAV location GPS spoofing dataset, the bagging classifier emerged as the top performer, achieving 99.16% accuracy and 0.002 s processing time compared to other well-known ML models. In addition, the experimental results show that our proposed methodology (RFE) outperformed other well-known ML models built on conventional feature selection techniques for detecting GPS spoofing attacks, such as mutual information gain, correlation matrices, and the chi-square test. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

22 pages, 872 KiB  
Article
Effective ML-Based Android Malware Detection and Categorization
by Areej Alhogail and Rawan Abdulaziz Alharbi
Electronics 2025, 14(8), 1486; https://doi.org/10.3390/electronics14081486 - 8 Apr 2025
Cited by 2 | Viewed by 1435
Abstract
The rapid proliferation of malware poses a significant challenge regarding digital security, necessitating the development of advanced techniques for malware detection and categorization. In this study, we investigate Android malware detection and categorization using a two-step machine learning (ML) framework combined with feature [...] Read more.
The rapid proliferation of malware poses a significant challenge regarding digital security, necessitating the development of advanced techniques for malware detection and categorization. In this study, we investigate Android malware detection and categorization using a two-step machine learning (ML) framework combined with feature engineering. The proposed framework first performs binary categorization to detect malware and then applies multi-class categorization to categorize malware into types, such as adware, banking Trojans, SMS malware, and riskware. Feature selection techniques such as chi-squared testing and select-from-model (SFM) were employed to reduce dimensionality and enhance model performance. Various ML classifiers were evaluated, and the proposed model achieved outstanding accuracy, at 97.82% for malware detection and 96.09% for malware categorization. The proposed framework outperforms existing approaches, demonstrating the effectiveness of feature engineering and random forest (RF) models in addressing computational efficiency. This research contributes a robust and interpretable framework for Android malware detection that is resource-efficient and practical for use in real-world applications. It also offers a scalable approach via which practitioners can deploy efficient malware detection systems. Future work will focus on real-time implementation and adaptive methodologies to address evolving malware threats. Full article
(This article belongs to the Special Issue Artificial Intelligence in Cyberspace Security)
Show Figures

Figure 1

20 pages, 1688 KiB  
Article
Evaluating Sparse Feature Selection Methods: A Theoretical and Empirical Perspective
by Monica Fira, Liviu Goras and Hariton-Nicolae Costin
Appl. Sci. 2025, 15(7), 3752; https://doi.org/10.3390/app15073752 - 29 Mar 2025
Cited by 2 | Viewed by 998
Abstract
This paper analyzes two main categories of feature selection: filter methods (such as minimum redundancy maximum relevance, CHI2, Kruskal–Wallis, and ANOVA) and embedded methods (such as alternating direction method of multipliers (BP_ADMM), least absolute shrinkage and selection operator, and orthogonal matching pursuit). The [...] Read more.
This paper analyzes two main categories of feature selection: filter methods (such as minimum redundancy maximum relevance, CHI2, Kruskal–Wallis, and ANOVA) and embedded methods (such as alternating direction method of multipliers (BP_ADMM), least absolute shrinkage and selection operator, and orthogonal matching pursuit). The mathematical foundations of feature selection methods inspired by compressed detection are presented, highlighting how the principles of sparse signal recovery can be applied to identify the most relevant features. The results have been obtained using two biomedical databases. The used algorithms have, as their starting point, the notion of sparsity, but the version implemented and tested in this work is adapted for feature selection. The experimental results show that BP_ADMM achieves the highest classification accuracy (77% for arrhythmia_database and 100% for oncological_database), surpassing both the full feature set and the other methods tested in this study, which makes it the optimal feature selection option. The analysis shows that embedded methods strike a balance between accuracy and efficiency by selecting features during the model training, unlike filtering methods, which ignore feature interactions. Although more accurate, embedded methods are slower and depend on the chosen algorithm. Although less comprehensive than wrapper methods, they offer a strong trade-off between speed and performance when computational resources allow for it. Full article
(This article belongs to the Section Applied Biosciences and Bioengineering)
Show Figures

Figure 1

20 pages, 3271 KiB  
Article
Fine-Tuned Machine Learning Classifiers for Diagnosing Parkinson’s Disease Using Vocal Characteristics: A Comparative Analysis
by Mehmet Meral, Ferdi Ozbilgin and Fatih Durmus
Diagnostics 2025, 15(5), 645; https://doi.org/10.3390/diagnostics15050645 - 6 Mar 2025
Viewed by 1288
Abstract
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine [...] Read more.
Background/Objectives: This paper is significant in highlighting the importance of early and precise diagnosis of Parkinson’s Disease (PD) that affects both motor and non-motor functions to achieve better disease control and patient outcomes. This study seeks to assess the effectiveness of machine learning algorithms optimized to classify PD based on vocal characteristics to serve as a non-invasive and easily accessible diagnostic tool. Methods: This study used a publicly available dataset of vocal samples from 188 people with PD and 64 controls. Acoustic features like baseline characteristics, time-frequency components, Mel Frequency Cepstral Coefficients (MFCCs), and wavelet transform-based metrics were extracted and analyzed. The Chi-Square test was used for feature selection to determine the most important attributes that enhanced the accuracy of the classification. Six different machine learning classifiers, namely SVM, k-NN, DT, NN, Ensemble and Stacking models, were developed and optimized via Bayesian Optimization (BO), Grid Search (GS) and Random Search (RS). Accuracy, precision, recall, F1-score and AUC-ROC were used for evaluation. Results: It has been found that Stacking models, especially those fine-tuned via Grid Search, yielded the best performance with 92.07% accuracy and an F1-score of 0.95. In addition to that, the choice of relevant vocal features, in conjunction with the Chi-Square feature selection method, greatly enhanced the computational efficiency and classification performance. Conclusions: This study highlights the potential of combining advanced feature selection techniques with hyperparameter optimization strategies to enhance machine learning-based PD diagnosis using vocal characteristics. Ensemble models proved particularly effective in handling complex datasets, demonstrating robust diagnostic performance. Future research may focus on deep learning approaches and temporal feature integration to further improve diagnostic accuracy and scalability for clinical applications. Full article
Show Figures

Figure 1

Back to TopTop