MDPI - Publisher of Open Access Journals

18 pages, 4305 KiB

Open AccessArticle

Decoding Depression from Different Brain Regions Using Hybrid Machine Learning Methods

by Qi Sang, Chen Chen and Zeguo Shao

Bioengineering 2025, 12(5), 449; https://doi.org/10.3390/bioengineering12050449 - 24 Apr 2025

Viewed by 798

Depression has become one of the most common mental illnesses, causing severe physical and mental harm. To clarify the impact of brain region segmentation on the detection accuracy of moderate-to-severe major depressive disorder (MDD) and identify the optimal brain region for detecting MDD [...] Read more.

Depression has become one of the most common mental illnesses, causing severe physical and mental harm. To clarify the impact of brain region segmentation on the detection accuracy of moderate-to-severe major depressive disorder (MDD) and identify the optimal brain region for detecting MDD using electroencephalography (EEG), this study compared eight traditional single-machine learning algorithms with a hybrid machine learning model based on a stacking ensemble technique. The hybrid model employed K-nearest neighbors (KNN), decision tree (DT), and Extreme Gradient Boosting (XGBoost) as base learners and used a DT as the meta-learner. Compared with traditional single methods, the hybrid approach significantly improved detection accuracy by leveraging the strengths of different algorithms. In addition, this study divided the brain regions into the left and right temporal lobes and extracted both linear and nonlinear features to comprehensively capture the complexity and dynamic behavior of EEG signals, enhancing the model’s ability to distinguish features across different brain regions. The experimental results showed that among the eight traditional machine learning methods, the KNN classifier achieved the highest detection accuracy of 96.97% in the left temporal lobe region. In contrast, the stacking hybrid learning model further increased the detection accuracy to 98.07%, significantly outperforming the single models. Moreover, the analysis of the brain region segmentation revealed that the left temporal lobe exhibited higher discriminative power in detecting MDD, highlighting its important role in the neurobiology of depression. This study provides a solid foundation for developing more efficient and portable methods for detecting depression, offering new perspectives and approaches for EEG-based MDD detection, and contributing to the improvement in objectivity and precision in depression diagnosis. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Imaging, Biosignals and Healthcare)

► Show Figures

Figure 1

19 pages, 2634 KiB

Open AccessArticle

GDSMOTE: A Novel Synthetic Oversampling Method for High-Dimensional Imbalanced Financial Data

by Libin Hu and Yunfeng Zhang

Mathematics 2024, 12(24), 4036; https://doi.org/10.3390/math12244036 - 23 Dec 2024

Viewed by 776

Abstract

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The failure of distance measurement in high-dimensional space, error accumulation caused by noise samples, [...] Read more.

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The failure of distance measurement in high-dimensional space, error accumulation caused by noise samples, and the reduction of recognition accuracy of majority samples caused by the distribution of synthetic samples are the main reasons that limit the performance of current methods. Taking these factors into consideration, a novel synthetic oversampling method is proposed, namely the gradient distribution-based synthetic minority oversampling technique (GDSMOTE). Firstly, the concept of gradient contribution was used to assign the minority-class samples to different gradient intervals instead of relying on the spatial distance. Secondly, the root sample selection strategy of GDSMOTE avoids the error accumulation caused by noise samples and a new concept of nearest neighbor was proposed to determine the auxiliary samples. Finally, a safety gradient distribution approximation strategy based on cosine similarity was designed to determine the number of samples to be synthesized in each safety gradient interval. Experiments on high-dimensional imbalanced financial datasets show that GDSMOTE can achieve a higher F1-Score and MCC metrics than baseline methods while achieving a higher recall score. This means that our method has the characteristics of improving the recognition accuracy of minority-class samples without sacrificing the recognition accuracy of majority-class samples and has good adaptability to data decision-making tasks in the financial field. Full article

(This article belongs to the Special Issue Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics, 2nd Edition)

► Show Figures

Figure 1

40 pages, 44470 KiB

Open AccessArticle

A Decision Support System for Crop Recommendation Using Machine Learning Classification Algorithms

by Murali Krishna Senapaty, Abhishek Ray and Neelamadhab Padhy

Agriculture 2024, 14(8), 1256; https://doi.org/10.3390/agriculture14081256 - 30 Jul 2024

Cited by 31 | Viewed by 8015

Abstract

Today, crop suggestions and necessary guidance have become a regular need for a farmer. Farmers generally depend on their local agriculture officers regarding this, and it may be difficult to obtain the right guidance at the right time. Nowadays, crop datasets are available [...] Read more.

Today, crop suggestions and necessary guidance have become a regular need for a farmer. Farmers generally depend on their local agriculture officers regarding this, and it may be difficult to obtain the right guidance at the right time. Nowadays, crop datasets are available on different websites in the agriculture sector, and they play a crucial role in suggesting suitable crops. So, a decision support system that analyzes the crop dataset using machine learning techniques can assist farmers in making better choices regarding crop selections. The main objective of this research is to provide quick guidance to farmers with more accurate and effective crop recommendations by utilizing machine learning methods, global positioning system coordinates, and crop cloud data. Here, the recommendation can be more personalized, which enables the farmers to predict crops in their specific geographical context, taking into account factors like climate, soil composition, water availability, and local conditions. In this regard, an existing historical crop dataset that contains the state, district, year, area-wise production rate, crop name, and season was collected for 246,091 sample records from the Dataworld website, which holds data on 37 different crops from different areas of India. Also, for better analysis, a dataset was collected from the agriculture offices of the Rayagada, Koraput, and Gajapati districts in Odisha state, India. Both of these datasets were combined and stored using a Firebase cloud service. Thirteen different machine learning algorithms have been applied to the dataset to identify dependencies within the data. To facilitate this process, an Android application was developed using Android Studio (Electric Eel | 2023.1.1) Emulator (Version 32.1.14), Software Development Kit (SDK, Android SDK 33), and Tools. A model has been proposed that implements the SMOTE (Synthetic Minority Oversampling Technique) to balance the dataset, and then it allows for the implementation of 13 different classifiers, such as logistic regression, decision tree (DT), K-Nearest Neighbor (KNN), SVC (Support Vector Classifier), random forest (RF), Gradient Boost (GB), Bagged Tree, extreme gradient boosting (XGB classifier), Ada Boost Classifier, Cat Boost, HGB (Histogram-based Gradient Boosting), SGDC (Stochastic Gradient Descent), and MNB (Multinomial Naive Bayes) on the cloud dataset. It is observed that the performance of the SGDC method is 1.00 in accuracy, precision, recall, F1-score, and ROC AUC (Receiver Operating Characteristics–Area Under the Curve) and is 0.91 in sensitivity and 0.54 in specificity after applying the SMOTE. Overall, SGDC has a better performance compared to all other classifiers implemented in the predictions. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Graphical abstract

25 pages, 6247 KiB

Open AccessArticle

Comparison of Machine Learning Models to Predict Lake Area in an Arid Area

by Di Wang, Zailin Huo, Ping Miao and Xiaoqiang Tian

Remote Sens. 2023, 15(17), 4153; https://doi.org/10.3390/rs15174153 - 24 Aug 2023

Cited by 5 | Viewed by 2383

Abstract

Machine learning (ML)-based models are popular for complex physical system simulation and prediction. Lake is the important indicator in arid and semi-arid areas, and to achieve the proper management of the water resources in a lake basin, it is crucial to estimate and [...] Read more.

Machine learning (ML)-based models are popular for complex physical system simulation and prediction. Lake is the important indicator in arid and semi-arid areas, and to achieve the proper management of the water resources in a lake basin, it is crucial to estimate and predict the lake dynamics, based on hydro-meteorological variations and anthropogenic disturbances. This task is particularly challenging in arid and semi-arid regions, where water scarcity poses a significant threat to human life. In this study, a typical arid area of China was selected as the study area, and the performances of eight widely used ML models (i.e., Bayesian Ridge (BR), K-Nearest Neighbor (KNN), Gradient Boosting Decision Tree (GBDT), Extra Trees (ET), Random Forest (RF), Adaptive Boosting (AB), Bootstrap aggregating (Bagging), eXtreme Gradient Boosting (XGB)) were evaluated in predicting lake area. Monthly lake area was determined by meteorological (precipitation, air temperature, Standardised Precipitation Evapotranspiration Index (SPEI)) and anthropogenic factors (ET_c, NDVI, LUCC). Lake area determined by Landsat satellite image classification for 2000–2020 was analysed side-by-side with the Standardised Precipitation Evapotranspiration Index (SPEI) on 9 and 12-month time scales. With the evaluation of six input variables and eight ML algorithms, it was found that the RF models performed best when using the SPEI-9 index, with R² = 0.88, RMSE = 1.37, LCCC = 0.95, and PRD = 1331.4 for the test samples. Furthermore, the performance of the ML model constructed with the 9-month time scale SPEI (SPEI-9) as an input variable (ML_SPEI-9) depended on seasonal variations, with the average relative errors of up to 0.62 in spring and a minimum of 0.12 in summer. Overall, this study provides valuable insights into the effectiveness of different ML models for predicting lake area by demonstrating that the right inputs can lead to a remarkable increase in performance of up to 13.89%. These findings have important implications for future research on lake area prediction in arid zones and demonstrate the power of ML models in advancing scientific understanding of complex natural systems. Full article

(This article belongs to the Special Issue Advances in Remote Sensing of Ecohydrology)

► Show Figures

Figure 1

12 pages, 801 KiB

Open AccessArticle

Constructing the Schizophrenia Recognition Method Employing GLCM Features from Multiple Brain Regions and Machine Learning Techniques

by Şerife Gengeç Benli and Merve Andaç

Diagnostics 2023, 13(13), 2140; https://doi.org/10.3390/diagnostics13132140 - 22 Jun 2023

Cited by 7 | Viewed by 2404

Abstract

Accurately diagnosing schizophrenia, a complex psychiatric disorder, is crucial for effectively managing the treatment process and methods. Various types of magnetic resonance (MR) images have the potential to serve as biomarkers for schizophrenia. The aim of this study is to numerically analyze differences [...] Read more.

Accurately diagnosing schizophrenia, a complex psychiatric disorder, is crucial for effectively managing the treatment process and methods. Various types of magnetic resonance (MR) images have the potential to serve as biomarkers for schizophrenia. The aim of this study is to numerically analyze differences in the textural characteristics that may occur in the bilateral amygdala, caudate, pallidum, putamen, and thalamus regions of the brain between individuals with schizophrenia and healthy controls via structural MR images. Towards this aim, Gray Level Co-occurence Matrix (GLCM) features obtained from five regions of the right, left, and bilateral brain were classified using machine learning methods. In addition, it was analyzed in which hemisphere these features were more distinctive and which method among Adaboost, Gradient Boost, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Naive Bayes had higher classification success. When the results were examined, it was demonstrated that the GLCM features of these five regions in the left hemisphere could be classified as having higher performance in schizophrenia compared to healthy individuals. Using the LDA algorithm, classification success was achieved with a 100% AUC, 94.4% accuracy, 92.31% sensitivity, 100% specificity, and an F1 score of 91.9% in healthy and schizophrenic individuals. Thus, it has been revealed that the textural characteristics of the five predetermined regions, instead of the whole brain, are an important indicator in identifying schizophrenia. Full article

(This article belongs to the Special Issue Application of Deep Learning in the Diagnosis of Brain Diseases)

► Show Figures

Figure 1

19 pages, 6257 KiB

Open AccessArticle

Application of Machine Learning Models to Bridge Afflux Estimation

by Reza Piraei, Majid Niazkar, Seied Hosein Afzali and Andrea Menapace

Water 2023, 15(12), 2187; https://doi.org/10.3390/w15122187 - 10 Jun 2023

Cited by 11 | Viewed by 6459

Abstract

Bridges are essential structures that connect riverbanks and facilitate transportation. However, bridge piers and abutments can disrupt the natural flow of rivers, causing a rise in water levels upstream of the bridge. The rise in water levels, known as bridge backwater or afflux, [...] Read more.

Bridges are essential structures that connect riverbanks and facilitate transportation. However, bridge piers and abutments can disrupt the natural flow of rivers, causing a rise in water levels upstream of the bridge. The rise in water levels, known as bridge backwater or afflux, can threaten the stability or service of bridges and riverbanks. It is postulated that applications of estimation models with more precise afflux predictions can enhance the safety of bridges in flood-prone areas. In this study, eight machine learning (ML) models were developed to estimate bridge afflux utilizing 202 laboratory and 66 field data. The ML models consist of Support Vector Regression (SVR), Decision Tree Regressor (DTR), Random Forest Regressor (RFR), AdaBoost Regressor (ABR), Gradient Boost Regressor (GBR), eXtreme Gradient Boosting (XGBoost) for Regression (XGBR), Gaussian Process Regression (GPR), and K-Nearest Neighbors (KNN). To the best of the authors’ knowledge, this is the first time that these ML models have been applied to estimate bridge afflux. The performance of ML-based models was compared with those of artificial neural networks (ANN), genetic programming (GP), and explicit equations adopted from previous studies. The results show that most of the ML models utilized in this study can significantly enhance the accuracy of bridge afflux estimations. Nevertheless, a few ML models, like SVR and ABR, did not show a good overall performance, suggesting that the right choice of an ML model is important. Full article

(This article belongs to the Special Issue Applications of XGBoost to Water Resource Problems)

► Show Figures

Figure 1

17 pages, 1653 KiB

Open AccessArticle

Analysis of the Performance of Machine Learning Models in Predicting the Severity Level of Large-Truck Crashes

by Jinli Liu, Yi Qi, Jueqiang Tao and Tao Tao

Future Transp. 2022, 2(4), 939-955; https://doi.org/10.3390/futuretransp2040052 - 16 Nov 2022

Cited by 3 | Viewed by 2083

Abstract

Large-truck crashes often result in substantial economic and social costs. Accurate prediction of the severity level of a reported truck crash can help rescue teams and emergency medical services take the right actions and provide proper medical care, thereby reducing its economic and [...] Read more.

Large-truck crashes often result in substantial economic and social costs. Accurate prediction of the severity level of a reported truck crash can help rescue teams and emergency medical services take the right actions and provide proper medical care, thereby reducing its economic and social costs. This study aims to investigate the modeling issues in using machine learning methods for predicting the severity level of large-truck crashes. To this end, six representative machine learning (ML) methods, including four classification tree-based ML models, specifically the Extreme Gradient Boosting tree (XGBoost), the Adaptive Boosting tree (AdaBoost), Random Forest (RF), and the Gradient Boost Decision Tree (GBDT), and two non-tree-based ML models, specifically Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN), were selected for predicting the severity level of large-truck crashes. The accuracy levels of these six methods were compared and the effects of data-balancing techniques in model prediction performance were also tested using three different resampling techniques: Undersampling, oversampling, and mix sampling. The results indicated that better prediction performances were obtained using the dataset with a similar distribution to the original sample population instead of using the datasets with a balanced sample population. Regarding the prediction performance, the tree-based ML models outperform the non-tree-based ML models and the GBDT model performed best among all of the six models. Full article

► Show Figures

Figure 1

18 pages, 623 KiB

Open AccessArticle

Evaluation of Machine Learning Algorithm on Drinking Water Quality for Better Sustainability

by Sanaa Kaddoura

Sustainability 2022, 14(18), 11478; https://doi.org/10.3390/su141811478 - 13 Sep 2022

Cited by 49 | Viewed by 5528

Abstract

Water has become intricately linked to the United Nations’ sixteen sustainable development goals. Access to clean drinking water is crucial for health, a fundamental human right, and a component of successful health protection policies. Clean water is a significant health and development issue [...] Read more.

Water has become intricately linked to the United Nations’ sixteen sustainable development goals. Access to clean drinking water is crucial for health, a fundamental human right, and a component of successful health protection policies. Clean water is a significant health and development issue on a national, regional, and local level. Investments in water supply and sanitation have been shown to produce a net economic advantage in some areas because they reduce adverse health effects and medical expenses more than they cost to implement. However, numerous pollutants are affecting the quality of drinking water. This study evaluates the efficiency of using machine learning (ML) techniques in order to predict the quality of water. Thus, in this paper, a machine learning classifier model is built to predict the quality of water using a real dataset. First, significant features are selected. In the case of the used dataset, all measured characteristics are chosen. Data are split into training and testing subsets. A set of existing ML algorithms is applied, and the results are compared in terms of precision, recall, F1 score, and ROC curve. The results show that support vector machine and k-nearest neighbor are better according to F1-score and ROC AUC values. However, The LASSO LARS and stochastic gradient descent are better based on recall values. Full article

(This article belongs to the Section Environmental Sustainability and Applications)

► Show Figures

Figure 1

12 pages, 2432 KiB

Open AccessArticle

Significance of Meteorological Feature Selection and Seasonal Variation on Performance and Calibration of a Low-Cost Particle Sensor

by Vikas Kumar, Vasudev Malyan and Manoranjan Sahu

Atmosphere 2022, 13(4), 587; https://doi.org/10.3390/atmos13040587 - 6 Apr 2022

Cited by 5 | Viewed by 3342

Abstract

Poor air quality is a major environmental concern worldwide, but people living in low- and middle-income countries are disproportionately affected. Measurement of PM_2.5 is essential for establishing regulatory standards and developing policy frameworks. Low-cost sensors (LCS) can construct a high spatiotemporal resolution [...] Read more.

Poor air quality is a major environmental concern worldwide, but people living in low- and middle-income countries are disproportionately affected. Measurement of PM_2.5 is essential for establishing regulatory standards and developing policy frameworks. Low-cost sensors (LCS) can construct a high spatiotemporal resolution PM_2.5 network, but the calibration dependencies and subject to biases of LCS due to variable meteorological parameters limit their deployment for air-quality measurements. This study used data collected from June 2019 to April 2021 from a PurpleAir Monitor and Met One Instruments’ Model BAM 1020 as a reference instrument at Alberta, Canada. The objective of this study is to identify the relevant meteorological parameters for each season that significantly affect the performance of LCS. The meteorological features considered are relative humidity (RH), temperature (T), wind speed (WS) and wind direction (WD). This study applied Multiple Linear Regression (MLR), k-Nearest Neighbor (kNN), Random Forest (RF) and Gradient Boosting (GB) models with varying features in a stepwise manner across all the seasons, and only the best results are presented in this study. Improvement in the performance of calibration models is observed by incorporating different features for different seasons. The best performance is achieved when RF is applied but with different features for different seasons. The significant meteorological features are PM_{2.5_LCS} in Summer, PM_{2.5_LCS}, RH and T in Autumn, PM_{2.5_LCS}, T and WS in Winter and PM_{2.5_LCS}, RH, T and WS in Spring. The improvement in R² for each season (values in parentheses) is Summer (0.66–0.94), Autumn (0.73–0.96), Winter (0.70–0.95) and Spring (0.70–0.94). This study signifies selecting the right combination of models and features to attain the best results for LCS calibration. Full article

(This article belongs to the Special Issue The Role of Low-Cost Air Pollution Sensors in Urban Air Quality, Source Apportionment, and Health Exposure)

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI