MDPI - Publisher of Open Access Journals

39 pages, 4295 KiB

Open AccessArticle

Evaluation of Smart Building Integration into a Smart City by Applying Machine Learning Techniques

by Mustafa Muthanna Najm Shahrabani and Rasa Apanaviciene

Buildings 2025, 15(12), 2031; https://doi.org/10.3390/buildings15122031 - 12 Jun 2025

Viewed by 608

Smart buildings’ role is crucial for advancing smart cities’ performance in achieving environmental sustainability, resiliency, and efficiency. The integration barriers continue due to technology, infrastructure, and operations misalignments and are escalated due to inadequate assessment frameworks and classification systems. The existing literature on [...] Read more.

Smart buildings’ role is crucial for advancing smart cities’ performance in achieving environmental sustainability, resiliency, and efficiency. The integration barriers continue due to technology, infrastructure, and operations misalignments and are escalated due to inadequate assessment frameworks and classification systems. The existing literature on assessment methodologies reveals diverging evaluation frameworks for smart buildings and smart cities, non-uniform metrics and taxonomies that hinder scalability, and the low use of machine learning in predictive integration modelling. To fill these gaps, this paper introduces a novel machine learning model to predict smart building integration into smart city levels and assess their impact on smart city performance by leveraging data from 147 smart buildings in 13 regions. Six optimised machine learning algorithms (K-Nearest Neighbours (KNNs), Support Vector Regression (SVR), Random Forest, Adaptive Boosting (AdaBoost), Decision Tree (DT), and Extra Tree (ET)) were employed to train the model and perform feature engineering and permutation importance analysis. The SVR-trained model substantially outperformed other models, achieving an R-squared of 0.81, Root Mean Square Error (RMSE) of 0.33 and Mean Absolute Error (MAE) of 0.27, enabling precise integration prediction. Case studies revealed that low-integration buildings gain significant benefits from progressive target upgrades, whilst those buildings that have already implemented some integrated systems tend to experience diminishing marginal benefits with further, potentially disruptive upgrades. The conclusion of this study states that by utilising the developed machine learning model, owners and policymakers are capable of significantly improving the integration of smart buildings to build better, more sustainable, and resilient urban environments. Full article

(This article belongs to the Section Construction Management, and Computers & Digitization)

► Show Figures

Figure 1

19 pages, 2079 KiB

Open AccessArticle

Evaluation of Feature Selection and Regression Models to Predict Biomass of Sweet Basil by Using Drone and Satellite Imagery

by Luana Centorame, Nicolò La Porta, Michela Papandrea, Adriano Mancini and Ester Foppa Pedretti

Appl. Sci. 2025, 15(11), 6227; https://doi.org/10.3390/app15116227 - 31 May 2025

Viewed by 927

Abstract

The integration of precision agriculture technologies, such as remote sensing through drones and satellites, has significantly enhanced real-time crop monitoring. This study is among the first to combine multispectral data from both a drone equipped with Altum-PT camera and PlanetScope satellite imagery to [...] Read more.

The integration of precision agriculture technologies, such as remote sensing through drones and satellites, has significantly enhanced real-time crop monitoring. This study is among the first to combine multispectral data from both a drone equipped with Altum-PT camera and PlanetScope satellite imagery to predict fresh biomass in sweet basil grown in an open field, demonstrating the added value of integrating different spatial scales. A dataset of 40 sampling points was built by combining remote sensing data with field measurements, and seven vegetation indices were calculated for each point. Feature selection was performed using three different methods (F-score, Recursive Feature Elimination, and model-based selection), and the most informative features were then processed through Principal Component Analysis. Eight regression models were trained and evaluated using leave-one-out cross-validation. The best-performing models were Random Forest (R² = 0.96 in training, R² = 0.65 in testing) and k-Nearest Neighbours (R² = 0.74 in training, R² = 0.94 in testing), with kNN demonstrating superior generalization capability on unseen data. These findings highlight the potential of combining drone and satellite imagery for modelling basil agronomic traits, offering valuable insights for optimizing crop management strategies. Full article

(This article belongs to the Special Issue Applications of Image Processing Technology in Agriculture)

► Show Figures

Figure 1

9 pages, 1886 KiB

Open AccessProceeding Paper

Modeling the Quantitative Structure–Activity Relationships of 1,2,4-Triazolo[1,5-a]pyrimidin-7-amine Analogs in the Inhibition of Plasmodium falciparum

by Inalegwu S. Apeh, Thecla O. Ayoka, Charles O. Nnadi and Wilfred O. Obonga

Eng. Proc. 2025, 87(1), 52; https://doi.org/10.3390/engproc2025087052 - 21 Apr 2025

Viewed by 695

Abstract

Triazolopyrimidine and its analogs represent an important scaffold in medicinal chemistry research. The heterocycle of 1,2,4-triazolo[1,5-a] pyrimidine (1,2,4-TAP) serves as a bioisostere candidate for purine scaffolds, N-acetylated lysine, and carboxylic acid. This study modeled the quantitative structure–activity relationship (QSAR) of 125 congeners of [...] Read more.

Triazolopyrimidine and its analogs represent an important scaffold in medicinal chemistry research. The heterocycle of 1,2,4-triazolo[1,5-a] pyrimidine (1,2,4-TAP) serves as a bioisostere candidate for purine scaffolds, N-acetylated lysine, and carboxylic acid. This study modeled the quantitative structure–activity relationship (QSAR) of 125 congeners of 1,2,4-TAP from the ChEMBL database in the inhibition of Plasmodium falciparum using six machine learning algorithms. The most significant features among 306 molecular descriptors, including one molecular outlier, were selected using recursive feature elimination. A ratio of 20% was used to split the x- and y-matrices into 99 training and 24 test compounds. The regression models were built using machine learning sci-kit-learn algorithms (multiple linear regression (MLR), k-nearest neighbours (kNN), support vector regressor (SVR), random forest regressor (RFR) RIDGE regression, and LASSO). Model performance was evaluated using the coefficient of determination (R²), mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), p-values, F-statistic, and variance inflation factor (VIF). Five significant variables were considered in constructing the model (p < 0.05) with the following regression equation: pIC₅₀ = 5.90 − 0.71npr1 − 1.52pmi3 + 0.88slogP − 0.57vsurf-CW2 + 1.11vsurf-W2. On five-fold cross-validation, three algorithms—kNN (MSE = 0.46, R² = 0.54, MAE = 0.54, RMSE = 0.68), SVR (MSE = 0.33, R² = 0.67, MAE = 0.46, RMSE = 0.57), and RFR (MSE = 0.43, R² = 0.58, MAE = 0.51, RMSE = 0.66)—showed strong robustness, efficiency, and reliability in predicting the pIC₅₀ of 1,2,4-triazolo[1,5-a]pyrimidine. The models provided useful data on the functionalities necessary for developing more potent 1,2,4-TAP analogs as anti-malarial agents. Full article

(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)

► Show Figures

Figure 1

26 pages, 5018 KiB

Open AccessArticle

Data-Driven Pavement Performance: Machine Learning-Based Predictive Models

by Mohammad Fahad and Nurullah Bektas

Appl. Sci. 2025, 15(7), 3889; https://doi.org/10.3390/app15073889 - 2 Apr 2025

Cited by 2 | Viewed by 1229

Abstract

Traditional methods for predicting pavement performance rely on complex finite element modelling and empirical equations, which are computationally expensive and time-consuming. However, machine learning models offer a time-efficient solution for predicting pavement performance. This study utilizes a range of machine learning algorithms, including [...] Read more.

Traditional methods for predicting pavement performance rely on complex finite element modelling and empirical equations, which are computationally expensive and time-consuming. However, machine learning models offer a time-efficient solution for predicting pavement performance. This study utilizes a range of machine learning algorithms, including linear regression, decision tree, random forest, gradient boosting, K-nearest neighbour, Support Vector Regression, LightGBM and CatBoost, to analyse their effectiveness in predicting pavement performance. The input variables include axle load, truck load, traffic speed, lateral wander modes, asphalt layer thickness, traffic lane width and tire types, while the output variables consist of number of passes to fatigue damage, number of passes to rutting damage, fatigue life reduction in number of years and rut depth at 1.3 million passes. A k-fold cross-validation technique was employed to optimize hyperparameters. Results indicate that LightGBM and CatBoost outperform other models, achieving the lowest mean squared error and highest R² values. In contrast, linear regression and KNN demonstrated the lowest performance, with MSE values up to 188% higher than CatBoost. This study concludes that integrating machine learning with finite element analysis provides further improvements in pavement performance predictions. Full article

(This article belongs to the Section Civil Engineering)

► Show Figures

Figure 1

12 pages, 634 KiB

Open AccessArticle

Post-COVID-19 Condition Prediction in Hospitalised Cancer Patients: A Machine Learning-Based Approach

by Sara Mahvash Mohammadi, Mikhail Rumyantsev, Elina Abdeeva, Dina Baimukhambetova, Polina Bobkova, Yasmin El-Taravi, Maria Pikuza, Anastasia Trefilova, Aleksandr Zolotarev, Margarita Andreeva, Ekaterina Iakovleva, Nikolay Bulanov, Sergey Avdeev, Ekaterina Pazukhina, Alexey Zaikin, Valentina Kapustina, Victor Fomin, Andrey A. Svistunov, Peter Timashev, Nina Avdeenko, Yulia Ivanova, Lyudmila Fedorova, Elena Kondrikova, Irina Turina, Petr Glybochko, Denis Butnaru, Oleg Blyuss, Daniel Munblit and Sechenov StopCOVID Research Team Show full author list Hide full author list

Cancers 2025, 17(4), 687; https://doi.org/10.3390/cancers17040687 - 18 Feb 2025

Viewed by 1171

Abstract

Background: The COVID-19 pandemic has led to widespread long-term complications, known as post-COVID conditions (PCC), particularly affecting vulnerable populations such as cancer patients. This study aims to predict the incidence of PCC in hospitalised cancer patients using the data from a longitudinal cohort [...] Read more.

Background: The COVID-19 pandemic has led to widespread long-term complications, known as post-COVID conditions (PCC), particularly affecting vulnerable populations such as cancer patients. This study aims to predict the incidence of PCC in hospitalised cancer patients using the data from a longitudinal cohort study conducted in four major university hospitals in Moscow, Russia. Methods: Clinical data have been collected during the acute phase and follow-ups at 6 and 12 months post-discharge. A total of 49 clinical features were evaluated, and machine learning classifiers including logistic regression, random forest, support vector machine (SVM), k-nearest neighbours (KNN), and neural network were applied to predict PCC. Results: Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. KNN demonstrated the highest predictive performance, with an AUC of 0.80, sensitivity of 0.73, and specificity of 0.69. Severe COVID-19 and pre-existing comorbidities were significant predictors of PCC. Conclusions: Machine learning models, particularly KNN, showed some promise in predicting PCC in cancer patients, offering the potential for early intervention and personalised care. These findings emphasise the importance of long-term monitoring for cancer patients recovering from COVID-19 to mitigate PCC impact. Full article

(This article belongs to the Collection The Impact of COVID-19 Infection in Cancer)

► Show Figures

Figure 1

27 pages, 4232 KiB

Open AccessArticle

Data-Driven Machine-Learning-Based Seismic Response Prediction and Damage Classification for an Unreinforced Masonry Building

by Nagavinothini Ravichandran, Butsawan Bidorn, Oya Mercan and Balamurugan Paneerselvam

Appl. Sci. 2025, 15(4), 1686; https://doi.org/10.3390/app15041686 - 7 Feb 2025

Cited by 1 | Viewed by 2073

Abstract

Unreinforced masonry buildings are highly vulnerable to earthquake damage due to their limited ability to withstand lateral loads, compared to other structures. Therefore, a detailed assessment of the seismic response and resultant damage associated with such buildings becomes necessary. The present study employs [...] Read more.

Unreinforced masonry buildings are highly vulnerable to earthquake damage due to their limited ability to withstand lateral loads, compared to other structures. Therefore, a detailed assessment of the seismic response and resultant damage associated with such buildings becomes necessary. The present study employs machine learning models to effectively predict the seismic response and classify the damage level for a benchmark unreinforced masonry building. In this regard, eight regression-based models, namely, Linear Regression (LR), Stepwise Linear Regression (SLR), Ridge Regression (RR), Support Vector Machine (SVM), Gaussian Process Regression (GPR), Decision Tree (DT), Random Forest (RF), and Neural Networks (NN), were used to predict the building’s responses. Additionally, eight classification-based models, namely, Naïve Bayes (NB), Discriminant Analysis (DA), K-Nearest Neighbours (KNN), Adaptive Boosting (AB), DT, RF, SVM, and NN, were explored for the purpose of categorizing the damage states of the building. The material properties of the masonry and the earthquake intensity were considered as the input parameters. The results from the regression models indicate that the GPR model efficiently predicts the seismic response with larger coefficients of determination and smaller root mean square error values than other models. Among the classification-based models, the RF, AB, and NN models effectively classify the damage states with accuracy levels of 92.9%, 91.1%, and 92.6%, respectively. In conclusion, the overall performance of the non-parametric models, such as GPR, NN, and RF, was found to be better than that of the parametric models. Full article

(This article belongs to the Special Issue Structural Seismic Design and Evaluation)

► Show Figures

Figure 1

36 pages, 12469 KiB

Open AccessArticle

Advancing Iron Ore Grade Estimation: A Comparative Study of Machine Learning and Ordinary Kriging

by Mujigela Maniteja, Gopinath Samanta, Angesom Gebretsadik, Ntshiri Batlile Tsae, Sheo Shankar Rai, Yewuhalashet Fissha, Natsuo Okada and Youhei Kawamura

Minerals 2025, 15(2), 131; https://doi.org/10.3390/min15020131 - 29 Jan 2025

Cited by 3 | Viewed by 2321

Abstract

Mineral grade estimation is a vital phase in mine planning and design, as well as in the mining project’s economic assessment. In mining, commonly accepted methods of ore grade estimation include geometrical approaches and geostatistical techniques such as kriging, which effectively capture the [...] Read more.

Mineral grade estimation is a vital phase in mine planning and design, as well as in the mining project’s economic assessment. In mining, commonly accepted methods of ore grade estimation include geometrical approaches and geostatistical techniques such as kriging, which effectively capture the spatial grade variation within a deposit. The application of machine-learning (ML) techniques has been explored in the estimation of mineral resources, where complex correlations need to be captured. In this paper, the authors developed four machine-learning regression models, i.e., support vector regression (SVR), random forest regression (RFR), k-nearest neighbour (KNN) regression, and extreme gradient boost (XGBoost) regression, using a geological database to predict the grade in an Indian iron ore deposit. When compared with ordinary kriging (R² = 0.74; RMSE = 2.09), the RFR (R² = 0.74; RMSE = 2.06), XGBoost (R² = 0.73; RMSE = 2.12), and KNN (R² = 0.73; RMSE = 2.11) regression models produced similar results. The block model predictions generated using the RFR, XGBoost, and KNN models show comparable accuracy and spatial trends to those of ordinary kriging, whereas SVR was identified as less effective. When integrated with geological methods, these models demonstrate significant potential for enhancing and optimizing mine planning and design processes in similar iron ore deposits. Full article

(This article belongs to the Section Mineral Exploration Methods and Applications)

► Show Figures

Figure 1

46 pages, 17123 KiB

Open AccessArticle

Predicting the Effect of RSW Parameters on the Shear Force and Nugget Diameter of Similar and Dissimilar Joints Using Machine Learning Algorithms and Multilayer Perceptron

by Marwan T. Mezher, Alejandro Pereira and Tomasz Trzepieciński

Materials 2024, 17(24), 6250; https://doi.org/10.3390/ma17246250 - 20 Dec 2024

Cited by 1 | Viewed by 1505

Abstract

Resistance spot-welded joints are crucial parts in contemporary manufacturing technology due to their ubiquitous use in the automobile industry. The necessity of improving manufacturing efficiency and quality at an affordable cost requires deep knowledge of the resistance spot welding (RSW) process and the [...] Read more.

Resistance spot-welded joints are crucial parts in contemporary manufacturing technology due to their ubiquitous use in the automobile industry. The necessity of improving manufacturing efficiency and quality at an affordable cost requires deep knowledge of the resistance spot welding (RSW) process and the development of artificial neural network (ANN)- and machine learning (ML)-based modelling techniques, apt for providing essential tools for design, planning, and incorporation in the welding process. Tensile shear force and nugget diameter are the most crucial outputs for evaluating the quality of a resistance spot-welded specimen. This study uses ML and ANN models to predict shear force and nugget diameter responses to RSW parameters. The RSW analysis was executed on similar and dissimilar AISI 304 and grade 2 titanium alloy joints with equal and unequal thicknesses. The input parameters included welding current, pressure, welding duration, squeezing time, holding time, pulse welding, and sheet thickness. Linear regression, Decision tree, Support vector machine (SVM), Random forest (RF), Gradient-boosting, CatBoost, K-Nearest Neighbour (KNN), Ridge, Lasso, and ElasticNet machine learning algorithms, along with two different structures of Multilayer Perceptron, were utilized for studying the impact of the RSW parameters on the shear force and nugget diameter. Different validation metrics were applied to assess each model’s quality. Two equations were developed to determine the shear force and nugget diameter based on the investigation parameters. The current research also presents a prediction of the Relative Importance (RI) of RSW factors. Shear force and nugget diameter predictions were examined using SHapley (SHAP) Additive Explanations for the first time in the RSW field. Trainbr as the training function and Logsig as the transfer function delivered the best ANN model for predicting shear force in a one-output structure. Trainrp with Tansig made the most accurate predictions for nugget diameter in a one-output structure and for shear force and diameter in a two-output structure. Depending on validation metrics, the Random forest model outperformed the other ML algorithms in predicting shear force or nugget diameter in a one-output model, while the Decision tree model gave the best prediction using a two-output structure. Linear regression made the worst ML predictions for shear force, while ElasticNet made the worst nugget diameter forecasts in a one-output model. However, in two-output models, Lasso made the worst predictions. Full article

(This article belongs to the Section Metals and Alloys)

► Show Figures

Figure 1

13 pages, 1062 KiB

Open AccessArticle

Real-Time Computing Strategies for Automatic Detection of EEG Seizures in ICU

by Laura López-Viñas, Jose L. Ayala and Francisco Javier Pardo Moreno

Appl. Sci. 2024, 14(24), 11616; https://doi.org/10.3390/app142411616 - 12 Dec 2024

Viewed by 4515

Abstract

Developing interfaces for seizure diagnosis, often challenging to detect visually, is rising. However, their effectiveness is constrained by the need for diverse and extensive databases. This study aimed to create a seizure detection methodology incorporating detailed information from each EEG channel and accounts [...] Read more.

Developing interfaces for seizure diagnosis, often challenging to detect visually, is rising. However, their effectiveness is constrained by the need for diverse and extensive databases. This study aimed to create a seizure detection methodology incorporating detailed information from each EEG channel and accounts for frequency band variations linked to the primary brain pathology leading to ICU admission, enhancing our ability to identify epilepsy onset. This study involved 460 video-electroencephalography recordings from 71 patients under monitoring. We applied signal preprocessing and conducted a numerical quantitative analysis in the frequency domain. Various machine learning algorithms were assessed for their efficacy. The k-nearest neighbours (KNN) model was the most effective in our overall sample, achieving an average F1 score of 0.76. For specific subgroups, different models showed superior performance: Decision Tree for ‘Epilepsy’ (average F1 score of 0.80) and ‘Craniencephalic Trauma’ (average F1 score of 0.84), Random Forest for ‘Cardiorespiratory Arrest’ (average F1 score of 0.89) and ‘Brain Haemorrhage’ (average F1 score of 0.84). In the categorisation of seizure types, Linear Discriminant Analysis was most effective for focal seizures (average F1 score of 0.87), KNN for generalised (average F1 score of 0.84) and convulsive seizures (average F1 score of 0.88), and logistic regression for non-convulsive seizures (average F1 score of 0.83). Our study demonstrates the potential of using classifier models based on quantified EEG data for diagnosing seizures in ICU patients. The performance of these models varies significantly depending on the underlying cause of the seizure, highlighting the importance of tailored approaches. The automation of these diagnostic tools could facilitate early seizure detection. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

23 pages, 8533 KiB

Open AccessArticle

Integrating Hyperspectral, Thermal, and Ground Data with Machine Learning Algorithms Enhances the Prediction of Grapevine Yield and Berry Composition

by Shaikh Yassir Yousouf Jewan, Deepak Gautam, Debbie Sparkes, Ajit Singh, Lawal Billa, Alessia Cogato, Erik Murchie and Vinay Pagay

Remote Sens. 2024, 16(23), 4539; https://doi.org/10.3390/rs16234539 - 4 Dec 2024

Viewed by 1664

Abstract

Accurately predicting grapevine yield and quality is critical for optimising vineyard management and ensuring economic viability. Numerous studies have reported the complexity in modelling grapevine yield and quality due to variability in the canopy structure, challenges in incorporating soil and microclimatic factors, and [...] Read more.

Accurately predicting grapevine yield and quality is critical for optimising vineyard management and ensuring economic viability. Numerous studies have reported the complexity in modelling grapevine yield and quality due to variability in the canopy structure, challenges in incorporating soil and microclimatic factors, and management practices throughout the growing season. The use of multimodal data and machine learning (ML) algorithms could overcome these challenges. Our study aimed to assess the potential of multimodal data (hyperspectral vegetation indices (VIs), thermal indices, and canopy state variables) and ML algorithms to predict grapevine yield components and berry composition parameters. The study was conducted during the 2019/20 and 2020/21 grapevine growing seasons in two South Australian vineyards. Hyperspectral and thermal data of the canopy were collected at several growth stages. Simultaneously, grapevine canopy state variables, including the fractional intercepted photosynthetically active radiation (fiPAR), stem water potential (Ψ_stem), leaf chlorophyll content (LCC), and leaf gas exchange, were collected. Yield components were recorded at harvest. Berry composition parameters, such as total soluble solids (TSSs), titratable acidity (TA), pH, and the maturation index (IMAD), were measured at harvest. A total of 24 hyperspectral VIs and 3 thermal indices were derived from the proximal hyperspectral and thermal data. These data, together with the canopy state variable data, were then used as inputs for the modelling. Both linear and non-linear regression models, such as ridge (RR), Bayesian ridge (BRR), random forest (RF), gradient boosting (GB), K-Nearest Neighbour (KNN), and decision trees (DTs), were employed to model grape yield components and berry composition parameters. The results indicated that the GB model consistently outperformed the other models. The GB model had the best performance for the total number of clusters per vine (R² = 0.77; RMSE = 0.56), average cluster weight (R² = 0.93; RMSE = 0.00), average berry weight (R² = 0.95; RMSE = 0.00), cluster weight (R² = 0.95; RMSE = 0.13), and average berries per bunch (R² = 0.93; RMSE = 0.83). For the yield, the RF model performed the best (R² = 0.97; RMSE = 0.55). The GB model performed the best for the TSSs (R² = 0.83; RMSE = 0.34), pH (R² = 0.93; RMSE = 0.02), and IMAD (R² = 0.88; RMSE = 0.19). However, the RF model performed best for the TA (R² = 0.83; RMSE = 0.33). Our results also revealed the top 10 predictor variables for grapevine yield components and quality parameters, namely, the canopy temperature depression, LCC, fiPAR, normalised difference infrared index, Ψ_stem, stomatal conductance (g_s), net photosynthesis (P_n), modified triangular vegetation index, modified red-edge simple ratio, and ANT_gitelson index. These predictors significantly influence the grapevine growth, berry quality, and yield. The identification of these predictors of the grapevine yield and fruit composition can assist growers in improving vineyard management decisions and ultimately increase profitability. Full article

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

► Show Figures

Graphical abstract

25 pages, 9546 KiB

Open AccessArticle

Fusion of UAV-Acquired Visible Images and Multispectral Data by Applying Machine-Learning Methods in Crop Classification

by Zuojun Zheng, Jianghao Yuan, Wei Yao, Paul Kwan, Hongxun Yao, Qingzhi Liu and Leifeng Guo

Agronomy 2024, 14(11), 2670; https://doi.org/10.3390/agronomy14112670 - 13 Nov 2024

Cited by 8 | Viewed by 2172

Abstract

The sustainable development of agriculture is closely related to the adoption of precision agriculture techniques, and accurate crop classification is a fundamental aspect of this approach. This study explores the application of machine learning techniques to crop classification by integrating RGB images and [...] Read more.

The sustainable development of agriculture is closely related to the adoption of precision agriculture techniques, and accurate crop classification is a fundamental aspect of this approach. This study explores the application of machine learning techniques to crop classification by integrating RGB images and multispectral data acquired by UAVs. The study focused on five crops: rice, soybean, red bean, wheat, and corn. To improve classification accuracy, the researchers extracted three key feature sets: band values and vegetation indices, texture features extracted from a grey-scale co-occurrence matrix, and shape features. These features were combined with five machine learning models: random forest (RF), support vector machine (SVM), k-nearest neighbour (KNN) based, classification and regression tree (CART) and artificial neural network (ANN). The results show that the Random Forest model consistently outperforms the other models, with an overall accuracy (OA) of over 97% and a significantly higher Kappa coefficient. Fusion of RGB images and multispectral data improved the accuracy by 1–4% compared to using a single data source. Our feature importance analysis showed that band values and vegetation indices had the greatest impact on classification results. This study provides a comprehensive analysis from feature extraction to model evaluation, identifying the optimal combination of features to improve crop classification and providing valuable insights for advancing precision agriculture through data fusion and machine learning techniques. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

19 pages, 1774 KiB

Open AccessEditor’s ChoiceArticle

Effective Machine Learning Techniques for Dealing with Poor Credit Data

by Dumisani Selby Nkambule, Bhekisipho Twala and Jan Harm Christiaan Pretorius

Risks 2024, 12(11), 172; https://doi.org/10.3390/risks12110172 - 30 Oct 2024

Cited by 2 | Viewed by 1842

Abstract

Credit risk is a crucial component of daily financial services operations; it measures the likelihood that a borrower will default on a loan, incurring an economic loss. By analysing historical data for assessment of the creditworthiness of a borrower, lenders can reduce credit [...] Read more.

Credit risk is a crucial component of daily financial services operations; it measures the likelihood that a borrower will default on a loan, incurring an economic loss. By analysing historical data for assessment of the creditworthiness of a borrower, lenders can reduce credit risk. Data are vital at the core of the credit decision-making processes. Decision-making depends heavily on accurate, complete data, and failure to harness high-quality data would impact credit lenders when assessing the loan applicants’ risk profiles. In this paper, an empirical comparison of the robustness of seven machine learning algorithms to credit risk, namely support vector machines (SVMs), naïve base, decision trees (DT), random forest (RF), gradient boosting (GB), K-nearest neighbour (K-NN), and logistic regression (LR), is carried out using the Lending Club credit data from Kaggle. This task uses seven performance measures, including the F1 Score (recall, accuracy, and precision), ROC-AUC, and HL and MCC metrics. Then, the harnessing of generative adversarial networks (GANs) simulation to enhance the robustness of the single machine learning classifiers for predicting credit risk is proposed. The results show that when GANs imputation is incorporated, the decision tree is the best-performing classifier with an accuracy rate of 93.01%, followed by random forest (92.92%), gradient boosting (92.33%), support vector machine (90.83%), logistic regression (90.76%), and naïve Bayes (89.29%), respectively. The classifier is the worst-performing method with a k-NN (88.68%) accuracy rate. Subsequently, when GANs are optimised, the accuracy rate of the naïve Bayes classifier improves significantly to (90%) accuracy rate. Additionally, the average error rate for these classifiers is over 9%, which implies that the estimates are not far from the actual values. In summary, most individual classifiers are more robust to missing data when GANs are used as an imputation technique. The differences in performance of all seven machine learning algorithms are significant at the 95% level. Full article

(This article belongs to the Special Issue Financial Analysis, Corporate Finance and Risk Management)

► Show Figures

Figure 1

19 pages, 4248 KiB

Open AccessEditor’s ChoiceArticle

Predicting Leukoplakia and Oral Squamous Cell Carcinoma Using Interpretable Machine Learning: A Retrospective Analysis

by Salem Shamsul Alam, Saif Ahmed, Taseef Hasan Farook and James Dudley

Oral 2024, 4(3), 386-404; https://doi.org/10.3390/oral4030032 - 13 Sep 2024

Viewed by 2332

Abstract

Purpose: The purpose of this study is to assess the effectiveness of the best performing interpretable machine learning models in the diagnoses of leukoplakia and oral squamous cell carcinoma (OSCC). Methods: A total of 237 patient cases were analysed that included [...] Read more.

Purpose: The purpose of this study is to assess the effectiveness of the best performing interpretable machine learning models in the diagnoses of leukoplakia and oral squamous cell carcinoma (OSCC). Methods: A total of 237 patient cases were analysed that included information about patient demographics, lesion characteristics, and lifestyle factors, such as age, gender, tobacco use, and lesion size. The dataset was preprocessed and normalised, and then separated into training and testing sets. The following models were tested: K-Nearest Neighbours (KNN), Logistic Regression, Naive Bayes, Support Vector Machine (SVM), and Random Forest. The overall accuracy, Kappa score, class-specific precision, recall, and F1 score were used to assess performance. SHAP (SHapley Additive ExPlanations) was used to interpret the Random Forest model and determine the contribution of each feature to the predictions. Results: The Random Forest model had the best overall accuracy (93%) and Kappa score (0.90). For OSCC, it had a precision of 0.91, a recall of 1.00, and an F1 score of 0.95. The model had a precision of 1.00, recall of 0.78, and F1 score of 0.88 for leukoplakia without dysplasia. The precision for leukoplakia with dysplasia was 0.91, the recall was 1.00, and the F1 score was 0.95. The top three features influencing the prediction of leukoplakia with dysplasia are buccal mucosa localisation, ages greater than 60 years, and larger lesions. For leukoplakia without dysplasia, the key features are gingival localisation, larger lesions, and tongue localisation. In the case of OSCC, gingival localisation, floor-of-mouth localisation, and buccal mucosa localisation are the most influential features. Conclusions: The Random Forest model outperformed the other machine learning models in diagnosing oral cancer and potentially malignant oral lesions with higher accuracy and interpretability. The machine learning models struggled to identify dysplastic changes. Using SHAP improves the understanding of the importance of features, facilitating early diagnosis and possibly reducing mortality rates. The model notably indicated that lesions on the floor of the mouth were highly unlikely to be dysplastic, instead showing one of the highest probabilities for being OSCC. Full article

► Show Figures

Figure 1

18 pages, 343 KiB

Open AccessArticle

Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach

by Olayinka Ogundile, Oluwaseyi Babalola, Afolakemi Ogunbanwo, Olabisi Ogundile and Vipin Balyan

Appl. Sci. 2024, 14(16), 7389; https://doi.org/10.3390/app14167389 - 21 Aug 2024

Cited by 2 | Viewed by 2437

Abstract

In the face of escalating credit card fraud due to the surge in e-commerce activities, effectively distinguishing between legitimate and fraudulent transactions has become increasingly challenging. To address this, various machine learning (ML) techniques have been employed to safeguard cardholders and financial institutions. [...] Read more.

In the face of escalating credit card fraud due to the surge in e-commerce activities, effectively distinguishing between legitimate and fraudulent transactions has become increasingly challenging. To address this, various machine learning (ML) techniques have been employed to safeguard cardholders and financial institutions. This article explores the use of the Ensemble Hidden Markov Model (EHMM) combined with two distinct feature extraction methods: principal component analysis (PCA) and a proposed statistical feature set termed MRE, comprising Mean, Relative Amplitude, and Entropy. Both the PCA-EHMM and MRE-EHMM approaches were evaluated using a dataset of European cardholders and demonstrated comparable performance in terms of recall (sensitivity), specificity, precision, and F1-score. Notably, the MRE-EHMM method exhibited significantly reduced computational complexity, making it more suitable for real-time credit card fraud detection. Results also demonstrated that the PCA and MRE approaches perform significantly better when integrated with the EHMM in contrast to the conventional HMM approach. In addition, the proposed MRE-EHMM and PCA-EHMM techniques outperform other classic ML models, including random forest (RF), linear regression (LR), decision trees (DT) and K-nearest neighbour (KNN). Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

12 pages, 1846 KiB

Open AccessArticle

Machine Learning Analysis of Post-Operative Tumour Progression in Non-Functioning Pituitary Neuroendocrine Tumours: A Pilot Study

by Ziad Hussein, Robert W. Slack, Stephanie E. Baldeweg, Evangelos B. Mazomenos and Hani J. Marcus

Cancers 2024, 16(6), 1199; https://doi.org/10.3390/cancers16061199 - 19 Mar 2024

Cited by 1 | Viewed by 1601

Abstract

Post-operative tumour progression in patients with non-functioning pituitary neuroendocrine tumours is variable. The aim of this study was to use machine learning (ML) models to improve the prediction of post-operative outcomes in patients with NF PitNET. We studied data from 383 patients who [...] Read more.

Post-operative tumour progression in patients with non-functioning pituitary neuroendocrine tumours is variable. The aim of this study was to use machine learning (ML) models to improve the prediction of post-operative outcomes in patients with NF PitNET. We studied data from 383 patients who underwent surgery with or without radiotherapy, with a follow-up period between 6 months and 15 years. ML models, including k-nearest neighbour (KNN), support vector machine (SVM), and decision tree, showed superior performance in predicting tumour progression when compared with parametric statistical modelling using logistic regression, with SVM achieving the highest performance. The strongest predictor of tumour progression was the extent of surgical resection, with patient age, tumour volume, and the use of radiotherapy also showing influence. No features showed an association with tumour recurrence following a complete resection. In conclusion, this study demonstrates the potential of ML models in predicting post-operative outcomes for patients with NF PitNET. Future work should look to include additional, more granular, multicentre data, including incorporating imaging and operative video data. Full article

(This article belongs to the Special Issue Advances for Sellar and Parasellar Tumours: Current Treatments and Future Directions)

► Show Figures

Figure 1

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI