Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (37)

Search Parameters:
Keywords = bootstrapped regression trees

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3252 KB  
Article
A Machine Learning-Based Calibration Framework for Low-Cost PM2.5 Sensors Integrating Meteorological Predictors
by Xuying Ma, Yuanyuan Fan, Yifan Wang, Xiaoqi Wang, Zelei Tan, Danyang Li, Jun Gao, Leshu Zhang, Yixin Xu, Xueyao Liu, Shuyan Cai, Yuxin Ma and Yongzhe Huang
Chemosensors 2025, 13(12), 425; https://doi.org/10.3390/chemosensors13120425 - 8 Dec 2025
Viewed by 475
Abstract
Low-cost sensors (LCSs) have rapidly expanded in urban air quality monitoring but still suffer from limited data accuracy and vulnerability to environmental interference compared with regulatory monitoring stations. To improve their reliability, we proposed a machine learning (ML)-based framework for LCS correction that [...] Read more.
Low-cost sensors (LCSs) have rapidly expanded in urban air quality monitoring but still suffer from limited data accuracy and vulnerability to environmental interference compared with regulatory monitoring stations. To improve their reliability, we proposed a machine learning (ML)-based framework for LCS correction that integrates various meteorological factors at observation sites. Taking Tongshan District of Xuzhou City as an example, this study carried out continuous co-location data collection of hourly PM2.5 measurements by placing our LCS (American Temtop M10+ series) close to a regular fixed monitoring station. A mathematical model was developed to regress the PM2.5 deviations (PM2.5 concentrations at the fixed station—PM2.5 concentrations at the LCS) and the most important predictor variables. The data calibration was carried out based on six kinds of ML algorithms: random forest (RF), support vector regression (SVR), long short-term memory network (LSTM), decision tree regression (DTR), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), and the final model was selected from them with the optimal performance. The performance of calibration was then evaluated by a testing dataset generated in a bootstrap fashion with ten time repetitions. The results show that RF achieved the best overall accuracy, with R2 of 0.99 (training), 0.94 (validation), and 0.94 (testing), followed by DTR, BiLSTM, and GRU, which also showed strong predictive capabilities. In contrast, LSTM and SVR produced lower accuracy with larger errors under the limited data conditions. The results demonstrate that tree-based and advanced deep learning models can effectively capture the complex nonlinear relationships influencing LCS performance. The proposed framework exhibits high scalability and transferability, allowing its application to different LCS types and regions. This study advances the development of innovative techniques that enhance air quality assessment and support environmental research. Full article
Show Figures

Figure 1

27 pages, 11265 KB  
Article
Using Machine Learning Methods to Predict Cognitive Age from Psychophysiological Tests
by Daria D. Tyurina, Sergey V. Stasenko, Konstantin V. Lushnikov and Maria V. Vedunova
Healthcare 2025, 13(24), 3193; https://doi.org/10.3390/healthcare13243193 - 5 Dec 2025
Viewed by 236
Abstract
Background/Objectives: This paper presents the results of predicting chronological age from psychophysiological tests using machine learning regressors. Methods: Subjects completed a series of psychological tests measuring various cognitive functions, including reaction time and cognitive conflict, short-term memory, verbal functions, and color and spatial [...] Read more.
Background/Objectives: This paper presents the results of predicting chronological age from psychophysiological tests using machine learning regressors. Methods: Subjects completed a series of psychological tests measuring various cognitive functions, including reaction time and cognitive conflict, short-term memory, verbal functions, and color and spatial perception. The sample included 99 subjects, 68 percent of whom were men and 32 percent were women. Based on the test results, 43 features were generated. To determine the optimal feature selection method, several approaches were tested alongside the regression models using MAE, R2, and CV_R2 metrics. SHAP and Permutation Importance (via Random Forest) delivered the best performance with 10 features. Features selected through Permutation Importance were used in subsequent analyses. To predict participants’ age from psychophysiological test results, we evaluated several regression models, including Random Forest, Extra Trees, Gradient Boosting, SVR, Linear Regression, LassoCV, RidgeCV, ElasticNetCV, AdaBoost, and Bagging. Model performance was compared using the determination coefficient (R2) and mean absolute error (MAE). Cross-validated performance (CV_R2) was estimated via 5-fold cross-validation. To assess metric stability and uncertainty, bootstrapping (1000 resamples) was applied to the test set, yielding distributions of MAE and RMSE from which mean values and 95% confidence intervals were derived. Results: The study identified RidgeCV with winsorization and standardization as the best model for predicting cognitive age, achieving a mean absolute error of 5.7 years and an R2 of 0.60. Feature importance was evaluated using SHAP values and permutation importance. SHAP analysis showed that stroop_time_color and stroop_var_attempt_time were the strongest predictors, followed by several task-timing features with moderate contributions. Permutation importance confirmed this ranking, with these two features causing the largest performance drop when permuted. Partial dependence plots further indicated clear positive relationships between these key features and predicted age. Correlation analysis stratified by sex revealed that most features were significantly associated with age, with stronger effects generally observed in men. Conclusions: Feature selection revealed Stroop timing measures and task-related metrics from math and campimetry tests as the strongest predictors, reflecting core cognitive processes linked to aging. The results underscore the value of careful outlier handling, feature selection, and interpretable regularized models for analyzing psychophysiological data. Future work should include longitudinal studies and integration with biological markers to further improve clinical relevance. Full article
(This article belongs to the Special Issue AI-Driven Healthcare Insights)
Show Figures

Figure 1

17 pages, 2226 KB  
Article
Multi-Aspect Sentiment Analysis of Arabic Café Reviews Using Machine and Deep Learning Approaches
by Hmood Al-Dossari and Munerah Altalasi
Mathematics 2025, 13(24), 3895; https://doi.org/10.3390/math13243895 - 5 Dec 2025
Viewed by 262
Abstract
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we [...] Read more.
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we combine machine learning (Linear SVC, Naïve Bayes, Logistic Regression, Decision Tree, Random Forest) and a Convolutional Neural Network (CNN) to perform aspect identification and sentiment classification. A rigorous preprocessing and feature-engineering with TF-IDF and n-gram was implemented and statistically validated through bootstrap confidence intervals and Friedman–Nemenyi significance tests. Experimental results demonstrate that Linear SVC with optimized TF-IDF tri-grams achieved a macro-F1 of 0.89 for aspect identification and 0.71 for sentiment classification. Meanwhile, the CNN model yielded a comparable F1 of 0.89 for aspect identification and a higher 0.76 for sentiment classification. The findings highlight that effective feature representation and model selection can substantially improve Arabic opinion mining. The proposed framework provides a reliable foundation for analyzing Arabic user feedback on location-based platforms and supports more interpretable and data-driven business insights. These insights are essential to enhance personalized recommendations and business intelligence in the hospitality sector. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

16 pages, 609 KB  
Article
Enhancing Software Defect Prediction Using Ensemble Techniques and Diverse Machine Learning Paradigms
by Ayesha Siddika, Momotaz Begum, Fahmid Al Farid, Jia Uddin and Hezerul Abdul Karim
Eng 2025, 6(7), 161; https://doi.org/10.3390/eng6070161 - 15 Jul 2025
Cited by 1 | Viewed by 2711
Abstract
In today’s fast-paced world of software development, it is essential to ensure that programs run smoothly without any issues. When dealing with complex applications, the objective is to predict and resolve problems before they escalate. The prediction of software defects is a crucial [...] Read more.
In today’s fast-paced world of software development, it is essential to ensure that programs run smoothly without any issues. When dealing with complex applications, the objective is to predict and resolve problems before they escalate. The prediction of software defects is a crucial element in maintaining the stability and reliability of software systems. This research addresses this need by combining advanced techniques (ensemble techniques) with seventeen machine learning algorithms for predicting software defects, categorised into three types: semi-supervised, self-supervised, and supervised. In supervised learning, we mainly experimented with several algorithms, including random forest, k-nearest neighbors, support vector machines, logistic regression, gradient boosting, AdaBoost classifier, quadratic discriminant analysis, Gaussian training, decision tree, passive aggressive, and ridge classifier. In semi-supervised learning, we tested are autoencoders, semi-supervised support vector machines, and generative adversarial networks. For self-supervised learning, we utilized are autoencoder, simple framework for contrastive learning of representations, and bootstrap your own latent. After comparing the performance of each machine learning algorithm, we identified the most effective one. Among these, the gradient boosting AdaBoost classifier demonstrated superior performance based on an accuracy of 90%, closely followed by the AdaBoost classifier at 89%. Finally, we applied ensemble methods to predict software defects, leveraging the collective strengths of these diverse approaches. This enables software developers to significantly enhance defect prediction accuracy, thereby improving overall system robustness and reliability. Full article
Show Figures

Figure 1

30 pages, 3032 KB  
Article
A Bayesian Additive Regression Trees Framework for Individualized Causal Effect Estimation
by Lulu He, Lixia Cao, Tonghui Wang, Zhenqi Cao and Xin Shi
Mathematics 2025, 13(13), 2195; https://doi.org/10.3390/math13132195 - 4 Jul 2025
Viewed by 1821
Abstract
In causal inference research, accurate estimation of individualized treatment effects (ITEs) is at the core of effective intervention. This paper proposes a dual-structure ITE-estimation model based on Bayesian Additive Regression Trees (BART), which constructs independent BART sub-models for the treatment and control groups, [...] Read more.
In causal inference research, accurate estimation of individualized treatment effects (ITEs) is at the core of effective intervention. This paper proposes a dual-structure ITE-estimation model based on Bayesian Additive Regression Trees (BART), which constructs independent BART sub-models for the treatment and control groups, estimates ITEs using the potential outcome framework and enhances posterior stability and estimation reliability through Markov Chain Monte Carlo (MCMC) sampling. Based on psychological stress questionnaire data from graduate students, the study first integrates BART with the Shapley value method to identify employment pressure as a key driving factor and reveals substantial heterogeneity in ITEs across subgroups. Furthermore, the study constructs an ITE model using a dual-structured BART framework (BART-ITE), where employment pressure is defined as the treatment variable. Experimental results show that the model performs well in terms of credible interval width and ranking ability, demonstrating superior heterogeneity detection and individual-level sorting. External validation using both the Bootstrap method and matching-based pseudo-ITE estimation confirms the robustness of the proposed model. Compared with mainstream meta-learning methods such as S-Learner, X-Learner and Bayesian Causal Forest, the dual-structure BART-ITE model achieves a favorable balance between root mean square error and bias. In summary, it offers clear advantages in capturing ITE heterogeneity and enhancing estimation reliability and individualized decision-making. Full article
(This article belongs to the Special Issue Bayesian Learning and Its Advanced Applications)
Show Figures

Figure 1

12 pages, 2163 KB  
Article
Intra-Plant Variation in Leaf Dry Mass per Area (LMA): Effects of Leaf–Shoot Orientation and Vertical Position on Dry Mass and Area Scaling
by Xuchen Guo, Yiwen Zheng, Yuanmiao Chen, Zhidong Zhou and Jianhui Xue
Forests 2025, 16(5), 724; https://doi.org/10.3390/f16050724 - 24 Apr 2025
Viewed by 1297
Abstract
The intra-plant plasticity of leaves plays a vital role in enabling plants to adapt to changing climatic conditions. However, limited research has investigated the extent of intra-plant leaf trait variation and leaf biomass allocation strategies in herbaceous plants. To address this gap, we [...] Read more.
The intra-plant plasticity of leaves plays a vital role in enabling plants to adapt to changing climatic conditions. However, limited research has investigated the extent of intra-plant leaf trait variation and leaf biomass allocation strategies in herbaceous plants. To address this gap, we collected a total of 1746 leaves from 217 Lamium barbatum Siebold and Zucc. plants and measured their leaf dry mass (M) and leaf area (A). Leaves were categorized by vertical position (upper vs. lower canopy layer) and leaf–shoot orientation (east, south, west, north). ANOVA with Tukey’s HSD test was used to compare differences in M, A, and leaf dry mass per unit area (LMA). Reduced major axis regression was employed to evaluate the scaling relationship between M and A, and the bootstrap percentile method was used to determine differences in scaling exponents. The data indicated that: (i) M, A, LMA, and the scaling exponents of M versus A did not differ significantly among leaf–shoot orientations, and (ii) lower layer leaves exhibited significantly greater M, A, and LMA than upper layer leaves, but their scaling exponents were significantly smaller. These findings highlight that plant vertical growth brings significant intra-plant plasticity in leaf traits and their scaling relationships in herbaceous plants. This plasticity differs from that observed in trees, but is also critical for balancing weight load and optimizing light-use efficiency, potentially enhancing stress resilience in herbaceous plants. Full article
(This article belongs to the Special Issue Forest Phenology Dynamics and Response to Climate Change)
Show Figures

Figure 1

13 pages, 2431 KB  
Article
Optimal Pair Matching Combined with Machine Learning Predicts a Significant Reduction in Myocardial Infarction Risk in African Americans Following Omega-3 Fatty Acid Supplementation
by Shudong Sun, Aki Hara, Laurel Johnstone, Brian Hallmark, Joseph C. Watkins, Cynthia A. Thomson, Susan M. Schembre, Susan Sergeant, Jason G. Umans, Guang Yao, Hao Helen Zhang and Floyd H. Chilton
Nutrients 2024, 16(17), 2933; https://doi.org/10.3390/nu16172933 - 2 Sep 2024
Viewed by 3009
Abstract
Conflicting clinical trial results on omega-3 highly unsaturated fatty acids (n-3 HUFA) have prompted uncertainty about their cardioprotective effects. While the VITAL trial found no overall cardiovascular benefit from n-3 HUFA supplementation, its substantial African American (AfAm) enrollment provided a unique opportunity to [...] Read more.
Conflicting clinical trial results on omega-3 highly unsaturated fatty acids (n-3 HUFA) have prompted uncertainty about their cardioprotective effects. While the VITAL trial found no overall cardiovascular benefit from n-3 HUFA supplementation, its substantial African American (AfAm) enrollment provided a unique opportunity to explore racial differences in response to n-3 HUFA supplementation. The current observational study aimed to simulate randomized clinical trial (RCT) conditions by matching 3766 AfAm and 15,553 non-Hispanic White (NHW) individuals from the VITAL trial utilizing propensity score matching to address the limitations related to differences in confounding variables between the two groups. Within matched groups (3766 AfAm and 3766 NHW), n-3 HUFA supplementation’s impact on myocardial infarction (MI), stroke, and cardiovascular disease (CVD) mortality was assessed. A weighted decision tree analysis revealed belonging to the n-3 supplementation group as the most significant predictor of MI among AfAm but not NHW. Further logistic regression using the LASSO method and bootstrap estimation of standard errors indicated n-3 supplementation significantly lowered MI risk in AfAm (OR 0.17, 95% CI [0.048, 0.60]), with no such effect in NHW. This study underscores the critical need for future RCT to explore racial disparities in MI risk associated with n-3 HUFA supplementation and highlights potential causal differences between supplementation health outcomes in AfAm versus NHW populations. Full article
(This article belongs to the Special Issue Precision Nutrition and Human Health)
Show Figures

Figure 1

14 pages, 2447 KB  
Article
Air Quality Prediction and Ranking Assessment Based on Bootstrap-XGBoost Algorithm and Ordinal Classification Models
by Jingnan Yang, Yuzhu Tian and Chun Ho Wu
Atmosphere 2024, 15(8), 925; https://doi.org/10.3390/atmos15080925 - 2 Aug 2024
Cited by 10 | Viewed by 2864
Abstract
Along with the rapid development of industries and the acceleration of urbanisation, the problem of air pollution is becoming more serious. Exploring the relevant factors affecting air quality and accurately predicting the air quality index are significant in improving the overall environmental quality [...] Read more.
Along with the rapid development of industries and the acceleration of urbanisation, the problem of air pollution is becoming more serious. Exploring the relevant factors affecting air quality and accurately predicting the air quality index are significant in improving the overall environmental quality and realising green economic development. Machine learning algorithms and statistical models have been widely used in air quality prediction and ranking assessment. In this paper, based on daily air quality data for the city of Xi’an, China, from 1 October 2022 to 30 September 2023, we construct support vector regression (SVR), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), random forests (RF), neural network (NN) and long short-term memory (LSTM) models to analyse the influence of the air quality index for Xi’an and to conduct comparative tests. The predicted values and 95% prediction intervals of the AQI for the next 15 days for Xi’an, China, are given based on the Bootstrap-XGBoost algorithm. Further, the ordinal logit regression and ordinal probit regression models are constructed to evaluate and accurately predict the AQI ranks of the data from 1 October 2023 to 15 October 2023 for Xi’an. Finally, this paper proposes some suggestions and policy measures based on the findings of this paper. Full article
(This article belongs to the Special Issue Atmospheric Pollutants: Monitoring and Observation)
Show Figures

Figure 1

21 pages, 2560 KB  
Article
A Network Intrusion Detection Method Based on Bagging Ensemble
by Zichen Zhang, Shanshan Kong, Tianyun Xiao and Aimin Yang
Symmetry 2024, 16(7), 850; https://doi.org/10.3390/sym16070850 - 5 Jul 2024
Cited by 10 | Viewed by 2720
Abstract
The problems of asymmetry in information features and redundant features in datasets, and the asymmetry of network traffic distribution in the field of network intrusion detection, have been identified as a cause of low accuracy and poor generalization of traditional machine learning detection [...] Read more.
The problems of asymmetry in information features and redundant features in datasets, and the asymmetry of network traffic distribution in the field of network intrusion detection, have been identified as a cause of low accuracy and poor generalization of traditional machine learning detection methods in intrusion detection systems (IDSs). In response, a network intrusion detection method based on the integration of bootstrap aggregating (bagging) is proposed. The extreme random tree (ERT) algorithm was employed to calculate the weights of each feature, determine the feature subsets of different machine learning models, then randomly sample the training samples based on the bootstrap sampling method, and integrated classification and regression trees (CART), support vector machine (SVM), and k-nearest neighbor (KNN) as the base estimators of bagging. A comparison of integration methods revealed that the KNN-Bagging integration model exhibited optimal performance. Subsequently, the Bayesian optimization (BO) algorithm was employed for hyper-parameter tuning of the base estimators’ KNN. Finally, the base estimators were integrated through a hard voting approach. The proposed BO-KNN-Bagging model was evaluated on the NSL-KDD dataset, achieving an accuracy of 82.48%. This result was superior to those obtained by traditional machine learning algorithms and demonstrated enhanced performance compared with other methods. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

23 pages, 6643 KB  
Article
Radiomics and Deep Features: Robust Classification of Brain Hemorrhages and Reproducibility Analysis Using a 3D Autoencoder Neural Network
by Salar Bijari, Sahar Sayfollahi, Shiwa Mardokh-Rouhani, Sahar Bijari, Sadegh Moradian, Ziba Zahiri and Seyed Masoud Rezaeijo
Bioengineering 2024, 11(7), 643; https://doi.org/10.3390/bioengineering11070643 - 24 Jun 2024
Cited by 55 | Viewed by 2557
Abstract
This study evaluates the reproducibility of machine learning models that integrate radiomics and deep features (features extracted from a 3D autoencoder neural network) to classify various brain hemorrhages effectively. Using a dataset of 720 patients, we extracted 215 radiomics features (RFs) and 15,680 [...] Read more.
This study evaluates the reproducibility of machine learning models that integrate radiomics and deep features (features extracted from a 3D autoencoder neural network) to classify various brain hemorrhages effectively. Using a dataset of 720 patients, we extracted 215 radiomics features (RFs) and 15,680 deep features (DFs) from CT brain images. With rigorous screening based on Intraclass Correlation Coefficient thresholds (>0.75), we identified 135 RFs and 1054 DFs for analysis. Feature selection techniques such as Boruta, Recursive Feature Elimination (RFE), XGBoost, and ExtraTreesClassifier were utilized alongside 11 classifiers, including AdaBoost, CatBoost, Decision Trees, LightGBM, Logistic Regression, Naive Bayes, Neural Networks, Random Forest, Support Vector Machines (SVM), and k-Nearest Neighbors (k-NN). Evaluation metrics included Area Under the Curve (AUC), Accuracy (ACC), Sensitivity (SEN), and F1-score. The model evaluation involved hyperparameter optimization, a 70:30 train–test split, and bootstrapping, further validated with the Wilcoxon signed-rank test and q-values. Notably, DFs showed higher accuracy. In the case of RFs, the Boruta + SVM combination emerged as the optimal model for AUC, ACC, and SEN, while XGBoost + Random Forest excelled in F1-score. Specifically, RFs achieved AUC, ACC, SEN, and F1-scores of 0.89, 0.85, 0.82, and 0.80, respectively. Among DFs, the ExtraTreesClassifier + Naive Bayes combination demonstrated remarkable performance, attaining an AUC of 0.96, ACC of 0.93, SEN of 0.92, and an F1-score of 0.92. Distinguished models in the RF category included SVM with Boruta, Logistic Regression with XGBoost, SVM with ExtraTreesClassifier, CatBoost with XGBoost, and Random Forest with XGBoost, each yielding significant q-values of 42. In the DFs realm, ExtraTreesClassifier + Naive Bayes, ExtraTreesClassifier + Random Forest, and Boruta + k-NN exhibited robustness, with 43, 43, and 41 significant q-values, respectively. This investigation underscores the potential of synergizing DFs with machine learning models to serve as valuable screening tools, thereby enhancing the interpretation of head CT scans for patients with brain hemorrhages. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

21 pages, 8324 KB  
Article
Short-Term Load Forecasting Based on Optimized Random Forest and Optimal Feature Selection
by Bianca Magalhães, Pedro Bento, José Pombo, Maria do Rosário Calado and Sílvio Mariano
Energies 2024, 17(8), 1926; https://doi.org/10.3390/en17081926 - 18 Apr 2024
Cited by 22 | Viewed by 5482
Abstract
Short-term load forecasting (STLF) plays a vital role in ensuring the safe, efficient, and economical operation of power systems. Accurate load forecasting provides numerous benefits for power suppliers, such as cost reduction, increased reliability, and informed decision-making. However, STLF is a complex task [...] Read more.
Short-term load forecasting (STLF) plays a vital role in ensuring the safe, efficient, and economical operation of power systems. Accurate load forecasting provides numerous benefits for power suppliers, such as cost reduction, increased reliability, and informed decision-making. However, STLF is a complex task due to various factors, including non-linear trends, multiple seasonality, variable variance, and significant random interruptions in electricity demand time series. To address these challenges, advanced techniques and models are required. This study focuses on the development of an efficient short-term power load forecasting model using the random forest (RF) algorithm. RF combines regression trees through bagging and random subspace techniques to improve prediction accuracy and reduce model variability. The algorithm constructs a forest of trees using bootstrap samples and selects random feature subsets at each node to enhance diversity. Hyperparameters such as the number of trees, minimum sample leaf size, and maximum features for each split are tuned to optimize forecasting results. The proposed model was tested using historical hourly load data from four transformer substations supplying different campus areas of the University of Beira Interior, Portugal. The training data were from January 2018 to December 2021, while the data from 2022 were used for testing. The results demonstrate the effectiveness of the RF model in forecasting short-term hourly and one day ahead load and its potential to enhance decision-making processes in smart grid operations. Full article
(This article belongs to the Topic Short-Term Load Forecasting)
Show Figures

Figure 1

17 pages, 12515 KB  
Article
Prediction Model of Flavonoids Content in Ancient Tree Sun−Dried Green Tea under Abiotic Stress Based on LASSO−Cox
by Lei Li, Yamin Wu, Houqiao Wang, Junjie He, Qiaomei Wang, Jiayi Xu, Yuxin Xia, Wenxia Yuan, Shuyi Chen, Lin Tao, Xinghua Wang and Baijuan Wang
Agriculture 2024, 14(2), 296; https://doi.org/10.3390/agriculture14020296 - 12 Feb 2024
Cited by 2 | Viewed by 1921
Abstract
To investigate the variation in flavonoids content in ancient tree sun–dried green tea under abiotic stress environmental conditions, this study determined the flavonoids content in ancient tree sun−dried green tea and analyzed its correlation with corresponding factors such as the age, height, altitude, [...] Read more.
To investigate the variation in flavonoids content in ancient tree sun–dried green tea under abiotic stress environmental conditions, this study determined the flavonoids content in ancient tree sun−dried green tea and analyzed its correlation with corresponding factors such as the age, height, altitude, and soil composition of the tree. This study uses two machine−learning models, Least Absolute Shrinkage and Selection Operator (LASSO) regression and Cox regression, to build a predictive model based on the selection of effective variables. During the process, bootstrap was used to expand the dataset for single−factor and multi−factor comparative analyses, as well as for model validation, and the goodness−of−fit was assessed using the Akaike information criterion (AIC). The results showed that pH, total potassium, nitrate nitrogen, available phosphorus, hydrolytic nitrogen, and ammonium nitrogen have a high accuracy in predicting the flavonoids content of this model and have a synergistic effect on the production of flavonoids in the ancient tree tea. In this prediction model, when the flavonoids content was >6‰, the area under the curve of the training set and validation set were 0.8121 and 0.792 and, when the flavonoids content was >9‰, the area under the curve of the training set and validation set were 0.877 and 0.889, demonstrating good consistency. Compared to modeling with all significantly correlated factors (p < 0.05), the AIC decreased by 32.534%. Simultaneously, a visualization system for predicting flavonoids content in ancient tree sun−dried green tea was developed based on a nomogram model. The model was externally validated using actual measurement data and achieved an accuracy rate of 83.33%. Therefore, this study offers a scientific theoretical foundation for explaining the forecast and interference of the quality of ancient tree sun−dried green tea under abiotic stress. Full article
(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Agriculture)
Show Figures

Figure 1

13 pages, 1851 KB  
Article
Exploration of Machine Learning Algorithms for pH and Moisture Estimation in Apples Using VIS-NIR Imaging
by Erhan Kavuncuoğlu, Necati Çetin, Bekir Yildirim, Mohammad Nadimi and Jitendra Paliwal
Appl. Sci. 2023, 13(14), 8391; https://doi.org/10.3390/app13148391 - 20 Jul 2023
Cited by 7 | Viewed by 2471
Abstract
Non-destructive assessment of fruits for grading and quality determination is essential to automate pre- and post-harvest handling. Near-infrared (NIR) hyperspectral imaging (HSI) has already established itself as a powerful tool for characterizing the quality parameters of various fruits, including apples. The adoption of [...] Read more.
Non-destructive assessment of fruits for grading and quality determination is essential to automate pre- and post-harvest handling. Near-infrared (NIR) hyperspectral imaging (HSI) has already established itself as a powerful tool for characterizing the quality parameters of various fruits, including apples. The adoption of HSI is expected to grow exponentially if inexpensive tools are made available to growers and traders at the grassroots levels. To this end, the present study aims to explore the feasibility of using a low-cost visible-near-infrared (VIS-NIR) HSI in the 386–1028 nm wavelength range to predict the moisture content (MC) and pH of Pink Lady apples harvested at three different maturity stages. Five different machine learning algorithms, viz. partial least squares regression (PLSR), multiple linear regression (MLR), k-nearest neighbor (kNN), decision tree (DT), and artificial neural network (ANN) were utilized to analyze HSI data cubes. In the case of ANN, PLSR, and MLR models, data analysis modeling was performed using 11 optimum features identified using a Bootstrap Random Forest feature selection approach. Among the tested algorithms, ANN provided the best performance with R (correlation), and root mean squared error (RMSE) values of 0.868 and 0.756 for MC and 0.383 and 0.044 for pH prediction, respectively. The obtained results indicate that while the VIS-NIR HSI promises success in non-destructively measuring the MC of apples, its performance for pH prediction of the studied apple variety is poor. The present work contributes to the ongoing research in determining the full potential of VIS-NIR HSI technology in apple grading, maturity assessment, and shelf-life estimation. Full article
(This article belongs to the Special Issue Applied Computer Vision in Industry and Agriculture)
Show Figures

Figure 1

28 pages, 3597 KB  
Article
Predicting Dose-Dependent Carcinogenicity of Chemical Mixtures Using a Novel Hybrid Neural Network Framework and Mathematical Approach
by Sarita Limbu and Sivanesan Dakshanamurthy
Toxics 2023, 11(7), 605; https://doi.org/10.3390/toxics11070605 - 12 Jul 2023
Cited by 7 | Viewed by 2797
Abstract
This study addresses the challenge of assessing the carcinogenic potential of hazardous chemical mixtures, such as per- and polyfluorinated substances (PFASs), which are known to contribute significantly to cancer development. Here, we propose a novel framework called HNNMixCancer that utilizes a hybrid [...] Read more.
This study addresses the challenge of assessing the carcinogenic potential of hazardous chemical mixtures, such as per- and polyfluorinated substances (PFASs), which are known to contribute significantly to cancer development. Here, we propose a novel framework called HNNMixCancer that utilizes a hybrid neural network (HNN) integrated into a machine-learning framework. This framework incorporates a mathematical model to simulate chemical mixtures, enabling the creation of classification models for binary (carcinogenic or noncarcinogenic) and multiclass classification (categorical carcinogenicity) and regression (carcinogenic potency). Through extensive experimentation, we demonstrate that our HNN model outperforms other methodologies, including random forest, bootstrap aggregating, adaptive boosting, support vector regressor, gradient boosting, kernel ridge, decision tree with AdaBoost, and KNeighbors, achieving a superior accuracy of 92.7% in binary classification. To address the limited availability of experimental data and enrich the training data, we generate an assumption-based virtual library of chemical mixtures using a known carcinogenic and noncarcinogenic single chemical for all the classification models. Remarkably, in this case, all methods achieve accuracies exceeding 98% for binary classification. In external validation tests, our HNN method achieves the highest accuracy of 80.5%. Furthermore, in multiclass classification, the HNN demonstrates an overall accuracy of 96.3%, outperforming RF, Bagging, and AdaBoost, which achieved 91.4%, 91.7%, and 80.2%, respectively. In regression models, HNN, RF, SVR, GB, KR, DT with AdaBoost, and KN achieved average R2 values of 0.96, 0.90, 0.77, 0.94, 0.96, 0.96, and 0.97, respectively, showcasing their effectiveness in predicting the concentration at which a chemical mixture becomes carcinogenic. Our method exhibits exceptional predictive power in prioritizing carcinogenic chemical mixtures, even when relying on assumption-based mixtures. This capability is particularly valuable for toxicology studies that lack experimental data on the carcinogenicity and toxicity of chemical mixtures. To our knowledge, this study introduces the first method for predicting the carcinogenic potential of chemical mixtures. The HNNMixCancer framework offers a novel alternative for dose-dependent carcinogen prediction. Ongoing efforts involve implementing the HNN method to predict mixture toxicity and expanding the application of HNNMixCancer to include multiple mixtures such as PFAS mixtures and co-occurring chemicals. Full article
(This article belongs to the Special Issue The 10th Anniversary of Toxics)
Show Figures

Figure 1

84 pages, 26371 KB  
Article
A Study on ML-Based Software Defect Detection for Security Traceability in Smart Healthcare Applications
by Samuel Mcmurray and Ali Hassan Sodhro
Sensors 2023, 23(7), 3470; https://doi.org/10.3390/s23073470 - 26 Mar 2023
Cited by 26 | Viewed by 4941
Abstract
Software Defect Prediction (SDP) is an integral aspect of the Software Development Life-Cycle (SDLC). As the prevalence of software systems increases and becomes more integrated into our daily lives, so the complexity of these systems increases the risks of widespread defects. With reliance [...] Read more.
Software Defect Prediction (SDP) is an integral aspect of the Software Development Life-Cycle (SDLC). As the prevalence of software systems increases and becomes more integrated into our daily lives, so the complexity of these systems increases the risks of widespread defects. With reliance on these systems increasing, the ability to accurately identify a defective model using Machine Learning (ML) has been overlooked and less addressed. Thus, this article contributes an investigation of various ML techniques for SDP. An investigation, comparative analysis and recommendation of appropriate Feature Extraction (FE) techniques, Principal Component Analysis (PCA), Partial Least Squares Regression (PLS), Feature Selection (FS) techniques, Fisher score, Recursive Feature Elimination (RFE), and Elastic Net are presented. Validation of the following techniques, both separately and in combination with ML algorithms, is performed: Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbour (KNN), Multilayer Perceptron (MLP), Decision Tree (DT), and ensemble learning methods Bootstrap Aggregation (Bagging), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Random Forest(RF), and Generalized Stacking (Stacking). Extensive experimental setup was built and the results of the experiments revealed that FE and FS can both positively and negatively affect performance over the base model or Baseline. PLS, both separately and in combination with FS techniques, provides impressive, and the most consistent, improvements, while PCA, in combination with Elastic-Net, shows acceptable improvement. Full article
Show Figures

Figure 1

Back to TopTop