MDPI - Publisher of Open Access Journals

22 pages, 2592 KB

Open AccessArticle

Predicting Rice Quality in Indica Rice Using Multidimensional Data and Machine Learning Strategies

by Xiang Zhang, Yongqiang Liu, Junming Yu, Ni Cao, Wei Zhou, Jiaming Wu, Rumeng Zhao, Shaoqing Tang, Song Chen, Ying Chen, Fengli Zhao, Jiwai He and Gaoneng Shao

Agriculture 2026, 16(7), 807; https://doi.org/10.3390/agriculture16070807 - 4 Apr 2026

Viewed by 288

Abstract

Integrating agricultural remote sensing and phenomics for full-growth-period rice quality prediction is vital for early non-destructive screening and breeding; however, studies integrating genomic and multi-source phenotypic data across multiple environments remain limited. This study addressed this gap by integrating genomic SNP data, UAV-based [...] Read more.

Integrating agricultural remote sensing and phenomics for full-growth-period rice quality prediction is vital for early non-destructive screening and breeding; however, studies integrating genomic and multi-source phenotypic data across multiple environments remain limited. This study addressed this gap by integrating genomic SNP data, UAV-based spectral data, and individual multidimensional phenotypic data of 61 indica rice varieties (field and greenhouse environments). As a proof-of-concept study, feature selection methods (LASSO, MI, RFE, SPA) were used to mitigate overfitting and the “p >> n” problem, with further validation needed in larger populations. The results showed that amylose content is genetically dominated, protein content is genetically determined and influenced by gene-environment interactions, and chalkiness traits are determined by three combined factors. For amylose content, SNP data under the Random Forest model at the population level (phenomics data from field UAV remote sensing of variety populations) achieved optimal performance (R² = 0.92; MAE = 1.1; RMSE = 1.5), while the Stacking Ensemble method enhanced accuracy at the individual level (phenomics data from greenhouse single-plant phenotyping per variety). Chalky grain rate and chalkiness degree showed SNP-comparable prediction accuracy, with Stacking significantly improving performance at the population level (R² = 0.89 and 0.85, respectively). Protein content prediction remained relatively low (optimal R² = 0.56) due to strong environmental sensitivity and complex interactions. This framework extends traditional single-environment/single-data-source approaches, providing an effective strategy for early, high-throughput, non-destructive rice quality screening. Further validation with larger datasets, more growing seasons, or independent populations is required for reliable application in breeding-related practices. Full article

(This article belongs to the Section Agricultural Product Quality and Safety)

► Show Figures

Figure 1

20 pages, 17077 KB

Open AccessArticle

Comparative Analysis of Machine Learning Algorithms to Predict Municipal Solid Waste

by Pedro Aguilar-Encarnacion, Pedro Peñafiel-Arcos, Marcos Barahona Morales and Wilson Chango

Computation 2026, 14(3), 72; https://doi.org/10.3390/computation14030072 - 19 Mar 2026

Viewed by 272

Abstract

The management of municipal solid waste in intermediate cities exhibits high daily variability and source heterogeneity, which hinders operational sizing and material recovery. Reliable predictions are required from heterogeneous and often-scarce data. However, studies that compare multiple machine learning algorithms with temporal validation [...] Read more.

The management of municipal solid waste in intermediate cities exhibits high daily variability and source heterogeneity, which hinders operational sizing and material recovery. Reliable predictions are required from heterogeneous and often-scarce data. However, studies that compare multiple machine learning algorithms with temporal validation on short time series in intermediate cities are still limited. This study compares fourteen machine learning algorithms to predict the daily generation of organic and inorganic waste in La Joya de los Sachas, Ecuador, formulating the problem as a multi-output regression problem. An adapted CRISP-DM design was employed, using primary data from a waste characterization campaign, temporal feature engineering, variable encoding, and an expanding-window backtesting protocol against lag-7 persistence and ARIMA. Tree-based ensembles achieved the best performance. AdaBoost provided the best organic forecasts (

R^{2}

= 0.985

,

RMSE

= 0.081

,

MAE

= 0.061

in rate space), while Random Forest was best for inorganic (

R^{2}

= 0.965

,

RMSE

= 0.049

,

MAE

= 0.040

). Linear models were stable but slightly inferior, and other approaches (SVR, KNN, MLP, Lasso, ElasticNet) showed lower generalization capacity. The study provides a multi-output regression protocol with temporal validation for municipal contexts with short time series, comparative evidence across fourteen algorithms, and a conversion from rates to kilograms for operational use. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Graphical abstract

21 pages, 11307 KB

Open AccessArticle

A Symmetry-Preserving Extrapolated Primal-Dual Hybrid Gradient Method for Saddle-Point Problems

by Xiayang Zhang, Wenzhuo Li, Bowen Chang, Wei Liu and Shiyu Zhang

Axioms 2026, 15(3), 219; https://doi.org/10.3390/axioms15030219 - 16 Mar 2026

Viewed by 257

Abstract

The primal-dual hybrid gradient (PDHG) method is widely used for convex–concave saddle-point problems, yet its extrapolated variants are typically asymmetric because only one side is extrapolated. We propose a symmetry-preserving refinement, E-PDHG, which performs dual-side extrapolation followed by an explicit correction step. Under [...] Read more.

The primal-dual hybrid gradient (PDHG) method is widely used for convex–concave saddle-point problems, yet its extrapolated variants are typically asymmetric because only one side is extrapolated. We propose a symmetry-preserving refinement, E-PDHG, which performs dual-side extrapolation followed by an explicit correction step. Under standard step-size conditions, we establish global convergence for all

η \in (- 1, 1)

and derive a pointwise (non-ergodic)

O (1 / \sqrt{t})

rate for the last iterate. The method does not improve the asymptotic complexity order of PDHG; instead, it enlarges the practically stable parameter region while retaining the same per-iteration cost. Numerical experiments on image deblurring/inpainting and additional machine learning benchmarks (logistic regression and LASSO) demonstrate improved finite-iteration stability and efficiency. Full article

(This article belongs to the Section Mathematical Analysis)

► Show Figures

Figure 1

30 pages, 2394 KB

Open AccessArticle

Machine-Learning-Derived, Mechanistically Informed Transcriptomic Signature to Diagnose Active Tuberculosis and Guide Host-Directed Therapy

by Asif Hassan Syed, Nashwan Alromema, Hatem A. Almazarqi, Jasrah Irfan, Shakeel Ahmad, Altyeb A. Taha and Alhuseen Omar Alsayed

Diagnostics 2026, 16(5), 693; https://doi.org/10.3390/diagnostics16050693 - 26 Feb 2026

Viewed by 468

Abstract

Background/Objectives: An important diagnostic problem is to differentiate between active tuberculosis (TB) and latent TB infection (LTBI). Furthermore, the current biomarkers also offer minimal insight into disease pathogenesis to direct treatment. This triggered us to design a two-mode biomarker signature based on the [...] Read more.

Background/Objectives: An important diagnostic problem is to differentiate between active tuberculosis (TB) and latent TB infection (LTBI). Furthermore, the current biomarkers also offer minimal insight into disease pathogenesis to direct treatment. This triggered us to design a two-mode biomarker signature based on the multicohort analysis using a transcriptomic and stringent machine learning pipeline. Methods: When analyzing active TB, latent TB, and healthy control samples, a rigorous filter (ANOVA, p < 0.001) was used, followed by the selection of features with the help of Boruta-XGBoost and LASSO regression. This determined a small four-gene signature (TAP2, SORT1, WARS, and ANKRD22), which was selectively and highly upregulated in the active TB clinical state (p < 0.001). An ensemble staking classifier based on this signature (Random Forest and XGBoost) had a very high diagnostic performance (ROC-AUC = 0.991 (95% CI: 0.983–0.997)) in the stratification of infection phases, which was strongly confirmed in another cohort (GSE19444). Results: Importantly, the analysis of the functional pathways showed that all the genes are mapped to core dysregulated host pathways in active TB: antigen presentation (TAP2), lipid trafficking (SORT1), interferon response (WARS), and inflammasome signaling (ANKRD22). In such a way, the signature has a dual advantage: (1) high specificity, non-sputum transcriptional diagnostic of active TB, and (2) a mechanistic map of key host pathways, which describes targets of intervention. Conclusions: Thus, the signature provides a two-fold response: a biomarker panel aligned with WHO performance targets for TB triage and a mechanistic plan of therapy, which provides an easy way to implement transcriptomic discovery into clinical action against TB. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

31 pages, 2629 KB

Open AccessArticle

Using EEG to Explore Teachers’ Emotional Responses to Problem Behaviours in Learners with Autism Spectrum Disorder

by Zekai Alper Alp, Veysel Aksoy, Fatma Latifoğlu, Şerife Gengeç Benli and Avşar Ardıç

Appl. Sci. 2026, 16(4), 2153; https://doi.org/10.3390/app16042153 - 23 Feb 2026

Viewed by 634

Abstract

This study aimed to investigate the emotional changes in the brain activity of 34 special education teachers using electroencephalography (EEG) signals in response to common problem behaviours observed in students with Autism Spectrum Disorder (ASD), such as self-harm, aggression, tantrums, and stereotyped behaviours. [...] Read more.

This study aimed to investigate the emotional changes in the brain activity of 34 special education teachers using electroencephalography (EEG) signals in response to common problem behaviours observed in students with Autism Spectrum Disorder (ASD), such as self-harm, aggression, tantrums, and stereotyped behaviours. Vignettes with Turkish narration and stimulus videos were used for each behaviour type to trigger emotions. EEG data were collected from the frontal, temporal, parietal, and occipital regions, and subjected to pre-processing steps such as band-pass filtering (0.5–40 Hz) and Independent Component Analysis (ICA), and various spectral and statistical features were extracted. To improve classification performance, feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) method, and Support Vector Machine (SVM), Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), and Random Forest (RF) algorithms were used for classification. The machine learning techniques used achieved success rates of up to 97.66% F1 score in classifying teachers’ brain activity in response to different behavioural patterns. Teachers showed strong negative emotional responses to self-harm, aggression, and tantrums, while showing less response to the stereotypical behaviours. It is recommended that the study be replicated with different signals and teachers. Full article

(This article belongs to the Special Issue Improving Healthcare with Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 376 KB

Open AccessArticle

Multi-Platform Multivariate Regression with Group Sparsity for High-Dimensional Data Integration

by Shanshan Qin, Guanlin Zhang, Xin Gao and Yuehua Wu

Entropy 2026, 28(2), 135; https://doi.org/10.3390/e28020135 - 23 Jan 2026

Viewed by 342

Abstract

High-dimensional regression with multivariate responses poses significant challenges when data are collected across multiple platforms, each with potentially correlated outcomes. In this paper, we introduce a multi-platform multivariate high-dimensional linear regression (MM-HLR) model for simultaneously modeling within-platform correlation and cross-platform information fusion. Our [...] Read more.

High-dimensional regression with multivariate responses poses significant challenges when data are collected across multiple platforms, each with potentially correlated outcomes. In this paper, we introduce a multi-platform multivariate high-dimensional linear regression (MM-HLR) model for simultaneously modeling within-platform correlation and cross-platform information fusion. Our approach incorporates a mixture of Lasso and group Lasso penalties to promote both individual predictor sparsity and cross-platform group sparsity, thereby enhancing interpretability and estimation stability. We develop an efficient computational algorithm based on iteratively reweighted least squares and block coordinate descent to solve the resulting regularized optimization problem. We establish theoretical guarantees for our estimator, including oracle bounds on prediction error, estimation accuracy, and support recovery under mild conditions. Our simulation studies confirm the method’s strong empirical performance, demonstrating low bias, small variance, and robustness across various dimensions. The analysis of real financial data further validates the performance gains achieved by incorporating multivariate responses and integrating data across multiple platforms. Full article

(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data: Third Edition)

7 pages, 528 KB

Open AccessArticle

Structural Results on the HMLasso

by Shin-ya Matsushita and Hiromu Sasaki

Axioms 2025, 14(11), 843; https://doi.org/10.3390/axioms14110843 - 17 Nov 2025

Viewed by 304

Abstract

HMLasso (Lasso with High Missing Rate) is a useful technique for sparse regression when a high-dimensional design matrix contains a large number of missing data. To solve HMLasso, an appropriate positive semidefinite symmetric matrix must be obtained. In this paper, we present two [...] Read more.

HMLasso (Lasso with High Missing Rate) is a useful technique for sparse regression when a high-dimensional design matrix contains a large number of missing data. To solve HMLasso, an appropriate positive semidefinite symmetric matrix must be obtained. In this paper, we present two structural results on the HMLasso problem. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem. Full article

(This article belongs to the Special Issue New Theory and Applications of Nonlinear Analysis, Fractional Calculus and Optimization, 2nd Edition)

► Show Figures

Figure 1

16 pages, 399 KB

Open AccessArticle

Identifying Important Factors for Depressive Symptom Dynamics in Chinese Middle-Aged and Older Adults Using a Multi-State Transition Model with Feature Selection

by Chuoxin Ma, Tianyi Lu, Yu Li and Shanquan Chen

Behav. Sci. 2025, 15(11), 1501; https://doi.org/10.3390/bs15111501 - 5 Nov 2025

Viewed by 1067

Abstract

Depressive symptoms are increasingly common in middle-aged and older adults and have become a major public health problem. People may experience transitions across different underlying states due to symptom variability over a course of many years. And risk factors may have different impact [...] Read more.

Depressive symptoms are increasingly common in middle-aged and older adults and have become a major public health problem. People may experience transitions across different underlying states due to symptom variability over a course of many years. And risk factors may have different impact on different symptom states. However, existing research rarely considers the identification of important factors related to symptom conversion. The purpose of this study was to examine the risk associated with transitioning between various stages of depressive symptoms and their influencing factors, utilizing a multi-state model with a simultaneous feature selection method. We used the four waves of data from the China Health and Retirement Longitudinal Study (CHARLS) and 3916 participants were selected after screening. Five states of depressive symptoms were defined including no symptom, new symptom episode, symptom persistence, remission and relapse. We included 13 variables on demographic background, health status and functioning, and family and social connectivity, along with their interactions. Multi-state models were used to evaluate the risks of state transitions. The regularized (adaptive Lasso) partial likelihood approach was employed to simultaneously identify the important risk factors, estimate their impact on the state transition rates and determine their statistical significance. There were 1392 new depressive episodes events, 402 symptom persistence events, 639 remission events and 118 relapse events. We identified nine significant risk factors for the new onset of depressive symptoms: urban–rural residence, sex, retirement status, income, body pain, difficulty with basic daily activities, social engagement, education by income interaction and number of conditions by income interaction. The effects of the identified risk factors on new symptom episode weakened as those symptoms became persistent or went into remission. In terms of symptom relapse, sex by age was identified as a significant influencing factor. This study identified key factors and explored their effects on the various depressive symptom states among older Chinese adults. The findings could serve as a foundation for the development and implementation of targeted policies aimed at enhancing the mental well-being of China’s elderly population. Full article

► Show Figures

Figure 1

37 pages, 1891 KB

Open AccessArticle

CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems

by Ilya Bolotov

Mathematics 2025, 13(21), 3476; https://doi.org/10.3390/math13213476 - 31 Oct 2025

Viewed by 732

Abstract

This paper develops the linear-algebraic foundations of the Convex Least Squares Programming (CLSP) estimator and constructs its modular two-step convex optimization framework, capable of addressing ill-posed and underdetermined problems. After reformulating a problem in its canonical form, [...] Read more.

This paper develops the linear-algebraic foundations of the Convex Least Squares Programming (CLSP) estimator and constructs its modular two-step convex optimization framework, capable of addressing ill-posed and underdetermined problems. After reformulating a problem in its canonical form,

A^{(r)} z^{(r)} = b

, Step 1 yields an iterated (if

r > 1

) minimum-norm least-squares estimate

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

on a constrained subspace defined by a symmetric idempotent

Z

(reducing to the Moore–Penrose pseudoinverse when

Z = I

). The optional Step 2 corrects

{\hat{z}}^{(r)}

by solving a convex program, which penalizes deviations using a Lasso/Ridge/Elastic net-inspired scheme parameterized by

α \in [0, 1]

and yields

{\hat{z}}^{*}

. The second step guarantees a unique solution for

α \in (0, 1]

and coincides with the Minimum-Norm BLUE (MNBLUE) when

α = 1

. This paper also proposes an analysis of numerical stability and CLSP-specific goodness-of-fit statistics, such as partial

R^{2}

, normalized RMSE (NRMSE), Monte Carlo t-tests for the mean of NRMSE, and condition-number-based confidence bands. The three special CLSP problem cases are then tested in a 50,000-iteration Monte Carlo experiment and on simulated numerical examples. The estimator has a wide range of applications, including interpolating input–output tables and structural matrices. Full article

(This article belongs to the Section D: Statistics and Operational Research)

► Show Figures

Figure 1

29 pages, 11690 KB

Open AccessArticle

Enhanced Breast Cancer Diagnosis Using Multimodal Feature Fusion with Radiomics and Transfer Learning

by Nazmul Ahasan Maruf, Abdullah Basuhail and Muhammad Umair Ramzan

Diagnostics 2025, 15(17), 2170; https://doi.org/10.3390/diagnostics15172170 - 28 Aug 2025

Cited by 2 | Viewed by 2516

Abstract

Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields [...] Read more.

Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields of radiomics and deep learning (DL), have contributed to improvements in early detection methodologies. Nonetheless, persistent challenges, including limited data availability, model overfitting, and restricted generalization, continue to hinder performance. Methods: This study aims to overcome existing challenges by improving model accuracy and robustness through enhanced data augmentation and the integration of radiomics and deep learning features from the CBIS-DDSM dataset. To mitigate overfitting and improve model generalization, data augmentation techniques were applied. The PyRadiomics library was used to extract radiomics features, while transfer learning models were employed to derive deep learning features from the augmented training dataset. For radiomics feature selection, we compared multiple supervised feature selection methods, including RFE with random forest and logistic regression, ANOVA F-test, LASSO, and mutual information. Embedded methods with XGBoost, LightGBM, and CatBoost for GPUs were also explored. Finally, we integrated radiomics and deep features to build a unified multimodal feature space for improved classification performance. Based on this integrated set of radiomics and deep learning features, 13 pre-trained transfer learning models were trained and evaluated, including various versions of ResNet (50, 50V2, 101, 101V2, 152, 152V2), DenseNet (121, 169, 201), InceptionV3, MobileNet, and VGG (16, 19). Results: Among the evaluated models, ResNet152 achieved the highest classification accuracy of 97%, demonstrating the potential of this approach to enhance diagnostic precision. Other models, including VGG19, ResNet101V2, and ResNet101, achieved 96% accuracy, emphasizing the importance of the selected feature set in achieving robust detection. Conclusions: Future research could build on this work by incorporating Vision Transformer (ViT) architectures and leveraging multimodal data (e.g., clinical data, genomic information, and patient history). This could improve predictive performance and make the model more robust and adaptable to diverse data types. Ultimately, this approach has the potential to transform breast cancer detection, making it more accurate and interpretable. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

24 pages, 8294 KB

Open AccessArticle

Computing Two Heuristic Shrinkage Penalized Deep Neural Network Approach

by Mostafa Behzadi, Saharuddin Bin Mohamad, Mahdi Roozbeh, Rossita Mohamad Yunus and Nor Aishah Hamzah

Math. Comput. Appl. 2025, 30(4), 86; https://doi.org/10.3390/mca30040086 - 7 Aug 2025

Cited by 2 | Viewed by 1048

Abstract

Linear models are not always able to sufficiently capture the structure of a dataset. Sometimes, combining predictors in a non-parametric method, such as deep neural networks (DNNs), would yield a more flexible modeling of the response variables in the predictions. Furthermore, the standard [...] Read more.

Linear models are not always able to sufficiently capture the structure of a dataset. Sometimes, combining predictors in a non-parametric method, such as deep neural networks (DNNs), would yield a more flexible modeling of the response variables in the predictions. Furthermore, the standard statistical classification or regression approaches are inefficient when dealing with more complexity, such as a high-dimensional problem, which usually suffers from multicollinearity. For confronting these cases, penalized non-parametric methods are very useful. This paper proposes two heuristic approaches and implements new shrinkage penalized cost functions in the DNN, based on the elastic-net penalty function concept. In other words, some new methods via the development of shirnkaged penalized DNN, such as

{DNN}_{elastic-net}

and

{DNN}_{ridge & bridge}

, are established, which are strong rivals for

{DNN}_{Lasso}

and

{DNN}_{ridge}

. If there is any dataset grouping information in each layer of the DNN, it may be transferred using the derived penalized function of elastic-net; other penalized DNNs cannot provide this functionality. Regarding the outcomes in the tables, in the developed DNN, not only are there slight increases in the classification results, but there are also nullifying processes of some nodes in addition to a shrinkage property simultaneously in the structure of each layer. A simulated dataset was generated with the binary response variables, and the classic and heuristic shrinkage penalized DNN models were generated and tested. For comparison purposes, the DNN models were also compared to the classification tree using GUIDE and applied to a real microbiome dataset. Full article

► Show Figures

Figure 1

14 pages, 403 KB

Open AccessArticle

An Inexact Nonsmooth Quadratic Regularization Algorithm

by Anliang Wang, Xiangmei Wang and Chunfang Liao

Axioms 2025, 14(8), 604; https://doi.org/10.3390/axioms14080604 - 4 Aug 2025

Viewed by 784

Abstract

The quadratic regularization technique is widely used in the literature for constructing efficient algorithms, particularly for solving nonsmooth optimization problems. We propose an inexact nonsmooth quadratic regularization algorithm for solving large-scale optimization, which involves a large-scale smooth separable item and a nonsmooth one. [...] Read more.

The quadratic regularization technique is widely used in the literature for constructing efficient algorithms, particularly for solving nonsmooth optimization problems. We propose an inexact nonsmooth quadratic regularization algorithm for solving large-scale optimization, which involves a large-scale smooth separable item and a nonsmooth one. The main difference between our algorithm and the (exact) quadratic regularization algorithm is that it employs inexact gradients instead of the full gradients of the smooth item. Also, a slightly different update rule for the regularization parameters is adopted for easier implementation. Under certain assumptions, it is proved that the algorithm achieves a first-order approximate critical point of the problem, and the iteration complexity of the algorithm is

O (ε^{- 2})

. In the end, we apply the algorithm to solve LASSO problems. The numerical results show that the inexact algorithm is more efficient than the corresponding exact one in large-scale cases. Full article

► Show Figures

Figure 1

19 pages, 575 KB

Open AccessArticle

Accelerated Gradient-CQ Algorithms for Split Feasibility Problems

by Yu Zhang and Xiaojun Ma

Symmetry 2025, 17(7), 1121; https://doi.org/10.3390/sym17071121 - 12 Jul 2025

Viewed by 594

Abstract

This work focuses on split feasibility problems in Hilbert spaces. To accelerate the convergent rate of gradient-CQ algorithms, we introduce an inertial term. Additionally, non-monotone stepsizes are employed to adjust the relaxation parameter applied to the original stepsizes, ensuring that these original stepsizes [...] Read more.

This work focuses on split feasibility problems in Hilbert spaces. To accelerate the convergent rate of gradient-CQ algorithms, we introduce an inertial term. Additionally, non-monotone stepsizes are employed to adjust the relaxation parameter applied to the original stepsizes, ensuring that these original stepsizes maintain a positive lower bound. Thereby, the efficiency of the algorithms is improved. Moreover, the weak and strong convergence of the proposed algorithms are established through proofs that exhibit a similar symmetry structure and do not require the assumption of Lipschitz continuity for the gradient mappings. Finally, the LASSO problem is presented to illustrate and compare the performance of the algorithms. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

25 pages, 875 KB

Open AccessEditor’s ChoiceArticle

Filter Learning-Based Partial Least Squares Regression and Its Application in Infrared Spectral Analysis

by Yi Mou, Long Zhou, Weizhen Chen, Jianguo Liu and Teng Li

Algorithms 2025, 18(7), 424; https://doi.org/10.3390/a18070424 - 9 Jul 2025

Cited by 2 | Viewed by 1470

Abstract

Partial Least Squares (PLS) regression has been widely used to model the relationship between predictors and responses. However, PLS may be limited in its capacity to handle complex spectral data contaminated with significant noise and interferences. In this paper, we propose a novel [...] Read more.

Partial Least Squares (PLS) regression has been widely used to model the relationship between predictors and responses. However, PLS may be limited in its capacity to handle complex spectral data contaminated with significant noise and interferences. In this paper, we propose a novel filter learning-based PLS (FPLS) model that integrates an adaptive filter into the PLS framework. The FPLS model is designed to maximize the covariance between the filtered spectral data and the response. This modification enables FPLS to dynamically adapt to the characteristics of the data, thereby enhancing its feature extraction and noise suppression capabilities. We have developed an efficient algorithm to solve the FPLS optimization problem and provided theoretical analyses regarding the convergence of the model, the prediction variance, and the relationships among the objective functions of FPLS, PLS, and the filter length. Furthermore, we have derived bounds for the Root Mean Squared Error of Prediction (RMSEP) and the Cosine Similarity (CS) to evaluate model performance. Experimental results using spectral datasets from Corn, Octane, Mango, and Soil Nitrogen show that the FPLS model outperforms PLS, OSCPLS, VCPLS, PoPLS, LoPLS, DOSC, OPLS, MSC, SNV, SGFilter, and Lasso in terms of prediction accuracy. The theoretical analyses align with the experimental results, emphasizing the effectiveness and robustness of the FPLS model in managing complex spectral data. Full article

(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)

► Show Figures

Figure 1

18 pages, 487 KB

Open AccessArticle

Variational Bayesian Variable Selection in Logistic Regression Based on Spike-and-Slab Lasso

by Juanjuan Zhang, Weixian Wang, Mingming Yang and Maozai Tian

Mathematics 2025, 13(13), 2205; https://doi.org/10.3390/math13132205 - 6 Jul 2025

Viewed by 1685

Abstract

Logistic regression is often used to solve classification problems. This article combines the advantages of Bayesian methods and spike-and-slab Lasso to select variables in high-dimensional logistic regression. The method of introducing a new hidden variable or approximating the lower bound is used to [...] Read more.

Logistic regression is often used to solve classification problems. This article combines the advantages of Bayesian methods and spike-and-slab Lasso to select variables in high-dimensional logistic regression. The method of introducing a new hidden variable or approximating the lower bound is used to solve the problem of logistic functions without conjugate priors. The Laplace distribution in spike-and-slab Lasso is expressed as a hierarchical form of normal distribution and exponential distribution, so that all parameters in the model are posterior distributions that are easy to deal with. Considering the high time cost of parameter estimation and variable selection in high-dimensional models, we use the variational Bayesian algorithm to perform posterior inference on the parameters in the model. From the simulation results, it can be seen that it is an adaptive prior that can perform parameter estimation and variable selection well in high-dimensional logistic regression. From the perspective of algorithm running time, the method proposed in this article also has high computational efficiency in many cases. Full article

(This article belongs to the Section D: Statistics and Operational Research)

► Show Figures

Figure 1

Search Results (149)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (149)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI