Statistical Learning for High-Dimensional Data

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 8529

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics, Federal University of Bahia, Salvador 40170-110, Brazil
Interests: statistical learning; time series forecasting; robust statistics; data science; applied statistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

I am pleased to announce this new Special Issue on theoretical developments and applications related to statistical learning and high-dimensional data. In this Special Issue, we will consider topics such as machine and statistical learning methods, supervised and non-supervised, deep learning, and other multivariate methods, with application in multivariate independent and dependent data, in all application areas. More generally, this Special Issue aims to gather recent developments and applications of statistical learning methods for multivariate data. Manuscripts introducing new methodologies which can be helpful to practitioners are highly appreciated.

I look forward to receiving your submissions.

Sincerely,

Prof. Dr. Paulo Canas Rodrigues
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • statistical learning
  • high-dimensional data
  • data science
  • supervised learning
  • unsupervised learning
  • principal component analysis
  • cluster analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

11 pages, 262 KiB  
Article
On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components
by Alessandro Giuliani and Alessandro Vici
Stats 2024, 7(1), 54-64; https://doi.org/10.3390/stats7010004 - 11 Jan 2024
Viewed by 2281
Abstract
The usual method of separating signal and noise principal components on the sole basis of their eigenvalues has evident drawbacks when semantically relevant information ‘hides’ in minor components, explaining a very small part of the total variance. This situation is common in biomedical [...] Read more.
The usual method of separating signal and noise principal components on the sole basis of their eigenvalues has evident drawbacks when semantically relevant information ‘hides’ in minor components, explaining a very small part of the total variance. This situation is common in biomedical experimentation when PCA is used for hypothesis generation: the multi-scale character of biological regulation typically generates a main mode explaining the major part of variance (size component), squashing potentially interesting (shape) components into the noise floor. These minor components should be erroneously discarded as noisy by the usual selection methods. Here, we propose a computational method, tailored for the chemical concept of ‘titration’, allowing for the unsupervised recognition of the potential signal character of minor components by the analysis of the presence of a negative linear relation between added noise and component invariance. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

19 pages, 4728 KiB  
Article
Self-Organizing Topological Multilayer Perceptron: A Hybrid Method to Improve the Forecasting of Extreme Pollution Values
by Javier Linkolk López-Gonzales, Ana María Gómez Lamus, Romina Torres, Paulo Canas Rodrigues and Rodrigo Salas
Stats 2023, 6(4), 1241-1259; https://doi.org/10.3390/stats6040077 - 11 Nov 2023
Cited by 1 | Viewed by 2246
Abstract
Forecasting air pollutant levels is essential in regulatory plans focused on controlling and mitigating air pollutants, such as particulate matter. Focusing the forecast on air pollution peaks is challenging and complex since the pollutant time series behavior is not regular and is affected [...] Read more.
Forecasting air pollutant levels is essential in regulatory plans focused on controlling and mitigating air pollutants, such as particulate matter. Focusing the forecast on air pollution peaks is challenging and complex since the pollutant time series behavior is not regular and is affected by several environmental and urban factors. In this study, we propose a new hybrid method based on artificial neural networks to forecast daily extreme events of PM2.5 pollution concentration. The hybrid method combines self-organizing maps to identify temporal patterns of excessive daily pollution found at different monitoring stations, with a set of multilayer perceptron to forecast extreme values of PM2.5 for each cluster. The proposed model was applied to analyze five-year pollution data obtained from nine weather stations in the metropolitan area of Santiago, Chile. Simulation results show that the hybrid method improves performance metrics when forecasting daily extreme values of PM2.5. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

24 pages, 732 KiB  
Article
Causal Inference in Threshold Regression and the Neural Network Extension (TRNN)
by Yiming Chen, Paul J. Smith and Mei-Ling Ting Lee
Stats 2023, 6(2), 552-575; https://doi.org/10.3390/stats6020036 - 28 Apr 2023
Viewed by 2393
Abstract
The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, [...] Read more.
The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, such as the estimators’ collapsibility. We propose a neural network extension of the first-hitting-time based threshold regression model. With the flexibility of neural networks, the extended threshold regression model can efficiently capture complex relationships among predictors and underlying health processes while providing clinically meaningful interpretations, and also tackle the challenge of high-dimensional inputs. The proposed neural network extended threshold regression model can further be applied in causal survival analysis, such as performing as the Q-model in G-computation. More efficient causal estimations are expected given the algorithm’s robustness. Simulations were conducted to validate estimator collapsibility and threshold regression G-computation. The performance of the neural network extended threshold regression model is also illustrated by using simulated and real high-dimensional data from an observational study. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

Other

Jump to: Research

14 pages, 2280 KiB  
Case Report
Estimator Comparison for the Prediction of Election Results
by Miltiadis S. Chalikias, Georgios X. Papageorgiou and Dimitrios P. Zarogiannis
Stats 2024, 7(3), 671-684; https://doi.org/10.3390/stats7030040 - 1 Jul 2024
Viewed by 1034
Abstract
Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: [...] Read more.
Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

Back to TopTop