Submit to Stats Review for Stats Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Statistical Learning for High-Dimensional Data

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 9586

Share This Special Issue

Special Issue Editor

Prof. Dr. Paulo Canas Rodrigues

E-Mail Website
Guest Editor

Department of Statistics, Federal University of Bahia, Salvador 40170-110, Brazil
Interests: statistical learning; time series forecasting; robust statistics; data science; applied statistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

I am pleased to announce this new Special Issue on theoretical developments and applications related to statistical learning and high-dimensional data. In this Special Issue, we will consider topics such as machine and statistical learning methods, supervised and non-supervised, deep learning, and other multivariate methods, with application in multivariate independent and dependent data, in all application areas. More generally, this Special Issue aims to gather recent developments and applications of statistical learning methods for multivariate data. Manuscripts introducing new methodologies which can be helpful to practitioners are highly appreciated.

I look forward to receiving your submissions.

Sincerely,

Prof. Dr. Paulo Canas Rodrigues
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

statistical learning
high-dimensional data
data science
supervised learning
unsupervised learning
principal component analysis
cluster analysis

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Other

11 pages, 262 KiB

Open AccessArticle

On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components

by Alessandro Giuliani and Alessandro Vici

Stats 2024, 7(1), 54-64; https://doi.org/10.3390/stats7010004 - 11 Jan 2024

Cited by 1 | Viewed by 2518

Abstract

The usual method of separating signal and noise principal components on the sole basis of their eigenvalues has evident drawbacks when semantically relevant information ‘hides’ in minor components, explaining a very small part of the total variance. This situation is common in biomedical experimentation when PCA is used for hypothesis generation: the multi-scale character of biological regulation typically generates a main mode explaining the major part of variance (size component), squashing potentially interesting (shape) components into the noise floor. These minor components should be erroneously discarded as noisy by the usual selection methods. Here, we propose a computational method, tailored for the chemical concept of ‘titration’, allowing for the unsupervised recognition of the potential signal character of minor components by the analysis of the presence of a negative linear relation between added noise and component invariance. Full article

(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)

► Show Figures

Figure 1

19 pages, 4728 KiB

Open AccessArticle

Self-Organizing Topological Multilayer Perceptron: A Hybrid Method to Improve the Forecasting of Extreme Pollution Values

by Javier Linkolk López-Gonzales, Ana María Gómez Lamus, Romina Torres, Paulo Canas Rodrigues and Rodrigo Salas

Stats 2023, 6(4), 1241-1259; https://doi.org/10.3390/stats6040077 - 11 Nov 2023

Cited by 2 | Viewed by 2471

Abstract

Forecasting air pollutant levels is essential in regulatory plans focused on controlling and mitigating air pollutants, such as particulate matter. Focusing the forecast on air pollution peaks is challenging and complex since the pollutant time series behavior is not regular and is affected by several environmental and urban factors. In this study, we propose a new hybrid method based on artificial neural networks to forecast daily extreme events of PM

_{2.5}

pollution concentration. The hybrid method combines self-organizing maps to identify temporal patterns of excessive daily pollution found at different monitoring stations, with a set of multilayer perceptron to forecast extreme values of PM

_{2.5}

for each cluster. The proposed model was applied to analyze five-year pollution data obtained from nine weather stations in the metropolitan area of Santiago, Chile. Simulation results show that the hybrid method improves performance metrics when forecasting daily extreme values of PM

_{2.5}

. Full article

(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)

► Show Figures

Figure 1

24 pages, 732 KiB

Open AccessArticle

Causal Inference in Threshold Regression and the Neural Network Extension (TRNN)

by Yiming Chen, Paul J. Smith and Mei-Ling Ting Lee

Stats 2023, 6(2), 552-575; https://doi.org/10.3390/stats6020036 - 28 Apr 2023

Cited by 1 | Viewed by 2717

Abstract

The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, such as the estimators’ collapsibility. We propose a neural network extension of the first-hitting-time based threshold regression model. With the flexibility of neural networks, the extended threshold regression model can efficiently capture complex relationships among predictors and underlying health processes while providing clinically meaningful interpretations, and also tackle the challenge of high-dimensional inputs. The proposed neural network extended threshold regression model can further be applied in causal survival analysis, such as performing as the Q-model in G-computation. More efficient causal estimations are expected given the algorithm’s robustness. Simulations were conducted to validate estimator collapsibility and threshold regression G-computation. The performance of the neural network extended threshold regression model is also illustrated by using simulated and real high-dimensional data from an observational study. Full article

(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)

► Show Figures

Figure 1

Other

Jump to: Research

14 pages, 2280 KiB

Open AccessCase Report

Estimator Comparison for the Prediction of Election Results

by Miltiadis S. Chalikias, Georgios X. Papageorgiou and Dimitrios P. Zarogiannis

Stats 2024, 7(3), 671-684; https://doi.org/10.3390/stats7030040 - 1 Jul 2024

Viewed by 1236

Abstract

Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available. Full article

(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)

► Show Figures

Journal Menu

Journal Browser

Statistical Learning for High-Dimensional Data

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (4 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI