Robust Statistics in Action

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 April 2021) | Viewed by 39199

Special Issue Editor


E-Mail Website
Guest Editor
Department of Economics and Management and Interdepartmental Centre for Robust Statistics, University of Parma, Parma, Italy
Interests: all aspects of robust statistics (regression, multivariate analysis, classification and time series)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

I am pleased to announce a Special Issue on the practical use of robust statistics. I am soliciting manuscripts which show how the application of robust methods can help to address and solve real complex data problems in a way that is not possible using traditional methods. Suitable manuscripts could include but are not limited to the robust assessment of public health issues, fraud detection discovery, chemometrics and geochemistry, medical problems, all varieties of classification problems, predictive maintenance, and marketing applications in banks or firms. More generally, the purpose of the Special Issue is to show how robust statistics can be successfully applied to analyze multivariate complex data affected by different sources of heterogeneity, multiple outliers, and masking and swamping problems. Manuscripts applying robust statistics concepts to the modeling of the COVID-19 epidemic are especially welcome. Similarly, manuscripts introducing specific robust models which can be useful to practitioners are highly appreciated.

I look forward to receiving your submissions.

Sincerely,

Prof. Dr. Marco Riani
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • regression
  • multivariate analysis
  • time series
  • public health
  • classification (supervised and unsupervised) neural networks
  • factor analysis
  • fraud detection
  • predictive maintenance

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

42 pages, 726 KiB  
Article
Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches
by Alexander Robitzsch
Stats 2022, 5(3), 631-672; https://doi.org/10.3390/stats5030039 - 22 Jul 2022
Cited by 12 | Viewed by 4055
Abstract
Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In [...] Read more.
Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In a recent article, Rosseel and Loh (2022, Psychol. Methods) proposed a two-step structural after measurement (SAM) approach to SEM that estimates the parameters of the measurement model in the first step and the parameters of the structural model in the second step. Rosseel and Loh claimed that SAM is more robust to local model misspecifications (i.e., cross loadings and residual correlations) than one-step maximum likelihood estimation. In this article, it is demonstrated with analytical derivations and simulation studies that SAM is generally not more robust to misspecifications than one-step estimation approaches. Alternative estimation methods are proposed that provide more robustness to misspecifications. SAM suffers from finite-sample bias that depends on the size of factor reliability and factor correlations. A bootstrap-bias-corrected LSAM estimate provides less biased estimates in finite samples. Nevertheless, we argue in the discussion section that applied researchers should nevertheless adopt SAM because robustness to local misspecifications is an irrelevant property when applying SAM. Parameter estimates in a structural model are of interest because intentionally misspecified SEMs frequently offer clearly interpretable factors. In contrast, SEMs with some empirically driven model modifications will result in biased estimates of the structural parameters because the meaning of factors is unintentionally changed. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

17 pages, 450 KiB  
Article
Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression
by Luca Insolia, Ana Kenney, Martina Calovi and Francesca Chiaromonte
Stats 2021, 4(3), 665-681; https://doi.org/10.3390/stats4030040 - 31 Aug 2021
Cited by 3 | Viewed by 4290
Abstract
High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric [...] Read more.
High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

14 pages, 12211 KiB  
Article
Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
by Andrea Cappozzo, Luis Angel García Escudero, Francesca Greselin and Agustín Mayo-Iscar
Stats 2021, 4(3), 602-615; https://doi.org/10.3390/stats4030036 - 6 Jul 2021
Cited by 6 | Viewed by 2681
Abstract
Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity [...] Read more.
Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

21 pages, 311 KiB  
Article
Robust Causal Estimation from Observational Studies Using Penalized Spline of Propensity Score for Treatment Comparison
by Tingting Zhou, Michael R. Elliott and Roderick J. A. Little
Stats 2021, 4(2), 529-549; https://doi.org/10.3390/stats4020032 - 10 Jun 2021
Cited by 4 | Viewed by 2740
Abstract
Without randomization of treatments, valid inference of treatment effects from observational studies requires controlling for all confounders because the treated subjects generally differ systematically from the control subjects. Confounding control is commonly achieved using the propensity score, defined as the conditional probability of [...] Read more.
Without randomization of treatments, valid inference of treatment effects from observational studies requires controlling for all confounders because the treated subjects generally differ systematically from the control subjects. Confounding control is commonly achieved using the propensity score, defined as the conditional probability of assignment to a treatment given the observed covariates. The propensity score collapses all the observed covariates into a single measure and serves as a balancing score such that the treated and control subjects with similar propensity scores can be directly compared. Common propensity score-based methods include regression adjustment and inverse probability of treatment weighting using the propensity score. We recently proposed a robust multiple imputation-based method, penalized spline of propensity for treatment comparisons (PENCOMP), that includes a penalized spline of the assignment propensity as a predictor. Under the Rubin causal model assumptions that there is no interference across units, that each unit has a non-zero probability of being assigned to either treatment group, and there are no unmeasured confounders, PENCOMP has a double robustness property for estimating treatment effects. In this study, we examine the impact of using variable selection techniques that restrict predictors in the propensity score model to true confounders of the treatment-outcome relationship on PENCOMP. We also propose a variant of PENCOMP and compare alternative approaches to standard error estimation for PENCOMP. Compared to the weighted estimators, PENCOMP is less affected by inclusion of non-confounding variables in the propensity score model. We illustrate the use of PENCOMP and competing methods in estimating the impact of antiretroviral treatments on CD4 counts in HIV+ patients. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

14 pages, 2134 KiB  
Article
Smart Visualization of Mixed Data
by Aurea Grané, Giancarlo Manzi and Silvia Salini
Stats 2021, 4(2), 472-485; https://doi.org/10.3390/stats4020029 - 1 Jun 2021
Cited by 6 | Viewed by 3535
Abstract
In this work, we propose a new protocol that integrates robust classification and visualization techniques to analyze mixed data. This protocol is based on the combination of the Forward Search Distance-Based (FS-DB) algorithm (Grané, Salini, and Verdolini 2020) and robust clustering. The resulting [...] Read more.
In this work, we propose a new protocol that integrates robust classification and visualization techniques to analyze mixed data. This protocol is based on the combination of the Forward Search Distance-Based (FS-DB) algorithm (Grané, Salini, and Verdolini 2020) and robust clustering. The resulting groups are visualized via MDS maps and characterized through an analysis of several graphical outputs. The methodology is illustrated on a real dataset related to European COVID-19 numerical health data, as well as the policy and restriction measurements of the 2020–2021 COVID-19 pandemic across the EU Member States. The results show similarities among countries in terms of incidence and the management of the emergency across several waves of the disease. With the proposed methodology, new smart visualization tools for analyzing mixed data are provided. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

18 pages, 1000 KiB  
Article
Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection
by Luca Greco, Giovanni Saraceno and Claudio Agostinelli
Stats 2021, 4(2), 454-471; https://doi.org/10.3390/stats4020028 - 1 Jun 2021
Cited by 4 | Viewed by 3562
Abstract
In this work, we deal with a robust fitting of a wrapped normal model to multivariate circular data. Robust estimation is supposed to mitigate the adverse effects of outliers on inference. Furthermore, the use of a proper robust method leads to the definition [...] Read more.
In this work, we deal with a robust fitting of a wrapped normal model to multivariate circular data. Robust estimation is supposed to mitigate the adverse effects of outliers on inference. Furthermore, the use of a proper robust method leads to the definition of effective outlier detection rules. Robust fitting is achieved by a suitable modification of a classification-expectation-maximization algorithm that has been developed to perform a maximum likelihood estimation of the parameters of a multivariate wrapped normal distribution. The modification concerns the use of complete-data estimating equations that involve a set of data dependent weights aimed to downweight the effect of possible outliers. Several robust techniques are considered to define weights. The finite sample behavior of the resulting proposed methods is investigated by some numerical studies and real data examples. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

19 pages, 1357 KiB  
Article
Analysis of ‘Pre-Fit’ Datasets of gLAB by Robust Statistical Techniques
by Maria Teresa Alonso, Carlo Ferigato, Deimos Ibanez Segura, Domenico Perrotta, Adria Rovira-Garcia and Emmanuele Sordini
Stats 2021, 4(2), 400-418; https://doi.org/10.3390/stats4020026 - 24 May 2021
Viewed by 2410
Abstract
The GNSS LABoratory tool (gLAB) is an interactive educational suite of applications for processing data from the Global Navigation Satellite System (GNSS). gLAB is composed of several data analysis modules that compute the solution of the problem of determining a position by means [...] Read more.
The GNSS LABoratory tool (gLAB) is an interactive educational suite of applications for processing data from the Global Navigation Satellite System (GNSS). gLAB is composed of several data analysis modules that compute the solution of the problem of determining a position by means of GNSS measurements. The present work aimed to improve the pre-fit outlier detection function of gLAB since outliers, if undetected, deteriorate the obtained position coordinates. The methodology exploits robust statistical tools for regression provided by the Flexible Statistics and Data Analysis (FSDA) toolbox, an extension of MATLAB for the analysis of complex datasets. Our results show how the robust analysis FSDA technique improves the capability of detecting actual outliers in GNSS measurements, with respect to the present gLAB pre-fit outlier detection function. This study concludes that robust statistical analysis techniques, when applied to the pre-fit layer of gLAB, improve the overall reliability and accuracy of the positioning solution. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

21 pages, 2366 KiB  
Article
fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
by Francesca Torti, Aldo Corbellini and Anthony C. Atkinson
Stats 2021, 4(2), 327-347; https://doi.org/10.3390/stats4020022 - 18 Apr 2021
Cited by 5 | Viewed by 2932
Abstract
The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from [...] Read more.
The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

18 pages, 343 KiB  
Article
Measuring Bayesian Robustness Using Rényi Divergence
by Luai Al-Labadi, Forough Fazeli Asl and Ce Wang
Stats 2021, 4(2), 251-268; https://doi.org/10.3390/stats4020018 - 29 Mar 2021
Cited by 6 | Viewed by 2447
Abstract
This paper deals with measuring the Bayesian robustness of classes of contaminated priors. Two different classes of priors in the neighborhood of the elicited prior are considered. The first one is the well-known ϵ-contaminated class, while the second one is the geometric [...] Read more.
This paper deals with measuring the Bayesian robustness of classes of contaminated priors. Two different classes of priors in the neighborhood of the elicited prior are considered. The first one is the well-known ϵ-contaminated class, while the second one is the geometric mixing class. The proposed measure of robustness is based on computing the curvature of Rényi divergence between posterior distributions. Examples are used to illustrate the results by using simulated and real data sets. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
8 pages, 259 KiB  
Article
Cumulative Median Estimation for Sufficient Dimension Reduction
by Stephen Babos and Andreas Artemiou
Stats 2021, 4(1), 138-145; https://doi.org/10.3390/stats4010011 - 20 Feb 2021
Viewed by 2049
Abstract
In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in [...] Read more.
In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

20 pages, 466 KiB  
Article
Improving the Efficiency of Robust Estimators for the Generalized Linear Model
by Alfio Marazzi
Stats 2021, 4(1), 88-107; https://doi.org/10.3390/stats4010008 - 4 Feb 2021
Cited by 5 | Viewed by 2630
Abstract
The distance constrained maximum likelihood procedure (DCML) optimally combines a robust estimator with the maximum likelihood estimator with the purpose of improving its small sample efficiency while preserving a good robustness level. It has been published for the linear model and is now [...] Read more.
The distance constrained maximum likelihood procedure (DCML) optimally combines a robust estimator with the maximum likelihood estimator with the purpose of improving its small sample efficiency while preserving a good robustness level. It has been published for the linear model and is now extended to the GLM. Monte Carlo experiments are used to explore the performance of this extension in the Poisson regression case. Several published robust candidates for the DCML are compared; the modified conditional maximum likelihood estimator starting with a very robust minimum density power divergence estimator is selected as the best candidate. It is shown empirically that the DCML remarkably improves its small sample efficiency without loss of robustness. An example using real hospital length of stay data fitted by the negative binomial regression model is discussed. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

13 pages, 312 KiB  
Article
Nonparametric Limits of Agreement for Small to Moderate Sample Sizes: A Simulation Study
by Maria E. Frey, Hans C. Petersen and Oke Gerke
Stats 2020, 3(3), 343-355; https://doi.org/10.3390/stats3030022 - 28 Aug 2020
Cited by 13 | Viewed by 4399
Abstract
The assessment of agreement in method comparison and observer variability analysis of quantitative measurements is usually done by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold, the 2.5% [...] Read more.
The assessment of agreement in method comparison and observer variability analysis of quantitative measurements is usually done by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold, the 2.5% and 97.5% percentiles are obtained by quantile estimation. In the literature, empirical quantiles have been used for this purpose. In this simulation study, we applied both sample, subsampling, and kernel quantile estimators, as well as other methods for quantile estimation to sample sizes between 30 and 150 and different distributions of the paired differences. The performance of 15 estimators in generating prediction intervals was measured by their respective coverage probability for one newly generated observation. Our results indicated that sample quantile estimators based on one or two order statistics outperformed all of the other estimators and they can be used for deriving nonparametric Limits of Agreement. For sample sizes exceeding 80 observations, more advanced quantile estimators, such as the Harrell–Davis and estimators of Sfakianakis–Verginis type, which use all of the observed differences, performed likewise well, but may be considered intuitively more appealing than simple sample quantile estimators that are based on only two observations per quantile. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

Back to TopTop