Special Issue "Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 June 2021) | Viewed by 11991

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor

Prof. Jan Mielniczuk
E-Mail Website
Guest Editor
Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
Interests: Model selection; time series analysis;nonparametric methods of mathematical statistics; in particular curve estimation (both asymptotic and small sample performance of estimates); information theory; machine learning; computer intensive methods; dependence analysis, statistical consulting

Special Issue Information

Dear Colleagues,

In recent years, there has been an increased interest in statistical analysis of structured data and high-dimensional problems. This has created a challenge for classical statistical inference which frequently does not cover such cases. A huge number of studies have been devoted to proposing new solutions or modify existing ones in order to account for specificity of such data. However, frequently, these methods work well for specific parametric models and fail when misspecification occurs. Thus, there is a growing need to develop nonparametric and robust procedures in this context which will meet contemporary needs in, among others, dependence analysis, supervised and unsupervised classification and regression, feature selection, and prediction analysis. In particular, nonparametric approaches based on an information-theoretic approach provide interesting and yet not sufficiently explored methodologies for this challenge.

We encourage submissions in, but not limited to, the following areas:

  • Nonparametric feature-selection methods for high-dimensional regression problems, in particular those based on information-theoretic approach;
  • Nonparametric dependence analysis, in particular Markov blanket discovery;
  • Nonparametric interaction detection;
  • Learning to rank methods;
  • Nonparametric density and regression estimation and related problems;
  • Effect of misspecification on behavior of parametric procedures and their robustness;
  • Nonparametric estimation of information-theoretic and dependence indices;
  • Principal component analysis and related methods in high-dimensions.

Prof. Jan Mielniczuk
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

Editorial
Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods
Entropy 2022, 24(4), 553; https://doi.org/10.3390/e24040553 - 15 Apr 2022
Viewed by 405
Abstract
The presented volume addresses some vital problems in contemporary statistical reasoning [...] Full article

Research

Jump to: Editorial

Article
Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator
Entropy 2021, 23(12), 1586; https://doi.org/10.3390/e23121586 - 27 Nov 2021
Cited by 1 | Viewed by 492
Abstract
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a [...] Read more.
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan–Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data’s structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison. Full article
Show Figures

Figure 1

Article
Robust Multiple Regression
Entropy 2021, 23(1), 88; https://doi.org/10.3390/e23010088 - 09 Jan 2021
Cited by 1 | Viewed by 1099
Abstract
As modern data analysis pushes the boundaries of classical statistics, it is timely to reexamine alternate approaches to dealing with outliers in multiple regression. As sample sizes and the number of predictors increase, interactive methodology becomes less effective. Likewise, with limited understanding of [...] Read more.
As modern data analysis pushes the boundaries of classical statistics, it is timely to reexamine alternate approaches to dealing with outliers in multiple regression. As sample sizes and the number of predictors increase, interactive methodology becomes less effective. Likewise, with limited understanding of the underlying contamination process, diagnostics are likely to fail as well. In this article, we advocate for a non-likelihood procedure that attempts to quantify the fraction of bad data as a part of the estimation step. These ideas also allow for the selection of important predictors under some assumptions. As there are many robust algorithms available, running several and looking for interesting differences is a sensible strategy for understanding the nature of the outliers. Full article
Show Figures

Figure 1

Article
Analysis of Information-Based Nonparametric Variable Selection Criteria
Entropy 2020, 22(9), 974; https://doi.org/10.3390/e22090974 - 31 Aug 2020
Cited by 2 | Viewed by 1082
Abstract
We consider a nonparametric Generative Tree Model and discuss a problem of selecting active predictors for the response in such scenario. We investigated two popular information-based selection criteria: Conditional Infomax Feature Extraction (CIFE) and Joint Mutual information (JMI), which are both derived as [...] Read more.
We consider a nonparametric Generative Tree Model and discuss a problem of selecting active predictors for the response in such scenario. We investigated two popular information-based selection criteria: Conditional Infomax Feature Extraction (CIFE) and Joint Mutual information (JMI), which are both derived as approximations of Conditional Mutual Information (CMI) criterion. We show that both criteria CIFE and JMI may exhibit different behavior from CMI, resulting in different orders in which predictors are chosen in variable selection process. Explicit formulae for CMI and its two approximations in the generative tree model are obtained. As a byproduct, we establish expressions for an entropy of a multivariate gaussian mixture and its mutual information with mixing distribution. Full article
Show Figures

Figure 1

Article
Multivariate Tail Coefficients: Properties and Estimation
Entropy 2020, 22(7), 728; https://doi.org/10.3390/e22070728 - 30 Jun 2020
Cited by 5 | Viewed by 1234
Abstract
Multivariate tail coefficients are an important tool when investigating dependencies between extreme events for different components of a random vector. Although bivariate tail coefficients are well-studied, this is, to a lesser extent, the case for multivariate tail coefficients. This paper contributes to this [...] Read more.
Multivariate tail coefficients are an important tool when investigating dependencies between extreme events for different components of a random vector. Although bivariate tail coefficients are well-studied, this is, to a lesser extent, the case for multivariate tail coefficients. This paper contributes to this research area by (i) providing a thorough study of properties of existing multivariate tail coefficients in the light of a set of desirable properties; (ii) proposing some new multivariate tail measurements; (iii) dealing with estimation of the discussed coefficients and establishing asymptotic consistency; and, (iv) studying the behavior of tail measurements with increasing dimension of the random vector. A set of illustrative examples is given, and practical use of the tail measurements is demonstrated in a data analysis with a focus on dependencies between stocks that are part of the EURO STOXX 50 market index. Full article
Show Figures

Figure 1

Article
Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification
Entropy 2020, 22(5), 543; https://doi.org/10.3390/e22050543 - 13 May 2020
Cited by 4 | Viewed by 2102
Abstract
In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic [...] Read more.
In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results. Full article
Article
Towards a Unified Theory of Learning and Information
Entropy 2020, 22(4), 438; https://doi.org/10.3390/e22040438 - 13 Apr 2020
Cited by 2 | Viewed by 2104
Abstract
In this paper, we introduce the notion of “learning capacity” for algorithms that learn from data, which is analogous to the Shannon channel capacity for communication systems. We show how “learning capacity” bridges the gap between statistical learning theory and information theory, and [...] Read more.
In this paper, we introduce the notion of “learning capacity” for algorithms that learn from data, which is analogous to the Shannon channel capacity for communication systems. We show how “learning capacity” bridges the gap between statistical learning theory and information theory, and we will use it to derive generalization bounds for finite hypothesis spaces, differential privacy, and countable domains, among others. Moreover, we prove that under the Axiom of Choice, the existence of an empirical risk minimization (ERM) rule that has a vanishing learning capacity is equivalent to the assertion that the hypothesis space has a finite Vapnik–Chervonenkis (VC) dimension, thus establishing an equivalence relation between two of the most fundamental concepts in statistical learning theory and information theory. In addition, we show how the learning capacity of an algorithm provides important qualitative results, such as on the relation between generalization and algorithmic stability, information leakage, and data processing. Finally, we conclude by listing some open problems and suggesting future directions of research. Full article
Show Figures

Graphical abstract

Article
Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
Entropy 2020, 22(2), 153; https://doi.org/10.3390/e22020153 - 28 Jan 2020
Cited by 3 | Viewed by 1115
Abstract
We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When [...] Read more.
We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification. Full article
Show Figures

Figure 1

Article
Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series
Entropy 2020, 22(1), 55; https://doi.org/10.3390/e22010055 - 31 Dec 2019
Cited by 2 | Viewed by 1657
Abstract
This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach [...] Read more.
This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach is proposed: multiple change point locations are first identified on the basis of comparing the difference between the localized averages on sample covariance matrices, and then graph supports are recovered on the basis of a kernelized time-varying constrained L 1 -minimization for inverse matrix estimation (CLIME) estimator on each segment. We derive the rates of convergence for estimating the change points and precision matrices under mild moment and dependence conditions. In particular, we show that this two-step approach is consistent in estimating the change points and the piecewise smooth precision matrix function, under a certain high-dimensional scaling limit. The method is applied to the analysis of network structure of the S&P 500 index between 2003 and 2008. Full article
Show Figures

Figure 1

Back to TopTop