MDPI - Publisher of Open Access Journals

18 pages, 366 KB

Open AccessArticle

Nonparametric Transformation Models for Double-Censored Data with Crossed Survival Curves: A Bayesian Approach

by Ping Xu, Ruichen Ni, Shouzheng Chen, Zhihua Ma and Chong Zhong

Mathematics 2025, 13(15), 2461; https://doi.org/10.3390/math13152461 - 30 Jul 2025

Viewed by 839

Double-censored data are frequently encountered in pharmacological and epidemiological studies, where the failure time can only be observed within a certain range and is otherwise either left- or right-censored. In this paper, we present a Bayesian approach for analyzing double-censored survival data with [...] Read more.

Double-censored data are frequently encountered in pharmacological and epidemiological studies, where the failure time can only be observed within a certain range and is otherwise either left- or right-censored. In this paper, we present a Bayesian approach for analyzing double-censored survival data with crossed survival curves. We introduce a novel pseudo-quantile I-splines prior to model monotone transformations under both random and fixed censoring schemes. Additionally, we incorporate categorical heteroscedasticity using the dependent Dirichlet process (DDP), enabling the estimation of crossed survival curves. Comprehensive simulations further validate the robustness and accuracy of the method, particularly under the fixed censoring scheme, where traditional approaches may NOT be applicable. In the randomized AIDS clinical trial, by incorporating the categorical heteroscedasticity, we obtain a new finding that the effect of baseline log RNA levels is significant. The proposed framework provides a flexible and reliable tool for survival analysis, offering an alternative to parametric and semiparametric models. Full article

► Show Figures

Figure 1

19 pages, 881 KB

Open AccessArticle

Exploring Flexible Penalization of Bayesian Survival Analysis Using Beta Process Prior for Baseline Hazard

by Kazeem A. Dauda, Ebenezer J. Adeniyi, Rasheed K. Lamidi and Olalekan T. Wahab

Computation 2025, 13(2), 21; https://doi.org/10.3390/computation13020021 - 21 Jan 2025

Cited by 2 | Viewed by 1491

Abstract

High-dimensional data have attracted considerable interest from researchers, especially in the area of variable selection. However, when dealing with time-to-event data in survival analysis, where censoring is a key consideration, progress in addressing this complex problem has remained somewhat limited. Moreover, in microarray [...] Read more.

High-dimensional data have attracted considerable interest from researchers, especially in the area of variable selection. However, when dealing with time-to-event data in survival analysis, where censoring is a key consideration, progress in addressing this complex problem has remained somewhat limited. Moreover, in microarray research, it is common to identify groupings of genes involved in the same biological pathways. These gene groupings frequently collaborate and operate as a unified entity. Therefore, this study is motivated to adopt the idea of a penalized semi-parametric Bayesian Cox (PSBC) model through elastic-net and group lasso penalty functions (PSBC-EN and PSBC-GL) to incorporate the grouping structure of the covariates (genes) and optimally perform variable selection. The proposed methods assign a beta process prior to the cumulative baseline hazard function (PSBC-EN-B and PSBC-GL-B), instead of the gamma process prior used in existing methods (PSBC-EN-G and PSBC-GL-G). Three real-life datasets and simulation scenarios were considered to compare and validate the efficiency of the modified methods with existing techniques, using Bayesian information criteria (BIC). The results of the simulated studies provided empirical evidence that the proposed methods performed better than the existing methods across a wide range of data scenarios. Similarly, the results of the real-life study showed that the proposed methods revealed a substantial improvement over the existing techniques in terms of feature selection and grouping behavior. Full article

► Show Figures

Figure 1

20 pages, 5033 KB

Open AccessFeature PaperArticle

Multi-Output Bayesian Support Vector Regression Considering Dependent Outputs

by Yanlin Wang, Zhijun Cheng and Zichen Wang

Mathematics 2024, 12(18), 2923; https://doi.org/10.3390/math12182923 - 20 Sep 2024

Cited by 2 | Viewed by 2223

Abstract

Multi-output regression aims to utilize the correlation between outputs to achieve information transfer between dependent outputs, thus improving the accuracy of predictive models. Although the Bayesian support vector machine (BSVR) can provide both the mean and the predicted variance distribution of the data [...] Read more.

Multi-output regression aims to utilize the correlation between outputs to achieve information transfer between dependent outputs, thus improving the accuracy of predictive models. Although the Bayesian support vector machine (BSVR) can provide both the mean and the predicted variance distribution of the data to be labeled, which has a large potential application value, its standard form is unable to handle multiple outputs at the same time. To solve this problem, this paper proposes a multi-output Bayesian support vector machine model (MBSVR), which uses a covariance matrix to describe the relationship between outputs and outputs and outputs and inputs simultaneously by introducing a semiparametric latent factor model (SLFM) in BSVR, realizing knowledge transfer between outputs and improving the accuracy of the model. MBSVR integrates and optimizes the parameters in BSVR and those in SLFM through Bayesian derivation to effectively deal with the multi-output problem on the basis of inheriting the advantages of BSVR. The effectiveness of the method is verified using two function cases and four high-dimensional real-world data with multi-output. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

17 pages, 1714 KB

Open AccessArticle

Bayesian Estimation of the Semiparametric Spatial Lag Model

by Kunming Li and Liting Fang

Mathematics 2024, 12(14), 2289; https://doi.org/10.3390/math12142289 - 22 Jul 2024

Viewed by 1485

Abstract

This paper proposes a semiparametric spatial lag model and develops a Bayesian estimation method for this model. In the estimation of the model, the paper combines Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm, random walk Metropolis sampler, and Gibbs sampling techniques to [...] Read more.

This paper proposes a semiparametric spatial lag model and develops a Bayesian estimation method for this model. In the estimation of the model, the paper combines Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm, random walk Metropolis sampler, and Gibbs sampling techniques to sample all the parameters. The paper conducts numerical simulations to validate the proposed Bayesian estimation theory using a numerical example. The simulation results demonstrate satisfactory estimation performance of the parameter part and the fitting performance of the nonparametric function under different spatial weight matrix settings. Furthermore, the paper applies the constructed model and its estimation method to an empirical study on the relationship between economic growth and carbon emissions in China, illustrating the practical application value of the theoretical results. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

17 pages, 468 KB

Open AccessArticle

A Semiparametric Bayesian Approach to Heterogeneous Spatial Autoregressive Models

by Ting Liu, Dengke Xu and Shiqi Ke

Entropy 2024, 26(6), 498; https://doi.org/10.3390/e26060498 - 7 Jun 2024

Cited by 1 | Viewed by 1823

Abstract

Many semiparametric spatial autoregressive (SSAR) models have been used to analyze spatial data in a variety of applications; however, it is a common phenomenon that heteroscedasticity often occurs in spatial data analysis. Therefore, when considering SSAR models in this paper, it is allowed [...] Read more.

Many semiparametric spatial autoregressive (SSAR) models have been used to analyze spatial data in a variety of applications; however, it is a common phenomenon that heteroscedasticity often occurs in spatial data analysis. Therefore, when considering SSAR models in this paper, it is allowed that the variance parameters of the models can depend on the explanatory variable, and these are called heterogeneous semiparametric spatial autoregressive models. In order to estimate the model parameters, a Bayesian estimation method is proposed for heterogeneous SSAR models based on B-spline approximations of the nonparametric function. Then, we develop an efficient Markov chain Monte Carlo sampling algorithm on the basis of the Gibbs sampler and Metropolis–Hastings algorithm that can be used to generate posterior samples from posterior distributions and perform posterior inference. Finally, some simulation studies and real data analysis of Boston housing data have demonstrated the excellent performance of the proposed Bayesian method. Full article

(This article belongs to the Special Issue Developments and Applications of Markov Chain Monte Carlo in Bayesian Inference)

► Show Figures

Figure 1

23 pages, 5242 KB

Open AccessArticle

Multivariate Bayesian Semiparametric Regression Model for Forecasting and Mapping HIV and TB Risks in West Java, Indonesia

by I. Gede Nyoman Mindra Jaya, Budhi Handoko, Yudhie Andriyana, Anna Chadidjah, Farah Kristiani and Mila Antikasari

Mathematics 2023, 11(17), 3641; https://doi.org/10.3390/math11173641 - 23 Aug 2023

Cited by 3 | Viewed by 2450

Abstract

Multivariate “Bayesian” regression via a shared component model has gained popularity in recent years, particularly in modeling and mapping the risks associated with multiple diseases. This method integrates joint outcomes, fixed effects of covariates, and random effects involving spatial and temporal components and [...] Read more.

Multivariate “Bayesian” regression via a shared component model has gained popularity in recent years, particularly in modeling and mapping the risks associated with multiple diseases. This method integrates joint outcomes, fixed effects of covariates, and random effects involving spatial and temporal components and their interactions. A shared spatial–temporal component considers correlations between the joint outcomes. Notably, due to spatial–temporal variations, certain covariates may exhibit nonlinear effects, necessitating the use of semiparametric regression models. Sometimes, choropleth maps based on regional data that is aggregated by administrative regions do not adequately depict infectious disease transmission. To counteract this, we combine the area-to-point geostatistical model with inverse distance weighted (IDW) interpolation for high-resolution mapping based on areal data. Additionally, to develop an effective and efficient early warning system for controlling disease transmission, it is crucial to forecast disease risk for a future time. Our study focuses on developing a novel multivariate Bayesian semiparametric regression model for forecasting and mapping HIV and TB risk in West Java, Indonesia, at fine-scale resolution. This novel approach combines multivariate Bayesian semiparametric regression with geostatistical interpolation, utilizing population density and the Human Development Index (HDI) as risk factors. According to an examination of annual data from 2017 to 2021, HIV and TB consistently exhibit recognizable spatial patterns, validating the suitability of multivariate modeling. The multivariate Bayesian semiparametric model indicates significant linear effects of higher population density on elevating HIV and TB risks, whereas the impact of the HDI varies over time and space. Mapping of HIV and TB risks in 2022 using isopleth maps shows a clear HIV and TB transmission pattern in West Java, Indonesia. Full article

(This article belongs to the Section E3: Mathematical Biology)

► Show Figures

Figure 1

37 pages, 7148 KB

Open AccessArticle

Longitudinal Data Analysis Based on Bayesian Semiparametric Method

by Guimei Jiao, Jiajuan Liang, Fanjuan Wang, Xiaoli Chen, Shaokang Chen, Hao Li, Jing Jin, Jiali Cai and Fangjie Zhang

Axioms 2023, 12(5), 431; https://doi.org/10.3390/axioms12050431 - 27 Apr 2023

Cited by 4 | Viewed by 2595

Abstract

A Bayesian semiparametric model framework is proposed to analyze multivariate longitudinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The [...] Read more.

A Bayesian semiparametric model framework is proposed to analyze multivariate longitudinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The proposed model framework associated with the MCMC algorithm is validated by four covariance structures and a real-life dataset. A simple Monte Carlo study of the model under four covariance structures and an analysis of the real dataset show that the new model framework and its associated Bayesian posterior inferential method through the MCMC algorithm perform fairly well in the sense of easy implementation, fast convergence, and smaller root mean square errors compared with the same model without the specified autoregression structure. Full article

(This article belongs to the Special Issue Computational Statistics & Data Analysis)

► Show Figures

Figure 1

21 pages, 738 KB

Open AccessArticle

A Semiparametric Bayesian Joint Modelling of Skewed Longitudinal and Competing Risks Failure Time Data: With Application to Chronic Kidney Disease

by Melkamu Molla Ferede, Samuel Mwalili, Getachew Dagne, Simon Karanja, Workagegnehu Hailu, Mahmoud El-Morshedy and Afrah Al-Bossly

Mathematics 2022, 10(24), 4816; https://doi.org/10.3390/math10244816 - 18 Dec 2022

Cited by 6 | Viewed by 2946

Abstract

In clinical and epidemiological studies, when the time-to-event(s) and the longitudinal outcomes are associated, modelling them separately may give biased estimates. A joint modelling approach is required to obtain unbiased results and to evaluate their association. In the joint model, a subject may [...] Read more.

In clinical and epidemiological studies, when the time-to-event(s) and the longitudinal outcomes are associated, modelling them separately may give biased estimates. A joint modelling approach is required to obtain unbiased results and to evaluate their association. In the joint model, a subject may be exposed to more than one type of failure event (competing risks). Considering the competing event as an independent censoring of the time-to-event process may underestimate the true survival probability and give biased results. Within the joint model, longitudinal outcomes may have nonlinear (irregular) trajectories over time and exhibit skewness with heavy tails. Accordingly, fully parametric mixed-effect models may not be flexible enough to model this type of complex longitudinal data. In addition, assuming a Gaussian distribution for model errors may be too restrictive to adequately represent within-individual variations and may lack robustness against deviation from distributional assumptions. To simultaneously overcome these issues, in this paper, we presented semiparametric joint models for competing risks failure time and skewed-longitudinal data by using a smoothing spline approach and a multivariate skew-t distribution. We also considered different parameterization approaches in the formulation of joint models and used a Bayesian approach to make the statistical inference. We illustrated the proposed methods by analyzing real data on a chronic kidney disease. To evaluate the performance of the methods, we also carried out simulation studies. The results of both the application and simulation studies revealed that the joint modelling approach proposed in this study performed well when the semiparametric, random-effects parameterization, and skew-t distribution specifications were taken into account. Full article

(This article belongs to the Special Issue Current Developments in Theoretical and Applied Statistics)

► Show Figures

Figure 1

15 pages, 1377 KB

Open AccessArticle

A Bayesian Sample Size Estimation Procedure Based on a B-Splines Semiparametric Elicitation Method

by Danila Azzolina, Paola Berchialla, Silvia Bressan, Liviana Da Dalt, Dario Gregori and Ileana Baldi

Int. J. Environ. Res. Public Health 2022, 19(21), 14245; https://doi.org/10.3390/ijerph192114245 - 31 Oct 2022

Cited by 4 | Viewed by 2505

Abstract

Sample size estimation is a fundamental element of a clinical trial, and a binomial experiment is the most common situation faced in clinical trial design. A Bayesian method to determine sample size is an alternative solution to a frequentist design, especially for studies [...] Read more.

Sample size estimation is a fundamental element of a clinical trial, and a binomial experiment is the most common situation faced in clinical trial design. A Bayesian method to determine sample size is an alternative solution to a frequentist design, especially for studies conducted on small sample sizes. The Bayesian approach uses the available knowledge, which is translated into a prior distribution, instead of a point estimate, to perform the final inference. This procedure takes the uncertainty in data prediction entirely into account. When objective data, historical information, and literature data are not available, it may be indispensable to use expert opinion to derive the prior distribution by performing an elicitation process. Expert elicitation is the process of translating expert opinion into a prior probability distribution. We investigated the estimation of a binomial sample size providing a generalized version of the average length, coverage criteria, and worst outcome criterion. The original method was proposed by Joseph and is defined in a parametric framework based on a Beta-Binomial model. We propose a more flexible approach for binary data sample size estimation in this theoretical setting by considering parametric approaches (Beta priors) and semiparametric priors based on B-splines. Full article

► Show Figures

Figure 1

12 pages, 531 KB

Open AccessArticle

A New Semiparametric Regression Framework for Analyzing Non-Linear Data

by Wesley Bertoli, Ricardo P. Oliveira and Jorge A. Achcar

Analytics 2022, 1(1), 15-26; https://doi.org/10.3390/analytics1010002 - 16 Jun 2022

Cited by 3 | Viewed by 2962

Abstract

This work introduces a straightforward framework for semiparametric non-linear models as an alternative to existing non-linear parametric models, whose interpretation primarily depends on biological or physical aspects that are not always available in every practical situation. The proposed methodology does not require intensive [...] Read more.

This work introduces a straightforward framework for semiparametric non-linear models as an alternative to existing non-linear parametric models, whose interpretation primarily depends on biological or physical aspects that are not always available in every practical situation. The proposed methodology does not require intensive numerical methods to obtain estimates in non-linear contexts, which is attractive as such algorithms’ convergence strongly depends on assigning good initial values. Moreover, the proposed structure can be compared with standard polynomial approximations often used for explaining non-linear data behaviors. Approximate posterior inferences for the semiparametric model parameters were obtained from a fully Bayesian approach based on the Metropolis-within-Gibbs algorithm. The proposed structures were considered to analyze artificial and real datasets. Our results indicated that the semiparametric models outperform linear polynomial regression approximations to predict the behavior of response variables in non-linear settings. Full article

► Show Figures

Figure 1

14 pages, 1248 KB

Open AccessArticle

Forecasting Frequent Alcohol Use among Adolescents in HBSC Countries: A Bayesian Framework for Making Predictions

by Lorena Charrier, Michela Bersia, Alessio Vieno, Rosanna Irene Comoretto, Mindaugas Štelemėkas, Paola Nardone, Tibor Baška, Paola Dalmasso and Paola Berchialla

Int. J. Environ. Res. Public Health 2022, 19(5), 2737; https://doi.org/10.3390/ijerph19052737 - 26 Feb 2022

Cited by 2 | Viewed by 3484

Abstract

(1) Aim: To summarize alcohol trends in the last 30 years (1985/6–2017/8) among 15-year-olds in Health Behaviour in School-aged Children (HBSC) countries (overall sample size: 413,399 adolescents; 51.55% girls) and to forecast the potential evolution in the upcoming 2021/22 HBSC survey. (2) Methods: [...] Read more.

(1) Aim: To summarize alcohol trends in the last 30 years (1985/6–2017/8) among 15-year-olds in Health Behaviour in School-aged Children (HBSC) countries (overall sample size: 413,399 adolescents; 51.55% girls) and to forecast the potential evolution in the upcoming 2021/22 HBSC survey. (2) Methods: Using 1986–2018 prevalence data on weekly alcohol consumption among 15-year-olds related to 40 HBSC countries/regions, a Bayesian semi-parametric hierarchical model was adopted to estimate trends making a clusterization of the countries, and to give estimates for the 2022 HBSC survey. (3) Results: An overall declining trend in alcohol consumption was observed over time in almost all the countries. However, compared to 2014, some countries showed a new increase in 2018 and 2021/22 estimates forecast a slight increase in the majority of countries, pointing out a potential bounce after a decreasing period in frequent drinking habits. (4) Conclusions: The clusterization suggested a homogenization of consumption habits among HBSC countries. The comparison between 2022 observed and expected data could be helpful to investigate the effect of risk behaviour determinants, including the pandemic impact, occurring between the last two waves of the survey. Full article

(This article belongs to the Section Children's Health)

► Show Figures

Figure 1

9 pages, 461 KB

Open AccessArticle

Adjustment for Baseline Covariates to Increase Efficiency in RCTs with Binary Endpoint: A Comparison of Bayesian and Frequentist Approaches

by Paola Berchialla, Veronica Sciannameo, Sara Urru, Corrado Lanera, Danila Azzolina, Dario Gregori and Ileana Baldi

Int. J. Environ. Res. Public Health 2021, 18(15), 7758; https://doi.org/10.3390/ijerph18157758 - 22 Jul 2021

Cited by 1 | Viewed by 3655

Abstract

Background: In a randomized controlled trial (RCT) with binary outcome the estimate of the marginal treatment effect can be biased by prognostic baseline covariates adjustment. Methods that target the marginal odds ratio, allowing for improved precision and power, have been developed. Methods: The [...] Read more.

Background: In a randomized controlled trial (RCT) with binary outcome the estimate of the marginal treatment effect can be biased by prognostic baseline covariates adjustment. Methods that target the marginal odds ratio, allowing for improved precision and power, have been developed. Methods: The performance of different estimators for the treatment effect in the frequentist (targeted maximum likelihood estimator, inverse-probability-of-treatment weighting, parametric G-computation, and the semiparametric locally efficient estimator) and Bayesian (model averaging), adjustment for confounding, and generalized Bayesian causal effect estimation frameworks are assessed and compared in a simulation study under different scenarios. The use of these estimators is illustrated on an RCT in type II diabetes. Results: Model mis-specification does not increase the bias. The approaches that are not doubly robust have increased standard error (SE) under the scenario of mis-specification of the treatment model. The Bayesian estimators showed a higher type II error than frequentist estimators if noisy covariates are included in the treatment model. Conclusions: Adjusting for prognostic baseline covariates in the analysis of RCTs can have more power than intention-to-treat based tests. However, for some classes of model, when the regression model is mis-specified, inflated type I error and potential bias on treatment effect estimate may arise. Full article

(This article belongs to the Special Issue Bayesian Design in Clinical Trials)

► Show Figures

Figure 1

22 pages, 509 KB

Open AccessArticle

Bayesian Bandwidths in Semiparametric Modelling for Nonnegative Orthant Data with Diagnostics

by Célestin C. Kokonendji and Sobom M. Somé

Stats 2021, 4(1), 162-183; https://doi.org/10.3390/stats4010013 - 4 Mar 2021

Cited by 14 | Viewed by 3899

Abstract

Multivariate nonnegative orthant data are real vectors bounded to the left by the null vector, and they can be continuous, discrete or mixed. We first review the recent relative variability indexes for multivariate nonnegative continuous and count distributions. As a prelude, the classification [...] Read more.

Multivariate nonnegative orthant data are real vectors bounded to the left by the null vector, and they can be continuous, discrete or mixed. We first review the recent relative variability indexes for multivariate nonnegative continuous and count distributions. As a prelude, the classification of two comparable distributions having the same mean vector is done through under-, equi- and over-variability with respect to the reference distribution. Multivariate associated kernel estimators are then reviewed with new proposals that can accommodate any nonnegative orthant dataset. We focus on bandwidth matrix selections by adaptive and local Bayesian methods for semicontinuous and counting supports, respectively. We finally introduce a flexible semiparametric approach for estimating all these distributions on nonnegative supports. The corresponding estimator is directed by a given parametric part, and a nonparametric part which is a weight function to be estimated through multivariate associated kernels. A diagnostic model is also discussed to make an appropriate choice between the parametric, semiparametric and nonparametric approaches. The retention of pure nonparametric means the inconvenience of parametric part used in the modelization. Multivariate real data examples in semicontinuous setup as reliability are gradually considered to illustrate the proposed approach. Concluding remarks are made for extension to other multiple functions. Full article

(This article belongs to the Special Issue Directions in Statistical Modelling)

► Show Figures

Figure 1

14 pages, 20591 KB

Open AccessArticle

Facing Missing Observations in Data—A New Approach for Estimating Strength of Earthquakes on the Pacific Coast of Southern Mexico Using Random Censoring

by Alejandro Ivan Aguirre-Salado, Humberto Vaquera-Huerta, Carlos Arturo Aguirre-Salado, José del Carmen Jiménez-Hernández, Franco Barragán and María Guzmán-Martínez

Appl. Sci. 2019, 9(14), 2863; https://doi.org/10.3390/app9142863 - 18 Jul 2019

Cited by 3 | Viewed by 2933

Abstract

We introduced a novel spatial model based on the distribution of generalized extreme values (GEV) to analyze the maximum intensity levels of earthquakes with incomplete data (randomly censored) on the Pacific coast of southern Mexico using a random censorship approach. Spatiotemporal trends were [...] Read more.

We introduced a novel spatial model based on the distribution of generalized extreme values (GEV) to analyze the maximum intensity levels of earthquakes with incomplete data (randomly censored) on the Pacific coast of southern Mexico using a random censorship approach. Spatiotemporal trends were modeled through a non-stationary GEV model. We used a multivariate smoothing function as a linear predictor of GEV parameters to approximate nonlinear trends. The model was fitted using a flexible semi-parametric Bayesian approach and the parameters are estimated via Markov chain Monte-Carlo (MCMC). Through a rigorous simulation study, we showed the robustness of both the model and the estimation method used. Maps of the location parameter on the spatial plane for different periods of time show the existence of local variations in the extreme values of seismicity in the study area. The results indicate strong evidence of an increase in the magnitude of earthquakes over time. A spatial map of risk with maximum intensity of earthquakes in a period of 25 years was elaborated. Full article

(This article belongs to the Special Issue Mapping and Monitoring of Geohazards)

► Show Figures

Figure 1

20 pages, 1084 KB

Open AccessArticle

Asymptotic Properties for Methods Combining the Minimum Hellinger Distance Estimate and the Bayesian Nonparametric Density Estimate

by Yuefeng Wu and Giles Hooker

Entropy 2018, 20(12), 955; https://doi.org/10.3390/e20120955 - 11 Dec 2018

Cited by 1 | Viewed by 3870

Abstract

In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficient when the parametric family contains the data-generating distribution. This paper seeks to extend these results to [...] Read more.

In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficient when the parametric family contains the data-generating distribution. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces the kernel density estimator with the expected posterior density using a random histogram prior; the other transforms the posterior over densities into a posterior over parameters through minimizing the Hellinger distance for each density. We show that it is possible to adapt the mathematical machinery of efficient influence functions from semiparametric models to demonstrate that both our estimators are efficient in the sense of achieving the Cramér-Rao lower bound. We further demonstrate a Bernstein-von-Mises result for our second estimator, indicating that its posterior is asymptotically Gaussian. In addition, the robustness properties of classical minimum Hellinger distance estimators continue to hold. Full article

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

► Show Figures

Figure 1

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI