MDPI - Publisher of Open Access Journals

20 pages, 386 KB

Open AccessArticle

A High Dimensional Omnibus Regression Test

by Ahlam M. Abid, Paul A. Quaye and David J. Olive

Stats 2025, 8(4), 107; https://doi.org/10.3390/stats8040107 - 5 Nov 2025

Cited by 2 | Viewed by 699

Consider regression models where the response variable Y only depends on the

p \times 1

vector of predictors

x = {(x_{1}, \dots, x_{p})}^{T}

through the sufficient predictor

S P = α + x^{T} β

. [...] Read more.

Consider regression models where the response variable Y only depends on the

p \times 1

vector of predictors

x = {(x_{1}, \dots, x_{p})}^{T}

through the sufficient predictor

S P = α + x^{T} β

. Let the covariance vector

Cov (x, Y) = Σ_{x Y}

. Assume the cases

{(x_{i}^{T}, Y_{i})}^{T}

are independent and identically distributed random vectors for

i = 1, \dots, n

. Then for many such regression models,

β = 0

if and only if

Σ_{x Y} = 0

where 0 is the

p \times 1

vector of zeroes. The test of

H_{0} : Σ_{x Y} = 0

versus

H_{1} : Σ_{x Y} \neq 0

is equivalent to the high dimensional one sample test

H_{0} : μ = 0

versus

H_{A} : μ \neq 0

applied to

w_{1}, \dots, w_{n}

where

w_{i} = (x_{i} - μ_{x}) (Y_{i} - μ_{Y})

and the expected values

E (x) = μ_{x}

and

E (Y) = μ_{Y}

. Since

μ_{x}

and

μ_{Y}

are unknown, the test of

H_{0} : β = 0

versus

H_{1} : β \neq 0

is implemented by applying the one sample test to

v_{i} = (x_{i} - \bar{x}) (Y_{i} - \bar{Y})

for

i = 1, \dots, n

. This test has milder regularity conditions than its few competitors. For the multiple linear regression one component partial least squares and marginal maximum likelihood estimators, the test can be adapted to test

H_{0} : {(β_{i_{1}}, \dots, β_{i_{k}})}^{T} = 0

versus

H_{1} : {(β_{i_{1}}, \dots, β_{i_{k}})}^{T} \neq 0

where

1 \leq k \leq p .

Full article

(This article belongs to the Section Regression Models)

20 pages, 778 KB

Open AccessArticle

Determinants of Blank and Null Votes in the Brazilian Presidential Elections

by Renata Rojas Guerra, Kerolene De Souza Moraes, Fernando De Jesus Moreira Junior, Fernando A. Peña-Ramírez and Ryan Novaes Pereira

Stats 2025, 8(2), 38; https://doi.org/10.3390/stats8020038 - 13 May 2025

Viewed by 2630

Abstract

This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and [...] Read more.

This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and Shape (GAMLSS) framework. Specifically, five different unit regression models are explored, beta, simplex, Kumaraswamy, unit Weibull, and reflected unit Burr XII regressions, each incorporating submodels for both indexed distribution parameters. The beta regression model emerges as the best fit through rigorous model selection and diagnostic procedures. The findings reveal that the disaggregated municipal human development index (MHDI), particularly its income, longevity, and education dimensions, along with the municipality’s geographic region, significantly affect voting behavior. Notably, higher income and longevity values are linked to greater proportions of blank and null votes, whereas the educational level exhibits a negative relationship with the variable of interest. Additionally, municipalities in the Southeast region tend to have higher average proportions of blank and null votes. In terms of variability, the ability of a municipality’s population to acquire goods and services is shown to negatively influence the dispersion of vote proportions, while municipalities in the Northeast, North, and Southeast regions exhibit distinct patterns of variation compared to other regions. These results provide valuable insights into electoral participation’s socioeconomic and regional determinants, contributing to broader discussions on political engagement and democratic representation in Brazil. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

30 pages, 6909 KB

Open AccessArticle

The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing

by Marco Riani, Anthony C. Atkinson, Gianluca Morelli and Aldo Corbellini

Stats 2025, 8(1), 6; https://doi.org/10.3390/stats8010006 - 8 Jan 2025

Cited by 1 | Viewed by 2706

Abstract

Routine least squares regression analyses may sometimes miss important aspects of data. To exemplify this point we analyse a set of 1171 observations from a questionnaire intended to illuminate the relationship between customer loyalty and perceptions of such factors as price and community [...] Read more.

Routine least squares regression analyses may sometimes miss important aspects of data. To exemplify this point we analyse a set of 1171 observations from a questionnaire intended to illuminate the relationship between customer loyalty and perceptions of such factors as price and community outreach. Our analysis makes much use of graphics and data monitoring to provide a paradigmatic example of the use of modern robust statistical tools based on graphical interaction with data. We start with regression. We perform such an analysis and find significant regression on all factors. However, a variety of plots show that there are some unexplained features, which are not eliminated by response transformation. Accordingly, we turn to robust analyses, intended to give answers unaffected by the presence of data contamination. A robust analysis using a non-parametric model leads to the increased significance of transformations of the explanatory variables. These transformations provide improved insight into consumer behaviour. We provide suggestions for a structured approach to modern robust regression and give links to the software used for our data analyses. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

15 pages, 1311 KB

Open AccessArticle

Cross-Country Assessment of Socio-Ecological Drivers of COVID-19 Dynamics in Africa: A Spatial Modelling Approach

by Kolawole Valère Salako, Akoeugnigan Idelphonse Sode, Aliou Dicko, Eustache Ayédèguè Alaye, Martin Wolkewitz and Romain Glèlè Kakaï

Stats 2024, 7(4), 1084-1098; https://doi.org/10.3390/stats7040064 - 11 Oct 2024

Viewed by 1918

Abstract

Understanding how countries’ socio-economic, environmental, health status, and climate factors have influenced the dynamics of COVID-19 is essential for public health, particularly in Africa. This study explored the relationships between African countries’ COVID-19 cases and deaths and their socio-economic, environmental, health, clinical, and [...] Read more.

Understanding how countries’ socio-economic, environmental, health status, and climate factors have influenced the dynamics of COVID-19 is essential for public health, particularly in Africa. This study explored the relationships between African countries’ COVID-19 cases and deaths and their socio-economic, environmental, health, clinical, and climate variables. It compared the performance of Ordinary Least Square (OLS) regression, the spatial lag model (SLM), the spatial error model (SEM), and the conditional autoregressive model (CAR) using statistics such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Root Mean Square Error (RMSE), and coefficient of determination (

R^{2}

). Results showed that the SEM with the 10-nearest neighbours matrix weights performed better for the number of cases, while the SEM with the maximum distance matrix weights performed better for the number of deaths. For the cases, the number of tests followed by the adjusted savings, Gross Domestic Product (GDP) per capita, dependence ratio, and annual temperature were the strongest covariates. For deaths, the number of tests followed by malaria prevalence, prevalence of communicable diseases, adjusted savings, GDP, dependence ratio, Human Immunodeficiency Virus (HIV) prevalence, and moisture index of the moistest quarter play a critical role in explaining disparities across countries. This study illustrates the importance of accounting for spatial autocorrelation in modelling the dynamics of the disease while highlighting the role of countries’ specific factors in driving its dynamics. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

19 pages, 2991 KB

Open AccessCase Report

Integrating Proteomic Analysis and Machine Learning to Predict Prostate Cancer Aggressiveness

by Sheila M. Valle Cortés, Jaileene Pérez Morales, Mariely Nieves Plaza, Darielys Maldonado, Swizel M. Tevenal Baez, Marc A. Negrón Blas, Cayetana Lazcano Etchebarne, José Feliciano, Gilberto Ruiz Deyá, Juan C. Santa Rosario and Pedro Santiago Cardona

Stats 2024, 7(3), 875-893; https://doi.org/10.3390/stats7030053 - 21 Aug 2024

Viewed by 1725

Abstract

Prostate cancer (PCa) poses a significant challenge because of the difficulty in identifying aggressive tumors, leading to overtreatment and missed personalized therapies. Although only 8% of cases progress beyond the prostate, the accurate prediction of aggressiveness remains crucial. Thus, this study focused on [...] Read more.

Prostate cancer (PCa) poses a significant challenge because of the difficulty in identifying aggressive tumors, leading to overtreatment and missed personalized therapies. Although only 8% of cases progress beyond the prostate, the accurate prediction of aggressiveness remains crucial. Thus, this study focused on studying retinoblastoma phosphorylated at Serine 249 (Phospho-Rb S249), N-cadherin, β-catenin, and E-cadherin as biomarkers for identifying aggressive PCa using a logistic regression model and a classification and regression tree (CART). Using immunohistochemistry (IHC), we targeted the expression of these biomarkers in PCa tissues and correlated their expression with clinicopathological data of the tumor. The results showed a negative correlation between E-cadherin and β-catenin with aggressive tumor behavior, whereas Phospho-Rb S249 and N-cadherin positively correlated with increased tumor aggressiveness. Furthermore, patients were stratified based on Gleason scores and E-cadherin staining patterns to evaluate their capability for early identification of aggressive PCa. Our findings suggest that the classification tree is the most effective method for measuring the utility of these biomarkers in clinical practice, incorporating β-catenin, tumor grade, and Gleason grade as relevant determinants for identifying patients with Gleason scores ≥ 4 + 3. This study could potentially benefit patients with aggressive PCa by enabling early disease detection and closer monitoring. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

17 pages, 458 KB

Open AccessArticle

A New Extended Weibull Distribution with Application to Influenza and Hepatitis Data

by Gauss M. Cordeiro, Elisângela C. Biazatti and Luís H. de Santana

Stats 2023, 6(2), 657-673; https://doi.org/10.3390/stats6020042 - 19 May 2023

Cited by 6 | Viewed by 3192

Abstract

The Weibull is a popular distribution that models monotonous failure rate data. In this work, we introduce the four-parameter Weibull extended Weibull distribution that presents greater flexibility, thus modeling data with bathtub-shaped and unimodal failure rate. Some of its mathematical properties such as [...] Read more.

The Weibull is a popular distribution that models monotonous failure rate data. In this work, we introduce the four-parameter Weibull extended Weibull distribution that presents greater flexibility, thus modeling data with bathtub-shaped and unimodal failure rate. Some of its mathematical properties such as quantile function, linear representation and moments are provided. The maximum likelihood estimation is adopted to estimate its parameters, and the log-Weibull extended Weibull regression model is presented. In addition, some simulations are carried out to show the consistency of the estimators. We prove the greater flexibility and performance of this distribution and the regression model through applications to influenza and hepatitis data. The new models perform much better than some of their competitors. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

18 pages, 571 KB

Open AccessArticle

A Phylogenetic Regression Model for Studying Trait Evolution on Network

by Dwueng-Chwuan Jhwueng

Stats 2023, 6(1), 450-467; https://doi.org/10.3390/stats6010028 - 18 Mar 2023

Viewed by 2818

Abstract

A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as [...] Read more.

A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as the input to build up the variance–covariance matrix. The model is applied to study the common sunflower, Helianthus annuus, by investigating its traits used to respond to drought conditions. Results show that our model provides acceptable estimates of the parameters, where most of the traits analyzed were found to have a significant correlation with drought tolerance. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

15 pages, 1014 KB

Open AccessArticle

A Weibull-Beta Prime Distribution to Model COVID-19 Data with the Presence of Covariates and Censored Data

by Elisângela C. Biazatti, Gauss M. Cordeiro, Gabriela M. Rodrigues, Edwin M. M. Ortega and Luís H. de Santana

Stats 2022, 5(4), 1159-1173; https://doi.org/10.3390/stats5040069 - 17 Nov 2022

Cited by 6 | Viewed by 2754

Abstract

Motivated by the recent popularization of the beta prime distribution, a more flexible generalization is presented to fit symmetrical or asymmetrical and bimodal data, and a non-monotonic failure rate. Thus, the Weibull-beta prime distribution is defined, and some of its structural properties are [...] Read more.

Motivated by the recent popularization of the beta prime distribution, a more flexible generalization is presented to fit symmetrical or asymmetrical and bimodal data, and a non-monotonic failure rate. Thus, the Weibull-beta prime distribution is defined, and some of its structural properties are obtained. The parameters are estimated by maximum likelihood, and a new regression model is proposed. Some simulations reveal that the estimators are consistent, and applications to censored COVID-19 data show the adequacy of the models. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

43 pages, 844 KB

Open AccessArticle

The Stacy-G Class: A New Family of Distributions with Regression Modeling and Applications to Survival Real Data

by Lucas D. Ribeiro Reis, Gauss M. Cordeiro and Maria do Carmo S. Lima

Stats 2022, 5(1), 215-257; https://doi.org/10.3390/stats5010015 - 4 Mar 2022

Cited by 3 | Viewed by 3285

Abstract

We study the Stacy-G family, which extends the gamma-G class and provides four of the most well-known forms of the hazard rate function: increasing, decreasing, bathtub, and inverted bathtub. We provide some of its structural properties. We estimate the parameters by maximum likelihood, [...] Read more.

We study the Stacy-G family, which extends the gamma-G class and provides four of the most well-known forms of the hazard rate function: increasing, decreasing, bathtub, and inverted bathtub. We provide some of its structural properties. We estimate the parameters by maximum likelihood, and perform a simulation study to verify the asymptotic properties of the estimators for the Burr-XII baseline. We construct the log-Stacy-Burr XII regression for censored data. The usefulness of the new models is shown through applications to uncensored and censored real data. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

22 pages, 422 KB

Open AccessArticle

Inverted Weibull Regression Models and Their Applications

by Sarah R. Al-Dawsari and Khalaf S. Sultan

Stats 2021, 4(2), 269-290; https://doi.org/10.3390/stats4020019 - 1 Apr 2021

Cited by 2 | Viewed by 3161

Abstract

In this paper, we propose the classical and Bayesian regression models for use in conjunction with the inverted Weibull (IW) distribution; there are the inverted Weibull Regression model (IW-Reg) and inverted Weibull Bayesian regression model (IW-BReg). In the proposed models, we suggest the [...] Read more.

In this paper, we propose the classical and Bayesian regression models for use in conjunction with the inverted Weibull (IW) distribution; there are the inverted Weibull Regression model (IW-Reg) and inverted Weibull Bayesian regression model (IW-BReg). In the proposed models, we suggest the logarithm and identity link functions, while in the Bayesian approach, we use a gamma prior and two loss functions, namely zero-one and modified general entropy (MGE) loss functions. To deal with the outliers in the proposed models, we apply Huber and Tukey’s bisquare (biweight) functions. In addition, we use the iteratively reweighted least squares (IRLS) algorithm to estimate Bayesian regression coefficients. Further, we compare IW-Reg and IW-BReg using some performance criteria, such as Akaike’s information criterion (AIC), deviance (D), and mean squared error (MSE). Finally, we apply the some real datasets collected from Saudi Arabia with the corresponding explanatory variables to the theoretical findings. The Bayesian approach shows better performance compare to the classical approach in terms of the considered performance criteria. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

14 pages, 597 KB

Open AccessArticle

Predictor Analysis in Group Decision Making

by Stan Lipovetsky

Stats 2021, 4(1), 108-121; https://doi.org/10.3390/stats4010009 - 9 Feb 2021

Cited by 5 | Viewed by 2583

Abstract

Priority vectors in the Analytic Hierarchy Process (AHP) are commonly estimated as constant values calculated by the pairwise comparison ratios elicited from an expert. For multiple experts, or panel data, or other data with varied characteristics of measurements, the priority vectors can be [...] Read more.

Priority vectors in the Analytic Hierarchy Process (AHP) are commonly estimated as constant values calculated by the pairwise comparison ratios elicited from an expert. For multiple experts, or panel data, or other data with varied characteristics of measurements, the priority vectors can be built as functions of the auxiliary predictors. For example, in multi-person decision making, the priorities can be obtained in regression modeling by the demographic and socio-economic properties. Then the priorities can be predicted for individual respondents, profiled by each predictor, forecasted in time, studied by the predictor importance, and estimated by the characteristic of significance, fit and quality well-known in regression modeling. Numerical results show that the suggested approaches reveal useful features of priority behavior, that can noticeably extend the AHP abilities and applications for numerous multiple-criteria decision making problems. The considered methods are useful for segmentation of the respondents and finding optimum managerial solutions specific for each segment. It can help to decision makers to focus on the respondents’ individual features and to increase customer satisfaction, their retention and loyalty to the promoted brands or products. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

17 pages, 967 KB

Open AccessArticle

An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions

by Samuele Tosatto, Riad Akrour and Jan Peters

Stats 2021, 4(1), 1-17; https://doi.org/10.3390/stats4010001 - 30 Dec 2020

Cited by 3 | Viewed by 6639

Abstract

The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in several related literature. However, given its asymptotic nature, it gives no access [...] Read more.

The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in several related literature. However, given its asymptotic nature, it gives no access to a hard bound. The increasing popularity of predictive tools for automated decision-making surges the need for hard (non-probabilistic) guarantees. To alleviate this issue, we propose an upper bound of the bias which holds for finite bandwidths using Lipschitz assumptions and mitigating some of the prerequisites of Rosenblatt’s analysis. Our bound has potential applications in fields like surgical robots or self-driving cars, where some hard guarantees on the prediction-error are needed. Full article

(This article belongs to the Section Regression Models)

► Show Figures

Figure 1

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI