Next Issue
Volume 7, June
Previous Issue
Volume 6, December
 
 

Stats, Volume 7, Issue 1 (March 2024) – 20 articles

Cover Story (view full-size image): Social and behavioral scientists use structural equation models (SEMs) as mathematical representations of complex theories involving multivariate data. Sophisticated sampling designs can yield dependencies that complicate standard SEMs, requiring more advanced estimation methods. Round-robin designs have a social-network structure, in which every member of a group is potentially linked to every other member of the same group. This yields dyadic data, in which the same variable is measured for each member of a pair in response to (or about) the other member of the pair. Every person is a member of several pairs within the network. This paper demonstrates a method to account for this complexity in the first step to estimate SEM parameters. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
17 pages, 2272 KiB  
Article
A Note on Simultaneous Confidence Intervals for Direct, Indirect and Synthetic Estimators
by Christophe Quentin Valvason and Stefan Sperlich
Stats 2024, 7(1), 333-349; https://doi.org/10.3390/stats7010020 - 20 Mar 2024
Cited by 1 | Viewed by 1217
Abstract
Direct, indirect and synthetic estimators have a long history in official statistics. While model-based or model-assisted approaches have become very popular, direct and indirect estimators remain the predominant standard and are therefore important tools in practice. This is mainly due to their simplicity, [...] Read more.
Direct, indirect and synthetic estimators have a long history in official statistics. While model-based or model-assisted approaches have become very popular, direct and indirect estimators remain the predominant standard and are therefore important tools in practice. This is mainly due to their simplicity, including low data requirements, assumptions and straightforward inference. With the increasing use of domain estimates in policy, the demands on these tools have also increased. Today, they are frequently used for comparative statistics. This requires appropriate tools for simultaneous inference. We study devices for constructing simultaneous confidence intervals and show that simple tools like the Bonferroni correction can easily fail. In contrast, uniform inference based on max-type statistics in combination with bootstrap methods, appropriate for finite populations, work reasonably well. We illustrate our methods with frequently applied estimators of totals and means. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

16 pages, 476 KiB  
Article
The Flexible Gumbel Distribution: A New Model for Inference about the Mode
by Qingyang Liu, Xianzheng Huang and Haiming Zhou
Stats 2024, 7(1), 317-332; https://doi.org/10.3390/stats7010019 - 13 Mar 2024
Cited by 2 | Viewed by 1617
Abstract
A new unimodal distribution family indexed via the mode and three other parameters is derived from a mixture of a Gumbel distribution for the maximum and a Gumbel distribution for the minimum. Properties of the proposed distribution are explored, including model identifiability and [...] Read more.
A new unimodal distribution family indexed via the mode and three other parameters is derived from a mixture of a Gumbel distribution for the maximum and a Gumbel distribution for the minimum. Properties of the proposed distribution are explored, including model identifiability and flexibility in capturing heavy-tailed data that exhibit different directions of skewness over a wide range. Both frequentist and Bayesian methods are developed to infer parameters in the new distribution. Simulation studies are conducted to demonstrate satisfactory performance of both methods. By fitting the proposed model to simulated data and data from an application in hydrology, it is shown that the proposed flexible distribution is especially suitable for data containing extreme values in either direction, with the mode being a location parameter of interest. Using the proposed unimodal distribution, one can easily formulate a regression model concerning the mode of a response given covariates. We apply this model to data from an application in criminology to reveal interesting data features that are obscured by outliers. Full article
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)
Show Figures

Figure 1

16 pages, 788 KiB  
Article
Wilcoxon-Type Control Charts Based on Multiple Scans
by Ioannis S. Triantafyllou
Stats 2024, 7(1), 301-316; https://doi.org/10.3390/stats7010018 - 7 Mar 2024
Cited by 2 | Viewed by 1259
Abstract
In this article, we establish new distribution-free Shewhart-type control charts based on rank sum statistics with signaling multiple scans-type rules. More precisely, two Wilcoxon-type chart statistics are considered in order to formulate the decision rule of the proposed monitoring scheme. In order to [...] Read more.
In this article, we establish new distribution-free Shewhart-type control charts based on rank sum statistics with signaling multiple scans-type rules. More precisely, two Wilcoxon-type chart statistics are considered in order to formulate the decision rule of the proposed monitoring scheme. In order to enhance the performance of the new nonparametric control charts, multiple scans-type rules are activated, which make the proposed chart more sensitive in detecting possible shifts of the underlying distribution. The appraisal of the proposed monitoring scheme is accomplished with the aid of the corresponding run length distribution under both in- and out-of-control cases. Thereof, exact formulae for the variance of the run length distribution and the average run length (ARL) of the proposed monitoring schemes are derived. A numerical investigation is carried out and depicts that the proposed schemes acquire better performance towards their competitors. Full article
Show Figures

Figure 1

17 pages, 844 KiB  
Article
Cumulative Histograms under Uncertainty: An Application to Dose–Volume Histograms in Radiotherapy Treatment Planning
by Flavia Gesualdi and Niklas Wahl
Stats 2024, 7(1), 284-300; https://doi.org/10.3390/stats7010017 - 6 Mar 2024
Viewed by 1454
Abstract
In radiotherapy treatment planning, the absorbed doses are subject to executional and preparational errors, which propagate to plan quality metrics. Accurately quantifying these uncertainties is imperative for improved treatment outcomes. One approach, analytical probabilistic modeling (APM), presents a highly computationally efficient method. This [...] Read more.
In radiotherapy treatment planning, the absorbed doses are subject to executional and preparational errors, which propagate to plan quality metrics. Accurately quantifying these uncertainties is imperative for improved treatment outcomes. One approach, analytical probabilistic modeling (APM), presents a highly computationally efficient method. This study evaluates the empirical distribution of dose–volume histogram points (a typical plan metric) derived from Monte Carlo sampling to quantify the accuracy of modeling uncertainties under different distribution assumptions, including Gaussian, log-normal, four-parameter beta, gamma, and Gumbel distributions. Since APM necessitates the bivariate cumulative distribution functions, this investigation also delves into approximations using a Gaussian or an Ali–Mikhail–Haq Copula. The evaluations are performed in a one-dimensional simulated geometry and on patient data for a lung case. Our findings suggest that employing a beta distribution offers improved modeling accuracy compared to a normal distribution. Moreover, the multivariate Gaussian model outperforms the Copula models in patient data. This investigation highlights the significance of appropriate statistical distribution selection in advancing the accuracy of uncertainty modeling in radiotherapy treatment planning, extending an understanding of the analytical probabilistic modeling capacities in this crucial medical domain. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

15 pages, 3284 KiB  
Article
Comments on the Bernoulli Distribution and Hilbe’s Implicit Extra-Dispersion
by Daniel A. Griffith
Stats 2024, 7(1), 269-283; https://doi.org/10.3390/stats7010016 - 5 Mar 2024
Cited by 2 | Viewed by 1717
Abstract
For decades, conventional wisdom maintained that binary 0–1 Bernoulli random variables cannot contain extra-binomial variation. Taking an unorthodox stance, Hilbe actively disagreed, especially for correlated observation instances, arguing that the universally adopted diagnostic Pearson or deviance dispersion statistics are insensitive to a variance [...] Read more.
For decades, conventional wisdom maintained that binary 0–1 Bernoulli random variables cannot contain extra-binomial variation. Taking an unorthodox stance, Hilbe actively disagreed, especially for correlated observation instances, arguing that the universally adopted diagnostic Pearson or deviance dispersion statistics are insensitive to a variance anomaly in a binary context, and hence simply fail to detect it. However, having the intuition and insight to sense the existence of this departure from standard mathematical statistical theory, but being unable to effectively isolate it, he classified this particular over-/under-dispersion phenomenon as implicit. This paper explicitly exposes his hidden quantity by demonstrating that the variance in/deflation it represents occurs in an underlying predicted beta random variable whose real number values are rounded to their nearest integers to convert to a Bernoulli random variable, with this discretization masking any materialized extra-Bernoulli variation. In doing so, asymptotics linking the beta-binomial and Bernoulli distributions show another conventional wisdom misconception, namely a mislabeling substitution involving the quasi-Bernoulli random variable; this undeniably is not a quasi-likelihood situation. A public bell pepper disease dataset exhibiting conspicuous spatial autocorrelation furnishes empirical examples illustrating various features of this advocated proposition. Full article
Show Figures

Figure 1

34 pages, 659 KiB  
Article
Two-Stage Limited-Information Estimation for Structural Equation Models of Round-Robin Variables
by Terrence D. Jorgensen, Aditi M. Bhangale and Yves Rosseel
Stats 2024, 7(1), 235-268; https://doi.org/10.3390/stats7010015 - 28 Feb 2024
Viewed by 1737
Abstract
We propose and demonstrate a new two-stage maximum likelihood estimator for parameters of a social relations structural equation model (SR-SEM) using estimated summary statistics (Σ^) as data, as well as uncertainty about Σ^ to obtain robust inferential statistics. The [...] Read more.
We propose and demonstrate a new two-stage maximum likelihood estimator for parameters of a social relations structural equation model (SR-SEM) using estimated summary statistics (Σ^) as data, as well as uncertainty about Σ^ to obtain robust inferential statistics. The SR-SEM is a generalization of a traditional SEM for round-robin data, which have a dyadic network structure (i.e., each group member responds to or interacts with each other member). Our two-stage estimator is developed using similar logic as previous two-stage estimators for SEM, developed for application to multilevel data and multiple imputations of missing data. We demonstrate out estimator on a publicly available data set from a 2018 publication about social mimicry. We employ Markov chain Monte Carlo estimation of Σ^ in Stage 1, implemented using the R package rstan. In Stage 2, the posterior mean estimates of Σ^ are used as input data to estimate SEM parameters with the R package lavaan. The posterior covariance matrix of estimated Σ^ is also calculated so that lavaan can use it to calculate robust standard errors and test statistics. Results are compared to full-information maximum likelihood (FIML) estimation of SR-SEM parameters using the R package srm. We discuss how differences between estimators highlight the need for future research to establish best practices under realistic conditions (e.g., how to specify empirical Bayes priors in Stage 1), as well as extensions that would make 2-stage estimation particularly advantageous over single-stage FIML. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

15 pages, 2708 KiB  
Article
Generation of Scale-Free Assortative Networks via Newman Rewiring for Simulation of Diffusion Phenomena
by Laura Di Lucchio and Giovanni Modanese
Stats 2024, 7(1), 220-234; https://doi.org/10.3390/stats7010014 - 24 Feb 2024
Cited by 1 | Viewed by 1353
Abstract
By collecting and expanding several numerical recipes developed in previous work, we implement an object-oriented Python code, based on the networkX library, for the realization of the configuration model and Newman rewiring. The software can be applied to any kind of network and [...] Read more.
By collecting and expanding several numerical recipes developed in previous work, we implement an object-oriented Python code, based on the networkX library, for the realization of the configuration model and Newman rewiring. The software can be applied to any kind of network and “target” correlations, but it is tested with focus on scale-free networks and assortative correlations. In order to generate the degree sequence we use the method of “random hubs”, which gives networks with minimal fluctuations. For the assortative rewiring we use the simple Vazquez-Weigt matrix as a test in the case of random networks; since it does not appear to be effective in the case of scale-free networks, we subsequently turn to another recipe which generates matrices with decreasing off-diagonal elements. The rewiring procedure is also important at the theoretical level, in order to test which types of statistically acceptable correlations can actually be realized in concrete networks. From the point of view of applications, its main use is in the construction of correlated networks for the solution of dynamical or diffusion processes through an analysis of the evolution of single nodes, i.e., beyond the Heterogeneous Mean Field approximation. As an example, we report on an application to the Bass diffusion model, with calculations of the time tmax of the diffusion peak. The same networks can additionally be exported in environments for agent-based simulations like NetLogo. Full article
Show Figures

Figure 1

17 pages, 1866 KiB  
Article
New Vessel Extraction Method by Using Skew Normal Distribution for MRA Images
by Tohid Bahrami, Hossein Jabbari Khamnei, Mehrdad Lakestani and B. M. Golam Kibria
Stats 2024, 7(1), 203-219; https://doi.org/10.3390/stats7010013 - 23 Feb 2024
Viewed by 1439
Abstract
Vascular-related diseases pose significant public health challenges and are a leading cause of mortality and disability. Understanding the complex structure of the vascular system and its processes is crucial for addressing these issues. Recent advancements in medical imaging technology have enabled the generation [...] Read more.
Vascular-related diseases pose significant public health challenges and are a leading cause of mortality and disability. Understanding the complex structure of the vascular system and its processes is crucial for addressing these issues. Recent advancements in medical imaging technology have enabled the generation of high-resolution 3D images of vascular structures, leading to a diverse array of methods for vascular extraction. While previous research has often assumed a normal distribution of image data, this paper introduces a novel vessel extraction method that utilizes the skew normal distribution for more accurate probability distribution modeling. The proposed method begins with a preprocessing step to enhance vessel structures and reduce noise in Magnetic Resonance Angiography (MRA) images. The skew normal distribution, known for its ability to model skewed data, is then employed to characterize the intensity distribution of vessels. By estimating the parameters of the skew normal distribution using the Expectation-Maximization (EM) algorithm, the method effectively separates vessel pixels from the background and non-vessel regions. To extract vessels, a thresholding technique is applied based on the estimated skew normal distribution parameters. This segmentation process enables accurate vessel extraction, particularly in detecting thin vessels and enhancing the delineation of vascular edges with low contrast. Experimental evaluations on a diverse set of MRA images demonstrate the superior performance of the proposed method compared to previous approaches in terms of accuracy and computational efficiency. The presented vessel extraction method holds promise for improving the diagnosis and treatment of vascular-related diseases. By leveraging the skew normal distribution, it provides accurate and efficient vessel segmentation, contributing to the advancement of vascular imaging in the field of medical image analysis. Full article
Show Figures

Figure 1

18 pages, 1520 KiB  
Article
Utility in Time Description in Priority Best–Worst Discrete Choice Models: An Empirical Evaluation Using Flynn’s Data
by Sasanka Adikari and Norou Diawara
Stats 2024, 7(1), 185-202; https://doi.org/10.3390/stats7010012 - 19 Feb 2024
Cited by 1 | Viewed by 1608
Abstract
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer [...] Read more.
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer over time. The analysis was conducted by using simulated data (choice pairs) based on data from Flynn’s (2007) ‘Quality of Life Experiment’. Most of the traditional approaches assume the choice alternatives are mutually exclusive over time, which is a questionable assumption. We introduced a new copula-based model (CO-CUB) for the transition probability, which can handle the dependent structure of best–worst choices while applying a very practical constraint. We used a conditional logit model to calculate the utility at consecutive time points and spread it to future time points under dynamic programming. We suggest that the CO-CUB transition probability algorithm is a novel way to analyze and predict choices in future time points by expressing human choice behavior. The numerical results inform decision making, help formulate strategy and learning algorithms under dynamic utility in time for best–worst DCEs. Full article
Show Figures

Figure 1

13 pages, 533 KiB  
Article
Importance and Uncertainty of λ-Estimation for Box–Cox Transformations to Compute and Verify Reference Intervals in Laboratory Medicine
by Frank Klawonn, Neele Riekeberg and Georg Hoffmann
Stats 2024, 7(1), 172-184; https://doi.org/10.3390/stats7010011 - 9 Feb 2024
Cited by 2 | Viewed by 1657
Abstract
Reference intervals play an important role in medicine, for instance, for the interpretation of blood test results. They are defined as the central 95% values of a healthy population and are often stratified by sex and age. In recent years, so-called indirect methods [...] Read more.
Reference intervals play an important role in medicine, for instance, for the interpretation of blood test results. They are defined as the central 95% values of a healthy population and are often stratified by sex and age. In recent years, so-called indirect methods for the computation and validation of reference intervals have gained importance. Indirect methods use all values from a laboratory, including the pathological cases, and try to identify the healthy sub-population in the mixture of values. This is only possible under certain model assumptions, i.e., that the majority of the values represent non-pathological values and that the non-pathological values follow a normal distribution after a suitable transformation, commonly a Box–Cox transformation, rendering the parameter λ of the Box–Cox transformation as a nuisance parameter for the estimation of the reference interval. Although indirect methods put high effort on the estimation of λ, they come to very different estimates for λ, even though the estimated reference intervals are quite coherent. Our theoretical considerations and Monte-Carlo simulations show that overestimating λ can lead to intolerable deviations of the reference interval estimates, whereas λ=0 produces usually acceptable estimates. For λ close to 1, its estimate has limited influence on the estimate for the reference interval, and with reasonable sample sizes, the uncertainty for the λ-estimate remains quite high. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

12 pages, 4388 KiB  
Article
Sensitivity Analysis of Start Point of Extreme Daily Rainfall Using CRHUDA and Stochastic Models
by Martin Muñoz-Mandujano, Alfonso Gutierrez-Lopez, Jose Alfredo Acuña-Garcia, Mauricio Arturo Ibarra-Corona, Isaac Carpintero Aguilar and José Alejandro Vargas-Diaz
Stats 2024, 7(1), 160-171; https://doi.org/10.3390/stats7010010 - 8 Feb 2024
Viewed by 1518
Abstract
Forecasting extreme precipitation is one of the basic actions of warning systems in Latin America and the Caribbean (LAC). With thousands of economic losses and severe damage caused by floods in urban areas, hydrometeorological monitoring is a priority in most countries in the [...] Read more.
Forecasting extreme precipitation is one of the basic actions of warning systems in Latin America and the Caribbean (LAC). With thousands of economic losses and severe damage caused by floods in urban areas, hydrometeorological monitoring is a priority in most countries in the LAC region. The monitoring of convective precipitation, cold fronts, and hurricane tracks are the most demanded technological developments for early warning systems in the region. However, predicting and forecasting the onset time of extreme precipitation is a subject of life-saving scientific research. Developed in 2019, the CRHUDA (Crossing HUmidity, Dew point, and Atmospheric pressure) model provides insight into the onset of precipitation from the Clausius–Clapeyron relationship. With access to a historical database of more than 600 storms, the CRHUDA model provides a prediction with a precision of six to eight hours in advance of storm onset. However, the calibration is complex given the addition of ARMA(p,q)-type models for real-time forecasting. This paper presents the calibration of the joint CRHUDA+ARMA(p,q) model. It is concluded that CRHUDA is significantly more suitable and relevant for the forecast of precipitation and a possible future development for an early warning system (EWS). Full article
(This article belongs to the Section Applied Stochastic Models)
Show Figures

Figure 1

22 pages, 442 KiB  
Article
On Estimation of Shannon’s Entropy of Maxwell Distribution Based on Progressively First-Failure Censored Data
by Kapil Kumar, Indrajeet Kumar and Hon Keung Tony Ng
Stats 2024, 7(1), 138-159; https://doi.org/10.3390/stats7010009 - 8 Feb 2024
Cited by 2 | Viewed by 1715
Abstract
Shannon’s entropy is a fundamental concept in information theory that quantifies the uncertainty or information in a random variable or data set. This article addresses the estimation of Shannon’s entropy for the Maxwell lifetime model based on progressively first-failure-censored data from both classical [...] Read more.
Shannon’s entropy is a fundamental concept in information theory that quantifies the uncertainty or information in a random variable or data set. This article addresses the estimation of Shannon’s entropy for the Maxwell lifetime model based on progressively first-failure-censored data from both classical and Bayesian points of view. In the classical perspective, the entropy is estimated using maximum likelihood estimation and bootstrap methods. For Bayesian estimation, two approximation techniques, including the Tierney-Kadane (T-K) approximation and the Markov Chain Monte Carlo (MCMC) method, are used to compute the Bayes estimate of Shannon’s entropy under the linear exponential (LINEX) loss function. We also obtained the highest posterior density (HPD) credible interval of Shannon’s entropy using the MCMC technique. A Monte Carlo simulation study is performed to investigate the performance of the estimation procedures and methodologies studied in this manuscript. A numerical example is used to illustrate the methodologies. This paper aims to provide practical values in applied statistics, especially in the areas of reliability and lifetime data analysis. Full article
(This article belongs to the Section Reliability Engineering)
Show Figures

Figure 1

28 pages, 1032 KiB  
Article
Active Learning for Stacking and AdaBoost-Related Models
by Qun Sui and Sujit K. Ghosh
Stats 2024, 7(1), 110-137; https://doi.org/10.3390/stats7010008 - 24 Jan 2024
Cited by 2 | Viewed by 2034
Abstract
Ensemble learning (EL) has become an essential technique in machine learning that can significantly enhance the predictive performance of basic models, but it also comes with an increased cost of computation. The primary goal of the proposed approach is to present a general [...] Read more.
Ensemble learning (EL) has become an essential technique in machine learning that can significantly enhance the predictive performance of basic models, but it also comes with an increased cost of computation. The primary goal of the proposed approach is to present a general integrative framework that allows for applying active learning (AL) which makes use of only limited budget by selecting optimal instances to achieve comparable predictive performance within the context of ensemble learning. The proposed framework is based on two distinct approaches: (i) AL is implemented following a full scale EL, which we call the ensemble learning on top of active learning (ELTAL), and (ii) apply the AL while using the EL, which we call the active learning during ensemble learning (ALDEL). Various algorithms for ELTAL and ALDEL are presented using Stacking and Boosting with various algorithm-specific query strategies. The proposed active learning algorithms are numerically illustrated with the Support Vector Machine (SVM) model using simulated data and two real-world applications, evaluating their accuracy when only a small number instances are selected as compared to using full data. Our findings demonstrate that: (i) the accuracy of a boosting or stacking model, using the same uncertainty sampling, is higher than that of the SVM model, highlighting the strength of EL; (ii) AL can enable the stacking model to achieve comparable accuracy to the SVM model using the full dataset, with only a small fraction of carefully selected instances, illustrating the strength of active learning. Full article
Show Figures

Figure 1

15 pages, 400 KiB  
Brief Report
Statistical Framework: Estimating the Cumulative Shares of Nobel Prizes from 1901 to 2022
by Xu Zhang, Bruce Golden and Edward Wasil
Stats 2024, 7(1), 95-109; https://doi.org/10.3390/stats7010007 - 19 Jan 2024
Viewed by 1909
Abstract
Studying trends in the geographical distribution of the Nobel Prize is an interesting topic that has been examined in the academic literature. To track the trends, we develop a stochastic estimate for the cumulative shares of Nobel Prizes awarded to recipients in four [...] Read more.
Studying trends in the geographical distribution of the Nobel Prize is an interesting topic that has been examined in the academic literature. To track the trends, we develop a stochastic estimate for the cumulative shares of Nobel Prizes awarded to recipients in four geographical groups: North America, Europe, Asia, Other. Specifically, we propose two models to estimate how cumulative shares change over time in the four groups. We estimate parameters, develop a prediction interval for each model, and validate our models. Finally, we apply our approach to estimate the distribution of the cumulative shares of Nobel Prizes for the four groups from 1901 to 2022. Full article
Show Figures

Figure 1

16 pages, 5148 KiB  
Case Report
Ecosystem Degradation in Romania: Exploring the Core Drivers
by Alexandra-Nicoleta Ciucu-Durnoi and Camelia Delcea
Stats 2024, 7(1), 79-94; https://doi.org/10.3390/stats7010006 - 18 Jan 2024
Viewed by 1550
Abstract
The concept of sustainable development appeared as a response to the attempt to improve the quality of human life, simultaneously with the preservation of the environment. For this reason, two of the 17 Sustainable Development Goals are dedicated to life below water (SDG14) [...] Read more.
The concept of sustainable development appeared as a response to the attempt to improve the quality of human life, simultaneously with the preservation of the environment. For this reason, two of the 17 Sustainable Development Goals are dedicated to life below water (SDG14) and on land (SDG15). In the course of this research, comprehensive information on the extent of degradation in Romania’s primary ecosystems was furnished, along with an exploration of the key factors precipitating this phenomenon. This investigation delves into the perspectives of 42 counties, scrutinizing the level of degradation in forest ecosystems, grasslands, lakes and rivers. The analysis commences with a presentation of descriptive statistics pertaining to each scrutinized system, followed by an elucidation of the primary causes contributing to its degradation. Subsequently, a cluster analysis is conducted on the counties of the country. One of these causes is the presence of intense industrial activity in certain areas, so it is even more important to accelerate the transition to a green economy in order to help the environment regenerate. Full article
Show Figures

Figure 1

14 pages, 1536 KiB  
Article
Directional Differences in Thematic Maps of Soil Chemical Attributes with Geometric Anisotropy
by Dyogo Lesniewski Ribeiro, Tamara Cantú Maltauro, Luciana Pagliosa Carvalho Guedes, Miguel Angel Uribe-Opazo and Gustavo Henrique Dalposso
Stats 2024, 7(1), 65-78; https://doi.org/10.3390/stats7010005 - 16 Jan 2024
Viewed by 1575
Abstract
In the study of the spatial variability of soil chemical attributes, the process is considered anisotropic when the spatial dependence structure differs in relation to the direction. Anisotropy is a characteristic that influences the accuracy of the thematic maps that represent the spatial [...] Read more.
In the study of the spatial variability of soil chemical attributes, the process is considered anisotropic when the spatial dependence structure differs in relation to the direction. Anisotropy is a characteristic that influences the accuracy of the thematic maps that represent the spatial variability of the phenomenon. Therefore, the linear anisotropic Gaussian spatial model is important for spatial data that present anisotropy, and incorporating this as an intrinsic characteristic of the process that describes the spatial dependence structure improves the accuracy of the spatial estimation of the values of a georeferenced variable in unsampled locations. This work aimed at quantifying the directional differences existing in the thematic map of georeferenced variables when incorporating or not incorporating anisotropy into the spatial dependence structure through directional spatial autocorrelation. For simulated data and soil chemical properties (carbon, calcium and potassium), the Moran directional index was calculated, considering the predicted values at unsampled locations, and taking into account estimated isotropic and anisotropic geostatistical models. The directional spatial autocorrelation was effective in evidencing the directional difference between thematic maps elaborated with estimated isotropic and anisotropic geostatistical models. This measure evidenced the existence of an elliptical format of the subregions presented by thematic maps in the direction of anisotropy that indicated a greater spatial continuity for greater distances between pairs of points. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

11 pages, 262 KiB  
Article
On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components
by Alessandro Giuliani and Alessandro Vici
Stats 2024, 7(1), 54-64; https://doi.org/10.3390/stats7010004 - 11 Jan 2024
Viewed by 1850
Abstract
The usual method of separating signal and noise principal components on the sole basis of their eigenvalues has evident drawbacks when semantically relevant information ‘hides’ in minor components, explaining a very small part of the total variance. This situation is common in biomedical [...] Read more.
The usual method of separating signal and noise principal components on the sole basis of their eigenvalues has evident drawbacks when semantically relevant information ‘hides’ in minor components, explaining a very small part of the total variance. This situation is common in biomedical experimentation when PCA is used for hypothesis generation: the multi-scale character of biological regulation typically generates a main mode explaining the major part of variance (size component), squashing potentially interesting (shape) components into the noise floor. These minor components should be erroneously discarded as noisy by the usual selection methods. Here, we propose a computational method, tailored for the chemical concept of ‘titration’, allowing for the unsupervised recognition of the potential signal character of minor components by the analysis of the presence of a negative linear relation between added noise and component invariance. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

20 pages, 1233 KiB  
Article
Precise Tensor Product Smoothing via Spectral Splines
by Nathaniel E. Helwig
Stats 2024, 7(1), 34-53; https://doi.org/10.3390/stats7010003 - 10 Jan 2024
Cited by 1 | Viewed by 1573
Abstract
Tensor product smoothers are frequently used to include interaction effects in multiple nonparametric regression models. Current implementations of tensor product smoothers either require using approximate penalties, such as those typically used in generalized additive models, or costly parameterizations, such as those used in [...] Read more.
Tensor product smoothers are frequently used to include interaction effects in multiple nonparametric regression models. Current implementations of tensor product smoothers either require using approximate penalties, such as those typically used in generalized additive models, or costly parameterizations, such as those used in smoothing spline analysis of variance models. In this paper, I propose a computationally efficient and theoretically precise approach for tensor product smoothing. Specifically, I propose a spectral representation of a univariate smoothing spline basis, and I develop an efficient approach for building tensor product smooths from marginal spectral spline representations. The developed theory suggests that current tensor product smoothing methods could be improved by incorporating the proposed tensor product spectral smoothers. Simulation results demonstrate that the proposed approach can outperform popular tensor product smoothing implementations, which supports the theoretical results developed in the paper. Full article
(This article belongs to the Special Issue Novel Semiparametric Methods)
Show Figures

Figure 1

11 pages, 360 KiB  
Article
Predicting Random Walks and a Data-Splitting Prediction Region
by Mulubrhan G. Haile, Lingling Zhang and David J. Olive
Stats 2024, 7(1), 23-33; https://doi.org/10.3390/stats7010002 - 8 Jan 2024
Cited by 1 | Viewed by 1558
Abstract
Perhaps the first nonparametric, asymptotically optimal prediction intervals are provided for univariate random walks, with applications to renewal processes. Perhaps the first nonparametric prediction regions are introduced for vector-valued random walks. This paper further derives nonparametric data-splitting prediction regions, which are underpinned by [...] Read more.
Perhaps the first nonparametric, asymptotically optimal prediction intervals are provided for univariate random walks, with applications to renewal processes. Perhaps the first nonparametric prediction regions are introduced for vector-valued random walks. This paper further derives nonparametric data-splitting prediction regions, which are underpinned by very simple theory. Some of the prediction regions can be used when the data distribution does not have first moments, and some can be used for high-dimensional data, where the number of predictors is larger than the sample size. The prediction regions can make use of many estimators of multivariate location and dispersion. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

22 pages, 1709 KiB  
Article
The Mediating Impact of Innovation Types in the Relationship between Innovation Use Theory and Market Performance
by Shieh-Liang Chen and Kuo-Liang Chen
Stats 2024, 7(1), 1-22; https://doi.org/10.3390/stats7010001 - 30 Dec 2023
Viewed by 1870
Abstract
The ultimate goal of innovation is to improve performance. But if people’s needs and uses are ignored, innovation will only be a formality. In the past, research on innovation mostly focused on technology, processes, business models, services, and organizations. The measurement of innovation [...] Read more.
The ultimate goal of innovation is to improve performance. But if people’s needs and uses are ignored, innovation will only be a formality. In the past, research on innovation mostly focused on technology, processes, business models, services, and organizations. The measurement of innovation focuses on capabilities, processes, results, and methods, but there has always been a lack of pre-innovation measurements and tools. This study is the first to use the innovation use theory proposed by Christensen et al. combined with innovation types, and it uses the measurement focus on the early stage of innovation as a post-innovation performance prediction. This study collected 590 valid samples and used SPSS and the four-step BK method to conduct regression analysis and mediation tests. The empirical results obtained the following: (1) a confirmed model and scale of the innovation use theory; (2) that three constructs of innovation use theory have an impact on market performance; and (3) that innovation types acting as mediators will improve market performance. This study establishes an academic model of the innovation use theory to provide a clear scale tool for subsequent research. In practice, it can first measure the direction of innovation and performance prediction, providing managers with a reference when developing new products and applying market strategies. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop