Next Issue
Volume 3, December
Previous Issue
Volume 3, June
 
 

Stats, Volume 3, Issue 3 (September 2020) – 13 articles

Cover Story (view full-size image): Agreement assessment of quantitative measurements in method comparison and observer variability studies is usually achieved by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold and neither transformations help normalize the data, the 2.5% and 97.5% percentiles are obtained by quantile estimation. We applied 15 nonparametric quantile estimators and assumed sample sizes between 30 and 150 for different distributions. The performance of the estimators in generating prediction intervals was measured by the coverage probability for one newly generated observation. Simple sample quantile estimators based on one or two order statistics outperformed all of the other estimators. For sample sizes exceeding 80 observations, Harrell–Davis and Sfakianakis–Verginis estimators performed equally well. View [...] Read more.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
Article
On the Use of the Cumulative Distribution Function for Large-Scale Tolerance Analyses Applied to Electric Machine Design
Stats 2020, 3(3), 412-426; https://doi.org/10.3390/stats3030026 - 22 Sep 2020
Cited by 2 | Viewed by 989
Abstract
In the field of electrical machine design, excellent performance for multiple objectives, like efficiency or torque density, can be reached by using contemporary optimization techniques. Unfortunately, highly optimized designs are prone to be rather sensitive regarding uncertainties in the design parameters. This paper [...] Read more.
In the field of electrical machine design, excellent performance for multiple objectives, like efficiency or torque density, can be reached by using contemporary optimization techniques. Unfortunately, highly optimized designs are prone to be rather sensitive regarding uncertainties in the design parameters. This paper introduces an approach to rate the sensitivity of designs with a large number of tolerance-affected parameters using cumulative distribution functions (CDFs) based on finite element analysis results. The accuracy of the CDFs is estimated using the Dvoretzky–Kiefer–Wolfowitz inequality, as well as the bootstrapping method. The advantage of the presented technique is that computational time can be kept low, even for complex problems. As a demanding test case, the effect of imperfect permanent magnets on the cogging torque of a Vernier machine with 192 tolerance-affected parameters is investigated. Results reveal that for this problem, a reliable statement about the robustness can already be made with 1000 finite element calculations. Full article
(This article belongs to the Special Issue Applied Statistics in Engineering)
Show Figures

Figure 1

Article
Neural Legal Outcome Prediction with Partial Least Squares Compression
Stats 2020, 3(3), 396-411; https://doi.org/10.3390/stats3030025 - 18 Sep 2020
Cited by 1 | Viewed by 924
Abstract
Predicting the outcome of a case from a set of factual data is a common goal in legal knowledge discovery. In practice, solving this task is most of the time difficult due to the scarcity of labeled datasets. Additionally, processing long documents often [...] Read more.
Predicting the outcome of a case from a set of factual data is a common goal in legal knowledge discovery. In practice, solving this task is most of the time difficult due to the scarcity of labeled datasets. Additionally, processing long documents often leads to sparse data, which adds another layer of complexity. This paper presents a study focused on the french decisions of the European Court of Human Rights (ECtHR) for which we build various classification tasks. These tasks consist first of all in the prediction of the potential violation of an article of the convention, using extracted facts. A multiclass problem is also created, with the objective of determining whether an article is relevant to plead given some circumstances. We solve these tasks by comparing simple linear models to an attention-based neural network. We also take advantage of a modified partial least squares algorithm that we integrate in the aforementioned models, capable of effectively dealing with classification problems and scale with sparse inputs coming from natural language tasks. Full article
(This article belongs to the Special Issue Interdisciplinary Research on Predictive Justice)
Show Figures

Figure 1

Article
Bottleneck or Crossroad? Problems of Legal Sources Annotation and Some Theoretical Thoughts
Stats 2020, 3(3), 376-395; https://doi.org/10.3390/stats3030024 - 09 Sep 2020
Viewed by 893
Abstract
So far, in the application of legal analytics to legal sources, the substantive legal knowledge employed by computational models has had to be extracted manually from legal sources. This is the bottleneck, described in the literature. The paper is an exploration of this [...] Read more.
So far, in the application of legal analytics to legal sources, the substantive legal knowledge employed by computational models has had to be extracted manually from legal sources. This is the bottleneck, described in the literature. The paper is an exploration of this obstacle, with a focus on quantitative legal prediction. The authors review the most important studies about quantitative legal prediction published in recent years and systematize the issue by dividing them in text-based approaches, metadata-based approaches, and mixed approaches to prediction. Then, they focus on the main theoretical issues, such as the relationship between legal prediction and certainty of law, isomorphism, the interaction between textual sources, information, representation, and models. The metaphor of a crossroad shows a descriptive utility both for the aspects inside the bottleneck and, surprisingly, for the wider scenario. In order to have an impact on the legal profession, the test bench for legal quantitative prediction is the analysis of case law from the lower courts. Finally, the authors outline a possible development in the Artificial Intelligence (henceforth AI) applied to ordinary judicial activity, in general and especially in Italy, stressing the opportunity the huge amount of data accumulated before lower courts in the online trials offers. Full article
(This article belongs to the Special Issue Interdisciplinary Research on Predictive Justice)
Article
Improving Access to Justice with Legal Chatbots
Stats 2020, 3(3), 356-375; https://doi.org/10.3390/stats3030023 - 04 Sep 2020
Cited by 2 | Viewed by 1746
Abstract
On average, one in three Canadians will be affected by a legal problem over a three-year period. Unfortunately, whether it is legal representation or legal advice, the very high cost of these services excludes disadvantaged and most vulnerable people, forcing them to represent [...] Read more.
On average, one in three Canadians will be affected by a legal problem over a three-year period. Unfortunately, whether it is legal representation or legal advice, the very high cost of these services excludes disadvantaged and most vulnerable people, forcing them to represent themselves. For these people, accessing legal information is therefore critical. In this work, we attempt to tackle this problem by embedding legal data in a conversational interface. We introduce two dialog systems (chatbots) created to provide legal information. The first one, based on data from the Government of Canada, deals with immigration issues, while the second one informs bank employees about legal issues related to their job tasks. Both chatbots rely on various representations and classification algorithms, from mature techniques to novel advances in the field. The chatbot dedicated to immigration issues is shared with the research community as an open resource project. Full article
(This article belongs to the Special Issue Interdisciplinary Research on Predictive Justice)
Show Figures

Figure 1

Article
Nonparametric Limits of Agreement for Small to Moderate Sample Sizes: A Simulation Study
Stats 2020, 3(3), 343-355; https://doi.org/10.3390/stats3030022 - 28 Aug 2020
Cited by 7 | Viewed by 1454
Abstract
The assessment of agreement in method comparison and observer variability analysis of quantitative measurements is usually done by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold, the 2.5% [...] Read more.
The assessment of agreement in method comparison and observer variability analysis of quantitative measurements is usually done by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold, the 2.5% and 97.5% percentiles are obtained by quantile estimation. In the literature, empirical quantiles have been used for this purpose. In this simulation study, we applied both sample, subsampling, and kernel quantile estimators, as well as other methods for quantile estimation to sample sizes between 30 and 150 and different distributions of the paired differences. The performance of 15 estimators in generating prediction intervals was measured by their respective coverage probability for one newly generated observation. Our results indicated that sample quantile estimators based on one or two order statistics outperformed all of the other estimators and they can be used for deriving nonparametric Limits of Agreement. For sample sizes exceeding 80 observations, more advanced quantile estimators, such as the Harrell–Davis and estimators of Sfakianakis–Verginis type, which use all of the observed differences, performed likewise well, but may be considered intuitively more appealing than simple sample quantile estimators that are based on only two observations per quantile. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

Brief Report
Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic
Stats 2020, 3(3), 330-342; https://doi.org/10.3390/stats3030021 - 25 Aug 2020
Cited by 1 | Viewed by 981
Abstract
We prove that the Behrens–Fisher statistic follows a Student bridge distribution, the mixing coefficient of which depends on the two sample variances only through their ratio. To this end, it is first shown that a weighted sum of two independent normalized chi-square distributed [...] Read more.
We prove that the Behrens–Fisher statistic follows a Student bridge distribution, the mixing coefficient of which depends on the two sample variances only through their ratio. To this end, it is first shown that a weighted sum of two independent normalized chi-square distributed random variables is chi-square bridge distributed, and secondly that the Behrens–Fisher statistic is based on such a variable and a standard normally distributed one that is independent of the former. In case of a known variance ratio, exact standard statistical testing and confidence estimation methods apply without the need for any additional approximations. In addition, a three pillar bridges explanation is given for the choice of degrees of freedom in Welch’s approximation to the exact distribution of the Behrens–Fisher statistic. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

Article
Recovering Yield Curves from Dynamic Term Structure Models with Time-Varying Factors
Stats 2020, 3(3), 284-329; https://doi.org/10.3390/stats3030020 - 22 Aug 2020
Viewed by 823
Abstract
A dynamic version of the Nelson-Siegel-Svensson term structure model with time-varying factors is considered for predicting out-of-sample maturity yields. Simple linear interpolation cannot be applied to recover yields at the very short- and long- end of the term structure where data are often [...] Read more.
A dynamic version of the Nelson-Siegel-Svensson term structure model with time-varying factors is considered for predicting out-of-sample maturity yields. Simple linear interpolation cannot be applied to recover yields at the very short- and long- end of the term structure where data are often missing. This motivates the use of dynamic parametric term structure models that exploit both time series and cross-sectional variation in yield data to predict missing data at the extreme ends of the term structure. Although the dynamic Nelson–Siegel–Svensson model is weakly identified when the two decay factors become close to each other, their predictions may be more accurate than those from more restricted models depending on data and maturity. Full article
(This article belongs to the Special Issue Time Series Analysis and Forecasting)
Show Figures

Figure 1

Article
Lp Loss Functions in Invariance Alignment and Haberman Linking with Few or Many Groups
Stats 2020, 3(3), 246-283; https://doi.org/10.3390/stats3030019 - 05 Aug 2020
Cited by 10 | Viewed by 1429
Abstract
The comparison of group means in latent variable models plays a vital role in empirical research in the social sciences. The present article discusses an extension of invariance alignment and Haberman linking by choosing the robust power loss function [...] Read more.
The comparison of group means in latent variable models plays a vital role in empirical research in the social sciences. The present article discusses an extension of invariance alignment and Haberman linking by choosing the robust power loss function ρ(x)=|x|p(p>0). This power loss function with power values p smaller than one is particularly suited for item responses that are generated under partial invariance. For a general class of linking functions, asymptotic normality of estimates is shown. Moreover, the theory of M-estimation is applied for obtaining linking errors (i.e., inference with respect to a population of items) for this class of linking functions. In a simulation study, it is shown that invariance alignment and Haberman linking have comparable performance, and in some conditions, the newly proposed robust Haberman linking outperforms invariance alignment. In three examples, the influence of the choice of a particular linking function on the estimation of group means is demonstrated. It is concluded that the choice of the loss function in linking is related to structural assumptions about the pattern of noninvariance in item parameters. Full article
(This article belongs to the Section Multivariate Analysis)
Show Figures

Figure 1

Editorial
Learning from Data to Optimize Control in Precision Farming
Stats 2020, 3(3), 239-245; https://doi.org/10.3390/stats3030018 - 24 Jul 2020
Cited by 1 | Viewed by 1097
Abstract
Precision farming is one way of many to meet a 55 percent increase in global demand for agricultural products on current agricultural land by 2050 at reduced need of fertilizers and efficient use of water resources. The catalyst for the emergence of precision [...] Read more.
Precision farming is one way of many to meet a 55 percent increase in global demand for agricultural products on current agricultural land by 2050 at reduced need of fertilizers and efficient use of water resources. The catalyst for the emergence of precision farming has been satellite positioning and navigation followed by Internet-of-Things, generating vast information that can be used to optimize farming processes in real-time. Statistical tools from data mining, predictive modeling, and machine learning analyze patterns in historical data, to make predictions about future events as well as intelligent actions. This special issue presents the latest development in statistical inference, machine learning, and optimum control for precision farming. Full article
(This article belongs to the Special Issue Statistical Tools in Precision Farming)
Article
A Bayesian Adaptive Design in Cancer Phase I Trials Using Dose Combinations with Ordinal Toxicity Grades
Stats 2020, 3(3), 221-238; https://doi.org/10.3390/stats3030017 - 17 Jul 2020
Viewed by 1027
Abstract
We propose a Bayesian adaptive design for early phase drug combination cancer trials incorporating ordinal grade of toxicities. Parametric models are used to describe the relationship between the dose combinations and the probabilities of the ordinal toxicities under the proportional odds assumption. Trial [...] Read more.
We propose a Bayesian adaptive design for early phase drug combination cancer trials incorporating ordinal grade of toxicities. Parametric models are used to describe the relationship between the dose combinations and the probabilities of the ordinal toxicities under the proportional odds assumption. Trial design proceeds by treating cohorts of two patients simultaneously receiving different dose combinations. Specifically, at each stage of the trial, we seek the dose of one agent by minimizing the Bayes risk with respect to a loss function given the current dose of the other agent. We consider two types of loss functions corresponding to the Continual Reassessment Method (CRM) and Escalation with Overdose Control (EWOC). At the end of the trial, we estimate the MTD curve as a function of Bayes estimates of the model parameters. We evaluate design operating characteristics in terms of safety of the trial and percent of dose recommendation at dose combination neighborhoods around the true MTD by comparing this design to the one that uses a binary indicator of DLT. The methodology is further adapted to the case of a pre-specified discrete set of dose combinations. Full article
(This article belongs to the Section Biostatistics)
Show Figures

Figure 1

Article
Multivariate Mixed Response Model with Pairwise Composite-Likelihood Method
Stats 2020, 3(3), 203-220; https://doi.org/10.3390/stats3030016 - 15 Jul 2020
Cited by 3 | Viewed by 1482
Abstract
In clinical research, study outcomes usually consist of various patients’ information corresponding to the treatment. To have a better understanding of the effects of different treatments, one often needs to analyze multiple clinical outcomes simultaneously, while the data are usually mixed with both [...] Read more.
In clinical research, study outcomes usually consist of various patients’ information corresponding to the treatment. To have a better understanding of the effects of different treatments, one often needs to analyze multiple clinical outcomes simultaneously, while the data are usually mixed with both continuous and discrete variables. We propose the multivariate mixed response model to implement statistical inference based on the conditional grouped continuous model through a pairwise composite-likelihood approach. It can simplify the multivariate model by dealing with three types of bivariate models and incorporating the asymptotical properties of the composite likelihood via the Godambe information. We demonstrate the validity and the statistic power of the multivariate mixed response model through simulation studies and clinical applications. This composite-likelihood method is advantageous for statistical inference on correlated multivariate mixed outcomes. Full article
Show Figures

Figure 1

Article
Non-Negativity of a Quadratic form with Applications to Panel Data Estimation, Forecasting and Optimization
Stats 2020, 3(3), 185-202; https://doi.org/10.3390/stats3030015 - 06 Jul 2020
Viewed by 961
Abstract
For a symmetric matrix B, we determine the class of Q such that QtBQ is non-negative definite and apply it to panel data estimation and forecasting: the Hausman test for testing the endogeneity of the random effects in panel data [...] Read more.
For a symmetric matrix B, we determine the class of Q such that Q t BQ is non-negative definite and apply it to panel data estimation and forecasting: the Hausman test for testing the endogeneity of the random effects in panel data models. We show that the test can be performed if the estimated error variances in the fixed and random effects models satisfy a specific inequality. If it fails, we discuss the restrictions under which the test can be performed. We show that estimators satisfying the inequality exist. Furthermore, we discuss an application to a constrained quadratic minimization problem with an indefinite objective function. Full article
Article
A Family of Correlated Observations: From Independent to Strongly Interrelated Ones
Stats 2020, 3(3), 166-184; https://doi.org/10.3390/stats3030014 - 30 Jun 2020
Cited by 5 | Viewed by 1182
Abstract
This paper proposes a new classification of correlated data types based upon the relative number of direct connections among observations, producing a family of correlated observations embracing seven categories, one whose empirical counterpart currently is unknown, and ranging from independent (i.e., no links) [...] Read more.
This paper proposes a new classification of correlated data types based upon the relative number of direct connections among observations, producing a family of correlated observations embracing seven categories, one whose empirical counterpart currently is unknown, and ranging from independent (i.e., no links) to approaching near-complete linkage (i.e., n(n − 1)/2 links). Analysis of specimen datasets from publicly available data sources furnishes empirical illustrations for these various categories. Their descriptions also include their historical context and calculation of their effective sample sizes (i.e., an equivalent number of independent observations). Concluding comments contain some state-of-the-art future research topics. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop