You are currently viewing a new version of our website. To view the old version click .

Stats, Volume 3, Issue 4

December 2020 - 7 articles

Cover Story: The analysis of massive databases is a key issue for most applications today, and the use of parallel computing techniques is one of the suitable approaches for that. One way to perform statistical analyses over massive databases is combining some tools via the sparklyr package, which allows for an R application to use Apache Spark as a framework. This paper presents an analysis of Brazilian public data from the Bolsa Família Programme (BFP—conditional cash transfer), comprising a local processing of a large data set with 1.26 billion observations which total more than 100 GB. Our goal was to understand how this social program acts in different cities, as well as to identify potentially important variables to BFP utilization rate. The analysis was performed with RF and indicated the high importance of some variables such as family income, education, occupation, and density of people in the homes. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list .
  • You may sign up for email alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

Articles (7)

  • Article
  • Open Access
53 Citations
4,921 Views
16 Pages

6 November 2020

In a multiple linear regression model, the ordinary least squares estimator is inefficient when the multicollinearity problem exists. Many authors have proposed different estimators to overcome the multicollinearity problem for linear regression mode...

  • Article
  • Open Access
2,376 Views
16 Pages

5 November 2020

In a functional linear model (FLM) with scalar response, the parameter curve quantifies the relationship between a functional explanatory variable and a scalar response. While these models can be ill-posed, a penalized regression spline approach may...

  • Article
  • Open Access
2 Citations
3,537 Views
26 Pages

Model Free Inference on Multivariate Time Series with Conditional Correlations

  • Dimitrios Thomakos,
  • Johannes Klepsch and
  • Dimitris N. Politis

3 November 2020

New results on volatility modeling and forecasting are presented based on the NoVaS transformation approach. Our main contribution is that we extend the NoVaS methodology to modeling and forecasting conditional correlation, thus allowing NoVaS to wor...

  • Article
  • Open Access
4 Citations
2,310 Views
9 Pages

31 October 2020

The purpose of this note is to introduce and investigate the nonparametric estimation of the conditional mode using wavelet methods. We propose a new linear wavelet estimator for this problem. The estimator is constructed by combining a specific rati...

  • Article
  • Open Access
17 Citations
5,803 Views
10 Pages

29 October 2020

The first purpose of this study was to examine the factor structure of the Adult Self-Report (ASR) via traditional confirmatory factor analysis (CFA) and contemporary exploratory structural equation modeling (ESEM). The second purpose was to examine...

  • Article
  • Open Access
5 Citations
5,098 Views
21 Pages

Local Processing of Massive Databases with R: A National Analysis of a Brazilian Social Programme

  • Hellen Paz,
  • Mateus Maia,
  • Fernando Moraes,
  • Ricardo Lustosa,
  • Lilia Costa,
  • Samuel Macêdo,
  • Marcos E. Barreto and
  • Anderson Ara

19 October 2020

The analysis of massive databases is a key issue for most applications today and the use of parallel computing techniques is one of the suitable approaches for that. Apache Spark is a widely employed tool within this context, aiming at processing lar...

  • Article
  • Open Access
3 Citations
3,435 Views
17 Pages

Identification of Judicial Outcomes in Judgments: A Generalized Gini-PLS Approach

  • Gildas Tagny-Ngompé,
  • Stéphane Mussard,
  • Guillaume Zambrano,
  • Sébastien Harispe and
  • Jacky Montmain

27 September 2020

This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gath...

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Stats - ISSN 2571-905X