Skip to Content
You are currently on the new version of our website. Access the old version .

Stats, Volume 3, Issue 4

2020 December - 7 articles

Cover Story: The analysis of massive databases is a key issue for most applications today, and the use of parallel computing techniques is one of the suitable approaches for that. One way to perform statistical analyses over massive databases is combining some tools via the sparklyr package, which allows for an R application to use Apache Spark as a framework. This paper presents an analysis of Brazilian public data from the Bolsa Família Programme (BFP—conditional cash transfer), comprising a local processing of a large data set with 1.26 billion observations which total more than 100 GB. Our goal was to understand how this social program acts in different cities, as well as to identify potentially important variables to BFP utilization rate. The analysis was performed with RF and indicated the high importance of some variables such as family income, education, occupation, and density of people in the homes. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list .
  • You may sign up for email alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

Articles

There are no articles in this issue yet.

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Stats - ISSN 2571-905X