Next Article in Journal
Indoor 3D Localization Scheme Based on BLE Signal Fingerprinting and 1D Convolutional Neural Network
Previous Article in Journal
Ultralow Voltage FinFET- Versus TFET-Based STT-MRAM Cells for IoT Applications
 
 
Article

FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

1
National Scientific and Technical Research Council, CONICET, La Plata 1900, Argentina
2
Institute of Research in Computer Science LIDI, III-LIDI, Scientific Research Commission, Province of Buenos Aires, CIC-PBA, School of Computer Science, National University of La Plata, UNLP, La Plata 1900, Argentina
3
Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071 Granada, Spain
*
Author to whom correspondence should be addressed.
Academic Editor: Rashid Mehmood
Electronics 2021, 10(15), 1757; https://doi.org/10.3390/electronics10151757
Received: 23 June 2021 / Revised: 14 July 2021 / Accepted: 18 July 2021 / Published: 22 July 2021
(This article belongs to the Section Artificial Intelligence)
In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline. View Full-Text
Keywords: big data; data reduction; classification; preprocessing techniques; Apache Spark big data; data reduction; classification; preprocessing techniques; Apache Spark
Show Figures

Figure 1

MDPI and ACS Style

Basgall, M.J.; Naiouf, M.; Fernández, A. FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/electronics10151757

AMA Style

Basgall MJ, Naiouf M, Fernández A. FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics. 2021; 10(15):1757. https://doi.org/10.3390/electronics10151757

Chicago/Turabian Style

Basgall, María José, Marcelo Naiouf, and Alberto Fernández. 2021. "FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems" Electronics 10, no. 15: 1757. https://doi.org/10.3390/electronics10151757

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop