Next Article in Journal
An Online Algorithm for Dynamic NFV Placement in Cloud-Based Autonomous Response Networks
Previous Article in Journal
Searching on Encrypted E-Data Using Random Searchable Encryption (RanSCrypt) Scheme
Article Menu
Issue 5 (May) cover image

Export Article

Open AccessFeature PaperArticle
Symmetry 2018, 10(5), 162;

Sampling Based Histogram PCA and Its Mapreduce Parallel Implementation on Multicore

School of Economics and Management, Beihang University, Beijing 100191, China
Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing 100191, China
School of Statistics and Mathematics, Central University of Finance and Economics, Beijing 100081, China
CEREMADE, Paris-Dauphine University, 75775 Paris, France
MAPMO, University of Orleans, 45067 Orleans, France
Author to whom correspondence should be addressed.
Received: 7 April 2018 / Revised: 28 April 2018 / Accepted: 11 May 2018 / Published: 15 May 2018
Full-Text   |   PDF [553 KB, uploaded 15 May 2018]   |  


In existing principle component analysis (PCA) methods for histogram-valued symbolic data, projection results are approximated based on Moore’s algebra and fail to reflect the data’s true structure, mainly because there is no precise, unified calculation method for the linear combination of histogram data. In this paper, we propose a new PCA method for histogram data that distinguishes itself from various well-established methods in that it can project observations onto the space spanned by principal components more accurately and rapidly by sampling through a MapReduce framework. The new histogram PCA method is implemented under the same assumption of “orthogonal dimensions for every observation” with the existing literatures. To project observations, the method first samples from the original histogram variables to acquire single-valued data, on which linear combination operations can be performed. Then, the projection of observations can be given by linear combination of loading vectors and single-valued samples, which is close to accurate projection results. Finally, the projection is summarized to histogram data. These procedures involve complex algorithms and large-scale data, which makes the new method time-consuming. To speed it up, we undertake a parallel implementation of the new method in a multicore MapReduce framework. A simulation study and an empirical study confirm that the new method is effective and time-saving. View Full-Text
Keywords: histogram-valued symbolic data; Principal component analysis; sampling; mapreduce; parallel histogram-valued symbolic data; Principal component analysis; sampling; mapreduce; parallel

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Wang, C.; Wang, H.; Wang, S.; Diday, E.; Emilion, R. Sampling Based Histogram PCA and Its Mapreduce Parallel Implementation on Multicore. Symmetry 2018, 10, 162.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top