Next Article in Journal
A Method to Determine Core Design Problems and a Corresponding Solution Strategy
Previous Article in Journal
Fractional-Order Fusion Model for Low-Light Image Enhancement
Article Menu
Issue 4 (April) cover image

Export Article

Open AccessArticle

When Considering More Elements: Attribute Correlation in Unsupervised Data Cleaning under Blocking

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(4), 575; https://doi.org/10.3390/sym11040575
Received: 24 February 2019 / Revised: 30 March 2019 / Accepted: 11 April 2019 / Published: 19 April 2019
  |  
PDF [2852 KB, uploaded 25 April 2019]
  |  

Abstract

In banks, governments, and internet companies, due to the increasing demand for data in various information systems and continuously shortening of the cycle for data collection and update, there may be a variety of data quality issues in a database. As the expansion of data scales, methods such as pre-specifying business rules or introducing expert experience into a repair process are no longer applicable to some information systems requiring rapid responses. In this case, we divided data cleaning into supervised and unsupervised forms according to whether there were interventions in the repair processes and put forward a new dimension suitable for unsupervised cleaning in this paper. For weak logic errors in unsupervised data cleaning, we proposed an attribute correlation-based (ACB)-Framework under blocking, and designed three different data blocking methods to reduce the time complexity and test the impact of clustering accuracy on data cleaning. The experiments showed that the blocking methods could effectively reduce the repair time by maintaining the repair validity. Moreover, we concluded that the blocking methods with a too high clustering accuracy tended to put tuples with the same elements into a data block, which reduced the cleaning ability. In summary, the ACB-Framework with blocking can reduce the corresponding time cost and does not need the guidance of domain knowledge or interventions in repair, which can be applied in information systems requiring rapid responses, such as internet web pages, network servers, and sensor information acquisition. View Full-Text
Keywords: data quality; unsupervised data cleaning; attribute correlation; data blocking; machine learning data quality; unsupervised data cleaning; attribute correlation; data blocking; machine learning
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Li, P.; Dai, C.; Wang, W. When Considering More Elements: Attribute Correlation in Unsupervised Data Cleaning under Blocking. Symmetry 2019, 11, 575.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top