Next Article in Journal
Computational Algorithms Underlying the Time-Based Detection of Sudden Cardiac Arrest via Electrocardiographic Markers
Previous Article in Journal
Study on the High Temperature Interfacial Stability of Ti/Mo/Yb0.3Co4Sb12 Thermoelectric Joints
Article Menu
Issue 9 (September) cover image

Export Article

Open AccessArticle
Appl. Sci. 2017, 7(9), 951; doi:10.3390/app7090951

Practical Challenge of Shredded Documents: Clustering of Chinese Homologous Pieces

1
School of Physics and Optoelectronic Engineering, Xidian University, Xi’an 710071, China
2
School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
*
Author to whom correspondence should be addressed.
Received: 20 July 2017 / Revised: 7 September 2017 / Accepted: 12 September 2017 / Published: 15 September 2017
(This article belongs to the Section Computer Science and Electrical Engineering)

Abstract

When recovering a shredded document that has numerous mixed pieces, the difficulty of the recovery process can be reduced by clustering, which is a method of grouping pieces that originally belonged to the same page. Restoring homologous shredded documents (pieces from different pages of the same file) is a frequent problem, and because these pieces have nearly indistinguishable visual characteristics, grouping them is extremely difficult. Clustering research has important practical significance for document recovery because homologous pieces are ubiquitous. Because of the wide usage of Chinese and the huge demand for Chinese shredded document recovery, our research focuses on Chinese homologous pieces. In this paper, we propose a method of completely clustering Chinese homologous pieces in which the distribution features of the characters in the pieces and the document layout are used to correlate adjacent pieces and cluster them in different areas of a document. The experimental results show that the proposed method has a good clustering effect on real pieces. For the dataset containing 10 page documents (a total of 462 pieces), its average accuracy is 97.19%. View Full-Text
Keywords: shredded documents; homologous pieces; document layout; subarea clustering; digital forensics shredded documents; homologous pieces; document layout; subarea clustering; digital forensics
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Xing, N.; Zhang, J.; Cao, F.; Liu, P. Practical Challenge of Shredded Documents: Clustering of Chinese Homologous Pieces. Appl. Sci. 2017, 7, 951.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top