Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data
Abstract
:1. Introduction
- We introduce a self-supervised strategy of writer recognition based on generating clusterable embeddings, named Self-Writer. The training procedure learns directly from the unlabeled data.
- To train the Siamese architecture, we use a hypothesis-based pairwise constraint and nongenerative augmentation. The AutoEmbedder framework and nongenerative augmentation concentrate on the actual feature relationship instead of the hypothetical constraints.
- Two intercluster-based strategies—triplet and pairwise architectures—evaluate the proposed policy and conclude that a DL architecture can distinguish writers from pseudolabels depending on feature similarity.
2. Related Work
3. Methodology
3.1. Data Preprocessing
3.2. Self-Supervision Task
3.3. Paper’s Assumptions
3.4. Pairwise Constraints
3.5. Uncertainty of Pairwise Constraints
- Error in cannot-link constraints: Consider that the input pair and belong to two different classes, ∈ and ∈, where . Because the cluster assignment is based on manuscripts, the number of manuscripts outnumbers the actual number of writers. In consideration of the ground values, the hypothesis might be incorrect, and the input pair could belong to the same author.
- Impurity in can-link constraints: The main idea of the dataset is that a handwritten manuscript comprises only one person’s writing. In general, a manuscript may incorrectly identify writing and contain the writing of numerous individuals in a single manuscript. Let the input pair and belong to same script, . The manuscript might be impure, so the cluster assignment may be wrong, and the input pair may belong to different individuals.
3.6. AutoEmbedder Architecture
3.7. Augmenting Training Data
Algorithm 1: Self-Writer training algorithm |
|
4. Results
4.1. Evaluation Metrics
- Normalized Mutual Information: The normalized mutual information can be mathematically defined as
- Accuracy: Accuracy refers to the unsupervised clustering accuracy, expressed as
- Adjusted Rand Index: The adjusted rand index is calculated by using the contengency [56]. The ARI can be expressed asHere, , , and are the values of the contingency table produced by the Self-Writer.
4.2. Datasets
4.2.1. IAM
4.2.2. CVL
4.3. Results and Comparison
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Khan, F.A.; Khelifi, F.; Tahir, M.A.; Bouridane, A. Dissimilarity Gaussian mixture models for efficient offline handwritten text-independent identification using SIFT and RootSIFT descriptors. IEEE Trans. Inf. Forensics Secur. 2018, 14, 289–303. [Google Scholar] [CrossRef]
- Tapiador, M.; Gómez, J.; Sigüenza, J.A. Writer identification forensic system based on support vector machines with connected components. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Berlin/Heidelberg, Germany, 17 May 2004; pp. 625–632. [Google Scholar]
- Fornés, A.; Lladós, J.; Sánchez, G.; Bunke, H. Writer identification in old handwritten music scores. In Proceedings of the 2008 the Eighth IAPR International Workshop on Document Analysis Systems, Nara, Japan, 16–19 September 2008; pp. 347–353. [Google Scholar]
- Fornés, A.; Lladós, J.; Sánchez, G.; Bunke, H. On the use of textural features for writer identification in old handwritten music scores. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Catalonia, Spain, 26–29 July 2009; pp. 996–1000. [Google Scholar]
- Ballard, L.; Lopresti, D.; Monrose, F. Evaluating the security of handwriting biometrics. In Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 23–26 October 2006. [Google Scholar]
- Xing, L.; Qiao, Y. Deepwriter: A multi-stream deep CNN for text-independent writer identification. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 584–589. [Google Scholar]
- Sulaiman, A.; Omar, K.; Nasrudin, M.F.; Arram, A. Length independent writer identification based on the fusion of deep and hand-crafted descriptors. IEEE Access 2019, 7, 91772–91784. [Google Scholar] [CrossRef]
- Doersch, C.; Zisserman, A. Multi-task self-supervised visual learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2051–2060. [Google Scholar]
- Zhai, X.; Oliver, A.; Kolesnikov, A.; Beyer, L. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1476–1485. [Google Scholar]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar]
- Gidaris, S.; Bursuc, A.; Komodakis, N.; Pérez, P.; Cord, M. Boosting few-shot visual learning with self-supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8059–8068. [Google Scholar]
- Baevski, A.; Zhou, H.; Mohamed, A.; Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv 2020, arXiv:2006.11477. [Google Scholar]
- Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Ohi, A.Q.; Mridha, M.F.; Safir, F.B.; Hamid, M.A.; Monowar, M.M. Autoembedder: A semi-supervised DNN embedding system for clustering. Knowl.-Based Syst. 2020, 204, 106190. [Google Scholar] [CrossRef]
- Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
- He, Z.; Fang, B.; Du, J.; Tang, Y.Y.; You, X. A novel method for offline handwriting-based writer identification. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, Korea, 29 August–1 September 2005; pp. 242–246. [Google Scholar]
- Helli, B.; Moghaddam, M.E. A text-independent Persian writer identification based on feature relation graph (FRG). Pattern Recognit. 2010, 43, 2199–2209. [Google Scholar] [CrossRef]
- He, Z.; Tang, Y. Chinese handwriting-based writer identification by texture analysis. In Proceedings of the 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826), Shanghai, China, 26–29 August 2004; Volume 6, pp. 3488–3491. [Google Scholar]
- Zhu, Y.; Tan, T.; Wang, Y. Biometric personal identification based on handwriting. In Proceedings of the 15th International Conference on Pattern Recognition, ICPR-2000, Barcelona, Spain, 3–8 September 2000; Volume 2, pp. 797–800. [Google Scholar]
- Schlapbach, A.; Bunke, H. A writer identification and verification system using HMM based recognizers. Pattern Anal. Appl. 2007, 10, 33–43. [Google Scholar] [CrossRef] [Green Version]
- Anwar, W.; Bajwa, I.S.; Ramzan, S. Design and implementation of a machine learning-based authorship identification model. Sci. Program. 2019, 2019, 9431073. [Google Scholar] [CrossRef]
- Zheng, W.; Liu, X.; Ni, X.; Yin, L.; Yang, B. Improving visual reasoning through semantic representation. IEEE Access 2021, 9, 91476–91486. [Google Scholar] [CrossRef]
- Zheng, W.; Yin, L.; Chen, X.; Ma, Z.; Liu, S.; Yang, B. Knowledge base graph embedding module design for Visual question answering model. Pattern Recognit. 2021, 120, 108153. [Google Scholar] [CrossRef]
- Christlein, V.; Bernecker, D.; Maier, A.; Angelopoulou, E. Offline writer identification using convolutional neural network activation features. In Proceedings of the German Conference on Pattern Recognition, Hannover, Germany, 12–15 September 2015; pp. 540–552. [Google Scholar]
- Zhang, X.Y.; Xie, G.S.; Liu, C.L.; Bengio, Y. End-to-end online writer identification with recurrent neural network. IEEE Trans. Hum.-Mach. Syst. 2016, 47, 285–292. [Google Scholar] [CrossRef]
- Semma, A.; Hannad, Y.; Siddiqi, I.; Djeddi, C.; El Kettani, M.E.Y. Writer identification using deep learning with fast keypoints and harris corner detector. Expert Syst. Appl. 2021, 184, 115473. [Google Scholar] [CrossRef]
- Fiel, S.; Sablatnig, R. Writer identification and retrieval using a convolutional neural network. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Valletta, Malta, 2–4 September 2015; pp. 26–37. [Google Scholar]
- He, S.; Schomaker, L. Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recognit. 2019, 88, 64–74. [Google Scholar] [CrossRef] [Green Version]
- He, S.; Schomaker, L. Fragnet: Writer identification using deep fragment networks. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3013–3022. [Google Scholar] [CrossRef] [Green Version]
- Zheng, W.; Tian, X.; Yang, B.; Liu, S.; Ding, Y.; Tian, J.; Yin, L. A few shot classification methods based on multiscale relational networks. Appl. Sci. 2022, 12, 4059. [Google Scholar] [CrossRef]
- Zhang, P. RSTC: A New Residual Swin Transformer For Offline Word-Level Writer Identification. IEEE Access 2022, 10, 57452–57460. [Google Scholar] [CrossRef]
- Chen, Z.; Yu, H.X.; Wu, A.; Zheng, W.S. Level online writer identification. Int. J. Comput. Vis. 2021, 129, 1394–1409. [Google Scholar] [CrossRef]
- Christlein, V.; Gropp, M.; Fiel, S.; Maier, A. Unsupervised feature learning for writer identification and writer retrieval. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 991–997. [Google Scholar]
- Chen, S.; Wang, Y.; Lin, C.T.; Ding, W.; Cao, Z. Semi-supervised feature learning for improving writer identification. Inf. Sci. 2019, 482, 156–170. [Google Scholar] [CrossRef] [Green Version]
- Zhang, R.; Isola, P.; Efros, A.A. Colorful image colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 649–666. [Google Scholar]
- Walker, J.; Gupta, A.; Hebert, M. Dense optical flow prediction from a static image. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2443–2451. [Google Scholar]
- Walker, J.; Doersch, C.; Gupta, A.; Hebert, M. An uncertain future: Forecasting from static images using variational autoencoders. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 835–851. [Google Scholar]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic siamese network for visual object tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1763–1771. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Larsson, G.; Maire, M.; Shakhnarovich, G. Learning representations for automatic colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 577–593. [Google Scholar]
- Dosovitskiy, A.; Springenberg, J.T.; Riedmiller, M.; Brox, T. Discriminative unsupervised feature learning with convolutional neural networks. Adv. Neural Inf. Process. Syst. 2014, 27, 766–774. [Google Scholar] [CrossRef] [Green Version]
- Shi, Y.; Xu, X.; Xi, J.; Hu, X.; Hu, D.; Xu, K. Learning to detect 3D symmetry from single-view RGB-D images with weak supervision. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 1–15. [Google Scholar] [CrossRef]
- Noroozi, M.; Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 69–84. [Google Scholar]
- Li, Y.; Zheng, Y.; Doermann, D.; Jaeger, S. Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1313–1329. [Google Scholar] [CrossRef] [PubMed]
- Malik, S.; Sajid, A.; Ahmad, A.; Almogren, A.; Hayat, B.; Awais, M.; Kim, K.H. An efficient skewed line segmentation technique for cursive script OCR. Sci. Program. 2020, 2020, 8866041. [Google Scholar] [CrossRef]
- Zheng, W.; Liu, X.; Yin, L. Sentence representation method based on multi-layer semantic network. Appl. Sci. 2021, 11, 1316. [Google Scholar] [CrossRef]
- Marti, U.V.; Bunke, H. The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 2002, 5, 39–46. [Google Scholar] [CrossRef]
- Kleber, F.; Fiel, S.; Diem, M.; Sablatnig, R. Cvl-database: An off-line database for writer retrieval, writer identification and word spotting. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 560–564. [Google Scholar]
- Mridha, M.F.; Ohi, A.Q.; Ali, M.A.; Emon, M.I.; Kabir, M.M. BanglaWriting: A multi-purpose offline Bangla handwriting dataset. Data Brief 2021, 34, 106633. [Google Scholar] [CrossRef]
- Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
- Santos, J.M.; Embrechts, M. On the use of the adjusted rand index as a metric for evaluating supervised classification. In Proceedings of the International Conference on Artificial Neural Networks, Limassol, Cyprus, 14–17 September 2009; pp. 175–184. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Notation | Description |
---|---|
D | A set of manuscripts of handwritten text. We assume that most manuscripts contain handwriting of an individual. |
X | A single handwritten manuscript, . |
A text block, generated by taking non-overlapping sliding window approach from a line segmented images of a manuscripts, ∈ . | |
M | The numbers of possible text blocks in a manuscript. Therotically, |
C | A set of clusters. Those clusters are constructed by utilizing the hypothetical cluster network. Because cluster correlation is established on manuscripts relations, it can consider |X| = |C| |
Represents a subset of the entire set of clusters. illustrates a cluster constructed by the interrelationship of text blocks on the document . | |
N | The number of writers in X, considering the ground truth. |
The pairwise [19] architecture’s distance hyperparameter. In other architectures, may denote the state of connectivity between any two cluster nodes. |
Oneof Blocks | Transformations | Description | Probability of Transformations | Probability of Oneof Blocks | Augmentation Probability |
---|---|---|---|---|---|
Oneof | Flip | Flip the input either horizontally, vertically | 0.5 | 0.5 | 0.5 |
Crop and Pad | Randomly crop input image and pad images based on image size fractions. | 0.5 | |||
Oneof | Downscale | Decreases image quality by downscaling and upscaling back. | 0.3 | 0.5 | |
Gaussian Blur | Apply a Gaussian filter with a random kernel size to blur the input image | 0.3 | |||
Motion Blur | Apply motion blur to the input image using a random-sized kernel. | 0.3 | |||
Oneof | Multiplicative Noise | Multiply images to a random number or array of numbers. | 0.3 | 0.5 | |
Random Brightness Contrast | Randomly change brightness and contrast of the input image. | 0.3 | |||
Gaussian Noise | Apply gaussian noise to the input image. | 0.3 | |||
Oneof | Pixel Dropout | Set pixels to 0 with some probability. | 0.5 | 0.5 | |
CoarseDropout | Coarse drop out of the rectangular regions in the image | 0.5 |
Impurity = 0 | Impurity = 0.05 | Impurity = 0.1 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
NMI | ACC | ARI | NMI | ACC | ARI | NMI | ACC | ARI | ||
25 writers | Pretext task | 0.801 | 0.463 | 0.334 | 0.791 | 0.422 | 0.352 | 0.778 | 0.430 | 0.372 |
Ground task | 0.956 | 0.948 | 0.912 | 0.898 | 0.854 | 0.848 | 0.856 | 0.807 | 0.792 | |
50 writers | Pretext task | 0.849 | 0.452 | 0.351 | 0.845 | 0.432 | 0.348 | 0.834 | 0.424 | 0.345 |
Ground task | 0.988 | 0.969 | 0.934 | 0.958 | 0.943 | 0.897 | 0.903 | 0.861 | 0.801 | |
100 writers | Pretext task | 0.841 | 0.419 | 0.309 | 0.7895 | 0.394 | 0.310 | 0.731 | 0.398 | 0.312 |
Ground task | 0.901 | 0.841 | 0.813 | 0.876 | 0.823 | 0.801 | 0.851 | 0.816 | 0.794 | |
127 writers | Pretext task | 0.836 | 0.404 | 0.301 | 0.779 | 0.396 | 0.294 | 0.711 | 0.382 | 0.299 |
Ground task | 0.898 | 0.817 | 0.798 | 0.847 | 0.794 | 0.776 | 0.816 | 0.787 | 0.741 |
Impurity = 0 | Impurity = 0.05 | Impurity = 0.1 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
NMI | ACC | ARI | NMI | ACC | ARI | NMI | ACC | ARI | ||
25 writers | Pretext task | 0.786 | 0.372 | 0.228 | 0.811 | 0.384 | 0.246 | 0.771 | 0.368 | 0.250 |
Ground task | 0.943 | 0.910 | 0.908 | 0.899 | 0.862 | 0.857 | 0.907 | 0.850 | 0.810 | |
50 writers | Pretext task | 0.800 | 0.368 | 0.231 | 0.800 | 0.362 | 0.246 | 0.800 | 0.374 | 0.268 |
Ground task | 0.974 | 0.941 | 0.919 | 0.930 | 0.901 | 0.845 | 0.914 | 0.867 | 0.816 | |
100 writers | Pretext task | 0.786 | 0.352 | 0.228 | 0.764 | 0.340 | 0.219 | 0.744 | 0.336 | 0.214 |
Ground task | 0.908 | 0.871 | 0.819 | 0.894 | 0.846 | 0.770 | 0.861 | 0.811 | 0.784 | |
150 writers | Pretext task | 0.753 | 0.337 | 0.216 | 0.727 | 0.312 | 0.178 | 0.703 | 0.297 | 0.154 |
Ground task | 0.846 | 0.793 | 0.764 | 0.824 | 0.781 | 0.743 | 0.816 | 0.775 | 0.725 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mohammad, Z.; Kabir, M.M.; Monowar, M.M.; Hamid, M.A.; Mridha, M.F. Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data. Mathematics 2022, 10, 4796. https://doi.org/10.3390/math10244796
Mohammad Z, Kabir MM, Monowar MM, Hamid MA, Mridha MF. Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data. Mathematics. 2022; 10(24):4796. https://doi.org/10.3390/math10244796
Chicago/Turabian StyleMohammad, Zabir, Muhammad Mohsin Kabir, Muhammad Mostafa Monowar, Md Abdul Hamid, and Muhammad Firoz Mridha. 2022. "Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data" Mathematics 10, no. 24: 4796. https://doi.org/10.3390/math10244796
APA StyleMohammad, Z., Kabir, M. M., Monowar, M. M., Hamid, M. A., & Mridha, M. F. (2022). Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data. Mathematics, 10(24), 4796. https://doi.org/10.3390/math10244796