PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification
Abstract
:1. Introduction
1.1. Images with Different Local Detail Features and Decentralized Global Features
1.2. Existing Models Are Difficult to Deploy on Mobile Devices
1.3. Sample Imbalance
- Passport Background Texture Dataset: We constructed a passport background texture image dataset that includes four types of passport background textures from various countries.
- Parallel Hybrid Architecture for Passport Background Texture Image Classification: We introduce a new parallel hybrid architecture that combines CNN and Transformer. The CNN delivers comprehensive local features to the Transformer through its bottom–up extraction process, while the Transformer, in turn, enhances the CNN’s feature extraction capabilities with its top–down attention mechanisms. This information interaction strengthens the collaboration between the two components, improving the model’s overall performance. Concurrently, this interaction allows the CNN, which is good at extracting local features, to better model the global features extracted by the Transformer.
- Cross-Branch Interaction via Feature Enhancement Module: To improve the model’s nonlinear representation and feature representation capabilities in image classification tasks, we introduce a feature enhancement module, which facilitates cross-branch communication within the model and helps to improve the ability to differentiate between different classes of image features, thus enhancing classification accuracy.
- Comprehensive Evaluation and Performance: We evaluated our proposed framework PBNet on a self-constructed dataset and compared it with other models. The experimental results show that PBNet displays superior performance, highlighting the effectiveness of our model in the task of passport background texture image classification.
2. Related Work
2.1. CNN-Based Image Classification Method
2.2. Transformer-Based Image Classification Method
2.3. Deep Learning-Based Passport Detection
3. Methodology
3.1. Research Process
3.2. The Details of the Proposed Model
3.2.1. Inverted Residuals Block
3.2.2. Feature Enhancement Module
3.2.3. Focal Loss
4. Experiments
4.1. Datasets
- Images must have a resolution exceeding 200 by 200 pixels.
- Photos should be taken using natural lighting.
- The object of the image should be Transformed design, lithographic printing, laser printing, or inkjet printing.
4.2. Experiment Platform
4.3. Evaluation Criteria
4.4. Ablation Study
- Model 1: MobileNet v2 is the experimental model.
- Model 2: Transformer is the experimental model.
- Model 3: A parallel structure is adopted, consisting of a CNN branch and a Transformer branch, where the CNN branch is the residual block of MobileNet v2 and the Transformer branch is the standard transformer encoder.
- Model 4: Based on model 3, we replaced the loss function with focal loss.
- Model 5: Based on model 4, we added the feature fusion module.
4.5. Training Results of Model
4.6. Compare with State-of-the-Art Models
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Baechler, S. Document Fraud: Will Your Identity Be Secure in the Twenty-First Century? Eur. J. Crim. Policy Res. 2020, 26, 379–398. [Google Scholar] [CrossRef]
- Devlin, C.; Chadwick, S.; Moret, S.; Baechler, S.; Raymond, J.; Morelato, M. The Potential of Using the Forensic Profiles of Australian Fraudulent Identity Documents to Assist Intelligence-Led Policing. Aust. J. Forensic Sci. 2023, 55, 720–730. [Google Scholar] [CrossRef]
- Moulin, S.L.; Ertan, E.; Martin, D.; Baechler, S. Cross-Border Forensic Profiling of Fraudulent Identity and Travel Documents: A Pilot Project Between France and Switzerland. Sci. Justice 2024, 64, 202–209. [Google Scholar] [CrossRef] [PubMed]
- Moulin, S.L.; Weyermann, C.; Baechler, S. An Efficient Method to Detect Series of Fraudulent Identity Documents Based on Digitised Forensic Data. Sci. Justice 2022, 62, 610–620. [Google Scholar] [CrossRef] [PubMed]
- Leese, M.; Noori, S.; Scheel, S. Data Matters: The Politics and Practices of Digital Border and Migration Management. Geopolitics 2022, 27, 5–25. [Google Scholar] [CrossRef]
- Saadi, Z.M.; Sadiq, A.T.; Akif, O.Z.; Farhan, A.K. A Survey: Security Vulnerabilities and Protective Strategies for Graphical Passwords. Electronics 2024, 13, 3042. [Google Scholar] [CrossRef]
- Ouassam, E.; Dabachine, Y.; Hmina, N.; Bouikhalene, B. Improving the Efficiency and Security of Passport Control Processes at Airports by Using the R-cnn Object Detection Model. Baghdad Sci. J. 2024, 21, 0524–0536. [Google Scholar] [CrossRef]
- Elebe, T.M.; Kurnaz, S. Efficient Detection of Refugees and Migrants in Turkey Using Convolutional Neural Network. Phys. Commun. 2023, 59, 102078. [Google Scholar] [CrossRef]
- Liu, Y.; Joren, H.; Gupta, O.; Raviv, D. Mrz Code Extraction from Visa and Passport Documents Using Convolutional Neural Networks. Int. J. Doc. Anal. Recognit. (IJDAR) 2022, 25, 29–39. [Google Scholar] [CrossRef]
- Dimitriou, E.; Michailidis, N. Printable Conductive Inks Used for the Fabrication of Electronics: An Overview. Nanotechnology 2021, 32, 502009. [Google Scholar] [CrossRef]
- Tao, Y.M.; Tang, H.; Yang, X.; Chen, X.H. Assessment of High-Quality Counterfeit Stamp Impressions Generated by Inkjet Printers via Texture Analysis and Likelihood Ratio. Forensic Sci. Int. 2023, 344, 111573. [Google Scholar] [CrossRef] [PubMed]
- Li, T.; Zhang, Z.; Zhu, M.; Cui, Z.; Wei, D. Combining Transformer Global and Local Feature Extraction for Object Detection. Complex Intell. Syst. 2024, 10, 4897–4920. [Google Scholar] [CrossRef]
- Zhao, Z.; Chen, T.; Dou, J.; Liu, G.; Plaza, A. Landslide Susceptibility Mapping Considering Landslide Local-Global Features Based on Cnn and Transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7475–7489. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. Squeezenet: Alexnet-Level Accuracy with 50x Fewer Parameters and <0.5 Mb Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Burri, S.R.; Ahuja, S.; Kumar, A.; Baliyan, A. Exploring the Effectiveness of Optimized Convolutional Neural Network in Transfer Learning for Image Classification: A Practical Approach. In Proceedings of the 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 5–6 May 2023; pp. 598–602. [Google Scholar]
- Han, Q.; Qian, X.; Xu, H.; Wu, K.; Meng, L.; Qiu, Z.; Weng, T.; Zhou, B.; Gao, X. Dm-cnn: Dynamic Multi-Scale Convolutional Neural Network with Uncertainty Quantification for Medical Image Classification. Comput. Biol. Med. 2024, 168, 107758. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, W.; Che, Q. Innovative Research on Intelligent Recognition of Winter Jujube Defects by Applying Convolutional Neural Networks. Electronics 2024, 13, 2941. [Google Scholar] [CrossRef]
- Jiang, P.; Xue, Y.; Neri, F. Convolutional Neural Network Pruning Based on Multi-Objective Feature Map Selection for Image Classification. Appl. Soft Comput. 2023, 139, 110229. [Google Scholar] [CrossRef]
- Zhang, M.; Liu, L.; Jin, Y.; Lei, Z.; Wang, Z.; Jiao, L. Tree-Shaped Multiobjective Evolutionary Cnn for Hyperspectral Image Classification. Appl. Soft Comput. 2024, 152, 111176. [Google Scholar] [CrossRef]
- Pandiri, D.K.; Murugan, R.; Goel, T. Smart Soil Image Classification System Using Lightweight Convolutional Neural Network. Expert Syst. Appl. 2024, 238, 122185. [Google Scholar] [CrossRef]
- Li, X.; Liu, Y.; Zheng, L.; Zhang, W. A Lightweight Convolutional Spiking Neural Network for Fires Detection Based on Acoustics. Electronics 2024, 13, 2948. [Google Scholar] [CrossRef]
- Ran, Q.; Zhou, Y.; Hong, D.; Bi, M.; Ni, L.; Li, X.; Ahmad, M. Deep Transformer and Few-Shot Learning for Hyperspectral Image Classification. CAAI Trans. Intell. Technol. 2023, 8, 1323–1336. [Google Scholar] [CrossRef]
- Zhou, W.; Dou, P.; Su, T.; Hu, H.; Zheng, Z. Feature Learning Network with Transformer for Multi-Label Image Classification. Pattern Recognit. 2023, 136, 109203. [Google Scholar] [CrossRef]
- Wu, L.; Zhou, J.; Jiang, H.; Yang, X.; Zhan, Y.; Zhang, Y. Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model. Electronics 2024, 13, 3064. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, J.; Du, B.; Zhang, L.; Tao, D. Dcn-t: Dual Context Network with Transformer for Hyperspectral Image Classification. IEEE Trans. Image Process. 2023, 32, 2536–2551. [Google Scholar] [CrossRef] [PubMed]
- Gong, L.Y.; Li, X.J.; Chong, P.H.J. Swin-Fake: A Consistency Learning Transformer-Based Deepfake Video Detector. Electronics 2024, 13, 3045. [Google Scholar] [CrossRef]
- Yuan, F.; Zhang, Z.; Fang, Z. An Effective Cnn and Transformer Complementary Network for Medical Image Segmentation. Pattern Recognit. 2023, 136, 109228. [Google Scholar] [CrossRef]
- Zhang, X.; Su, Y.; Gao, L.; Bruzzone, L.; Gu, X.; Tian, Q. A Lightweight Transformer Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
- Al-Ghadi, M.; Ming, Z.; Gomez-Krämer, P.; Burie, J.C. Identity Documents Authentication Based on Forgery Detection of Guilloche Pattern. arXiv 2022, arXiv:2206.10989. [Google Scholar]
- Sirajudeen, M.; Anitha, R. Forgery Document Detection in Information Management System Using Cognitive Techniques. J. Intell. Fuzzy Syst. 2020, 39, 8057–8068. [Google Scholar] [CrossRef]
- Gonzalez, S.; Valenzuela, A.; Tapia, J. Hybrid Two-Stage Architecture for Tampering Detection of Chipless ID Cards. IEEE Trans. Biom. Behav. Identity Sci. 2020, 3, 89–100. [Google Scholar] [CrossRef]
- Ghanmi, N.; Nabli, C.; Awal, A.M. Checksim: A Reference-Based Identity Document Verification by Image Similarity Measure. In Document Analysis and Recognition–ICDAR 2021 Workshops: Lausanne, Switzerland, 5–10 September 2021, Proceedings, Part I 16; Springer: Cham, Switzerland, 2021; pp. 422–436. [Google Scholar]
- Xu, J.; Jia, D.; Lin, Z.; Zhou, T. Psfnet: A Deep Learning Network for Fake Passport Detection. IEEE Access 2022, 10, 123337–123348. [Google Scholar] [CrossRef]
- Jeny, A.A.; Junayed, M.S.; Atik, S.T. Passnet-Country Identification by Classifying Passport Cover Using Deep Convolutional Neural Networks. In Proceedings of the 2018 21st International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 21–23 December 2018; pp. 1–6. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
- Dina, A.S.; Siddique, A.; Manivannan, D. A Deep Learning Approach for Intrusion Detection in Internet of Things Using Focal Loss Function. Internet Things 2023, 22, 100699. [Google Scholar] [CrossRef]
- Batool, U.; Shapiai, M.I.; Mostafa, S.A.; Ibrahim, M.Z. An Attention-Augmented Convolutional Neural Network with Focal Loss for Mixed-Type Wafer Defect Classification. IEEE Access 2023, 11, 108891–108905. [Google Scholar] [CrossRef]
- Bono, F.M.; Radicioni, L.; Cinquemani, S. A novel approach for quality control of automated production lines working under highly inconsistent conditions. Eng. Appl. Artif. Intell. 2023, 122, 106149. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-cam: Why Did You Say That? arXiv 2016, arXiv:1611.07450. [Google Scholar]
- Liu, Y.; Zhang, Z.; Liu, X.; Lei, W.; Xia, X. Deep Learning Based Mineral Image Classification Combined with Visual Attention Mechanism. IEEE Access 2021, 9, 98091–98109. [Google Scholar] [CrossRef]
- Valero-Carreras, D.; Alcaraz, J.; Landete, M. Comparing Two SVM Models Through Different Metrics Based on the Confusion Matrix. Comput. Oper. Res. 2023, 152, 106131. [Google Scholar] [CrossRef]
- Lee, G.Y.; Dam, T.; Ferdaus, M.M.; Poenar, D.P.; Duong, V.N. Watt-effnet: A lightweight and accurate model for classifying aerial disaster images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Kyrkou, C.; Theocharides, T. EmergencyNet: Efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2020, 13, 1687–1699. [Google Scholar] [CrossRef]
- Wang, F.; Liang, Y.; Lin, Z.; Zhou, J.; Zhou, T. SSA-ELM: A Hybrid Learning Model for Short-Term Traffic Flow Forecasting. Mathematics 2024, 12, 1895. [Google Scholar] [CrossRef]
Model | CNN | Transformer | Focal Loss | Feature Enhance Module | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|---|---|---|
1 | ✓ | 0.8230 | 0.6671 | 0.6608 | 0.8201 | |||
2 | ✓ | 0.7210 | 0.7165 | 0.7160 | 0.7185 | |||
3 | ✓ | ✓ | 0.8540 | 0.8517 | 0.8500 | 0.8490 | ||
4 | ✓ | ✓ | ✓ | 0.8490 | 0.8450 | 0.8370 | 0.8380 | |
5 | ✓ | ✓ | ✓ | ✓ | 0.8810 | 0.8829 | 0.8790 | 0.8799 |
Model | Train ACC (%) | Val ACC (%) | Number of Parameters | Inference Speed | FLOPs |
---|---|---|---|---|---|
VGG | 86.75% | 54.60% | 138.35 M | 0.0157 | 15.48 G |
Resnet18 | 99.85% | 69.00% | 11.69 M | 0.0111 | 1.82 G |
MobileNet v2 | 92.83% | 82.30% | 3.50 M | 0.0105 | 0.32 G |
Transformer | 96.33% | 72.10% | 85.80 M | 0.0139 | 16.85 G |
Swin-Transformer | 98.18% | 75.30% | 27.52 M | 0.0115 | 4.35 G |
PBNet | 94.67% | 88.10% | 1.05 M | 0.0225 | 43.24 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, J.; Jia, D.; Lin, Z.; Zhou, T.; Wu, J.; Tang, L. PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification. Electronics 2024, 13, 4160. https://doi.org/10.3390/electronics13214160
Xu J, Jia D, Lin Z, Zhou T, Wu J, Tang L. PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification. Electronics. 2024; 13(21):4160. https://doi.org/10.3390/electronics13214160
Chicago/Turabian StyleXu, Jiafeng, Dawei Jia, Zhizhe Lin, Teng Zhou, Jie Wu, and Lin Tang. 2024. "PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification" Electronics 13, no. 21: 4160. https://doi.org/10.3390/electronics13214160
APA StyleXu, J., Jia, D., Lin, Z., Zhou, T., Wu, J., & Tang, L. (2024). PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification. Electronics, 13(21), 4160. https://doi.org/10.3390/electronics13214160