Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification

Electronics 2024, 13(21), 4160; https://doi.org/10.3390/electronics13214160

by Jiafeng Xu^1,*

, Dawei Jia¹, Zhizhe Lin^2,*

, Teng Zhou^2,3

, Jie Wu⁴ and Lin Tang⁵

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Tanmoy Dam

Electronics 2024, 13(21), 4160; https://doi.org/10.3390/electronics13214160

Submission received: 24 September 2024 / Revised: 15 October 2024 / Accepted: 21 October 2024 / Published: 23 October 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a model to enhance the classification of passport background textures. I have the following suggestions for the authors' consideration:

While the authors acknowledge the limitations of the current dataset, it would be beneficial to provide more details on how the dataset can be expanded. Could the authors elaborate on plans for augmenting the dataset with more diverse and complex textures, especially counterfeit examples?

Although the authors analyze misclassified images, additional visualizations showing the Grad-CAM activations for misclassified samples could help readers better understand the limitations of the model and how it could be improved.

The paper introduces focal loss to address class imbalance, but the explanation of its impact on model performance could be clearer.

The paper mentions the use of a lightweight model for deployment on mobile devices. However, it would be useful to discuss the trade-offs between model accuracy and inference time in real-world applications.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

your approach is interesting, but i have some comments:

1) section 3.1 - how do you collect your dataset? is this dataset available for further tworks?

2) line 277 - you should describe the equation of sigmoid activation function

3)did you tested other methods for unbalance different from focal loss?

4)FIGURE 4 g) - why is there a blu square on the photo?

5) section 4.3 does not include equation of the usual metrics. they are well known in literature. You should just cite some other works where they are described and investigated in details. I suggest this: https://www.sciencedirect.com/science/article/pii/S0952197623003330

6) Can you compare your performances with other custom model available in literature on the same dataset?

7) conclusion should be supported by results. try to summarize the main achievement form a quantitative point of view

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

There are some potential shortcomings:

1. Limited comparison with state-of-the-art methods specifically designed for passport authentication.

2. Lack of detailed analysis on the model's performance on different types of forgeries.

3. Absence of discussion on the model's robustness against adversarial attacks.

As a reviewer, I would pose the following questions to improve the quality of this research paper:

1. How does the model perform in real-world conditions, such as varying lighting conditions or when dealing with worn or damaged passports?

2. Have you conducted any experiments to test the model's robustness against potential adversarial attacks specific to passport forgery?

3. What is the inference time of PBNet on mobile devices? Can you provide benchmarks comparing it to other lightweight models?

4. How does the model handle cases where the passport background texture is partially obscured or altered?

5. Have you conducted any ablation studies to demonstrate the individual contributions of each component in PBNet?

6. While Focal Loss helps mitigate the impact of class imbalance, the dataset is highly imbalanced. Could the authors explore additional techniques such as class re-sampling, synthetic minority over-sampling (SMOTE), or other loss functions like weighted cross-entropy to improve the model’s performance on minority classes?

7. PBNet is compared with several models like MobileNet V2 and ResNet18. Could the authors further extend this comparison by benchmarking against other recent hybrid CNN-Transformer models (e.g., Swin-Transformer or Vision Transformers (ViTs)) to provide a broader understanding of PBNet’s relative performance?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The paper "PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification" describes a unique method for automating passport background texture classification using Convolutional Neural Networks (CNN) and Transformer models. The authors propose PBNet, a parallel hybrid network architecture that combines CNN's local feature extraction capabilities with Transformers' global feature dependency recognition. The study discusses issues such as sample imbalance, difficulty deploying models on mobile devices, and the intrinsic complexity of passport texture photos. To address these issues, PBNet uses a Feature Enhancement Module (FEM) to improve the interaction between CNN and Transformer features, as well as focal loss to deal with imbalanced datasets. Experimental results show that PBNet beats various cutting-edge models in terms of classification accuracy while remaining computationally efficient, making it appropriate for deployment in resource-constrained contexts. However, the authors should add FLOPs alongside parameters. The sota methods are not up to date. I would recommend applying for the papers listed below.
1. Lee, G.Y., Dam, T., Ferdaus, M.M., Poenar, D.P. and Duong, V.N., 2023. Watt-effnet: A lightweight and accurate model for classifying aerial disaster images. IEEE Geoscience and Remote Sensing Letters, 20, pp.1-5.
2. C. Kyrkou and T. Theocharides, “Emergencynet: Efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 1687–1699, 2020.

Comments on the Quality of English Language

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors addressed the concern

Reviewer 3 Report

Comments and Suggestions for Authors

Accept in present form

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have addressed all my concerns. I would like to recommend as an acceptable in the current format.

Article Menu

PBNet: Combining Transformer and CNN in Passport Background Texture Printing Image Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI