Enhancing Fine-Grained Image Recognition with Multi-Channel Self-Attention Mechanisms: A Focus on Fruit Fly Species Classification
Abstract
:1. Introduction
- (1)
- First, the incorporation of a multi-channel self-attention mechanism enhances feature extraction, allowing for the nuanced recognition of subtle differences between fruit fly species;
- (2)
- Second, the utilization of long-term and short-term memory networks as feature extractors contributes to the robustness of the framework, ensuring consistent and accurate recognition across diverse backgrounds. Together, these innovations mark a significant advancement in fine-grained image-recognition techniques tailored specifically for fruit fly identification.
2. Related Works
2.1. Current Research on Fine-Grained Image-Recognition Methods
2.2. Introducing the Multi-Channel Self-Attention Mechanism
3. Fine-Grained Image-Recognition Method for Fruit Flies
3.1. Fruit Fly Fine-Grained Image Bottom Feature Extraction
3.2. Multi-Channel Self-Attention Feature Fusion of Fine-Grained Images of Fruit Flies
- (1)
- branch is the global attention representation. Note that the representation method utilizes a convolutional layer with 161 × 1 kernels, combined with the bottom features of Drosophila , the global attention representation is obtained, and the important weights are more prominent at this time; finally, the attention feature map is obtained;
- (2)
- branch is the local attention. This branch uses the CBAM (the lightweight attention module) spatial attention representation, and the calculation process of the spatial feature map is as follows:
3.3. Design of Fruit Fly Fine-Grained Image Classification and Recognition Device
3.3.1. Softmax Loss Function
3.3.2. A-Softmax Loss Function
4. Experimental Analysis
- (1)
- The multi-channel self-attention mechanism can capture global and local features in an image, and strengthen task-related feature representations by assigning different weights to each channel;
- (2)
- As a feature extractor, LSTM can handle long-term dependencies in sequence data, thereby extracting low-level features in images, which are crucial for identifying fine-grained objects such as fruit flies.
4.1. Experimental Dataset
4.2. Evaluation Criteria
4.3. Parameter Analysis
4.4. Feature Extraction Effect Analysis
4.5. Analysis of Fine-Grained Image-Recognition Effect of Fruit Fly
5. Conclusions
- (1)
- The introduction of more advanced deep learning models: with the continuous development of deep learning technology, we can introduce more advanced deep learning models into fruit fly fine-grained image recognition, such as the Transformer, CNN-RNN, etc., to improve the ability of feature extraction and classification;
- (2)
- Combining multimodal information: Fine-grained image recognition of fruit flies can combine other modal information, such as infrared images and ultraviolet images, which may make the method be more robust and accurate for recognition. Therefore, it would be worth trying to fuse the information of different modes into the model to improve the recognition accuracy;
- (3)
- Strengthening data enhancement technology: Data enhancement is an effective method to improve the generalization ability of the model. Image enhancement can be achieved by random transformation, cropping, and rotation, so that the model can better adapt to various scenarios and conditions. In the future, more effective data augmentation techniques will be explored to further improve the recognition accuracy of fine-grained images of fruit flies. In the future, more effective data augmentation techniques can be explored to further improve the recognition accuracy of fine-grained images of fruit flies.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Elhani, D.; Megherbi, A.C.; Zitouni, A.; Dornaika, F.; Sbaa, S.; Taleb-Ahmed, A. Optimizing convolutional neural networks architecture using a modified particle swarm optimization for image classification. Expert Syst. Appl. 2023, 17, 120411.1–120411.17. [Google Scholar] [CrossRef]
- Zia, T.; Bashir, N.; Ullah, M.A.; Murtaza, S. Softnet: A concept-controlled deep learning architecture for interpretable image classification. Knowl.-Based Syst. 2022, 240, 108066.1–108066.14. [Google Scholar] [CrossRef]
- Dow, J.A.T.; Simons, M.; Romero, M.F. Drosophila melanogaster: A simple genetic model of kidney structure, function and disease. Nat. Rev. 2022, 18, 417–434. [Google Scholar] [CrossRef]
- Hassanzadeh, S.; Danyali, H.; Helfroush, M.S. Combined spatial-spectral schroedinger eigenmaps with multiple kernel learning for hyperspectral image classification using a low number of training samples. Can. J. Remote Sens. 2022, 48, 579–591. [Google Scholar] [CrossRef]
- Fernandes, J.; Simsek, M.; Kantarci, B.; Khan, S. Tabledet: An end-to-end deep learning approach for table detection and table image classification in data sheet images. Neurocomputing 2022, 468, 317–334. [Google Scholar] [CrossRef]
- Palazzo, S.; Murabito, F.; Pino, C.; Rundo, F.; Spampinato, C. Exploiting structured high-level knowledge for domain-specific visual classification. Pattern Recognit. 2021, 112, 107806–107817. [Google Scholar] [CrossRef]
- Andriyanov, N.A.; Dementiev, V.E.; Kargashin, Y.D. Analysis of the impact of visual attacks on the characteristics of neural networks in image recognition. Procedia Comput. Sci. 2021, 186, 495–502. [Google Scholar] [CrossRef]
- Ohri, K.; Kumar, M. Review on self-supervised image recognition using deep neural networks. Knowl.-Based Syst. 2021, 224, 107090.1–107090.22. [Google Scholar] [CrossRef]
- Banerjee, A.; Das, N.; Santosh, K.C. Weber local descriptor for image analysis and recognition: A survey. Vis. Comput. 2022, 38, 321–343. [Google Scholar] [CrossRef]
- Khan, A.; Chefranov, A.; Demirel, H. Image scene geometry recognition using low-level features fusion at multi-layer deep cnn. Neurocomputing 2021, 440, 111–126. [Google Scholar] [CrossRef]
- Arco, J.E.; Ortiz, A.; Gallego-Molina, N.J.; Górriz, J.M.; Ramírez, J. Enhancing multimodal patterns in neuroimaging by siamese neural networks with self-attention mechanism. Int. J. Neural Syst. 2023, 33, 111–126. [Google Scholar] [CrossRef] [PubMed]
- Shobana, J.; Murali, M. An improved self attention mechanism based on optimized bert-bilstm model for accurate polarity prediction. Comput. J. 2022, 66, 1279–1294. [Google Scholar] [CrossRef]
- Fermanian, R.; Pendu, M.L.; Guillemot, C. Pnp-reg: Learned regularizing gradient for plug-and-play gradient descent. SIAM J. Imaging Sci. 2023, 16, 585–613. [Google Scholar] [CrossRef]
- Devulapalli, S.; Krishnan, R. Remote sensing image retrieval by integrating automated deep feature extraction and handcrafted features using curvelet transform. J. Appl. Remote Sens. 2021, 15, 016504.1–016504.18. [Google Scholar] [CrossRef]
- Paramarthalingam, A.; Thankanadar, M. Extraction of compact boundary normalisation based geometric descriptors for affine invariant shape retrieval. IET Image Process. 2021, 15, 1093–1104. [Google Scholar] [CrossRef]
- Perreault, H.; Bilodeau, G.A.; Saunier, N.; Héritier, M. Ffavod: Feature fusion architecture for video object detection. Pattern Recognit. Lett. 2021, 151, 294–301. [Google Scholar] [CrossRef]
- Shende, P.; Dandawate, Y. Multimodal biometric identification system with deep learning based feature level fusion using maximum orthogonal method. Int. J. Knowl.-Based Intell. Eng. Syst. 2021, 25, 429–437. [Google Scholar] [CrossRef]
- Lourenco, V.; Paes, A. Learning attention-based representations from multiple patterns for relation prediction in knowledge graphs. Knowl.-Based Syst. 2022, 251, 109262.1–109232.12. [Google Scholar] [CrossRef]
- Tretiak, K.; Ferson, S. Should data ever be thrown away? pooling interval-censored data sets with different precision. Int. J. Approx. Reason. 2023, 156, 114–133. [Google Scholar] [CrossRef]
- Cui, X.N.; Sun, H.Y.; Li, K.L. Weakly Supervised Fine-Grained Image Classification Method Based on Bayesian Algorithm. Pattern Recognit. Lett. 2022, 39, 467–470+512. [Google Scholar]
- Lu, S.; Ye, S. Using an image segmentation and support vector machine method for identifying two locust species and instars. J. Integr. Agric. 2020, 19, 1301–1313. [Google Scholar] [CrossRef]
- Li, N.; Gao, H.; Ding, L.; Lv, F.T.; Bi, Z.Y.; Wang, Y.D. Research on feature extraction and segmentation of rover wheel imprint. J. Supercomput. 2018, 76, 2357–2373. [Google Scholar] [CrossRef]
- Peng, Y.; Liao, M.; Deng, H.; Ao, L.; Song, Y.; Huang, W.; Hua, J. CNN-SVM: A classification method for fruit fly image with the complex background. IET Cyber-Phys. Syst. Theory Appl. 2020, 5, 181–185. [Google Scholar] [CrossRef]
Number of Channels | Kappa Coefficient | Computational Cost/ms |
---|---|---|
4 | 0.75 | 15 |
8 | 0.95 | 32 |
16 | 0.96 | 58 |
32 | 0.97 | 77 |
64 | 0.98 | 103 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, Y.; Yi, K.; Xu, Y. Enhancing Fine-Grained Image Recognition with Multi-Channel Self-Attention Mechanisms: A Focus on Fruit Fly Species Classification. Appl. Sci. 2024, 14, 5328. https://doi.org/10.3390/app14125328
Lu Y, Yi K, Xu Y. Enhancing Fine-Grained Image Recognition with Multi-Channel Self-Attention Mechanisms: A Focus on Fruit Fly Species Classification. Applied Sciences. 2024; 14(12):5328. https://doi.org/10.3390/app14125328
Chicago/Turabian StyleLu, Yu, Ke Yi, and Yilu Xu. 2024. "Enhancing Fine-Grained Image Recognition with Multi-Channel Self-Attention Mechanisms: A Focus on Fruit Fly Species Classification" Applied Sciences 14, no. 12: 5328. https://doi.org/10.3390/app14125328
APA StyleLu, Y., Yi, K., & Xu, Y. (2024). Enhancing Fine-Grained Image Recognition with Multi-Channel Self-Attention Mechanisms: A Focus on Fruit Fly Species Classification. Applied Sciences, 14(12), 5328. https://doi.org/10.3390/app14125328