Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition
Abstract
:1. Introduction
- (1)
- We implemented an efficient feature extraction technique using the fusion of HOG feature descriptor and attention-based CNN for recognizing hand sign gestures.
- (2)
- The CBAM module is embedded in CNN to enhance the learning of features at various scales so that the model can handle different alterations in the location and shape of hand gestures and focus on effective features.
- (3)
- We tested the efficiency of the approach among two datasets using three evaluation metrics, namely precision, recall, and F1-score.
2. Related Work
3. Methodology
3.1. Dataset
3.2. Pre-Processing
3.3. Feature Extraction Module
3.3.1. HOG Feature Descriptor
3.3.2. CBAM Attention-Based CNN Feature Extraction

| Algorithm 1: Proposed Algorithm for recognition of sign language gestures | 
| Input: Image Dataset with size (128 × 128 × 3) Output: Sign language Gestures //HOG feature extraction Step 1: Convert each image from the training dataset into grayscale images. Step 2: For each pixel of the image gradient is calculated (. Step 3: Divide input images into 8 × 8 cells, and the corresponding histogram of each cell is computed. Step 4: Merge all histograms to obtain the HOG feature vector. Output: HOG feature vector of dimension 1 × 8100. //Attention-based CNN feature extraction Step 1: CNN layers are defined including convolution, normalization, CBAM, and max pooling. Step 2: Flatten the last max pooling layer CNN output to 1-dimensional feature vector . Step 3: Concatenate the flattened CNN feature with HOG feature Step 4: The fully connected layer takes the merged feature as input. Step 5: The final output layer classifies 36 sign gestures with a Softmax activation function. | 
4. Results and Discussion
4.1. Comparison with Previous Studies
4.2. Time Complexity and Order of the Proposed Method
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Das, S.; Biswas, S.K.; Purkayastha, B. Automated Indian sign language recognition system by fusing deep and handcrafted features. Multimed. Tools Appl. 2023, 82, 16905–16927. [Google Scholar] [CrossRef]
- Damaneh, M.M.; Mohanna, F.; Jafari, P. Static hand gesture recognition in sign language based on convolutional neural network with feature extraction method using ORB descriptor and Gabor filter. Expert Syst. Appl. 2023, 211, 118559. [Google Scholar] [CrossRef]
- de Castro, G.Z.; Guerra, R.R.; Guimarães, F.G. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps. Expert Syst. Appl. 2023, 215, 119394. [Google Scholar] [CrossRef]
- Nandi, U.; Ghorai, A.; Singh, M.M.; Changdar, C.; Bhakta, S.; Kumar Pal, R. Indian sign language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling. Multimed. Tools Appl. 2023, 82, 9627–9648. [Google Scholar] [CrossRef]
- Miah, A.S.M.; Hasan, A.M.; Shin, J.; Okuyama, Y.; Tomioka, Y. Multistage spatial attention-based neural network for hand gesture recognition. Computers 2023, 12, 13. [Google Scholar] [CrossRef]
- Marin, G.; Dominio, F.; Zanuttigh, P. Hand gesture recognition with leap motion and kinect devices. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 1565–1569. [Google Scholar]
- Lahiani, H.; Neji, M. Hand gesture recognition method based on hog-lbp features for mobile devices. Procedia Comput. Sci. 2018, 126, 254–263. [Google Scholar] [CrossRef]
- Parvathy, P.; Subramaniam, K.; Prasanna Venkatesan, G.K.D.; Karthikaikumar, P.; Varghese, J.; Jayasankar, T. Development of hand gesture recognition system using machine learning. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 6793–6800. [Google Scholar] [CrossRef]
- Sharma, S.; Singhm, S. Vision-based hand gesture recognition using deep learning for the interpretation of sign language. Expert Syst. Appl. 2021, 182, 115–657. [Google Scholar] [CrossRef]
- Xu, J.; Wang, H.; Zhang, J.; Cai, L. Robust hand gesture recognition based on RGB-D Data for natural human–computer interaction. IEEE Access 2022, 10, 54549–54562. [Google Scholar] [CrossRef]
- Masood, S.; Thuwal, H.C.; Srivastava, A. American sign language character recognition using convolution neural network. In Smart Computing and Informatics: Proceedings of the First International Conference on SCI 2016; Springer: Singapore, 2018; Volume 2, pp. 403–412. [Google Scholar]
- Sruthi, C.J.; Lijiya, A. Signet: A deep learning based indian sign language recognition system. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 4–6 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 596–600. [Google Scholar]
- Ma, Y.; Xu, T.; Kim, K. Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition. Sensors 2022, 22, 5959. [Google Scholar] [CrossRef]
- Eid, A.; Schwenker, F. Visual Static Hand Gesture Recognition Using Convolutional Neural Network. Algorithms 2023, 16, 361. [Google Scholar] [CrossRef]
- Suneetha, M.; Prasad, M.V.D.; Kishore, P.V.V. Multi-view motion modelled deep attention networks (M2DA-Net) for video-based sign language recognition. J. Vis. Commun. Image Represent. 2021, 78, 103161. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Zhang, L.; Tian, Q.; Ruan, Q.; Shi, Z. A simple and effective static gesture recognition method based on attention mechanism. J. Vis. Commun. Image Represent. 2023, 92, 103783. [Google Scholar] [CrossRef]
- Barczak, A.L.C.; Reyes, N.H.; Abastillas, M.; Piccio, A.; Susnjak, T. A new 2D static hand gesture colour image dataset for ASL gestures. Res. Lett. Inf. Math. Sci. 2011, 15, 12–20. [Google Scholar]
- Kothadiya, D.R.; Bhatt, C.M.; Rehman, A.; Alamri, F.S.; Saba, T. SignExplainer: An Explainable AI-Enabled Framework for Sign Language Recognition with Ensemble Learning. IEEE Access 2023, 11, 47410–47419. [Google Scholar] [CrossRef]
- Sharma, S.; Singh, S. ISL recognition system using integrated mobile-net and transfer learning method. Expert Syst. Appl. 2023, 221, 119772. [Google Scholar] [CrossRef]
- Choudhury, A.; Rana, H.S.; Bhowmik, T. Handwritten bengali numeral recognition using hog based feature extraction algorithm. In Proceedings of the 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 22–23 February 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 687–690. [Google Scholar]
- Sharma, A.; Mittal, A.; Singh, S.; Awatramani, V. Hand gesture recognition using image processing and feature extraction techniques. Procedia Comput. Sci. 2020, 173, 181–190. [Google Scholar] [CrossRef]
- Arun, C.; Gopikakumari, R. Optimisation of both classifier and fusion based feature set for static American sign language recognition. IET Image Process. 2020, 14, 2101–2109. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
- Wang, Y.; Zhang, Z.; Feng, L.; Ma, Y.; Du, Q. A new attention-based CNN approach for crop mapping using time series Sentinel-2 images. Comput. Electron. Agric. 2021, 184, 106090. [Google Scholar] [CrossRef]
- Tato, A.; Nkambou, R. Improving adam optimizer. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Katoch, S.; Singh, V.; Tiwary, U.S. Indian Sign Language recognition system using SURF with SVM and CNN. Array 2022, 14, 100141. [Google Scholar] [CrossRef]
- Rathi, P.; Kuwar Gupta, R.; Agarwal, S.; Shukla, A. Sign language recognition using resnet50 deep neural network architecture. In Proceedings of the 5th International Conference on Next Generation Computing Technologies, Dehradun, India, 20–21 December 2019; SSRN: Parkville, Australia, 2020. [Google Scholar]
- Barbhuiya, A.A.; Karsh, R.K.; Jain, R. Gesture recognition from RGB images using convolutional neural network-attention based system. Concurr. Comput. Pract. Exp. 2022, 34, e7230. [Google Scholar] [CrossRef]
- Adeyanju, I.A.; Bello, O.O.; Azeez, M.A. Development of an american sign language recognition system using canny edge and histogram of oriented gradient. Niger. J. Technol. Dev. 2022, 19, 195–205. [Google Scholar] [CrossRef]
- Bhaumik, G.; Govil, M.C. SpAtNet: A spatial feature attention network for hand gesture recognition. Multimed. Tools Appl. 2023. [Google Scholar] [CrossRef]
- Kothadiya, D.R.; Bhatt, C.M.; Saba, T.; Rehman, A.; Bahaj, S.A. SIGNFORMER: DeepVision Transformer for Sign Language Recognition. IEEE Access 2023, 11, 4730–4739. [Google Scholar] [CrossRef]
- Umar, S.S.I.; Iro, Z.S.; Zandam, A.Y.; Shitu, S.S. Accelerated Histogram of Oriented Gradients for Human Detection. Ph.D. Thesis, Universiti Teknologi Malaysia, Johor Bahru, Malaysia, 2016. [Google Scholar]










| Parameters | Values | 
|---|---|
| Input size | 128 × 128 | 
| Initial Learning Rate | 0.001 | 
| Optimizer | Nadam | 
| Batch Size | 32 | 
| Cost Function | Categorical Cross-entropy | 
| Epoch | 30 | 
| Beta 1 | 0.9 | 
| Beta 2 | 0.999 | 
| Epsilon | 1 × 10−8 | 
| Folds | Dataset 1 | Dataset 2 | 
|---|---|---|
| 1 | 99.10 | 99.69 | 
| 2 | 99.22 | 99.77 | 
| 3 | 99.24 | 99.79 | 
| 4 | 99.22 | 99.81 | 
| 5 | 99.30 | 99.83 | 
| Average Accuracy (%) | 99.22 | 99.77 | 
| Data | Year | Method (Year) | Accuracy (%) | Training Parameters | Training Epochs | Inference Time (s) | 
|---|---|---|---|---|---|---|
| [30] | 2-level ResNet-50 (2020) | 99.03 | 25,636,712 | 200 | 0.025 | |
| Massey ASL | [31] | VGG16+Attention (2022) | 98.02 | 138.35 M | 30 | - | 
| [32] | Canny + HOG+ KNN (2022) | 97.6 | - | - | 0.39 | |
| [33] | Spatial feature learning network (2023) | 80.44 | 3.8 M | - | 2.23 | |
| Proposed | HOG+ CBAM-based CNN | 99.22% | 2.3 M | 30 | 0.023 s | |
| Static ISL | [34] | Vision Transformer based approach (2023) | 99.29% | 7 M | 5 | - | 
| [21] | Ensemble (ResNet50 +Attention) (2022) | 98.20% | 23.5 M | 50 | - | |
| Proposed | HOG+ CBAM-based CNN | 99.79% | 2.3 M | 30 | 0.023 s | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kumari, D.; Anand, R.S. Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition. Appl. Sci. 2023, 13, 11993. https://doi.org/10.3390/app132111993
Kumari D, Anand RS. Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition. Applied Sciences. 2023; 13(21):11993. https://doi.org/10.3390/app132111993
Chicago/Turabian StyleKumari, Diksha, and Radhey Shyam Anand. 2023. "Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition" Applied Sciences 13, no. 21: 11993. https://doi.org/10.3390/app132111993
APA StyleKumari, D., & Anand, R. S. (2023). Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition. Applied Sciences, 13(21), 11993. https://doi.org/10.3390/app132111993
 
        

 
       