Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs
Abstract
:1. Introduction
- An end-to-end system is proposed for detecting Bengali sign characters and generating meaningful sentences.
- A quantization technique for the YoloV4-Tiny model is proposed for detecting hand signs and predicting the characters and to make it implementable on edge devices.
- A sentence-generation model based on LSTM is proposed that takes input from the predicted characters of the detection model and generates meaningful sentences.
- The proposed system achieves a mAP of 99.7% for the detection phase, as well as an accuracy of 99.12%, which is obtained by the sentence-generation model.
- A comprehensive performance analysis of some detection models such as Yolov4, Yolov4-Tiny, and Yolov7 is also provided.
2. Literature Review
3. Dataset Description
- Compound Character (Label 46): It is known as hasantha and is utilized to create a compound character. The structure of a Compound Character is shown in Figure 3.
- Space (Label 47): It is used to create a gap between two words.
- End of Sentence (Label 48): It is used to indicate a sentence’s end.
4. Proposed Methodology
4.1. Sign-Character Detection
4.2. YoloV4-Tiny Model Quantization
- The values are quantized prior to correlating the weights with the input. When the layer is normalized with batch normalization preparatory to quantization, the batch normalization variables are incorporated into the weights using Equation (4), where is the batch-normalization-scale parameter, is a rolling average approximation of the batch-wide variance of convolution results, e is weights, and is a tiny constant
- Each element’s quantization is based on the number of convolution layers, the cutting range, and the point-wise quantization coefficient q provided in Equation (5).
4.3. Sentence Generation
4.3.1. Encoder LSTM
4.3.2. Decoder LSTM
4.3.3. Parameter Settings
5. Experimental Results Analysis
5.1. Evaluation of the YoloV4-Tiny Model
5.2. Performance Comparison of Yolo Models
5.3. Evaluation of Language Model
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sanzidul Islam, M.; Sultana Sharmin Mousumi, S.; Jessan, N.A.; Shahariar Azad Rabby, A.; Akhter Hossain, S. Ishara-Lipi: The First Complete MultipurposeOpen Access Dataset of Isolated Characters for Bangla Sign Language. In Proceedings of the 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh, 21–22 September 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Rahaman, M.A.; Jasim, M.; Ali, M.H.; Hasanuzzaman, M. Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language. Front. Comput. Sci. 2020, 14, 143302. [Google Scholar] [CrossRef]
- Kudrinko, K.; Flavin, E.; Zhu, X.; Li, Q. Wearable sensor-based sign language recognition: A comprehensive review. IEEE Rev. Biomed. Eng. 2020, 14, 82–97. [Google Scholar] [CrossRef] [PubMed]
- Sharma, S.; Singh, S. Vision-based sign language recognition system: A Comprehensive Review. In Proceedings of the 2020 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–28 February 2020; IEEE: New York, NY, USA, 2020; pp. 140–144. [Google Scholar]
- Dima, T.F.; Ahmed, M.E. Using YOLOv5 Algorithm to Detect and Recognize American Sign Language. In Proceedings of the 2021 International Conference on Information Technology (ICIT), Amman, Jordan, 14–15 July 2021; IEEE: New York, NY, USA, 2021; pp. 603–607. [Google Scholar]
- Urmee, P.P.; Al Mashud, M.A.; Akter, J.; Jameel, A.S.M.M.; Islam, S. Real-time bangla sign language detection using xception model with augmented dataset. In Proceedings of the 2019 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Bangalore, India, 15–16 November 2019; IEEE: New York, NY, USA, 2019; pp. 1–5. [Google Scholar]
- Shanta, S.S.; Anwar, S.T.; Kabir, M.R. Bangla sign language detection using sift and cnn. In Proceedings of the 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, India, 10–12 July 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
- Bhadra, R.; Kar, S. Sign Language Detection from Hand Gesture Images using Deep Multi-layered Convolution Neural Network. In Proceedings of the 2021 IEEE Second International Conference on Control, Measurement and Instrumentation (CMI), Kolkata, India, 8–10 January 2021; IEEE: New York, NY, USA, 2021; pp. 196–200. [Google Scholar]
- Rafiq, R.B.; Hakim, S.A.; Tabashum, T. Real-time Vision-based Bangla Sign Language Detection using Convolutional Neural Network. In Proceedings of the 2021 International Conference on Advances in Computing and Communications (ICACC), Kochi, India, 21–23 October 2021; IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar]
- Hoque, O.B.; Jubair, M.I.; Akash, A.F.; Islam, S. Bdsl36: A dataset for bangladeshi sign letters recognition. In Proceedings of the 15th Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ma, D.; Hirota, K.; Dai, Y.; Jia, Z. Dynamic Sign Language Recognition Based on Improved Residual-LSTM Network; IEEE: New York, NY, USA, 2021. [Google Scholar]
- Talukder, D.; Jahara, F. Real-time bangla sign language detection with sentence and speech generation. In Proceedings of the 2020 23rd International Conference on Computer and Information Technology (ICCIT), Tejgaon, Dhaka, 19–21 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Wang, H.; Chai, X.; Hong, X.; Zhao, G.; Chen, X. Isolated sign language recognition with grassmann covariance matrices. ACM Trans. Access. Comput. (TACCESS) 2016, 8, 1–21. [Google Scholar] [CrossRef]
- Camgoz, N.C.; Hadfield, S.; Koller, O.; Ney, H.; Bowden, R. Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7784–7793. [Google Scholar]
- Venugopalan, A.; Reghunadhan, R. Applying Hybrid Deep Neural Network for the Recognition of Sign Language Words Used by the Deaf COVID-19 Patients. Arab. J. Sci. Eng. 2022, 48, 1349–1362. [Google Scholar] [CrossRef] [PubMed]
- Kamruzzaman, M. Arabic sign language recognition and generating Arabic speech using convolutional neural network. Wirel. Commun. Mob. Comput. 2020, 2020, 3685614. [Google Scholar] [CrossRef]
- Khan, N.S.; Abid, A.; Abid, K. A novel natural language processing (NLP)–based machine translation model for English to Pakistan sign language translation. Cogn. Comput. 2020, 12, 748–765. [Google Scholar] [CrossRef]
- Talukder, D.; Jahara, F.; Barua, S.; Haque, M.M. OkkhorNama: BdSL Image Dataset for Real Time Object Detection Algorithms. In Proceedings of the 2021 IEEE Region 10 Symposium (TENSYMP), Jeju, Republic of Korea, 23–25 August 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- Hasan, S.N.; Hasan, M.J.; Alam, K.S. Shongket: A Comprehensive and Multipurpose Dataset for Bangla Sign Language Detection. In Proceedings of the 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), Khulna, Bangladesh, 14–16 September 2021; IEEE: New York, NY, USA, 2021; pp. 1–4. [Google Scholar]
- Wadhawan, A.; Kumar, P. Deep learning-based sign language recognition system for static signs. Neural Comput. Appl. 2020, 32, 7957–7968. [Google Scholar] [CrossRef]
- Basnin, N.; Nahar, L.; Hossain, M.S. An integrated CNN-LSTM model for Bangla lexical sign language recognition. In Proceedings of the International Conference on Trends in Computational and Cognitive Engineering, Online, 21–22 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 695–707. [Google Scholar]
- Ahmed, S.; Islam, M.; Hassan, J.; Ahmed, M.U.; Ferdosi, B.J.; Saha, S.; Shopon, M. Hand sign to Bangla speech: A deep learning in vision based system for recognizing hand sign digits and generating Bangla speech. arXiv 2019, arXiv:1901.05613. [Google Scholar] [CrossRef]
- Islam, M.M.; Uddin, M.R.; AKhtar, M.N.; Alam, K.R. Recognizing multiclass Static Sign Language words for deaf and dumb people of Bangladesh based on transfer learning techniques. Informatics Med. Unlocked 2022, 33, 101077. [Google Scholar] [CrossRef]
- Shurid, S.A.; Amin, K.H.; Mirbahar, M.S.; Karmaker, D.; Mahtab, M.T.; Khan, F.T.; Alam, M.G.R.; Alam, M.A. Bangla Sign Language Recognition and Sentence Building Using Deep Learning. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, Australia, 16–18 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–9. [Google Scholar]
- Angona, T.M.; Shaon, A.S.; Niloy, K.T.R.; Karim, T.; Tasnim, Z.; Reza, S.S.; Mahbub, T.N. Automated Bangla sign language translation system for alphabets by means of MobileNet. TELKOMNIKA (Telecommun. Comput. Electron. Control.) 2020, 18, 1292–1301. [Google Scholar] [CrossRef]
- Podder, K.K.; Tabassum, S.; Khan, L.E.; Salam, K.M.A.; Maruf, R.I.; Ahmed, A. Design of a sign language transformer to enable the participation of persons with disabilities in remote healthcare systems for ensuring universal healthcare coverage. In Proceedings of the 2021 IEEE Technology & Engineering Management Conference-Europe (TEMSCON-EUR), Virtual, 17–20 May 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- Rahaman, M.A.; Hossain, M.P.; Rana, M.M.; Rahman, M.A.; Akter, T. A rule based system for bangla voice and text to bangla sign language interpretation. In Proceedings of the 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 19–20 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Khan, S.A.; Joy, A.D.; Asaduzzaman, S.; Hossain, M. An efficient sign language translator device using convolutional neural network and customized ROI segmentation. In Proceedings of the 2019 2nd International Conference on Communication Engineering and Technology (ICCET), Nagoya, Japan, 12–15 April 2019; IEEE: New York, NY, USA, 2019; pp. 152–156. [Google Scholar]
- Das, S.; Imtiaz, M.S.; Neom, N.H.; Siddique, N.; Wang, H. A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst. Appl. 2023, 213, 118914. [Google Scholar] [CrossRef]
- Miah, A.S.M.; Shin, J.; Hasan, M.A.M.; Rahim, M.A. BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network. Appl. Sci. 2022, 12, 3933. [Google Scholar] [CrossRef]
- Hassan, N. Bangla Sign Language Gesture Recognition System: Using CNN Model. Sci. Prepr. 2022. [Google Scholar] [CrossRef]
- Akash, S.K.; Chakraborty, D.; Kaushik, M.M.; Babu, B.S.; Zishan, M.S.R. Action Recognition Based Real-time Bangla Sign Language Detection and Sentence Formation. In Proceedings of the 2023 3rd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 7–8 January 2023; IEEE: New York. NY, USA, 2023; pp. 311–315. [Google Scholar]
- Tazalli, T.; Aunshu, Z.A.; Liya, S.S.; Hossain, M.; Mehjabeen, Z.; Ahmed, M.S.; Hossain, M.I. Computer vision-based Bengali sign language to text generation. In Proceedings of the 2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS), Genova, Italy, 5–7 December 2022; IEEE: New York. NY, USA, 2022; pp. 1–6. [Google Scholar]
- Hasib, A.; Khan, S.S.; Eva, J.F.; Khatun, M.; Haque, A.; Shahrin, N.; Rahman, R.; Murad, H.; Islam, M.; Hussein, M.R.; et al. BDSL 49: A Comprehensive Dataset of Bangla Sign Language. arXiv 2022, arXiv:2208.06827. [Google Scholar]
Research | Year | Dataset | Method | Performance |
---|---|---|---|---|
Talukder et al. [19] | 2021 | OkkhorNama | Yolo V5 | mAP: 98.02% |
Hasan et al. [20] | 2021 | Shongket | KNN, SVM, Random Forest | Digit Accuracy: 95% Letter Accuracy: 91.3% |
Rafiq et al. [9] | 2021 | Bangla 10 Words Dataset | Custom CNN Model | Test Accuracy: 97% |
Talukder et al. [13] | 2020 | BdSL Dataset | Yolo V4 | Overall Accuracy: 97.95% |
Angona et al. [26] | 2020 | BdSL36 | MobileNet | Overall Accuracy: 95.71% |
Das et al. [30] | 2023 | IsharaBochon and Ishara-Lipi | Hybrid Model | Overall Accuracy: 91.67% |
Miah et al. [31] | 2022 | BdSL Alphabet, KU-BdSL and Ishara-Lipi | BenSignNet | Overall Accuracy: 94%, 99.60% and 99.60% |
Hassan et al. [32] | 2022 | Author-Developed Private Dataset | Custom CNN Model | Overall Accuracy: 92% |
Akash et al. [33] | 2023 | Author-Developed Private Dataset | Blazepose and LSTM | Overall Accuracy: 87.14% |
Tazalli et al. [34] | 2022 | Author-Developed Private Dataset | Yolo V5 | Overall Accuracy: 51.44% |
Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|
momentum | 0.95 | classes | 49 |
decay | 0.0005 | Cut_Mix | 0 |
steps | 24,000, 27,000 | random | 1 |
burn_in | 1000 | mosaic | 1 |
batch size | 64 | subdivision | 16 |
channel | 3 | filter | 162 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Begum, N.; Rahman, R.; Jahan, N.; Khan, S.S.; Helaly, T.; Haque, A.; Khatun, N. Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs. Appl. Sci. 2023, 13, 5219. https://doi.org/10.3390/app13095219
Begum N, Rahman R, Jahan N, Khan SS, Helaly T, Haque A, Khatun N. Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs. Applied Sciences. 2023; 13(9):5219. https://doi.org/10.3390/app13095219
Chicago/Turabian StyleBegum, Nasima, Rashik Rahman, Nusrat Jahan, Saqib Sizan Khan, Tanjina Helaly, Ashraful Haque, and Nipa Khatun. 2023. "Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs" Applied Sciences 13, no. 9: 5219. https://doi.org/10.3390/app13095219
APA StyleBegum, N., Rahman, R., Jahan, N., Khan, S. S., Helaly, T., Haque, A., & Khatun, N. (2023). Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs. Applied Sciences, 13(9), 5219. https://doi.org/10.3390/app13095219