Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images
Abstract
:1. Introduction
2. Literature Review
3. Proposed CNN-RNN Attention Model
3.1. Dataset and Preprocessing
3.2. Segmented Dataset
4. Commonly Used Deep Learning Techniques
Algorithm 1 Proposed algorithm used for Data Preprocessing | |
1: | procedure PRE-PROCESSING(Dataset) |
2: | for Each image in dataset do |
3: | ImageBackground = PreBinarization(Image) |
4: | ImageGradient = CalculateGradient(ImageBackground) |
5: | ImageSmoothen = SmoothenGradient(ImageGradient) |
6: | ThresholdedImage = ThresholdedImage(ImageSmoothen) |
7: | DilatedImage = ApplyDilation(ThresholdedImage) |
8: | OutliersRemovedImages = ApplyConnectedComponentAnalysis |
9: | LocalizedTextImages = ApplyTextLocalization |
10: | return LocalizedTextImages |
Overview of RNN
5. Description of Implementation
6. Experimental Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lienhart, R.; Wernicke, A. Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 2002, 12, 256–268. [Google Scholar] [CrossRef] [Green Version]
- Aldahiri, A.; Alrashed, B.; Hussain, W. Trends in Using IoT with Machine Learning in Health Prediction System. Forecasting 2021, 3, 181–206. [Google Scholar] [CrossRef]
- Merler, M.; Galleguillos, C.; Belongie, S. Recognising groceries in situ using in vitro training data. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; IEEE: Piscataway Township, NJ, USA, 2007; pp. 1–8. [Google Scholar]
- Bin Ahmed, S.; Naz, S.; Razzak, M.I.; Yusof, R. Arabic Cursive Text Recognition from Natural Scene Images. Appl. Sci. 2019, 9, 236. [Google Scholar] [CrossRef] [Green Version]
- Hussain, W.; Hussain, F.K.; Saberi, M.; Hussain, O.K.; Chang, E. Comparing time series with machine learning-based prediction approaches for violation management in cloud SLAs. Futur. Gener. Comput. Syst. 2018, 89, 464–477. [Google Scholar] [CrossRef]
- Saidane, Z.; Garcia, C. Automatic scene text recognition using a convolutional neural network. In Workshop on Camera-Based Document Analysis and Recognition; Imlab, September 2007; Volume 1, Available online: http://www.m.cs.osakafu-u.ac.jp/cbdar2007/proceedings/papers/P6.pdf (accessed on 12 May 2021).
- Zayene, O.; Seuret, M.; Touj, S.M.; Hennebert, J.; Ingold, R.; Ben Amara, N.E. Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders. In Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 11–14 April 2016; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2016; pp. 13–18. [Google Scholar]
- De Campos, T.E.; Babu, B.R.; Varma, M. Character recognition in natural images. In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Lisboa, Portuga, 5 February 2009; Volume 7, pp. 273–280. [Google Scholar] [CrossRef] [Green Version]
- Hussain, W.; Sohaib, O. Analysing Cloud QoS Prediction Approaches and Its Control Parameters: Considering Overall Accuracy and Freshness of a Dataset. IEEE Access 2019, 7, 82649–82671. [Google Scholar] [CrossRef]
- Yi, C.; Tian, Y. Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration. IEEE Trans. Image Process. 2014, 23, 2972–2982. [Google Scholar] [CrossRef] [PubMed]
- Wang, K.; Babenko, B.; Belongie, S. End-to-end scene text recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2011; pp. 1457–1464. [Google Scholar]
- Gur, E.; Zelavsky, Z. Retrieval of Rashi Semi-cursive Handwriting via Fuzzy Logic. In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 18–20 September 2012; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2012; pp. 354–359. [Google Scholar]
- Raza, M.R.; Varol, A. QoS Parameters for Viable SLA in Cloud. In Proceedings of the 2020 8th International Symposium on Digital Forensics and Security (ISDFS), Beirut, Lebanon, 1–2 June 2020; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Ahmed, S.; Pasquier, M.; Qadah, G.Z. Key issues in conducting sentiment analysis on Arabic social media text. In Proceedings of the 2013 9th International Conference on Innovations in Information Technology (IIT), Al Ain, United Arab Emirates, 17–19 March 2013; Institute of Electrical and Electronics Engineers (IEEE): Al Ain, United Arab Emirates, 2013; pp. 72–77. [Google Scholar]
- Alma’Adeed, S.; Higgins, C.; Elliman, D. Off-line Recognition of Handwritten Arabic Words Using Multiple Hidden Markov Models. In Research and Development in Intelligent Systems XX; Springer Science and Business Media LLC: Berlin, Germany, 2004; pp. 33–40. [Google Scholar]
- Lakhfif, A.; Laskri, M.T. A frame-based approach for capturing semantics from Arabic text for text-to-sign language MT. Int. J. Speech Technol. 2015, 19, 203–228. [Google Scholar] [CrossRef]
- Hussain, W.; Hussain, F.K.; Hussain, O. Maintaining Trust in Cloud Computing through SLA Monitoring. In Proceedings of the International Conference on Neural Information Processing, Kuching, Malaysia, 3–6 November 2014; pp. 690–697. [Google Scholar]
- Jain, M.; Mathew, M.; Jawahar, C.V. Unconstrained scene text and video text recognition for Arabic script. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2017; pp. 26–30. [Google Scholar]
- Chowdhury, A.; Vig, L. An efficient end-to-end neural model for handwritten text recognition. arXiv 2018, arXiv:1807.07965. [Google Scholar]
- Yousefi, M.R.; Soheili, M.R.; Breuel, T.M.; Stricker, D. A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic. In Document Recognition and Retrieval XXII; International Society for Optics and Photonics: Bellingham, WA, USA, 2015; Volume 9402. [Google Scholar] [CrossRef]
- Chen, D.; Odobez, J.-M.; Bourlard, H. Text detection and recognition in images and video frames. Pattern Recognition. 2004, 37, 595–608. [Google Scholar] [CrossRef]
- Hussain, W.; Hussain, F.K.; Hussain, O.K.; Damiani, E.; Chang, E. Formulating and managing viable SLAs in cloud computing from a small to medium service provider’s viewpoint: A state-of-the-art review. Inf. Syst. 2017, 71, 240–259. [Google Scholar] [CrossRef]
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Fei-Fei, L. Large-scale video classification with convolution-al neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1725–1732. [Google Scholar]
- Zayene, O.; Hennebert, J.; Touj, S.M.; Ingold, R.; Amara, N.E.B. A dataset for arabic text detection, tracking and recogni-tion in news videos-activ. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; IEEE: Piscataway Township, NJ, USA, 2015; pp. 996–1000. [Google Scholar]
- Yousfi, S.; Berrani, S.-A.; Garcia, C. ALIF: A dataset for Arabic embedded text recognition in TV broadcast. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2015; pp. 1221–1225. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Alkalbani, A.M.; Hussain, W. Cloud service discovery method: A framework for automatic derivation of cloud market-place and cloud intelligence to assist consumers in finding cloud services. Int. J. Commun. Syst. 2021, 34, e4780. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Mikolov, T.; Karafiát, M.; Burget, L.; Cernock, Y.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, 26–30 September 2010. [Google Scholar]
- Shi, B.; Bai, X.; Yao, C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2298–2304. [Google Scholar] [CrossRef] [Green Version]
- Alrashed, B.A.; Hussain, W. Managing SLA Violation in the cloud using Fuzzy re-SchdNeg Decision Model. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; IEEE: Piscataway Township, NJ, USA, 2020. [Google Scholar]
- Graves, A.; Wayne, G.; Danihelka, I. Neural turing machines. arXiv 2014, arXiv:1410.5401. [Google Scholar]
- Bissacco, A.; Cummins, M.; Netzer, Y.; Neven, H. PhotoOCR: Reading Text in Uncontrolled Conditions. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2013; pp. 785–792. [Google Scholar]
- Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attentionbased models for speech recognition. arXiv 2015, arXiv:1506.07503. [Google Scholar]
- Haddad, S.E.; Roitfarb, H.R. The structure of arabic language and orthography. In Handbook of Arabic Literacy; Springer: Berlin, Germany, 2014; pp. 3–28. [Google Scholar]
- Gillies, A.; Erlandson, E.; Trenkle, J.; Schlosser, S. Arabic Text Recognition System. In Proceedings of the Symposium on Document Image Understanding Technology; 1999; pp. 253–260. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.947&rep=rep1&type=pdf (accessed on 12 May 2021).
- Shahin, A.A. Printed Arabic Text Recognition using Linear and Nonlinear Regression. Int. J. Adv. Comput. Sci. Appl. 2017, 8. [Google Scholar] [CrossRef] [Green Version]
- Halima, M.B.; Alimi, A.; Vila, A.F. Nf-savo: Neuro-fuzzy system for arabic video ocr. arXiv 2012, arXiv:1211.2150. [Google Scholar]
- Hussain, W.; Hussain, F.K.; Hussain, O.; Chang, E. Profile-Based Viable Service Level Agreement (SLA) Violation Prediction Model in the Cloud. In Proceedings of the 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), Krakow, Poland, 4–6 November 2015; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2015; pp. 268–272. [Google Scholar]
- Iwata, S.; Ohyama, W.; Wakabayashi, T.; Kimura, F. Recognition and transition frame detection of Arabic news captions for video retrieval. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4-8 December 2016; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2016; pp. 4005–4010. [Google Scholar]
- Alrehali, B.; Alsaedi, N.; Alahmadi, H.; Abid, N. Historical Arabic Manuscripts Text Recognition Using Convolutional Neural Network. In Proceedings of the 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2020; pp. 37–42. [Google Scholar]
- Younis, K.; Khateeb, A. Arabic Hand-Written Character Recognition Based on Deep Convolutional Neural Networks. Jordanian J. Comput. Inf. Technol. 2017, 3, 186. [Google Scholar] [CrossRef]
- El-Sawy, A.; Loey, M.; Bakry, H.E. Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. Res. 2017, 5, 11–19. [Google Scholar]
- Torki, M.; Hussein, M.E.; Elsallamy, A.; Fayyaz, M.; Yaser, S. Window-based descriptors for arabic handwritten alphabet recognition: A comparative study on a novel dataset. arXiv 2014, arXiv:1411.3519. [Google Scholar]
- Alkalbani, A.M.; Hussain, W.; Kim, J.Y. A Centralised Cloud Services Repository (CCSR) Framework for Optimal Cloud Service Advertisement Discovery From Heterogenous Web Portals. IEEE Access 2019, 7, 128213–128223. [Google Scholar] [CrossRef]
- Ahmad, R.; Naz, S.; Afzal, M.Z.; Rashid, S.F.; Liwicki, M.; Dengel, A. A deep learning based arabic script recognition sys-tem: Benchmark on khat. Int. Arab J. Inf. Technol. 2020, 17, 299–305. [Google Scholar]
- Mahmoud, S.A.; Ahmad, I.; Alshayeb, M.; Al-Khatib, W.G.; Parvez, M.T.; Fink, G.; Margner, V.; El Abed, H. KHATT: Arabic Offline Handwritten Text Database. In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 18–20 September 2012; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2012; pp. 449–454. [Google Scholar]
- Slimane, F.; Ingold, R.; Hennebert, J. ICDAR2017 Competition on Multi-Font and Multi-Size Digitally Represented Arabic Text. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2017; Volume 1, pp. 1466–1472. [Google Scholar]
- Alghamdi, A.; Hussain, W.; Alharthi, A.; Almusheqah, A.B. The Need of an Optimal QoS Repository and Assessment Framework in Forming a Trusted Relationship in Cloud: A Systematic Review. In Proceedings of the 2017 IEEE 14th International Conference on e-Business Engineering (ICEBE), Shanghai, China, 4–6 November 2017; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2017; pp. 301–306. [Google Scholar]
- Zhai, C.; Chen, Z.; Li, J.; Xu, B. Chinese Image Text Recognition with BLSTM-CTC: A Segmentation-Free Method. In Communications in Computer and Information Science; Springer Science and Business Media LLC: Berlin, Germany, 2016; pp. 525–536. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Reading Text in the Wild with Convolutional Neural Networks. Int. J. Comput. Vis. 2016, 116, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Almazan, J.; Gordo, A.; Fornes, A.; Valveny, E. Word Spotting and Recognition with Embedded Attributes. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2552–2566. [Google Scholar] [CrossRef] [PubMed]
- Pinheiro, P.H.; Collobert, R. Recurrent convolutional neural networks for scene labeling. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 21–26 June 2014. [Google Scholar]
- Gatta, C.; Romero, A.; van de Weijer, J. Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2014; pp. 504–511. [Google Scholar]
- Byeon, W.; Breuel, T.M.; Raue, F.; Liwicki, M. Scene labeling with LSTM recurrent neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2015; pp. 3547–3555. [Google Scholar]
- Karray, H.; Ellouze, M.; Alimi, A.M. Indexing Video Summaries for Quick Video Browsing. In Computer Communications and Networks; Springer Science and Business Media LLC: Berlin, Germany, 2009; pp. 77–95. [Google Scholar]
- Hua, X.S.; Chen, X.-R.; Wenyin, L.; Zhang, H.-J. Automatic location of text in video frames. In Proceedings of the 2001 ACM Workshops on Multimedia: Multimedia Information Retrieval, Ottawa, ON, Canada, 30 September–5 October 2001; ACM: New York, NY, USA, 2001; pp. 24–27. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Advances in Neural Information Processing Systems; NIPS: Grenada, Spain, 2014; pp. 2204–2212. [Google Scholar]
- Hussain, W.; Hussain, F.K.; Hussain, O. Comparative analysis of consumer profile-based methods to predict SLA violation. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey, 2–5 August 2015; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2015; pp. 1–8. [Google Scholar]
- Kim, S.; Hori, T.; Watanabe, S. Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2017; pp. 4835–4839. [Google Scholar]
- Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2002, 3, 115–143. [Google Scholar]
- Graves, A.; Mohamed, A.-R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–30 May 2013; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2013; pp. 6645–6649. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Shea, K.O.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
- Karsoliya, S. Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture. Int. J. Eng. Trends Technol. 2012, 3, 714–717. [Google Scholar]
- Hussain, W.; Hussain, F.; Hussain, O. QoS prediction methods to avoid SLA violation in post-interaction time phase. In Proceedings of the 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), Hefei, China, 5–7 June 2016; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2016; pp. 32–37. [Google Scholar]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems; NIPS: Grenada, Spain, 2014; pp. 3104–3112. [Google Scholar]
- Yao, K.; Cohn, T.; Vylomova, K.; Duh, K.; Dyer, C. Depth-gated lstm. arXiv 2015, arXiv:1508.03790. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Hussain, W.; Sohaib, O.; Naderpour, M.; Gao, H. Cloud Marginal Resource Allocation: A Decision Support Model. Mob. Netw. Appl. 2020, 25, 1418–1433. [Google Scholar] [CrossRef]
- Fasha, M.; Hammo, B.; Obeid, N.; Alwidian, J. A Hybrid Deep Learning Model for Arabic Text Recognition. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef]
- Yousfi, S.; Berrani, S.-A.; Garcia, C. Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; Institute of Electrical and Electronics Engineers (IEEE): Piscataway Township, NJ, USA, 2015; pp. 1026–1030. [Google Scholar]
- GarciaGraves, A. Offline Arabic Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Guide to OCR for Arabic Scripts; Springer Science and Business Media LLC: Berlin, Germany, 2012; pp. 297–313. [Google Scholar]
- Ahmed, R.; Gogate, M.; Tahir, A.; Dashtipour, K.; Al-Tamimi, B.; Hawalah, A.; El-Affendi, M.A.; Hussain, A. Deep neural network-based contextual recognition of arabic handwritten scripts. Entropy 2021, 23, 340. [Google Scholar] [CrossRef] [PubMed]
- Nurseitov, D.; Bostanbekov, K.; Alimova, A.; Abdallah, A.; Abdimanap, G. Classification of Handwritten Names of Cities and Handwritten Text Recognition using Various Deep Learning Models. Adv. Sci. Technol. Eng. Syst. J. 2020, 5, 934–943. [Google Scholar] [CrossRef]
- Hussain, W.; Hussain, F.K.; Hussain, O.K. Towards Soft Computing Approaches for Formulating Viable Service Level Agreements in Cloud. In Transactions on Petri Nets and Other Models of Concurrency XV; Springer Science and Business Media LLC: Berlin, Germany, 2015; pp. 639–646. [Google Scholar]
- Altwaijry, N.; Turaiki, I.A. Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 2021, 33, 2249–2261. [Google Scholar] [CrossRef]
- Mirza, A.; Siddiqi, I. Recognition of cursive video text using a deep learning framework. IET Image Process. 2020, 14, 3444–3455. [Google Scholar] [CrossRef]
- El Bazi, I.; Laachfoubi, N. Arabic named entity recognition using deep learning approach. Int. J. Electr. Comput. Eng. 2019, 9, 2025–2032. [Google Scholar]
- Arafat, S.Y.; Iqbal, M.J. Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 2020, 8, 96787–96803. [Google Scholar] [CrossRef]
TV-Channels | Training Data | Testing Data |
---|---|---|
Lines | Words | Lines | Words | |
AL-JazeeraHD-TV | 1725 | 7590 | 380 | 1286 |
Russia-Today-TV | 2076 | 12,999 | 301 | 1946 |
France 24-TV | 1821 | 5372 | 264 | 978 |
TunsiaNat-TV | 1900 | 8950 | 290 | 1094 |
AIISD-TV | 5840 | 26,203 | 812 | 5136 |
TV-Channels | Training Data | Testing Data |
---|---|---|
Lines | Words | Lines | Words | |
AL-JazeeraHD-TV | 1430 | 5970 | 310 | 980 |
AL-Arabiya | 410 | 2230 | 265 | 1457 |
France 24-TV | 1690 | 4887 | 219 | 870 |
BBC-Arabic | 155 | 350 | 95 | 180 |
AIISD-TV | 5030 | 21,104 | 780 | 3013 |
Sources | Training Data | Testing Data |
---|---|---|
Lines | Words | Lines | Words | |
1100 |3320 | 409 | 1110 |
Architecture | Alif_Test_1 | Alif_Test_2 | ||||
---|---|---|---|---|---|---|
ChRR (%) | LiRR (%) | WoRR (%) | ChRR (%) | LiRR (%) | WoRR (%) | |
ConNet with LSTM | 91.27 | 54.9 | 70.29 | 92.37 | 56.9 | 71.9 |
Deep belief net | 89.98 | 40.05 | 60.58 | 87.8 | 43.7 | 62.78 |
HC with LSTM | 85.44 | 60.15 | 53.4 | 87.14 | 62.30 | 50.31 |
ABBYY | 82.4 | 25.99 | 50.0 | 83.26 | 26.91 | 49.80 |
Hybrid CNN-RNN | 93.2 | 78.5 | 40.5 | 96.2 | 79.5 | 39.5 |
MLP_AE_LSTM | 88.50 | 33.5 | 61.22 | 88.50 | 33.5 | 61.22 |
Hi-MDLSTM | 95.55 | 71.33 | 85.72 | 96.55 | 70.67 | 85.71 |
Proposed Architecture | 98.73 | 82.21 | 87.06 | 97.09 | 79.91 | 85.98 |
TV Channels | AcTiV_Test_1 | AcTiV_Test_2 | ||||
---|---|---|---|---|---|---|
ChRR (%) | LiRR (%) | WoRR (%) | ChRR (%) | LiRR (%) | WoRR (%) | |
A1-JazeeraHD-tv and France24-tv | 94.9 | 60.51 | 75.55 | 96.29 | 69.54 | 86.11 |
Russia-Today-tv andTunsiaNat-tv | 95.33 | 70.28 | 88.31 | 97.72 | 78.33 | 88.14 |
AIISD-tv | 84.09 | 55.39 | 64.17 | 90.71 | 61.07 | 75.64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Butt, H.; Raza, M.R.; Ramzan, M.J.; Ali, M.J.; Haris, M. Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images. Forecasting 2021, 3, 520-540. https://doi.org/10.3390/forecast3030033
Butt H, Raza MR, Ramzan MJ, Ali MJ, Haris M. Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images. Forecasting. 2021; 3(3):520-540. https://doi.org/10.3390/forecast3030033
Chicago/Turabian StyleButt, Hanan, Muhammad Raheel Raza, Muhammad Javed Ramzan, Muhammad Junaid Ali, and Muhammad Haris. 2021. "Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images" Forecasting 3, no. 3: 520-540. https://doi.org/10.3390/forecast3030033
APA StyleButt, H., Raza, M. R., Ramzan, M. J., Ali, M. J., & Haris, M. (2021). Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images. Forecasting, 3(3), 520-540. https://doi.org/10.3390/forecast3030033