Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder
Abstract
1. Introduction
2. Related Works
2.1. Text Detection
2.2. Text Recognition
2.3. Urdu Dataset and Research
2.4. English Language Dataset and Research
2.5. Bilingual Text Detection and Recognition
3. Limitations of Previous Studies and the Contributions of This Study
3.1. Limitations of Previous Studies
3.2. Our Contributions
- Develop a model that handles cursive Urdu font with various positions and shapes;
- Develop a robust mechanism that addresses challenges associated with the detection and recognition of bilingual text (Urdu and English) with differing script characteristics and orientations;
- Establish a method for detecting text in complex NSIs with varying factors such as brightness, blur, background complexity, and nontext regions and objects;
- Construct a bilingual dataset of Urdu and English text from a preliminary unilingual dataset to address the lack of an adequate dataset featuring NSIs with Urdu and English text;
- Create a pipeline for accurate text detection and recognition using a CNN, an RNN, and a connectionist temporal classification (CTC).
- ◦
- The proposed pipeline used a customized CNN for feature extraction.
- ◦
- It was evaluated using different types of RNNs and hidden units in RNNs.
- ◦
- The best models derived using this pipeline were subjected to ablation studies.
4. Materials and Methods
4.1. Materials
4.1.1. Natural Scene Images
4.1.2. Cropped Urdu Characters
4.1.3. Cropped Urdu Words
4.1.4. Cropped English Characters
4.1.5. Dataset Creation and Preprocessing
- In this framework, the preliminary base dataset [13] originally contains the following:
- ◦
- Bilingual NSIs containing Urdu and English text;
- ◦
- Images of cropped Urdu characters segmented from these NSIs and their labels;
- ◦
- Images of cropped Urdu words segmented from these NSIs and their labels.
- This dataset was previously used to detect only Urdu text. It is enhanced for detecting bilingual text in the data creation stage.
- ◦
- This stage involves manually segmenting the images of cropped English characters from the preliminary bilingual NSIs. The resulting dataset is used for training the model on the English model.
- ◦
- Additionally, ground truth labels are manually created for this dataset containing cropped English characters.
- ◦
- The unilingual dataset is finally converted into a bilingual dataset (Figure 6).
- The output segments of the data creation stage are fed as the input to the data augmentation stage. In this stage, the sizes of all four segments of the dataset are increased using augmentation techniques such as rotation, width shift, height shift, zoom, blur, and color adjustments. These augmentation techniques are carefully designed to preserve the inherent characteristics of both Urdu and English text and enhance the performance of the model in detecting bilingual text in NSIs.
- The output segments are also fed to the data preprocessing stage that is carefully designed to improve the quality of input segments. Preprocessing techniques such as sharpening and grayscale are applied for enhancing feature detection and minimizing background complexities. This stage is important for the accurate detection and recognition of bilingual text from NSIs.
4.2. Methods: Proposed Methodology
- The pipeline begins with an image being fed into the text detection module that is used to localize the instances of text in the image. It also draws the boundaries of the text regions on the image.
- The text recognition module performs feature extraction of the cropped Urdu characters and words and cropped English characters separately. Then, sequence modeling and sequence decoding are performed.
- The output reflects the text recognized from the NSI and offers insights into the efficient functioning of the proposed pipeline.
- ◦
- The predicted text is the primary output, which is a readable form of the text extracted by the model from the input NSIs.
- ◦
- The text recognition rate is the secondary output that is a measure of the correctness of the predicted text.
4.2.1. Text Detection Module
4.2.2. Text Recognition Module
5. Evaluation Indicators and Their Significance
5.1. CRR
- Xi is the number of characters in the ith actual label;
- Yi is the number of characters in ith predicted label;
- Ai[j] is the ith actual label of the jth letter;
- Pi[j] is the ith predicted label of the jth letter;
- δ(Ai[j], Pi[j]) is
- Min (Xi,Yi) denotes the minimum number of characters in the actual and predicted labels to be compared by the system.
- Max (Xi,Yi) denotes the maximum number of characters in the actual and predicted labels that facilitate accurate comparison.
5.2. WRR
- N is the total number of labels;
- Ai is the ith actual label;
- Pi is the ith predicted label;
- δ(Ai, Pi) is
6. Results
6.1. Model Performance Under Different RNNs and Hidden Units
6.2. Accuracy and Loss Trends of Proposed Models
6.3. Ablation Studies of Proposed Models
6.4. Analysis of Error Cases
6.5. Comparison of Proposed Models with Existing Models
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Edwards, J.; Brown, K. Bilingualism and Multilingualism. In Encyclopedia of Language & Linguistics, 2nd ed.; Elsevier Science: Amsterdam, The Netherlands, 2006; ISBN 978-0-08-044854-1. Available online: https://www.sciencedirect.com/referencework/9780080448541/encyclopedia-of-language-and-linguistics (accessed on 7 July 2025).
- Romaine, S.; Brown, K. Global Distribution of Multilingualism. In Encyclopedia of Language & Linguistics, 2nd ed.; Elsevier Science: Amsterdam, The Netherlands, 2006; ISBN 978-0-08-044854-1. Available online: https://www.sciencedirect.com/referencework/9780080448541/encyclopedia-of-language-and-linguistics (accessed on 7 July 2025).
- Howard, G.; Patricia, J. Ethnolinguistic identity theory: A social psychological approach to language maintenance. Int. J. Sociol. Lang 2009, 1987, 69–100. [Google Scholar] [CrossRef]
- Urdo, Silk Roads Programme, UNESCO. Available online: https://en.unesco.org/silkroad/silk-road-themes/languages-and-endanger-languages/urdo#:~:text=Except%20Pakistan%20%26%20India%2C%20there%20are,UAE%2C%20the%20UK%20and%20Zambia (accessed on 7 July 2025).
- Tariq, R. Language Policy and Language Policy and Localization in Pakistan: Proposal for a Paradigmatic Shift. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=6f7fc1d0d12fc0898abfc7304179b942b4d66533 (accessed on 7 July 2025).
- Tariq, R. Language Policy and Education in Pakistan; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar] [CrossRef]
- Abdul, S.; Syed, M.; Maya, D.; Francisco, D.; Liaquat, C. The glocalization of English in the Pakistan linguistic landscape. World Englishes 2017, 36, 645–665. [Google Scholar] [CrossRef]
- Judd, T.; Ehinger, K.; Durand, F.; Torralba, A. Learning to predict where humans look. In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2106–2113. [Google Scholar]
- Agrahari, A.; Ghosh, R. Multi-Oriented Text Detection in Natural Scene Images Based on the Intersection of MSER with the Locally Binarized Image. Procedia Comput. Sci. 2020, 171, 322–330. [Google Scholar] [CrossRef]
- Pisarov, J.; Mester, G. The Future of Autonomous Vehicle; Research Gate: Berlin, Germany, 2020. [Google Scholar] [CrossRef]
- Saeeda, N.; Khizar, H.; Muhammad, R.; Muhammad, A.; Sajjad, M.; Samee, K. The optical character recognition of Urdu like cursive script. Pattern Recognit. 2014, 47, 1229–1248. [Google Scholar] [CrossRef]
- Umair, M.; Zubair, M.; Dawood, F.; Ashfaq, S.; Bhatti, M.S.; Hijji, M.; Sohail, A. A Multi-Layer Holistic Approach for Cursive Text Recognition. Appl. Sci. 2022, 12, 12652. [Google Scholar] [CrossRef]
- Chandio, A.A.; Asikuzzaman, M.; Pickering, M.; Leghari, M. Cursive-text: A comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data Brief 2020, 31, 105749. [Google Scholar] [CrossRef]
- Samia, N.; Thomas, J.F.; Kakul, A.; Rahma, A.-M. Second Language Acquisition and the Impact of First Language Writing Orientation; IGI Global Scientific Publishing: Hershey, PA, USA, 2014; pp. 28–43. [Google Scholar] [CrossRef]
- Ali, D.; Wahab, K.; Dunren, C. Urdu language processing: A survey. Artif. Intell. Rev. 2017, 47, 279–311. [Google Scholar] [CrossRef]
- Saad, A.; Saeeda, N.; Muhammad, R.; Rubiyah, Y. Arabic Cursive Text Recognition from Natural Scene Images. Appl. Sci. 2019, 9, 236. [Google Scholar] [CrossRef]
- Muhammad, S.; Kashif, Z. Urdu character recognition: A systematic literature review. Int. J. Appl. Pattern Recognit. 2021, 6, 283. [Google Scholar] [CrossRef]
- Manoj, S.; Narendra, S. A Survey on Handwritten Character Recognition (HCR) Techniques for English Alphabets. Adv. Vis. Comput. Int. J. 2016, 3, 1–12. [Google Scholar] [CrossRef]
- Sahare, P.; Dhok, S.B. Review of Text Extraction Algorithms for Scene-text and Document Images. IETE Tech. Rev. 2016, 34, 144–164. [Google Scholar] [CrossRef]
- Ammar, A.; Noman, A.; Alexander, G. Urdu Sentiment Analysis with Deep Learning Methods. IEEE Access 2021, 9, 97803–97812. [Google Scholar] [CrossRef] [PubMed]
- Arafat, S.Y.; Iqbal, M.J. Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning. IEEE Access 2020, 8, 96787–96803. [Google Scholar] [CrossRef]
- Chandio, A.A.; Pickering, M.; Shafi, K. Character classification and recognition for Urdu texts in natural scene images. In Proceedings of the International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 3–4 March 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Ali, A.; Pickering, M.; Shafi, K. Urdu Natural Scene Character Recognition using Convolutional Neural Networks. In Proceedings of the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), London, UK, 12–14 March 2018; pp. 29–34. [Google Scholar] [CrossRef]
- Chandio, A.A.; Asikuzzaman, M.; Pickering, M.R. Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion. IEEE Access 2020, 8, 109054–109070. [Google Scholar] [CrossRef]
- Chandio, A.A.; Asikuzzaman, M.; Pickering, M.R.; Leghari, M. Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network. IEEE Access 2022, 10, 10062–10078. [Google Scholar] [CrossRef]
- Ali, M.; Foroosh, H. Character recognition in natural scene images using rank-1 tensor decomposition. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2891–2895. [Google Scholar] [CrossRef]
- Akbani, O.; Gokrani, A.; Quresh, M.; Khan, F.M.; Behlim, S.I.; Syed, T.Q. Character recognition in natural scene images. In Proceedings of the 2015 International Conference on Information and Communication Technologies (ICICT), Karachi, Pakistan, 28–30 October 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Chandio, A.A.; Pickering, M. Convolutional Feature Fusion for Multi-Language Text Detection in Natural Scene Images. In Proceedings of the 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 30-31 January 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Butt, M.A.; Ul-Hasan, A.; Shafait, F. TraffSign: Multilingual Traffic Signboard Text Detection and Recognition for Urdu and English. In Document Analysis Systems. DAS 2022, Lecture Notes in Computer Science; Uchida, S., Barney, E., Eglin, V., Eds.; Springer: Cham, Switzerland, 2022; Volume 13237. [Google Scholar] [CrossRef]
- Dawson, H.L.; Dubrule, O.; John, C.M. Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification. Comput. Geosci. 2023, 171, 105284. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Panhwar, M.A.; Memon, K.A.; Abro, A.; Zhongliang, D.; Khuhro, S.A.; Memon, S. Signboard detection and text recognition using artificial neural networks. In Proceedings of the 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication, Beijing, China, 12–14 July 2019; pp. 16–19. [Google Scholar]
- Hossain, M.S.; Alwan, A.F.; Pervin, M. Road sign text detection using contrast intensify maximally stable extremal regions. In Proceedings of the 2018 IEEE Symposium on Computer Applications & Industrial Electronics, Penang Island, Malaysia, 28–29 April 2018; pp. 321–325. [Google Scholar]
- Basavaraju, H.T.; Manjunath Aradhya, V.N.; Guru, D.S. A novel arbitrary oriented multilingual text detection in images/video. In Information and Decision Sciences. AISC; Satapathy, S.C., Tavares, J.M.R.S., Bhateja, V., Mohanty, J.R., Eds.; Springer: Singapore, 2018; Volume 701, pp. 519–529. [Google Scholar] [CrossRef]
- Tian, S.; Bhattacharya, U.; Lu, S.; Su, B.; Wang, Q.; Wei, X.; Lu, Y.; Tan, C.L. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit. 2016, 51, 125–134. [Google Scholar] [CrossRef]
- Liao, M.; Wan, Z.; Yao, C.; Chen, K.; Bai, X. Real-time scene text detection with differentiable binarization. AAAI 2020, 34, 11474–11481. [Google Scholar] [CrossRef]
- Saha, S.; Chakraborty, N.; Kundu, S.; Paul, S.; Mollah, A.F.; Basu, S.; Sarkar, R. Multi-lingual scene text detection and language identification. Pattern Recognit. Lett. 2020, 138, 16–22. [Google Scholar] [CrossRef]
- Bains, J.K.; Singh, S.; Sharma, A. Dynamic features based stroke recognition system for signboard images of Gurmukhi text. Multimed. Tools Appl. 2021, 80, 665–689. [Google Scholar] [CrossRef]
- Aberdam, A.; Litman, R.; Tsiper, S.; Anschel, O.; Slossberg, R.; Mazor, S.; Manmatha, R.; Perona, P. Sequence-to-sequence contrastive learning for text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15302–15312. [Google Scholar]
- Cheng, Z.; Bai, F.; Xu, Y.; Zheng, G.; Pu, S.; Zhou, S. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22-29 October 2017; pp. 5076–5084. [Google Scholar]
- Lu, N.; Yu, W.; Qi, X.; Chen, Y.; Gong, P.; Xiao, R.; Bai, X. MASTER: Multi-aspect non-local network for scene text recognition. Pattern Recognit. 2021, 117, 107980. [Google Scholar] [CrossRef]
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef]
- Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T. Text recognition in the wild: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Busta, M.; Patel, Y.; Matas, J. E2E-MLT—An unconstrained end-to-end method for multi-language scene text. In Proceedings of the Asian Conference on Computer Vision 2018 (ACCV 2018), Perth, Australia, 2–6 December 2018; Carneiro, G., You, S., Eds.; Springer: Cham, Switzerland, 2019; Volume 11367, pp. 127–143. [Google Scholar] [CrossRef]
- Chollet, F. Keras: The Python Deep Learning library, in Encyclopedia of Machine Learning and Data Mining; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 978-3-319-63913-0. Available online: https://keras.io/ (accessed on 7 July 2025).
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar] [CrossRef]
Reference | Methodology | Dataset | Language (s) | Limitations |
---|---|---|---|---|
[33] | Maximally Stable Extremal Regions | ICDAR2001, ICDAR2002 | English | Struggles with varying luminous conditions and cursive text |
[34] | Laplacian Component Analysis | Hua’s dataset, MRRC, MSRA | English and Chinese | Lacks robustness for bilingual text with different orientations as well as with cursive text recognition |
[35] | Histogram of Oriented Gradients | Custom dataset | English, Chinese, and Bengali | Manual feature extraction; time-consuming; and complex structure |
[36] | Differentiation Binarization | MSRA-TD500 | English and Chinese | Issues with overlapping objects, detailed textures, and cluttered backgrounds; struggles with cursive text recognition |
[37] | Maximally Stable Extremal Regions | KAIST, COCO, CTW1500, CVSI, ICDAR | English, Chinese, and Korean | Challenges in the presence of lighting changes and lacks robustness for bilingual text with different orientations |
[38] | CNN | Custom dataset | Gurumukhi | Limited to Gurumukhi script and cannot generalize to other scripts or conditions |
[39,40] | Attention Mechanism | IIIT5k, SVT, ICDAR | English, Chinese, and Hindi | Sensitive to attention drifts; accuracy depends on precise attention alignment; and unilingual focus |
[41,42,43] | Neural Networks | Benchmark datasets | English and ICDAR Languages | High sensitivity to text variability and alignment issues; struggles with pattern distortions and curved text |
[21] | Recurrent CNN | Custom dataset | Urdu | Dataset comprises Urdu printed text manually pasted on images |
[22] | Histogram of Oriented Gradients | Custom dataset | Urdu | Manual feature extraction and classifier dependency |
[23] | CNN | Custom dataset | Urdu | Unilingual focus; struggles with complex cursive word recognition |
[24] | Multilevel Feature Fusion | Custom dataset | Urdu | Focuses on isolated Urdu characters |
[25] | Convolution Recurrent Neural Network | Custom dataset | Urdu | Limited dataset and lower accuracy compared to proposed model |
[26] | Rank-1 Tensor Decomposition | Char74, ICDAR | English | Limited to English only and requires robust evaluation datasets |
[27] | CNN | Char74k, ICDAR | English | Unilingual focus; lacking robustness under complex real-world NSIs |
[28] | VGG-16 | ICDAR 2017 bilingual dataset | Urdu and Arabic | Focus on only cursive text; lacking robustness for bilingual texts with different orientations |
[29] | CNN | DLL-TraffSiD | Urdu and English | Small dataset; prone to overfitting; limited scalability |
[32] | Artificial Neural Network | Custom dataset | Urdu and English | Dependent on training data quality; struggles with font variations; has lower accuracy compared to the proposed model |
Reference | Natural Scene Images | Cropped Urdu Characters | Cropped Urdu Words | Cropped English Characters |
---|---|---|---|---|
Preliminary unilingual base dataset [13] | 945 | 19,901 | 14,099 | ----- |
Proposed bilingual dataset (after data creation stage) | 945 | 19,901 | 14,099 | 14,224 |
Proposed bilingual dataset (after data augmentation and data preprocessing stages) | 2835 | 59,703 | 42,297 | 42,672 |
Dataset | NSIs | Cropped Urdu Characters | Cropped Urdu Words | Cropped English Characters |
---|---|---|---|---|
Proposed dataset | 2835 | 59,703 | 42,297 | 42,672 |
Training dataset | 1985 | 41,792 | 29,608 | 29,871 |
Validation dataset | 567 | 11,941 | 8459 | 8534 |
Test dataset | 283 | 5970 | 4230 | 4267 |
No. | Model | RNN Type | RNN Hidden Units | Urdu CRR (%) | Urdu WRR (%) | English CRR (%) |
---|---|---|---|---|---|---|
1 | CNN+RNN | GRU | 256 | 85.4 | 87.2 | 87.6 |
2 | CNN+RNN | GRU | 512 | 85.8 | 83.7 | 89.4 |
3 | CNN+RNN | BGRU | 256 | 91.4 | 89.3 | 94.5 |
4 | CNN+RNN | BGRU | 512 | 97.8 | 96.1 | 96.5 |
5 | CNN+RNN | LSTM | 256 | 88.7 | 87.4 | 89.1 |
6 | CNN+RNN | LSTM | 512 | 89.6 | 88.3 | 90.5 |
7 | CNN+RNN | BLSTM | 256 | 98.2 | 96.4 | 99.2 |
8 | CNN+RNN | BLSTM | 512 | 98.5 | 97.2 | 99.2 |
Model | Augmentation | Dense Layer | BGRU Layers | Urdu CRR (%) | Urdu WRR (%) | English CRR (%) |
---|---|---|---|---|---|---|
CNN+BGRU-512 | Omitted | Absent | Present | 89.9 | 84.1 | 90.4 |
CNN+BGRU-512 | Omitted | Present | Absent | 89.4 | 85.3 | 90.1 |
CNN+BGRU-512 | Omitted | Present | Present | 97.2 | 91.6 | 92.6 |
CNN+BGRU-512 | Applied | Absent | Present | 93.3 | 91.9 | 93.1 |
CNN+BGRU-512 | Applied | Present | Absent | 94.1 | 92.2 | 93.5 |
CNN+BGRU-512 | Applied | Present | Present | 97.8 | 96.1 | 96.5 |
Model | Augmentation | Dense Layer | BLSTM Layers | Urdu CRR (%) | Urdu WRR (%) | English CRR (%) |
---|---|---|---|---|---|---|
CNN+BLSTM-512 | Omitted | Absent | Present | 90.4 | 86.1 | 91.2 |
CNN+BLSTM-512 | Omitted | Present | Absent | 90.1 | 86.6 | 90.5 |
CNN+BLSTM-512 | Omitted | Present | Present | 98.1 | 93.4 | 93.5 |
CNN+BLSTM-512 | Applied | Absent | Present | 95.3 | 92.2 | 96.3 |
CNN+BLSTM-512 | Applied | Present | Absent | 96.1 | 94.3 | 96.7 |
CNN+BLSTM-512 | Applied | Present | Present | 98.5 | 97.2 | 99.2 |
Reference | Methods | Urdu CRR (%) | Urdu WRR (%) | English CRR (%) |
---|---|---|---|---|
[22] M. Pickering | HOG | 73.0 | ----- | ----- |
[23] K. Shafi | CNN | 88.6 | ----- | ----- |
[24] A. A. Chandio | MLFF | 93.0 | ----- | ----- |
[25] A. A. Chandio | CRNN | 95.7 | 87.1 | ----- |
Proposed model CNN+BGRU-512 (second best) | CRNN | 97.2 | 91.6 | 92.6 |
Proposed model CNN+BLSTM-512 (best) | CRNN | 98.1 | 93.4 | 93.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kubra, K.T.; Umair, M.; Zubair, M.; Naseem, M.T.; Lee, C.-S. Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder. Sensors 2025, 25, 5133. https://doi.org/10.3390/s25165133
Kubra KT, Umair M, Zubair M, Naseem MT, Lee C-S. Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder. Sensors. 2025; 25(16):5133. https://doi.org/10.3390/s25165133
Chicago/Turabian StyleKubra, Khadija Tul, Muhammad Umair, Muhammad Zubair, Muhammad Tahir Naseem, and Chan-Su Lee. 2025. "Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder" Sensors 25, no. 16: 5133. https://doi.org/10.3390/s25165133
APA StyleKubra, K. T., Umair, M., Zubair, M., Naseem, M. T., & Lee, C.-S. (2025). Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder. Sensors, 25(16), 5133. https://doi.org/10.3390/s25165133