MDPI - Publisher of Open Access Journals

19 pages, 1779 KiB

Open AccessArticle

Through the Eyes of the Viewer: The Cognitive Load of LLM-Generated vs. Professional Arabic Subtitles

by Hussein Abu-Rayyash and Isabel Lacruz

J. Eye Mov. Res. 2025, 18(4), 29; https://doi.org/10.3390/jemr18040029 - 14 Jul 2025

Viewed by 116

As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic [...] Read more.

As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic subtitles impose with that of professional human translations among 82 native Arabic speakers who viewed a 10 min episode (“Syria”) from the BBC comedy drama series State of the Union. Participants were randomly assigned to view the same episode with either professionally produced Arabic subtitles (Amazon Prime’s human translations) or machine-generated GPT-4o Arabic subtitles. In a between-subjects design, with English proficiency entered as a moderator, we collected fixation count, mean fixation duration, gaze distribution, and attention concentration (K-coefficient) as indices of cognitive processing. GPT-4o subtitles raised cognitive load on every metric; viewers produced 48% more fixations in the subtitle area, recorded 56% longer fixation durations, and spent 81.5% more time reading the automated subtitles than the professional subtitles. The subtitle area K-coefficient tripled (0.10 to 0.30), a shift from ambient scanning to focal processing. Viewers with advanced English proficiency showed the largest disruptions, which indicates that higher linguistic competence increases sensitivity to subtle translation shortcomings. These results challenge claims that large language models (LLMs) lighten viewer burden; despite fluent surface quality, GPT-4o subtitles demand far more cognitive resources than expert human subtitles and therefore reinforce the need for human oversight in audiovisual translation (AVT) and media accessibility. Full article

► Show Figures

Figure 1

15 pages, 2124 KiB

Open AccessArticle

Toward Building a Domain-Based Dataset for Arabic Handwritten Text Recognition

by Khawlah Alhefdhi, Abdulmalik Alsalman and Safi Faizullah

Electronics 2025, 14(12), 2461; https://doi.org/10.3390/electronics14122461 - 17 Jun 2025

Viewed by 327

Abstract

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text [...] Read more.

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text recognition is growing, especially to assist in digitizing archival documents, limited datasets are available for Arabic handwritten text compared to other languages. In this paper, we present novel work on building the Real Estate and Judicial Documents dataset (REJD dataset), which aims to facilitate the recognition of Arabic text in millions of archived documents. This paper also discusses the use of Optical Character Recognition and deep learning techniques, aiming to serve as the initial version in a series of experiments and enhancements designed to achieve optimal results. Full article

► Show Figures

Figure 1

18 pages, 373 KiB

Open AccessArticle

Machine Learning- and Deep Learning-Based Multi-Model System for Hate Speech Detection on Facebook

by Amna Naseeb, Muhammad Zain, Nisar Hussain, Amna Qasim, Fiaz Ahmad, Grigori Sidorov and Alexander Gelbukh

Algorithms 2025, 18(6), 331; https://doi.org/10.3390/a18060331 - 1 Jun 2025

Cited by 2 | Viewed by 597

Abstract

Hate speech is a complex topic that transcends language, culture, and even social spheres. Recently, the spread of hate speech on social media sites like Facebook has added a new layer of complexity to the issue of online safety and content moderation. This [...] Read more.

Hate speech is a complex topic that transcends language, culture, and even social spheres. Recently, the spread of hate speech on social media sites like Facebook has added a new layer of complexity to the issue of online safety and content moderation. This study seeks to minimize this problem by developing an Arabic script-based tool for automatically detecting hate speech in Roman Urdu, an informal script used most commonly for South Asian digital communications. Roman Urdu is relatively complex as there are no standardized spellings, leading to syntactic variations, which increases the difficulty of hate speech detection. To tackle this problem, we adopt a holistic strategy using a combination of six machine learning (ML) and four Deep Learning (DL) models, a dataset from Facebook comments, which was preprocessed (tokenization, stopwords removal, etc.), and text vectorization (TF-IDF, word embeddings). The ML algorithms used in this study are LR, SVM, RF, NB, KNN, and GBM. We also use deep learning architectures like CNN, RNN, LSTM, and GRU to increase the accuracy of the classification further. It is proven by the experimental results that deep learning models outperform the traditional ML approaches by a significant margin, with CNN and LSTM achieving accuracies of 95.1% and 96.2%, respectively. As far as we are aware, this is the first work that investigates QLoRA for fine-tuning large models for the task of offensive language detection in Roman Urdu. Full article

(This article belongs to the Special Issue Linguistic and Cognitive Approaches to Dialog Agents)

► Show Figures

Figure 1

14 pages, 8984 KiB

Open AccessArticle

Shared Memory and History: The Abrahamic Legacy in Medieval Judaeo-Arabic Poetry from the Cairo Genizah

by Ahmed Mohamed Sheir

Religions 2024, 15(12), 1431; https://doi.org/10.3390/rel15121431 - 26 Nov 2024

Viewed by 1762

Abstract

The Cairo Genizah collections provide scholars with a profound insight into Jewish culture, history, and the deeply intertwined relationships between Jews, Muslims, and Christians. Among these treasures are often overlooked Arabic poetic fragments from the eleventh to fifteenth centuries, which illuminate the shared [...] Read more.

The Cairo Genizah collections provide scholars with a profound insight into Jewish culture, history, and the deeply intertwined relationships between Jews, Muslims, and Christians. Among these treasures are often overlooked Arabic poetic fragments from the eleventh to fifteenth centuries, which illuminate the shared Abrahamic legacy. This paper explores mainly two unpublished poetic fragments written in Judaeo-Arabic (Arabic in Hebrew script), analyzing how they reflect a shared Jewish–Muslim cultural memory and history, particularly through the reverence for Abraham, Isaac, Jacob, and other key figures central to both traditions across the medieval Mediterranean and Middle East. By situating these poetic voices within broader historical and cultural contexts, this study underscores the role of poetry in reflecting sociocultural and historical dimensions while fostering cross-cultural and religious coexistence. It demonstrates how poetry acts as a bridge between religion, history, and culture by revealing the shared Abrahamic heritage of Jews and Muslims within two Arabic poetic fragments from the Cairo Genizah. Full article

(This article belongs to the Special Issue Jewish-Muslim Relations in the Past and Present)

► Show Figures

Figure 1

19 pages, 4823 KiB

Open AccessArticle

Evaluating Feature Impact Prior to Phylogenetic Analysis Using Machine Learning Techniques

by Osama A. Salman and Gábor Hosszú

Information 2024, 15(11), 696; https://doi.org/10.3390/info15110696 - 4 Nov 2024

Viewed by 1396

Abstract

The purpose of this paper is to describe a feature selection algorithm and its application to enhance the accuracy of the reconstruction of phylogenetic trees by improving the efficiency of tree construction. Applying machine learning models for Arabic and Aramaic scripts, such as [...] Read more.

The purpose of this paper is to describe a feature selection algorithm and its application to enhance the accuracy of the reconstruction of phylogenetic trees by improving the efficiency of tree construction. Applying machine learning models for Arabic and Aramaic scripts, such as deep neural networks (DNNs), support vector machines (SVMs), and random forests (RFs), each model was used to compare the phylogenies. The methodology was applied to a dataset containing Arabic and Aramaic scripts, demonstrating its relevance in a range of phylogenetic analyses. The results emphasize that feature selection by DNNs, their essential role, outperforms other models in terms of area under the curve (AUC) and equal error rate (EER) across various datasets and fold sizes. Furthermore, both SVM and RF models are valuable for understanding the strengths and limitations of these approaches in the context of phylogenetic analysis This method not only simplifies the tree structures but also enhances their Consistency Index values. Therefore, they offer a robust framework for evolutionary studies. The findings highlight the application of machine learning in phylogenetics, suggesting a path toward accurate and efficient evolutionary analyses and enabling a deeper understanding of evolutionary relationships. Full article

(This article belongs to the Special Issue Feature Papers in Artificial Intelligence 2024)

► Show Figures

Graphical abstract

19 pages, 1401 KiB

Open AccessArticle

Enhancing Arabic Sentiment Analysis of Consumer Reviews: Machine Learning and Deep Learning Methods Based on NLP

by Hani Almaqtari, Feng Zeng and Ammar Mohammed

Algorithms 2024, 17(11), 495; https://doi.org/10.3390/a17110495 - 3 Nov 2024

Cited by 2 | Viewed by 1856

Abstract

Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic [...] Read more.

Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic and its dialects poses challenges due to the language’s intricate morphology, right-to-left script, and nuanced emotional expressions. To address this, this study introduces the Arb-MCNN-Bi Model, which integrates the strengths of the transformer-based AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with a Multi-channel Convolutional Neural Network (MCNN) and a Bidirectional Gated Recurrent Unit (BiGRU) for Arabic sentiment analysis. AraBERT, designed specifically for Arabic, captures rich contextual information through word embeddings. These embeddings are processed by the MCNN to enhance feature extraction and by the BiGRU to retain long-term dependencies. The final output is obtained through feedforward neural networks. The study compares the proposed model with various machine learning and deep learning methods, applying advanced NLP techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), n-gram, Word2Vec (Skip-gram), and fastText (Skip-gram). Experiments are conducted on three Arabic datasets: the Arabic Customer Reviews Dataset (ACRD), Large-scale Arabic Book Reviews (LABR), and the Hotel Arabic Reviews dataset (HARD). The Arb-MCNN-Bi model with AraBERT achieved accuracies of 96.92%, 96.68%, and 92.93% on the ACRD, HARD, and LABR datasets, respectively. These results demonstrate the model’s effectiveness in analyzing Arabic text data and outperforming traditional approaches. Full article

(This article belongs to the Topic Multimodal Sentiment Analysis Based on Deep Learning Methods Such as Convolutional Neural Networks)

► Show Figures

Figure 1

18 pages, 4420 KiB

Open AccessArticle

Machine Learning Approach for Arabic Handwritten Recognition

by A. M. Mutawa, Mohammad Y. Allaho and Monirah Al-Hajeri

Appl. Sci. 2024, 14(19), 9020; https://doi.org/10.3390/app14199020 - 6 Oct 2024

Cited by 2 | Viewed by 3565

Abstract

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten [...] Read more.

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten text recognition received little attention compared with other languages. Therefore, it is crucial to develop a new model that can recognize Arabic handwritten text. Most of the existing models used to acknowledge Arabic text are based on traditional machine learning techniques. Therefore, we implemented a new model using deep machine learning techniques by integrating two deep neural networks. In the new model, the architecture of the Residual Network (ResNet) model is used to extract features from raw images. Then, the Bidirectional Long Short-Term Memory (BiLSTM) and connectionist temporal classification (CTC) are used for sequence modeling. Our system improved the recognition rate of Arabic handwritten text compared to other models of a similar type with a character error rate of 13.2% and word error rate of 27.31%. In conclusion, the domain of Arabic handwritten recognition is advancing swiftly with the use of sophisticated deep learning methods. Full article

(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)

► Show Figures

Figure 1

20 pages, 5247 KiB

Open AccessArticle

Design With, Not For, Local Community: Utilizing e-Participation Tools in the Design of Socially Sustainable Vertical Emirati Public Housing

by Omar Sherzad M. Shareef and Khaled Galal Ahmed

Buildings 2024, 14(7), 2235; https://doi.org/10.3390/buildings14072235 - 20 Jul 2024

Cited by 3 | Viewed by 1852

Abstract

The United Arab Emirates (UAE) is slowly transitioning from traditional single-family public housing to a ‘vertical’ typology to meet the increasing demand on public housing, solve the problem of the scarcity of land in urban areas, and contribute to achieving its local agenda [...] Read more.

The United Arab Emirates (UAE) is slowly transitioning from traditional single-family public housing to a ‘vertical’ typology to meet the increasing demand on public housing, solve the problem of the scarcity of land in urban areas, and contribute to achieving its local agenda for sustainable development goals. However, the direct involvement of Emirati residents in the design process of the recently developed limited number of vertical public housing projects has been missing. This research aims to involve a sample of Emirati residents, representing the targeted category for vertical public housing, in the pre-occupancy evaluation of the design of Al Ghurfa, the very recently developed vertical public housing project, focusing mainly on assessing the attainment of social sustainability in this design. The research method included four phases, including initiating a conceptual framework from relevant literature reviews, digitalizing the case study design, developing the conventional and e-Participation interview scenarios and scripts, and selecting a sample of Emirati young citizens who participated in the study. The results of the study successfully highlighted the participating residents’ preferences and concerns regarding the design of the investigated pioneering vertical public housing project. The findings revealed the interviewed citizens’ perceptions of the investigated social sustainability principles in the vertical housing design pertaining to mixed-use development within and outside the vertical residential building, social integration among neighbors of the building, vertical and horizontal accessibility inside and outside the building, security measures for the residents of the buildings and their privacy, design measures of the high-quality living environments, the user-responsive design of the housing units, and the importance of their involvement in the design. This helped propose a set of recommended design actions for attaining social sustainability in vertical housing design tailored to the specific needs of Emirati residents. The research has also revealed the successful merger between the conventional and advanced e-Participation tools in involving the residents in assessing the professional design of vertical public housing as a new emerging typology that is expected to prevail in the near future. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

27 pages, 7405 KiB

Open AccessArticle

Informing the Design of an Accessible Arabic Typeface: A Visual Analysis to Identify Letterform Features of Dyslexia-Friendly Typefaces

by Muneera Mohamed Hejres and Amanda J. Tinker

Societies 2024, 14(4), 45; https://doi.org/10.3390/soc14040045 - 29 Mar 2024

Cited by 1 | Viewed by 4379

Abstract

Dyslexia-friendly typefaces for the Latin script have been proliferating during the past decade. The typefaces are designed to tackle the challenges faced in a dyslexic reading experience by manipulating their letter forms and typographic attributes; several studies reported a positive effect on the [...] Read more.

Dyslexia-friendly typefaces for the Latin script have been proliferating during the past decade. The typefaces are designed to tackle the challenges faced in a dyslexic reading experience by manipulating their letter forms and typographic attributes; several studies reported a positive effect on the reading experience. To this date, no working dyslexia-friendly Arabic typefaces are available for the public. The present study is part of a larger practice-based research, where a novel dyslexia-friendly Arabic typeface is designed using a user-centred design approach. The current visual analysis marks the developmental phase, identifying the letterform features of dyslexia-friendly Latin typefaces that can be mapped to the Arabic script. This article explores the typographic features of dyslexia-friendly Latin typefaces by conducting a qualitative visual analysis; a proposed modified version of Leeuwen’s Typographic Distinctive Features Framework is employed. The results are discussed considering the Arabic script’s visual implications in a dyslexic reading experience. The findings of this study are used to create a list of design considerations for a dyslexia-friendly Arabic typeface. Full article

(This article belongs to the Special Issue Visual Arts and Design: Practice-Based Research)

► Show Figures

Figure 1

33 pages, 9169 KiB

Open AccessArticle

Dhad—A Children’s Handwritten Arabic Characters Dataset for Automated Recognition

by Sarab AlMuhaideb, Najwa Altwaijry, Ahad D. AlGhamdy, Daad AlKhulaiwi, Raghad AlHassan, Haya AlOmran and Aliyah M. AlSalem

Appl. Sci. 2024, 14(6), 2332; https://doi.org/10.3390/app14062332 - 10 Mar 2024

Cited by 1 | Viewed by 2451

Abstract

This study delves into the intricate realm of recognizing handwritten Arabic characters, specifically targeting children’s script. Given the inherent complexities of the Arabic script, encompassing semi-cursive styles, distinct character forms based on position, and the inclusion of diacritical marks, the domain demands specialized [...] Read more.

This study delves into the intricate realm of recognizing handwritten Arabic characters, specifically targeting children’s script. Given the inherent complexities of the Arabic script, encompassing semi-cursive styles, distinct character forms based on position, and the inclusion of diacritical marks, the domain demands specialized attention. While prior research has largely concentrated on adult handwriting, the spotlight here is on children’s handwritten Arabic characters, an area marked by its distinct challenges, such as variations in writing quality and increased distortions. To this end, we introduce a novel dataset, “Dhad”, refined for enhanced quality and quantity. Our investigation employs a tri-fold experimental approach, encompassing the exploration of pre-trained deep learning models (i.e., MobileNet, ResNet50, and DenseNet121), custom-designed Convolutional Neural Network (CNN) architecture, and traditional classifiers (i.e., Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP)), leveraging deep visual features. The results illuminate the efficacy of fine-tuned pre-existing models, the potential of custom CNN designs, and the intricacies associated with disjointed classification paradigms. The pre-trained model MobileNet achieved the best test accuracy of 93.59% on the Dhad dataset. Additionally, as a conceptual proposal, we introduce the idea of a computer application designed specifically for children aged 7–12, aimed at improving Arabic handwriting skills. Our concluding reflections emphasize the need for nuanced dataset curation, advanced model architectures, and cohesive training strategies to navigate the multifaceted challenges of Arabic character recognition. Full article

(This article belongs to the Special Issue Digital Image Processing: Advanced Technologies and Applications)

► Show Figures

Figure 1

26 pages, 6098 KiB

Open AccessArticle

Unveiling Sentiments: A Comprehensive Analysis of Arabic Hajj-Related Tweets from 2017–2022 Utilizing Advanced AI Models

by Hanan M. Alghamdi

Big Data Cogn. Comput. 2024, 8(1), 5; https://doi.org/10.3390/bdcc8010005 - 2 Jan 2024

Cited by 7 | Viewed by 4061

Abstract

Sentiment analysis plays a crucial role in understanding public opinion and social media trends. It involves analyzing the emotional tone and polarity of a given text. When applied to Arabic text, this task becomes particularly challenging due to the language’s complex morphology, right-to-left [...] Read more.

Sentiment analysis plays a crucial role in understanding public opinion and social media trends. It involves analyzing the emotional tone and polarity of a given text. When applied to Arabic text, this task becomes particularly challenging due to the language’s complex morphology, right-to-left script, and intricate nuances in expressing emotions. Social media has emerged as a powerful platform for individuals to express their sentiments, especially regarding religious and cultural events. Consequently, studying sentiment analysis in the context of Hajj has become a captivating subject. This research paper presents a comprehensive sentiment analysis of tweets discussing the annual Hajj pilgrimage over a six-year period. By employing a combination of machine learning and deep learning models, this study successfully conducted sentiment analysis on a sizable dataset consisting of Arabic tweets. The process involves pre-processing, feature extraction, and sentiment classification. The objective was to uncover the prevailing sentiments associated with Hajj over different years, before, during, and after each Hajj event. Importantly, the results presented in this study highlight that BERT, an advanced transformer-based model, outperformed other models in accurately classifying sentiment. This underscores its effectiveness in capturing the complexities inherent in Arabic text. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

15 pages, 909 KiB

Open AccessArticle

A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization

by Canglan Liu, Wushouer Silamu and Yanbing Li

Appl. Sci. 2023, 13(19), 10589; https://doi.org/10.3390/app131910589 - 22 Sep 2023

Cited by 3 | Viewed by 1959

Abstract

Low-resource languages often face the problem of insufficient data, which leads to poor quality in machine translation. One approach to address this issue is data augmentation. Data augmentation involves creating new data by transforming existing data through methods such as flipping, cropping, rotating, [...] Read more.

Low-resource languages often face the problem of insufficient data, which leads to poor quality in machine translation. One approach to address this issue is data augmentation. Data augmentation involves creating new data by transforming existing data through methods such as flipping, cropping, rotating, and adding noise. Traditionally, pseudo-parallel corpora are generated by randomly replacing words in low-resource language machine translation. However, this method can introduce ambiguity, as the same word may have different meanings in different contexts. This study proposes a new approach for low-resource language machine translation, which involves generating pseudo-parallel corpora by replacing phrases. The performance of this approach is compared with other data augmentation methods, and it is observed that combining it with other data augmentation methods further improves performance. To enhance the robustness of the model, R-Drop regularization is also used. R-Drop is an effective method for improving the quality of machine translation. The proposed method was tested on Chinese–Kazakh (Arabic script) translation tasks, resulting in performance improvements of 4.99 and 7.7 for Chinese-to-Kazakh and Kazakh-to-Chinese translations, respectively. By combining the generation of pseudo-parallel corpora through phrase replacement with the application of R-Drop regularization, there is a significant advancement in machine translation performance for low-resource languages. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

17 pages, 557 KiB

Open AccessArticle

The Role of Transliterated Words in Linking Bilingual News Articles in an Archive

by Muzammil Khan, Sarwar Shah Khan, Yasser Alharbi, Ali Alferaidi, Talal Saad Alharbi and Kusum Yadav

Appl. Sci. 2023, 13(7), 4435; https://doi.org/10.3390/app13074435 - 31 Mar 2023

Cited by 4 | Viewed by 1694

Abstract

Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are [...] Read more.

Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are included in the archive. Computing similarity against a query and among news articles in huge and evolving collections may be inaccurate and time-consuming at run time. This paper introduces a Similarity Measure based on Transliteration Words (SMTW) from the English language in the Urdu scripts for linking news articles extracted from multiple online sources during the preservation process. The SMTW link Urdu-to-English news articles using an upgraded Urdu-to-English lexicon, including transliteration words. The SMTW was exhaustively evaluated to assess the effectiveness using different size datasets and the results were compared with the Common Ratio Measure for Dual Language (CRMDL). The experimental results show that the SMTW was more effective than the CRMDL for linking Urdu-to-English news articles. The precision improved from 50% to 60%, recall improved from 67% to 82%, and the impact of common terms also improved. Full article

(This article belongs to the Special Issue Advanced Computational and Linguistic Analytics)

► Show Figures

Figure 1

19 pages, 1469 KiB

Open AccessArticle

Linguistic Variation, Social Meaning and Covert Prestige in a Northern Moroccan Arabic Variety

by Montserrat Benítez Fernández

Languages 2023, 8(1), 89; https://doi.org/10.3390/languages8010089 - 21 Mar 2023

Cited by 1 | Viewed by 4213

Abstract

This paper addresses how gender and age, as macro-sociological factors, influence variation and change in the Northern Moroccan Arabic variety of Ouezzane, and how social meaning plays a role in this variation. To do so, it examines the high degree of variability in [...] Read more.

This paper addresses how gender and age, as macro-sociological factors, influence variation and change in the Northern Moroccan Arabic variety of Ouezzane, and how social meaning plays a role in this variation. To do so, it examines the high degree of variability in the realization of two phonetic variables, the voiceless alveolar plosive /t/ and the voiceless uvular plosive /q/, in a corpus of semi-scripted interviews with 20 local informants. The data for the study was gathered during several fieldwork campaigns carried out between 2014 and 2021. The analysis combines quantitative and qualitative methods. Quantitative comparisons are drawn across gender and three age categories (under 30, between 30 and 50, and over 50) to search for gender and/or age markers, while the data are qualitatively analyzed with regard to the increase in the use of certain allophones, attrition and loss of other variants, and metalinguistic comments made by informants on those traits. These two methods make it possible to identify how the phonetic variables analyzed contribute to the construction of various identities, such as an “older person” identity, as well as self-affiliation with particular social groups, such as “artisans” or “rural women”, from which other groups, such as male university graduates, are keen to distance themselves. Full article

16 pages, 4781 KiB

Open AccessArticle

A Multi-Layer Holistic Approach for Cursive Text Recognition

by Muhammad Umair, Muhammad Zubair, Farhan Dawood, Sarim Ashfaq, Muhammad Shahid Bhatti, Mohammad Hijji and Abid Sohail

Appl. Sci. 2022, 12(24), 12652; https://doi.org/10.3390/app122412652 - 9 Dec 2022

Cited by 9 | Viewed by 2739

Abstract

Urdu is a widely spoken and narrated language in several South-Asian countries and communities worldwide. It is relatively hard to recognize Urdu text compared to other languages due to its cursive writing style. The Urdu text script belongs to a non-Latin cursive family [...] Read more.

Urdu is a widely spoken and narrated language in several South-Asian countries and communities worldwide. It is relatively hard to recognize Urdu text compared to other languages due to its cursive writing style. The Urdu text script belongs to a non-Latin cursive family script like Arabic, Hindi and Chinese. Urdu is written in several writing styles, among which ‘Nastaleeq’ is the most popular and widely used font style. A gap still poses a challenge for localization/detection and recognition of Urdu Nastaleeq text as it follows modified version of Arabic script. This research study presents a methodology to recognize and classify Urdu text in Nastaleeq font, regardless of the text position in the image. The proposed solution is comprised of a two-step methodology. In the first step, text detection is performed using the Connected Component Analysis (CCA) and Long Short-Term Memory Neural Network (LSTM). In the second step, a hybrid Convolution Neural Network and Recurrent Neural Network (CNN-RNN) architecture is deployed to recognize the detected text. The image containing Urdu text is binarized and segmented to produce a single-line text image fed to the hybrid CNN-RNN model, which recognizes the text and saves it in a text file. The proposed technique outperforms the existing ones by achieving an overall accuracy of 97.47%. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI