Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (35)

Search Parameters:
Keywords = alphabetic reading

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 688 KiB  
Article
Syntactic Information Extraction in the Parafovea: Evidence from Two-Character Phrases in Chinese
by Zijia Lu
Behav. Sci. 2025, 15(7), 935; https://doi.org/10.3390/bs15070935 - 10 Jul 2025
Viewed by 176
Abstract
This study investigates syntactic parafoveal processing in Chinese reading using a boundary paradigm with two-character verb–object phrases. Participants (N = 120 undergraduates) viewed sentences with manipulated previews (identity, syntactically consistent, and inconsistent previews). Results showed a selective syntactic preview effect: syntactical violations reduced [...] Read more.
This study investigates syntactic parafoveal processing in Chinese reading using a boundary paradigm with two-character verb–object phrases. Participants (N = 120 undergraduates) viewed sentences with manipulated previews (identity, syntactically consistent, and inconsistent previews). Results showed a selective syntactic preview effect: syntactical violations reduced target word skipping rates, but fixation durations remained unaffected. This dissociation contrasts with robust syntactic preview benefits observed in alphabetic languages, highlighting how Chinese’s lack of morphological markers constrains parafoveal processing. The findings challenge parallel processing models while supporting language-specific modulation of universal cognitive mechanisms. Our results advance understanding of hierarchical information extraction in reading, with implications for developing cross-linguistic reading models. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

12 pages, 239 KiB  
Article
What Is Scripture for Thomas Aquinas?
by Piotr Roszak and Krzysztof Krzemiński
Religions 2025, 16(7), 845; https://doi.org/10.3390/rel16070845 - 26 Jun 2025
Viewed by 221
Abstract
St. Thomas Aquinas defines theology (sacra doctrina) as the communication of wisdom that comes from God and leads to Him. What is important here, according to Thomas, is to read the Bible as a whole and not as a cluster of random books. [...] Read more.
St. Thomas Aquinas defines theology (sacra doctrina) as the communication of wisdom that comes from God and leads to Him. What is important here, according to Thomas, is to read the Bible as a whole and not as a cluster of random books. Revelation, and the testimony of it which is the Bible, cannot be reduced to a mere literal communication of divine truth. More fundamental than the biblical words (verba) themselves is the reality (res) to which they refer: the salvific truth communicated by God. The Thomistic approach to Scripture in theology is shaped by four complementary dimensions: auctoritas (power of authority), sensus (meaning), finis (purpose), and documentum (testimony). In this light, Scripture functions as the “alphabet” of theology—the foundational semantic structure through which revealed truth is expressed and transmitted. Full article
21 pages, 534 KiB  
Article
The Influence of Home Language and Literacy Environment and Parental Self-Efficacy on Chilean Preschoolers’ Early Literacy Outcomes
by Pelusa Orellana, Maria Cockerill, Maria Francisca Valenzuela, Malva Villalón, Carmen De la Maza and Pamela Inostroza
Educ. Sci. 2025, 15(6), 668; https://doi.org/10.3390/educsci15060668 - 28 May 2025
Viewed by 491
Abstract
We examined the effects of shared reading workshops on children’s early literacy outcomes. Data was collected for 240 children, 144 of whom had their parents or caregivers participate in a shared reading workshop. The remaining 129 were included as a comparison group. Pre- [...] Read more.
We examined the effects of shared reading workshops on children’s early literacy outcomes. Data was collected for 240 children, 144 of whom had their parents or caregivers participate in a shared reading workshop. The remaining 129 were included as a comparison group. Pre- and post-intervention measures of HLLE, narrative skills, alphabet knowledge, and parental self-efficacy scores were collected. Findings show higher and statistically significant scores in alphabet knowledge and narrative skills for children whose parents implemented shared reading. Parental self-efficacy increased after participating in the workshops. Correlations between HLLE, parental self-efficacy, and children’s outcomes were low yet significant. To further investigate the role of HLLE as a mediator of children’s outcomes, we used structural equation modeling. Results show an interaction of HLLE on children’s narrative skills and alphabet knowledge. Full article
(This article belongs to the Section Language and Literacy Education)
Show Figures

Figure 1

20 pages, 815 KiB  
Article
Investigating the Relationship Between Oral Reading Miscues and Comprehension in L2 Chinese
by Sicheng Wang
Languages 2025, 10(5), 115; https://doi.org/10.3390/languages10050115 - 19 May 2025
Viewed by 620
Abstract
Reading comprehension in Chinese as a second language (L2 Chinese) presents unique challenges due to the language’s logographic writing system. Analysis of oral reading miscues reveals specific patterns in L2 learners’ reading processes and comprehension difficulties. Despite established theoretical frameworks for miscue analysis [...] Read more.
Reading comprehension in Chinese as a second language (L2 Chinese) presents unique challenges due to the language’s logographic writing system. Analysis of oral reading miscues reveals specific patterns in L2 learners’ reading processes and comprehension difficulties. Despite established theoretical frameworks for miscue analysis in alphabetic languages, empirical research on miscues in logographic systems such as Chinese remains limited, particularly regarding their relationship with reading comprehension. This study investigates the relationship between oral reading miscues and literal comprehension of Chinese texts among L2 Chinese learners. Sixty-six intermediate-level Chinese learners from U.S. universities participated in the study. Oral reading and sentence-level translation tasks were administered to examine miscues and assess comprehension. Through analyzing the oral reading data, we identified 14 types of oral reading miscues, and they were categorized into four categories: orthographic, syntactic, semantic, and word processing miscues. Results showed strong negative correlations between oral reading miscues and comprehension. Orthographic, syntactic, and semantic miscues were negatively correlated with reading comprehension performance, while word processing miscues showed no significant correlation with comprehension. The findings reveal the complex relationship between character recognition, word processing behaviors, and comprehension in L2 Chinese reading, and suggest a need for a nuanced approach to oral reading error correction in L2 Chinese reading instruction. Based on the findings, pedagogical implications for effective reading instruction and reading assessment in L2 Chinese classrooms are discussed. Full article
Show Figures

Figure 1

20 pages, 2009 KiB  
Article
Driving Factors in the Development of Eye Movement Patterns in Chinese Reading: The Roles of Linguistic Ability and Oculomotor Maturation
by Meihua Guo, Nina Liu, Jingen Wu, Chengchieh Li and Guoli Yan
Behav. Sci. 2025, 15(4), 426; https://doi.org/10.3390/bs15040426 - 26 Mar 2025
Viewed by 528
Abstract
The mechanisms driving the development of eye movement patterns is an unresolved debate in children during reading, with three competing hypotheses: the oculomotor-tuning hypothesis, the linguistic-proficiency hypothesis, and the combined hypothesis that incorporates both. This study examined eye movement patterns in 215 Chinese [...] Read more.
The mechanisms driving the development of eye movement patterns is an unresolved debate in children during reading, with three competing hypotheses: the oculomotor-tuning hypothesis, the linguistic-proficiency hypothesis, and the combined hypothesis that incorporates both. This study examined eye movement patterns in 215 Chinese children from first to fifth grade using sentence-reading tasks. Oculomotor maturation was measured through saccade tasks, and linguistic abilities were assessed using Chinese character recognition and vocabulary knowledge tests. Path analysis explored how these factors predict temporal and spatial eye movement measures. Results indicated that temporal measures were primarily driven by linguistic abilities, supporting the linguistic-proficiency hypothesis. Spatial measures, however, were influenced by both linguistic abilities and oculomotor maturation, supporting the combined hypothesis. These findings diverge from predictions of the E-Z Reader model in alphabetic scripts, likely due to the unique visual complexity of Chinese characters. Full article
(This article belongs to the Special Issue Children’s Cognitive Development in Social and Cultural Contexts)
Show Figures

Figure 1

11 pages, 1298 KiB  
Article
The Contribution of Sustained Attention and Response Inhibition to Reading Comprehension Among Japanese Adolescents
by Inbar Lucia Trinczer, Yarden Dankner, Shira Frances-Israeli, Yoshi A. Okamoto, Dav Clark and Lilach Shalev
Children 2024, 11(10), 1245; https://doi.org/10.3390/children11101245 - 16 Oct 2024
Viewed by 1596
Abstract
Background: Previous studies demonstrated the influential role of sustained attention in the reading comprehension of alphabetic writing systems. However, there is limited understanding of how these cognitive functions contribute to reading comprehension in non-alphabetic systems, such as Japanese. This study seeks to explore [...] Read more.
Background: Previous studies demonstrated the influential role of sustained attention in the reading comprehension of alphabetic writing systems. However, there is limited understanding of how these cognitive functions contribute to reading comprehension in non-alphabetic systems, such as Japanese. This study seeks to explore this gap, focusing on how sustained attention and response inhibition function in a writing system where some of the characters represent meanings rather than sounds, introducing another layer of difficulty in the complex process of reading; Methods: Seventy-five Japanese 9th grade students performed a task to assess sustained attention and response inhibition. The cognitive test was carried out using tablets to enable feasible parallel group administration while maintaining high comparability with ecological classroom settings. Reading comprehension was measured using an exam that the participants took as part of their educational routine; Results: Our results indicate that both sustained attention and response inhibition significantly contributed to the reading comprehension of Japanese 9th grade students; Conclusions: These results replicate and expand previous studies documenting the contribution of sustained attention on the reading comprehension of alphabetic writing systems to a non-alphabetic system. Moreover, our findings unravel another important cognitive factor, namely response inhibition in reading comprehension. We suggest that response inhibition may play a crucial role in reading non-alphabetic writing systems that pose high cognitive demands, such as Japanese. Full article
(This article belongs to the Special Issue Cognitive and Linguistic Development in Children and Adolescents)
Show Figures

Figure 1

20 pages, 5193 KiB  
Article
The Effect of Visual Word Segmentation Cues in Tibetan Reading
by Danhui Wang, Dingyi Niu, Tianzhi Li and Xiaolei Gao
Brain Sci. 2024, 14(10), 964; https://doi.org/10.3390/brainsci14100964 - 25 Sep 2024
Viewed by 962
Abstract
Background/Objectives: In languages with within-word segmentation cues, the removal or replacement of these cues in a text hinders reading and lexical recognition, and adversely affects saccade target selection during reading. However, the outcome of artificially introducing visual word segmentation cues into a language [...] Read more.
Background/Objectives: In languages with within-word segmentation cues, the removal or replacement of these cues in a text hinders reading and lexical recognition, and adversely affects saccade target selection during reading. However, the outcome of artificially introducing visual word segmentation cues into a language that lacks them is unknown. Tibetan exemplifies a language that does not provide visual cues for word segmentation, relying solely on visual cues for morpheme segmentation. Moreover, previous studies have not examined word segmentation in the Tibetan language. Therefore, this study investigated the effects of artificially incorporated visual word segmentation cues and basic units of information processing in Tibetan reading. Methods: We used eye-tracking technology and conducted two experiments with Tibetan sentences that artificially incorporated interword spaces and color alternation markings as visual segmentation cues. Conclusions: The results indicated that interword spaces facilitate reading and lexical recognition and aid in saccade target selection during reading. Color alternation markings facilitate reading and vocabulary recognition but do not affect saccade selection. Words are more likely to be the basic units of information processing and exhibit greater psychological reality than morphemes. These findings shed light on the nature and rules of Tibetan reading and provide fundamental data to improve eye movement control models for reading alphabetic writing systems. Furthermore, our results may offer practical guidance and a scientific basis for improving the efficiency of reading, information processing, and word segmentation in Tibetan reading. Full article
(This article belongs to the Section Neurolinguistics)
Show Figures

Figure 1

18 pages, 622 KiB  
Essay
Learning to Read in Hebrew and Arabic: Challenges and Pedagogical Approaches
by Martin Luther Chan
Educ. Sci. 2024, 14(7), 765; https://doi.org/10.3390/educsci14070765 - 12 Jul 2024
Viewed by 1554
Abstract
Hebrew and Arabic are Semitic languages that use abjad alphabets, a consonant-primary writing system in which vowels are featured as optional diacritics. The relatively predictable morphology of Semitic language renders abjad writing feasible, with literate native speakers relying on grammatical and lexical familiarity [...] Read more.
Hebrew and Arabic are Semitic languages that use abjad alphabets, a consonant-primary writing system in which vowels are featured as optional diacritics. The relatively predictable morphology of Semitic language renders abjad writing feasible, with literate native speakers relying on grammatical and lexical familiarity to infer vowel sounds from consonantal texts. However, in the context of foreign language acquisition, abjads present unique difficulties in the attainment of literacy. Due to the absence of written vowels, learners of Hebrew and Arabic face manifold challenges, such as phonetic ambiguity, extensive homography, and morphological unpredictability. Therefore, the inherent complexities of abjad alphabets necessitate targeted pedagogical intervention to increase metalinguistic awareness to strengthen learners’ reading skills—specifically, by recreating elements of literacy education for native speakers in the second language context. This article explores the linguistic challenges of abjads for foreign language students and how pedagogical methodologies can be optimized to ameliorate long-term learning outcomes. Full article
15 pages, 520 KiB  
Article
Learning to Read in an Intermediate Depth Orthography: The Longitudinal Role of Grapheme Sounding on Different Types of Reading Fluency
by Sandra Fernandes, Luís Querido and Arlette Verhaeghe
Behav. Sci. 2024, 14(5), 396; https://doi.org/10.3390/bs14050396 - 10 May 2024
Viewed by 2266
Abstract
Phonological processing skills, such as phonological awareness, are known predictors of reading acquisition in alphabetic languages with varying degrees of orthographic complexity. However, the role of multi-letter-sound knowledge, an important foundation for early reading development, in supporting reading fluency development remains to be [...] Read more.
Phonological processing skills, such as phonological awareness, are known predictors of reading acquisition in alphabetic languages with varying degrees of orthographic complexity. However, the role of multi-letter-sound knowledge, an important foundation for early reading development, in supporting reading fluency development remains to be determined. This study examined whether two core foundational skills, phonemic awareness and grapheme sounding, have a predictive role in reading fluency development in an intermediate-depth orthography. The participants were 62 children learning to read in European Portuguese, and they were longitudinally assessed on phonemic awareness, complex grapheme sounding, and reading fluency (decoding, word, and text) from Grade 2 to Grade 3. The results showed that grapheme sounding predicted reading fluency development controlled for nonverbal intelligence and vocabulary, short-term verbal memory, and phonemic awareness. Grapheme sounding plays a prominent role in predicting reading fluency outcomes, whereas phonemic awareness (both accuracy and time per correct item) did not contribute to any of the three types of reading fluency. The fact that grapheme-sounding predicted reading fluency is likely due to complex grapheme-phoneme correspondences being required to achieve proficient reading. These findings provide insights into the cognitive processes underlying reading development in intermediate-depth orthographies and have implications for early literacy instruction. Full article
27 pages, 381 KiB  
Article
The Effects of Orthography on the Pronunciation of Nasal Vowels by L1 Japanese Learners of L3 French: Evidence from a Longitudinal Study of Speech in Interaction
by Cyrille Granget, Cecilia Gunnarsson, Inès Saddour, Clara Solier, Vera Serrau and Charlotte Alazard
Educ. Sci. 2024, 14(3), 234; https://doi.org/10.3390/educsci14030234 - 23 Feb 2024
Viewed by 2595
Abstract
In recent decades, a vast literature has documented crosslinguistic influences on the acquisition of L2 phonology and in particular the effects of spelling on pronunciation. However, articulating these research findings in terms of taking into account the effects of L1 phonology and spelling [...] Read more.
In recent decades, a vast literature has documented crosslinguistic influences on the acquisition of L2 phonology and in particular the effects of spelling on pronunciation. However, articulating these research findings in terms of taking into account the effects of L1 phonology and spelling on L2 pronunciation in language teaching remains to be examined. These studies are based on experimental cross-sectional methods and mainly focus on L2 English learning by speakers of languages with an alphabetic system. In French, there are few studies on crosslinguistic influences on the acquisition of the nasal vowels (//, // and /ε~/) and few experimental studies that point to a possible effect of orthography on the pronunciation of these phonemes. The results of experimental studies are difficult to transpose to the language classroom because they are based on word or sentence reading and writing activities, which are quite far-removed from the conversational activities practised in the classroom in interaction with peers and the teacher. Hence, we opted here for a case study of the effect of spelling on the production of nasal vowels in interaction tasks. We conducted a longitudinal study during the first year of extensive learning of French (4 h 30 per week). The results of a perceptive analysis by expert listeners show that (i) learners spell nasal vowels with an <n> or <m> in 98% of the obligatory contexts; (ii) most nasal vowels are perceived as nasal vowels in speech (72%), the others being perceived as vowels followed by a nasal consonant (19.5%) or as oral vowels (8.5%); (iii) consonantisation is stronger when the learner spontaneously produces a word than when (s)he repeats it, (iv) which decreases with time (learning effect) and varies (v) according to the consonant, /ε~/ being less consonantised than // and //. Finaly, we propose a didactic discussion in the light of intelligibility and influence of orthography. Full article
Show Figures

Figure 1

21 pages, 17631 KiB  
Article
The Serbian Sign Language Alphabet: A Unique Authentic Dataset of Letter Sign Gestures
by Mladen Radaković, Marina Marjanović, Ivana Ristić, Valentin Kuleto, Milena P. Ilić and Svetlana Dabić-Miletić
Mathematics 2024, 12(4), 525; https://doi.org/10.3390/math12040525 - 8 Feb 2024
Viewed by 2066
Abstract
Language barriers and the communication difficulties of individuals with developmental disabilities are two major causes of communication problems that societies worldwide encounter. A particularly challenging group is hearing-impaired people who have difficulties with communication, reading, writing, learning, and social interactions, which have a [...] Read more.
Language barriers and the communication difficulties of individuals with developmental disabilities are two major causes of communication problems that societies worldwide encounter. A particularly challenging group is hearing-impaired people who have difficulties with communication, reading, writing, learning, and social interactions, which have a substantial impact on their quality of life. This article focuses on detailing a Serbian Sign Language alphabet database and the method for creating it in order to provide a foundation for answering the various societal challenges of persons who use the Serbian language. In front of a computer camera, 41 people performed Serbian Sign Language sign movements that replicated the Serbian alphabet for this study’s aims. Hand and body key points were identified using the recorded video clips, and the numerical values of the identified key points were then stored in a database for further processing. In total, 8.346 video clips of people making recognized hand gestures were gathered, processed, classed, and archived. This paper provides a thorough technique that may be applied to comparable tasks and details the process of constructing a dataset based on Serbian Sign Language alphabet signs. This dataset was created using custom-made Python 3.11 software. Data regarding dynamic video clips that capture entire subject movement were incorporated into this dataset to fill in the gaps in other similar efforts based on static photographs. Thus, the purpose of this investigation is to employ innovative technology to support the community of hearing-impaired people in areas such as general inclusion, education, communication, and empowerment. Full article
Show Figures

Figure 1

13 pages, 18095 KiB  
Article
Minoan Cryptanalysis: Computational Approaches to Deciphering Linear A and Assessing Its Connections with Language Families from the Mediterranean and the Black Sea Areas
by Aaradh Nepal and Francesco Perono Cacciafoco
Information 2024, 15(2), 73; https://doi.org/10.3390/info15020073 - 25 Jan 2024
Cited by 2 | Viewed by 4545
Abstract
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not [...] Read more.
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not only is the corpus very small, but it is challenging to link Minoan, the language behind Linear A, to any known language. Most decipherment attempts involve using the phonetic values of Linear B, a grammatological offspring of Linear A, to ‘read’ Linear A. However, this yields meaningless words. Recently, novel approaches to deciphering the script have emerged which involve a computational component. In this paper, two such approaches are combined to account for the biases involved in provisionally assigning Linear B phonetic values to Linear A and to shed more light on the possible connections of Linear A with other scripts and languages from the region. Additionally, the limitations inherent in such approaches are discussed. Firstly, a feature-based similarity measure is used to compare Linear A with the Carian Alphabet and the Cypriot Syllabary. A few Linear A symbols are matched with symbols from the Carian Alphabet and the Cypriot Syllabary. Finally, using the derived phonetic values, Linear A is compared with Ancient Egyptian, Luwian, Hittite, Proto-Celtic, and Uralic using a consonantal approach. Some possible word matches are identified from each language. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
17 pages, 2897 KiB  
Article
Is Short-Term Memory Made of Two Processing Units? Clues from Italian and English Literatures down Several Centuries
by Emilio Matricciani
Information 2024, 15(1), 6; https://doi.org/10.3390/info15010006 - 20 Dec 2023
Cited by 3 | Viewed by 1859
Abstract
We propose that short-term memory (STM), when processing a sentence, uses two independent units in series. The clues for conjecturing this model emerge from studying many novels from Italian and English Literature. This simple model, referring to the surface of language, seems to [...] Read more.
We propose that short-term memory (STM), when processing a sentence, uses two independent units in series. The clues for conjecturing this model emerge from studying many novels from Italian and English Literature. This simple model, referring to the surface of language, seems to describe mathematically the input-output characteristics of a complex mental process involved in reading/writing a sentence. We show that there are no significant mathematical/statistical differences between the two literary corpora by considering deep-language variables and linguistic communication channels. Therefore, the surface mathematical structure of alphabetical languages is very deeply rooted in the human mind, independently of the language used. The first processing unit is linked to the number of words between two contiguous interpunctions, variable Ip, approximately ranging in Miller’s 7 ± 2 range; the second unit is linked to the number of Ip’s contained in a sentence, variable MF, ranging approximately from 1 to 6. The overall capacity required to process a sentence fully ranges from 8.3 to 61.2 words, values that can be converted into time by assuming a reading speed, giving the range 2.6∼19.5 s for fast-reading and 5.3∼30.1 s for the average reader. Since a sentence conveys meaning, the surface features we have found might be a starting point to arrive at an information theory that includes meaning. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2023)
Show Figures

Figure 1

18 pages, 6495 KiB  
Article
A Smart Real-Time Parking Control and Monitoring System
by Abdelrahman Osman Elfaki, Wassim Messoudi, Anas Bushnag, Shakour Abuzneid and Tareq Alhmiedat
Sensors 2023, 23(24), 9741; https://doi.org/10.3390/s23249741 - 10 Dec 2023
Cited by 29 | Viewed by 24700
Abstract
Smart parking is an artificial intelligence-based solution to solve the challenges of inefficient utilization of parking slots, wasting time, congestion producing high CO2 emission levels, inflexible payment methods, and protecting parked vehicles from theft and vandalism. Nothing is worse than parking congestion [...] Read more.
Smart parking is an artificial intelligence-based solution to solve the challenges of inefficient utilization of parking slots, wasting time, congestion producing high CO2 emission levels, inflexible payment methods, and protecting parked vehicles from theft and vandalism. Nothing is worse than parking congestion caused by drivers looking for open spaces. This is common in large parking lots, underground garages, and multi-story car parks, where visibility is limited and signage can be confusing or difficult to read, so drivers have no idea where available parking spaces are. In this paper, a smart real-time parking management system has been introduced. The developed system can deal with the aforementioned challenges by providing dynamic allocation for parking slots while taking into consideration the overall parking situation, providing a mechanism for booking a specific parking slot by using our Artificial Intelligence (AI)-based application, and providing a mechanism to ensure that the car is parked in its correct place. For the sake of providing cost flexibility, we have provided two technical solutions with cost varying. The first solution is developed based on a motion sensor and the second solution is based on a range-finder sensor. A plate detection and recognition system has been used to detect the vehicle’s license plate by capturing the image using an IoT device. The system will recognize the extracted English alphabet and Hindu-Arabic Numerals. The proposed solution was built and field-tested to prove the applicability of the proposed smart parking solution. We have measured and analyzed keen data such as vehicle plate detection accuracy, vehicle plate recognition accuracy, transmission delay time, and processing delay time. Full article
(This article belongs to the Special Issue Advanced IoT Systems in Smart Cities)
Show Figures

Figure 1

19 pages, 687 KiB  
Article
A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics
by Leonid Hanin and Lyudmila Pavlova
Mathematics 2023, 11(20), 4254; https://doi.org/10.3390/math11204254 - 11 Oct 2023
Viewed by 1470
Abstract
We study classification of random sequences of characters selected from a given alphabet into two classes characterized by distinct character selection probabilities and length distributions. The classification is based on the sign of the log-likelihood score (LLS) consisting of a random sum and [...] Read more.
We study classification of random sequences of characters selected from a given alphabet into two classes characterized by distinct character selection probabilities and length distributions. The classification is based on the sign of the log-likelihood score (LLS) consisting of a random sum and a random term depending on the length distributions for the two classes. For long sequences selected from a large alphabet, computing misclassification error rates is not feasible either theoretically or computationally. To mitigate this problem, we computed limiting distributions for two versions of the normalized LLS applicable to long sequences whose class-specific length follows a translated negative binomial distribution (TNBD). The two limiting distributions turned out to be plain or transformed Erlang distributions. This allowed us to establish the asymptotic accuracy of the likelihood-based classification of random sequences with TNBD length distributions. Our limit theorem generalizes a classic theorem on geometric random sums due to Rényi and is closely related to the published results of V. Korolev and coworkers on negative binomial random sums. As an illustration, we applied our limit theorem to the classification of DNA sequences contained in the genome of the bacterium Bacillus subtilis into two classes: protein-coding genes and standard noncoding open reading frames. We found that TNBDs provide an excellent fit to the length distributions for both classes and that the limiting distributions capture essential features of the normalized empirical LLS fairly well. Full article
Show Figures

Figure 1

Back to TopTop