Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (28)

Search Parameters:
Keywords = corpus phonetics

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 828 KB  
Article
N-Gram and RNN-LM Language Model Integration for End-to-End Amazigh Speech Recognition
by Meryam Telmem, Naouar Laaidi, Youssef Ghanou and Hassan Satori
Mach. Learn. Knowl. Extr. 2025, 7(4), 164; https://doi.org/10.3390/make7040164 - 10 Dec 2025
Viewed by 185
Abstract
This work investigates how different language modeling techniques affect the performance of an end-to-end automatic speech recognition (ASR) system for the Amazigh language. A (CNN-BiLSTM-CTC) model enhanced with an attention mechanism was used as the baseline. During decoding, two external language models were [...] Read more.
This work investigates how different language modeling techniques affect the performance of an end-to-end automatic speech recognition (ASR) system for the Amazigh language. A (CNN-BiLSTM-CTC) model enhanced with an attention mechanism was used as the baseline. During decoding, two external language models were integrated using shallow fusion: a trigram N-gram model built with KenLM and a recurrent neural network language model (RNN-LM) trained on the same Tifdigit corpus. Four decoding methods were compared: greedy decoding; beam search; beam search with an N-gram language model; and beam search with a compact recurrent neural network language model. Experimental results on the Tifdigit dataset reveal a clear trade-off: the N-gram language model produces the best results compared to RNN-LM, with a phonetic error rate (PER) of 0.0268, representing a relative improvement of 4.0% over the greedy baseline model, and translates into an accuracy of 97.32%. This suggests that N-gram models can outperform neural approaches when reliable, limited data and lexical resources are available. The improved N-gram approach notably outperformed both simple beam search and the RNN neural language model. This improvement is due to higher-order context modeling, its optimized interpolation weights, and its adaptive lexical weighting tailored to the phonotactic structure of the Amazigh language. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

18 pages, 1819 KB  
Article
Speech Markers of Parkinson’s Disease: Phonological Features and Acoustic Measures
by Ratree Wayland, Rachel Meyer and Kevin Tang
Brain Sci. 2025, 15(11), 1162; https://doi.org/10.3390/brainsci15111162 - 29 Oct 2025
Viewed by 1036
Abstract
Background/Objectives: Parkinson’s disease (PD) affects both articulatory and phonatory subsystems, leading to characteristic speech changes known as hypokinetic dysarthria. However, few studies have jointly analyzed these subsystems within the same participants using interpretable deep-learning-based measures. Methods: Speech data from the PC-GITA corpus, [...] Read more.
Background/Objectives: Parkinson’s disease (PD) affects both articulatory and phonatory subsystems, leading to characteristic speech changes known as hypokinetic dysarthria. However, few studies have jointly analyzed these subsystems within the same participants using interpretable deep-learning-based measures. Methods: Speech data from the PC-GITA corpus, including 50 Colombian Spanish speakers with PD and 50 age- and sex-matched healthy controls were analyzed. We combined phonological feature posteriors—probabilistic indices of articulatory constriction derived from the Phonet deep neural network—with harmonics-to-noise ratio (HNR) as a laryngeal measure. Linear mixed-effects models tested how these measures related to disease severity (UPDRS, UPDRS-speech, and Hoehn and Yahr), age, and sex. Results: PD participants showed significantly higher [continuant] posteriors, especially for dental stops, reflecting increased spirantization and articulatory weakening. In contrast, [sonorant] posteriors did not differ from controls, indicating reduced oral constriction without a shift toward more open, approximant-like articulations. HNR was predicted by vowel height and sex but did not distinguish PD from controls, likely reflecting ON-medication recordings. Conclusions: These findings demonstrate that deep-learning-derived articulatory features can capture early, subphonemic weakening in PD speech—particularly for coronal consonants—while single-parameter laryngeal indices such as HNR are less sensitive under medicated conditions. By linking spectral energy patterns to interpretable phonological categories, this approach provides a transparent framework for detecting subtle articulatory deficits and developing feature-level biomarkers of PD progression. Full article
(This article belongs to the Section Behavioral Neuroscience)
Show Figures

Figure 1

31 pages, 5187 KB  
Article
Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation
by Diana Rakhimova, Zhansaya Duisenbekkyzy and Eşref Adali
Appl. Sci. 2025, 15(16), 8989; https://doi.org/10.3390/app15168989 - 14 Aug 2025
Cited by 1 | Viewed by 2255
Abstract
This study focuses on the development and evaluation of automatic speech recognition (ASR) systems for Kazakh child speech, an underexplored domain in both linguistic and computational research. A specialized acoustic corpus was constructed for children aged 2 to 8 years, incorporating age-related vocabulary [...] Read more.
This study focuses on the development and evaluation of automatic speech recognition (ASR) systems for Kazakh child speech, an underexplored domain in both linguistic and computational research. A specialized acoustic corpus was constructed for children aged 2 to 8 years, incorporating age-related vocabulary stratification and gender variation to capture phonetic and prosodic diversity. The data were collected from three sources: a custom-designed Telegram bot, high-quality Dictaphone recordings, and naturalistic speech samples recorded in home and preschool environments. Four ASR models, Whisper, DeepSpeech, ESPnet, and Vosk, were evaluated. Whisper, ESPnet, and DeepSpeech were fine-tuned on the curated corpus, while Vosk was applied in its standard pretrained configuration. Performance was measured using five evaluation metrics: Word Error Rate (WER), BLEU, Translation Edit Rate (TER), Character Similarity Rate (CSRF2), and Accuracy. The results indicate that ESPnet achieved the highest accuracy (32%) and the lowest WER (0.242) for sentences, while Whisper performed well in semantically rich utterances (Accuracy = 33%; WER = 0.416). Vosk demonstrated the best performance on short words (Accuracy = 68%) and yielded the highest BLEU score (0.600) for short words. DeepSpeech showed moderate improvements in accuracy, particularly for short words (Accuracy = 60%), but faced challenges with longer utterances, achieving an Accuracy of 25% for sentences. These findings emphasize the critical importance of age-appropriate corpora and domain-specific adaptation when developing ASR systems for low-resource child speech, particularly in educational and therapeutic contexts. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 1765 KB  
Article
Can Informativity Effects Be Predictability Effects in Disguise?
by Vsevolod Kapatsinski
Entropy 2025, 27(7), 739; https://doi.org/10.3390/e27070739 - 10 Jul 2025
Viewed by 1049
Abstract
Recent work in corpus linguistics has observed that informativity predicts articulatory reduction of a linguistic unit above and beyond the unit’s predictability in the local context, i.e., the unit’s probability given the current context. Informativity of a unit is the inverse of average [...] Read more.
Recent work in corpus linguistics has observed that informativity predicts articulatory reduction of a linguistic unit above and beyond the unit’s predictability in the local context, i.e., the unit’s probability given the current context. Informativity of a unit is the inverse of average (log-scaled) predictability and corresponds to its information content. Research in the field has interpreted effects of informativity as speakers being sensitive to the information content of a unit in deciding how much effort to put into pronouncing it or as accumulation of memories of pronunciation details in long-term memory representations. However, average predictability can improve the estimate of local predictability of a unit above and beyond the observed predictability in that context, especially when that context is rare. Therefore, informativity can contribute to explaining variance in a dependent variable like reduction above and beyond local predictability simply because informativity improves the (inherently noisy) estimate of local predictability. This paper shows how to estimate the proportion of an observed informativity effect that is likely to be artifactual, due entirely to informativity improving the estimates of predictability, via simulation. The proposed simulation approach can be used to investigate whether an effect of informativity is likely to be real, under the assumption that corpus probabilities are an unbiased estimate of probabilities driving reduction behavior, and how much of it is likely to be due to noise in predictability estimates, in any real dataset. Full article
(This article belongs to the Special Issue Complexity Characteristics of Natural Language)
Show Figures

Figure 1

22 pages, 3293 KB  
Article
Phonetically Based Corpora for Anglicisms: A Tijuana–San Diego Contact Outcome
by Ruben Roberto Peralta-Rivera, Carlos Ivanhoe Gil-Burgoin and Norma Esthela Valenzuela-Miranda
Languages 2025, 10(6), 143; https://doi.org/10.3390/languages10060143 - 16 Jun 2025
Viewed by 2237
Abstract
Research in Loanword Phonology has extensively examined the adaptation processes of Anglicisms into recipient languages. In the Tijuana–San Diego border region, where English and Spanish have reciprocally existed, Anglicisms exhibit two main phonetic patterns: some structures exhibit Spanish phonetic properties, while others preserve [...] Read more.
Research in Loanword Phonology has extensively examined the adaptation processes of Anglicisms into recipient languages. In the Tijuana–San Diego border region, where English and Spanish have reciprocally existed, Anglicisms exhibit two main phonetic patterns: some structures exhibit Spanish phonetic properties, while others preserve English phonetic features. This study analyzes 131 vowel tokens drawn from spontaneous conversations with 28 bilingual speakers in Tijuana, recruited via the sociolinguistic ‘friend-of-a-friend’ approach. Specifically, it focuses on monosyllabic Anglicisms with monophthongs by examining the F1 and F2 values using Praat. The results were compared with theoretical vowel targets in English and Spanish through Euclidean distance analysis. Dispersion plots generated in R further illustrate the acoustic distribution of vowel realizations. The results reveal that some vowels closely match Spanish targets, others align with English, and several occupy intermediate acoustic spaces. Based on these patterns, the study proposes two phonetically based corpora—Phonetically Adapted Anglicisms (PAA) and Phonetically Non-Adapted Anglicisms (PNAA)—to capture the nature of Anglicisms in this contact setting. This research offers an empirically grounded basis for cross-dialectal comparison and language contact studies from a phonetically based approach. Full article
Show Figures

Figure 1

29 pages, 2368 KB  
Article
Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
by Chaoju Tang, Vincent J. van Heuven, Wilbert Heeringa and Charlotte Gooskens
Languages 2025, 10(6), 127; https://doi.org/10.3390/languages10060127 - 29 May 2025
Cited by 1 | Viewed by 5733
Abstract
In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another [...] Read more.
In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another than some European varieties that are traditionally called languages. More generally, we examined whether distances among varieties within and across European language families were larger than those within and across Sinitic language varieties. To this end, we computed lexico-phonetic as well as syntactic distance measures for comparable language materials in six Germanic, five Romance and six Slavic languages, as well as for six Mandarin and nine non-Mandarin (‘southern’) Chinese varieties. Lexico-phonetic distances were expressed as the length-normalized MPI-weighted Levenshtein distances computed on the 100 most frequently used nouns in the 32 language varieties. Syntactic distance was implemented as the (complement of) the Pearson correlation coefficient found for the PoS trigram frequencies established for a parallel corpus of the same four texts translated into each of the 32 languages. The lexico-phonetic distances proved to be relatively large and of approximately equal magnitude in the Germanic, Slavic and non-Mandarin Chinese language varieties. However, the lexico-phonetic distances among the Romance and Mandarin languages were considerably smaller, but of similar magnitude. Cantonese (Guangzhou dialect) was lexico-phonetically as distant from Standard Mandarin (Beijing dialect) as European language pairs such as Portuguese–Italian, Portuguese–Romanian and Dutch–German. Syntactically, however, the differences among the Sinitic varieties were about ten times smaller than the differences among the European languages, both within and across the families—which provides some justification for the Chinese tradition of calling the Sinitic varieties dialects of the same language. Full article
(This article belongs to the Special Issue Dialectal Dynamics)
Show Figures

Figure 1

18 pages, 2345 KB  
Article
SGM-EMA: Speech Enhancement Method Score-Based Diffusion Model and EMA Mechanism
by Yuezhou Wu, Zhiri Li and Hua Huang
Appl. Sci. 2025, 15(10), 5243; https://doi.org/10.3390/app15105243 - 8 May 2025
Viewed by 2345
Abstract
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using [...] Read more.
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using a score-based diffusion model and an efficient multi-scale attention mechanism (EMA) for the speech enhancement task. The model leverages the symmetric structure of U-Net to extract speech features and captures contextual information and local details across different scales using the EMA mechanism, improving speech quality in noisy environments. We evaluate the method on the VoiceBank-DEMAND (VB-DMD) dataset and the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus–TUT Sound Events 2017 (TIMIT-TUT) dataset. The experimental results show that the proposed model performed well in terms of speech quality perception (PESQ), extended short-time objective intelligibility (ESTOI), and scale-invariant signal-to-distortion ratio (SI-SDR). Especially when processing out-of-dataset noisy speech, the proposed method achieved excellent speech enhancement results compared to other methods, demonstrating the model’s strong generalization capability. We also conducted an ablation study on the SDE solver and the EMA mechanism, and the results show that the reverse diffusion method outperformed the Euler–Maruyama method, and the EMA strategy could improve the model performance. The results demonstrate the effectiveness of these two techniques in our system. Nevertheless, since the model is specifically designed for Gaussian noise, its performance under non-Gaussian or complex noise conditions may be limited. Full article
(This article belongs to the Special Issue Application of Deep Learning in Speech Enhancement Technology)
Show Figures

Figure 1

11 pages, 1727 KB  
Article
JLMS25 and Jiao-Liao Mandarin Speech Recognition Based on Multi-Dialect Knowledge Transfer
by Xuchen Li, Yiqun Wang, Xiaoyang Liu, Kun Su, Zhaochen Li, Yitian Wang, Bin Jiang, Kang Xie and Jie Liu
Appl. Sci. 2025, 15(3), 1670; https://doi.org/10.3390/app15031670 - 6 Feb 2025
Cited by 1 | Viewed by 2067
Abstract
Jiao-Liao Mandarin, a distinguished dialect in China, reflects the linguistic features and cultural heritage of the Jiao-Liao region. However, the labor-intensive and costly nature of manual transcription limits the scale of transcribed corpora, posing challenges for speech recognition. We present JLMS25, a transcribed [...] Read more.
Jiao-Liao Mandarin, a distinguished dialect in China, reflects the linguistic features and cultural heritage of the Jiao-Liao region. However, the labor-intensive and costly nature of manual transcription limits the scale of transcribed corpora, posing challenges for speech recognition. We present JLMS25, a transcribed corpus for Jiao-Liao Mandarin, alongside a novel multi-dialect knowledge transfer (MDKT) framework for low-resource speech recognition. By leveraging phonetic and linguistic knowledge from neighboring dialects, the MDKT framework improves recognition in resource-constrained settings. It comprises an acoustic feature extractor, a dialect feature extractor, and two modules—WFAdapter (weight decomposition adapter) and AttAdapter (attention-based adapter)—to enhance adaptability and mitigate overfitting. The training involves a three-phase strategy: multi-dialect AID-ASR multi-task learning in phase one, freezing the dialect feature extractor in phase two, and fine-tuning only the adapters in phase three. Experiments on the Jiao-Liao Mandarin subset of the KeSpeech dataset and JLMS25 dataset show that MDKT outperforms full-parameter fine-tuning, reducing Character Error Rate (CER) by 5.4% and 7.7% and Word Error Rate (WER) by 6.1% and 10.8%, respectively. Full article
(This article belongs to the Special Issue AI for Sustainability and Innovation—2nd Edition)
Show Figures

Figure 1

21 pages, 1280 KB  
Article
Quantifying and Characterizing Phonetic Reduction in Italian Natural Speech
by Loredana Schettino and Francesco Cutugno
Languages 2025, 10(1), 14; https://doi.org/10.3390/languages10010014 - 16 Jan 2025
Cited by 1 | Viewed by 1520
Abstract
The main purpose of this study is to test a method for the analysis of phonetic variation in natural speech. The method takes into account the continuous nature of the speech flow and allows for the investigation of the systematic variation phenomena that [...] Read more.
The main purpose of this study is to test a method for the analysis of phonetic variation in natural speech. The method takes into account the continuous nature of the speech flow and allows for the investigation of the systematic variation phenomena that occur in the speech net of the cross-word coarticulation phenomena that are expected in connected speech. We will describe some of the most frequent phonetic variation patterns that may be observed in the speech chain seen as a sequence of syllables, in relation to internal syllabic structure and lexical stress. The present study concerns speech data from the Italian section of the NOCANDO corpus. The data consist of about 1000 syllables extracted from monological speech from different speakers. In two different analysis layers, we attempted to align the “phonological” expected form and observed realisation. The results of this attempt led to the definition of syllabic deletion, substitution, or insertion when the alignment fails. The proposed method provides insight into the phonetic variation processes that can systematically occur in natural speech with relation to specific linguistic structures; in particular, unstressed syllables are most likely to undergo variation phenomena, and systematic differences concern the syllabic position of the segmental change, in that the presence of lexical stress prevents vowel deletion or centralization, but allows for onset changes (such as consonant cluster simplification or lenition). Full article
(This article belongs to the Special Issue Speech Variation in Contemporary Italian)
Show Figures

Figure 1

16 pages, 1512 KB  
Article
An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation
by Yi Qin and Feifan Yu
Sensors 2025, 25(2), 341; https://doi.org/10.3390/s25020341 - 9 Jan 2025
Cited by 2 | Viewed by 1473
Abstract
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as “Shapu”, characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end [...] Read more.
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as “Shapu”, characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect’s rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder’s text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer–Transformer–CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

28 pages, 4013 KB  
Article
Buenas no[tʃ]es y mu[ts]isimas gracias: A Sociophonetic Study of the Alveolar Affricate in Peninsular Spanish Political Speech
by Matthew Pollock
Languages 2024, 9(6), 218; https://doi.org/10.3390/languages9060218 - 14 Jun 2024
Viewed by 3133
Abstract
While variation in the southern Peninsular Spanish affricate /tʃ/ has been considered in the context of deaffrication to [ʃ], this study examines an emergent variant [ts] in the context of sociolinguistic identity and style in political speech. Based on a corpus of public [...] Read more.
While variation in the southern Peninsular Spanish affricate /tʃ/ has been considered in the context of deaffrication to [ʃ], this study examines an emergent variant [ts] in the context of sociolinguistic identity and style in political speech. Based on a corpus of public speech from Madrid and Andalusia, Spain, this study examines the phonetic and sociolinguistic characteristics of the affricate, finding variation in the quality of the frication portion of the segment through an analysis of segment duration (ms), the center of gravity (Hz), and a categorical identification of realization type. The results suggest that both linguistic variables, like phonetic environment, stress, lexical frequency, and following vowel formant height, as well as extralinguistic variables, like speaker city, gender, political affiliation, and speech context, condition use. Based on these findings, it appears that production of the alveolar affricate [ts] is an incipient sociolinguistic marker in the process of acquiring social meaning. It is particularly associated with female speech and prestige norms that transcend regional identification. This alveolar variant serves as an additional sociolinguistic resource accessible for identity development among politicians and offers insight into ongoing change in the affricate inventory of southern and northern-central Peninsular Spanish. Full article
(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)
Show Figures

Figure 1

11 pages, 849 KB  
Article
Gemination in Child Egyptian Arabic: A Corpus-Based Study
by Abdullah Alfaifi, Fawaz Qasem and Hassan Bokhari
Languages 2024, 9(6), 202; https://doi.org/10.3390/languages9060202 - 31 May 2024
Cited by 2 | Viewed by 1885
Abstract
This paper examines patterns of gemination in child Egyptian Arabic, with a focus on how gemination functions as a repair strategy, using data from the Egyptian Arabic Salama Corpus. The findings show that the phonological development of Egyptian Arabic-speaking children of geminated consonants [...] Read more.
This paper examines patterns of gemination in child Egyptian Arabic, with a focus on how gemination functions as a repair strategy, using data from the Egyptian Arabic Salama Corpus. The findings show that the phonological development of Egyptian Arabic-speaking children of geminated consonants correlates with previously established developmental stages. Initial stages involve the acquisition of labial geminates, transitioning through an increased use of alveolar and velar geminates, to the acquisition of rhotic and lateral geminates in later phases. The findings also suggest that gemination is not merely a phonetic phenomenon in child phonology, but also shows the children’s awareness of the phonology of the dialect, especially the moraicity of vowels and consonants. Full article
24 pages, 4100 KB  
Article
Robustness and Complexity in Italian Mid Vowel Contrasts
by Margaret E. L. Renwick
Languages 2024, 9(4), 150; https://doi.org/10.3390/languages9040150 - 18 Apr 2024
Cited by 2 | Viewed by 3565
Abstract
Accounts of phonological contrast traditionally invoke a binary distinction between unpredictable lexically stored phonemes and contextually predictable allophones, whose patterning reveals speakers’ knowledge about their native language. This paper explores the complexity of contrasts among Italian mid vowels from a multifaceted perspective considering [...] Read more.
Accounts of phonological contrast traditionally invoke a binary distinction between unpredictable lexically stored phonemes and contextually predictable allophones, whose patterning reveals speakers’ knowledge about their native language. This paper explores the complexity of contrasts among Italian mid vowels from a multifaceted perspective considering the lexicon, linguistic structure, usage, and regional variety. The Italian mid vowels are marginally contrastive due to a scarcity of minimal pairs alongside variation in phonetic realization. The analysis considers corpus data, which indicate that the marginal contrasts among front vowels vs. back vowels are driven by different sources and forces. Functional loads are low; while front /e ɛ/ have the weakest lexical contrast among all Italian vowels, back /o ɔ/ are separated by somewhat more minimal pairs. Among stressed front vowels, height is predicted by syllable structure and is context-dependent in some Italian varieties. Meanwhile, the height of back mid vowels is predicted by lexical frequency, in line with expectations of phonetic reduction in high-frequency contexts. For both front and back vowels, the phonetic factor of duration predicts vowel height, especially in closed syllables, suggesting its use for contrast enhancement. The results have implications for a proposed formalization of Italian mid vowel variation. Full article
(This article belongs to the Special Issue Phonetic and Phonological Complexity in Romance Languages)
Show Figures

Figure 1

13 pages, 18095 KB  
Article
Minoan Cryptanalysis: Computational Approaches to Deciphering Linear A and Assessing Its Connections with Language Families from the Mediterranean and the Black Sea Areas
by Aaradh Nepal and Francesco Perono Cacciafoco
Information 2024, 15(2), 73; https://doi.org/10.3390/info15020073 - 25 Jan 2024
Cited by 3 | Viewed by 7369
Abstract
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not [...] Read more.
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not only is the corpus very small, but it is challenging to link Minoan, the language behind Linear A, to any known language. Most decipherment attempts involve using the phonetic values of Linear B, a grammatological offspring of Linear A, to ‘read’ Linear A. However, this yields meaningless words. Recently, novel approaches to deciphering the script have emerged which involve a computational component. In this paper, two such approaches are combined to account for the biases involved in provisionally assigning Linear B phonetic values to Linear A and to shed more light on the possible connections of Linear A with other scripts and languages from the region. Additionally, the limitations inherent in such approaches are discussed. Firstly, a feature-based similarity measure is used to compare Linear A with the Carian Alphabet and the Cypriot Syllabary. A few Linear A symbols are matched with symbols from the Carian Alphabet and the Cypriot Syllabary. Finally, using the derived phonetic values, Linear A is compared with Ancient Egyptian, Luwian, Hittite, Proto-Celtic, and Uralic using a consonantal approach. Some possible word matches are identified from each language. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
18 pages, 4741 KB  
Article
Research on a Mongolian Text to Speech Model Based on Ghost and ILPCnet
by Qing-Dao-Er-Ji Ren, Lele Wang, Wenjing Zhang and Leixiao Li
Appl. Sci. 2024, 14(2), 625; https://doi.org/10.3390/app14020625 - 11 Jan 2024
Viewed by 2109
Abstract
The core challenge of speech synthesis technology is how to convert text information into an audible audio form to meet the needs of users. In recent years, the quality of speech synthesis based on end-to-end speech synthesis models has been significantly improved. However, [...] Read more.
The core challenge of speech synthesis technology is how to convert text information into an audible audio form to meet the needs of users. In recent years, the quality of speech synthesis based on end-to-end speech synthesis models has been significantly improved. However, due to the characteristics of the Mongolian language and the lack of an audio corpus, the Mongolian speech synthesis model has achieved few results, and there are still some problems with the performance and synthesis quality. First, the phoneme information of Mongolian was further improved and a Bang-based pre-training model was constructed to reduce the error rate of Mongolian phonetic synthesized words. Second, a Mongolian speech synthesis model based on Ghost and ILPCnet was proposed, named the Ghost-ILPCnet model, which was improved based on the Para-WaveNet acoustic model, replacing ordinary convolution blocks with stacked Ghost modules to generate Mongolian acoustic features in parallel and improve the speed of speech generation. At the same time, the improved vocoder ILPCnet had a high synthesis quality and low complexity compared to other vocoders. Finally, a large number of data experiments were conducted on the proposed model to verify its effectiveness. The experimental results show that the Ghost-ILPCnet model has a simple structure, fewer model generation parameters, fewer hardware requirements, and can be trained in parallel. The average subjective opinion score of its synthesized speech reached 4.48 and the real-time rate reached 0.0041. It ensures the naturalness and clarity of synthesized speech, speeds up the synthesis speed, and effectively improves the performance of the Mongolian speech synthesis model. Full article
(This article belongs to the Special Issue Audio, Speech and Language Processing)
Show Figures

Figure 1

Back to TopTop