Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (289)

Search Parameters:
Keywords = corpus construction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1112 KiB  
Article
Evaluative Grammar and Non-Standard Comparatives: A Cross-Linguistic Analysis of Ukrainian and English
by Oksana Kovtun
Languages 2025, 10(8), 191; https://doi.org/10.3390/languages10080191 - 6 Aug 2025
Abstract
This study examines non-standard comparative and superlative adjective forms in Ukrainian and English, emphasizing their evaluative meanings and grammatical deviations. While prescriptive grammar dictates conventional comparison patterns, modern discourse—particularly in advertising, informal communication, and literary texts—exhibits an increasing prevalence of innovative comparative structures. [...] Read more.
This study examines non-standard comparative and superlative adjective forms in Ukrainian and English, emphasizing their evaluative meanings and grammatical deviations. While prescriptive grammar dictates conventional comparison patterns, modern discourse—particularly in advertising, informal communication, and literary texts—exhibits an increasing prevalence of innovative comparative structures. Using a corpus-based approach, this research identifies patterns of positive and negative evaluative meanings, revealing that positive evaluations dominate non-standard comparatives in both languages, particularly in advertising (English: 78.5%, Ukrainian: 80.2%). However, English exhibits a higher tolerance for grammatical flexibility, while Ukrainian maintains a more restricted use, primarily in commercial and expressive discourse. The findings highlight the pragmatic and evaluative functions of such constructions, including hyperbolic emphasis, rhetorical contrast, and branding strategies. These insights contribute to research on comparative grammar, sentiment analysis, and natural language processing, particularly in modeling evaluative structures in computational linguistics. Full article
Show Figures

Figure 1

28 pages, 2335 KiB  
Article
Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms
by Pengfei Lu, Ping Zhang, Jun Wu, Xia Wu, Yunsheng Mao and Tao Liu
Mathematics 2025, 13(15), 2504; https://doi.org/10.3390/math13152504 - 4 Aug 2025
Viewed by 197
Abstract
Various factors influence the formation and adjustment of network freight prices, including transportation costs, cargo characteristics, and policies and regulations. The interaction of these factors increases the difficulty of accurately predicting network freight prices through regressions or other machine learning models, especially when [...] Read more.
Various factors influence the formation and adjustment of network freight prices, including transportation costs, cargo characteristics, and policies and regulations. The interaction of these factors increases the difficulty of accurately predicting network freight prices through regressions or other machine learning models, especially when the amount and quality of training data are limited. This paper introduces large language models (LLMs) to predict network freight prices using their inherent prior knowledge. Different data sorting methods and serialization strategies are employed to construct the corpora of LLMs, which are then tested on multiple base models. A few-shot sample dataset is constructed to test the performance of models under insufficient information. The Chain of Thought (CoT) is employed to construct a corpus that demonstrates the reasoning process in freight price prediction. Cross entropy loss with LoRA fine-tuning and cosine annealing learning rate adjustment, and Mean Absolute Error (MAE) loss with full fine-tuning and OneCycle learning rate adjustment to train the models, respectively, are used. The experimental results demonstrate that LLMs are better than or competitive with the best comparison model. Tests on a few-shot dataset demonstrate that LLMs outperform most comparison models in performance. This method provides a new reference for predicting network freight prices. Full article
Show Figures

Figure 1

23 pages, 1604 KiB  
Article
Fine-Tuning Large Language Models for Kazakh Text Simplification
by Alymzhan Toleu, Gulmira Tolegen and Irina Ualiyeva
Appl. Sci. 2025, 15(15), 8344; https://doi.org/10.3390/app15158344 - 26 Jul 2025
Viewed by 370
Abstract
This paper addresses text simplification task for Kazakh, a morphologically rich, low-resource language, by introducing KazSim, an instruction-tuned model built on multilingual large language models (LLMs). First, we develop a heuristic pipeline to identify complex Kazakh sentences, manually validating its performance on 400 [...] Read more.
This paper addresses text simplification task for Kazakh, a morphologically rich, low-resource language, by introducing KazSim, an instruction-tuned model built on multilingual large language models (LLMs). First, we develop a heuristic pipeline to identify complex Kazakh sentences, manually validating its performance on 400 examples and comparing it against a purely LLM-based selection method; we then use this pipeline to assemble a parallel corpus of 8709 complex–simple pairs via LLM augmentation. For the simplification task, we benchmark KazSim against standard Seq2Seq systems, domain-adapted Kazakh LLMs, and zero-shot instruction-following models. On an automatically constructed test set, KazSim (Llama-3.3-70B) achieves BLEU 33.50, SARI 56.38, and F1 87.56 with a length ratio of 0.98, outperforming all baselines. We also explore prompt language (English vs. Kazakh) and conduct human evaluation with three native speakers: KazSim scores 4.08 for fluency, 4.09 for meaning preservation, and 4.42 for simplicity—significantly above GPT-4o-mini. Error analysis shows that remaining failures cluster into tone change, tense change, and semantic drift, reflecting Kazakh’s agglutinative morphology and flexible syntax. Full article
(This article belongs to the Special Issue Natural Language Processing and Text Mining)
Show Figures

Figure 1

36 pages, 702 KiB  
Article
Enhancing Code-Switching Research Through Comparable Corpora: Introducing the El Paso Bilingual Corpus
by Margot Vanhaverbeke, Renata Enghels, María del Carmen Parafita Couto and Iva Ivanova
Languages 2025, 10(7), 174; https://doi.org/10.3390/languages10070174 - 21 Jul 2025
Viewed by 581
Abstract
Research on language contact outcomes, such as code-switching, continues to face theoretical and methodological challenges, particularly due to the difficulty of comparing findings across studies that use divergent data collection methods. Accordingly, scholars have emphasized the need for publicly available and comparable bilingual [...] Read more.
Research on language contact outcomes, such as code-switching, continues to face theoretical and methodological challenges, particularly due to the difficulty of comparing findings across studies that use divergent data collection methods. Accordingly, scholars have emphasized the need for publicly available and comparable bilingual corpora. This paper introduces the El Paso Bilingual Corpus, a new Spanish–English bilingual corpus recorded in El Paso (TX) in 2022, designed to be methodologically comparable to the Bangor Miami Corpus. The paper is structured in three main sections. First, we review the existing Spanish–English corpora and examine the theoretical challenges posed by studies using non-comparable methodologies, thereby underscoring the gap addressed by the El Paso Bilingual Corpus. Second, we outline the corpus creation process, discussing participant recruitment, data collection, and transcription, and provide an overview of these data, including participants’ sociolinguistic profiles. Third, to demonstrate the practical value of methodologically aligned corpora, we report a comparative case study on diminutive expressions in the El Paso and Bangor Miami corpora, illustrating how shared collection protocols can elucidate the role of community-specific social factors on bilinguals’ morphosyntactic choices. Full article
Show Figures

Figure 1

15 pages, 588 KiB  
Review
Archaeometry of Ancient Mortar-Based Materials in Roman Regio X and Neighboring Territories: A First Review
by Simone Dilaria
Minerals 2025, 15(7), 746; https://doi.org/10.3390/min15070746 - 16 Jul 2025
Viewed by 351
Abstract
This review synthesizes the corpus of archaeometric and analytical investigations focused on mortar-based materials, including wall paintings, plasters, and concrete, in the Roman Regio X and neighboring territories of northeastern Italy from the mid-1970s to the present. Organized into three principal categories—wall paintings [...] Read more.
This review synthesizes the corpus of archaeometric and analytical investigations focused on mortar-based materials, including wall paintings, plasters, and concrete, in the Roman Regio X and neighboring territories of northeastern Italy from the mid-1970s to the present. Organized into three principal categories—wall paintings and pigments, structural and foundational mortars, and flooring preparations—the analysis highlights the main methodological advances and progress in petrographic microscopy, mineralogical analysis, and mechanical testing of ancient mortars. Despite extensive case studies, the review identifies a critical need for systematic, statistically robust, and chronologically anchored datasets to fully reconstruct socio-economic and technological landscapes of this provincial region. This work offers a programmatic research agenda aimed at bridging current gaps and fostering integrated understandings of ancient construction technologies in northern Italy. The full forms of the abbreviations used throughout the text to describe the analytical equipment are provided at the end of the document in the “Abbreviations” section. Full article
Show Figures

Figure 1

18 pages, 957 KiB  
Article
CHTopo: A Multi-Source Large-Scale Chinese Toponym Annotation Corpus
by Peng Ye, Yujin Jiang and Yadi Wang
Information 2025, 16(7), 610; https://doi.org/10.3390/info16070610 - 16 Jul 2025
Viewed by 353
Abstract
Toponyms are fundamental geographical resources characterized by their spatial attributes, distinct from general nouns. While natural language provides rich toponymic data beyond traditional surveying methods, its qualitative ambiguity and inherent uncertainty challenge systematic extraction. Traditional toponym recognition methods based on part-of-speech tagging only [...] Read more.
Toponyms are fundamental geographical resources characterized by their spatial attributes, distinct from general nouns. While natural language provides rich toponymic data beyond traditional surveying methods, its qualitative ambiguity and inherent uncertainty challenge systematic extraction. Traditional toponym recognition methods based on part-of-speech tagging only focus on the surface-level features of words, failing to effectively handle complex scenarios such as alias nesting, metonymy ambiguity, and mixed punctuation. This leads to the loss of toponym semantic integrity and deviations in geographic entity recognition. This study proposes a set of Chinese toponym annotation specifications that integrate spatial semantics. By leveraging the XML markup language, it deeply combines the spatial location characteristics of toponyms with linguistic features, and designs fine-grained annotation rules to address the limitations of traditional methods in semantic integrity and geographic entity recognition. On this basis, by integrating multi-source corpora from the Encyclopedia of China: Chinese Geography and People’s Daily, a large-scale Chinese toponym annotation corpus (CHTopo) covering five major categories of toponyms has been constructed. The performance of this annotated corpus was evaluated through toponym recognition, exploring the construction methods of a large-scale, diversified, and high-coverage Chinese toponym annotated corpus from the perspectives of applicability and practicality. CHTopo is conducive to providing foundational support for geographic information extraction, spatial knowledge graphs, and geoparsing research, bridging linguistic and geospatial intelligence. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

15 pages, 820 KiB  
Article
From Sacred to Secular: Daoist Robes as Instruments of Identity Negotiation in Ming Dynasty Literature
by Xiangyang Bian, Menghe Tian and Liyan Zhou
Religions 2025, 16(7), 903; https://doi.org/10.3390/rel16070903 - 14 Jul 2025
Viewed by 428
Abstract
Daoist robes in the Ming Dynasty literature underwent a marked transformation from exclusive religious vestments to widespread secular attire. Originally confined to Daoist priests and sacred rites, these garments began to appear in everyday work, entertainment, and ceremonies across social strata. Drawing on [...] Read more.
Daoist robes in the Ming Dynasty literature underwent a marked transformation from exclusive religious vestments to widespread secular attire. Originally confined to Daoist priests and sacred rites, these garments began to appear in everyday work, entertainment, and ceremonies across social strata. Drawing on a hand-coded corpus of novels that yields robe related passages, and by analyzing textual references from Ming novels, Daoist canonical works, and visual artifacts, and applying clothing psychology and semiotic theory, this study elucidates how Daoist robes were re-coded as secular fashion symbols. For example, scholar-officials donned Daoist robes to convey moral prestige, laborers adopted them to signal upward mobility, and merchants donned them to impersonate the educated elite for commercial gain. By integrating close textual reading with cultural theory, the article advances a three-stage model, sacred uniform, ritual costume, and secular fashion, that clarifies the semantic flow of Daoist robes. In weddings and funerals, many commoners flaunted Daoist robes despite sumptuary laws, using them to assert honor and status. These adaptations reflect both the erosion of Daoist institutional authority and the dynamic process of identity construction through dress in late Ming society. Our interdisciplinary analysis highlights an East Asian perspective on the interaction of religion and fashion, offering historical insight into the interplay between religious symbolism and sociocultural identity formation. Full article
Show Figures

Figure 1

20 pages, 4177 KiB  
Article
Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment
by Du Chen, Zhiwu Gao, Sirui Li, Xuruixue Guo, Yaqi Wu, Haiyu Zhang and Delin Zhang
Appl. Sci. 2025, 15(13), 7611; https://doi.org/10.3390/app15137611 - 7 Jul 2025
Viewed by 354
Abstract
The construction of marine ranching is a crucial component of China’s Blue Granary strategy, yet the fragmented knowledge system in marine ranching equipment impedes intelligent management and operational efficiency. This study proposes the first knowledge graph (KG) framework tailored for marine ranching equipment, [...] Read more.
The construction of marine ranching is a crucial component of China’s Blue Granary strategy, yet the fragmented knowledge system in marine ranching equipment impedes intelligent management and operational efficiency. This study proposes the first knowledge graph (KG) framework tailored for marine ranching equipment, integrating hybrid ontology design, joint entity–relation extraction, and graph-based knowledge storage: (1) The limitations in existing KG are obtained through targeted questionnaires for diverse users and employees; (2) A domain ontology was constructed through a combination of the top-down and the bottom-up approach, defining seven key concepts and eight semantic relationships; (3) Semi-structured data from enterprises and standards, combined with unstructured data from the literature were systematically collected, cleaned via Scrapy and regular expression, and standardized into JSON format, forming a domain-specific corpus of 1456 annotated sentences; (4) A novel BERT-BiGRU-CRF model was developed, leveraging contextual embeddings from BERT, parameter-efficient sequence modeling via BiGRU (Bidirectional Gated Recurrent Unit), and label dependency optimization using CRF (Conditional Random Field). The TE + SE + Ri + BMESO tagging strategy was introduced to address multi-relation extraction challenges by linking theme entities to secondary entities; (5) The Neo4j-based KG encapsulated 2153 nodes and 3872 edges, enabling scalable visualization and dynamic updates. Experimental results demonstrated superior performance over BiLSTM-CRF and BERT-BiLSTM-CRF, achieving 86.58% precision, 77.82% recall, and 81.97% F1 score. This study not only proposes the first structured KG framework for marine ranching equipment but also offers a transferable methodology for vertical domain knowledge extraction. Full article
(This article belongs to the Section Marine Science and Engineering)
Show Figures

Figure 1

23 pages, 2203 KiB  
Review
Digital Academic Leadership in Higher Education Institutions: A Bibliometric Review Based on CiteSpace
by Olaniyi Joshua Olabiyi, Carl Jansen van Vuuren, Marieta Du Plessis, Yujie Xue and Chang Zhu
Educ. Sci. 2025, 15(7), 846; https://doi.org/10.3390/educsci15070846 - 2 Jul 2025
Cited by 1 | Viewed by 804
Abstract
The continuous evolution of technology compels higher education leaders to adapt to VUCA (volatile, uncertain, complex, and ambiguous) and BANI (brittle, anxious, non-linear, and incomprehensible) environments through innovative strategies that ensure institutional relevance. While VUCA emphasizes the challenges posed by rapid change and [...] Read more.
The continuous evolution of technology compels higher education leaders to adapt to VUCA (volatile, uncertain, complex, and ambiguous) and BANI (brittle, anxious, non-linear, and incomprehensible) environments through innovative strategies that ensure institutional relevance. While VUCA emphasizes the challenges posed by rapid change and uncertain decision-making, BANI underscores the fragility of systems, heightened anxiety, unpredictable causality, and the collapse of established patterns. Navigating these complexities requires agility, resilience, and visionary leadership to ensure that institutions remain adaptable and future ready. This study presents a bibliometric analysis of digital academic leadership in higher education transformation, examining empirical studies, reviews, book chapters, and proceeding papers published from 2014 to 2024 (11-year period) in the Web of Science—Science Citation Index Expanded (SCIE) and Social Science Citation Index (SSCI). Using CiteSpace software (version 6.3. R1-64 bit), we analyzed 5837 documents, identifying 24 key publications that formed a network of 90 nodes and 256 links. The reduction to 24 publications occurred as part of a structured bibliometric analysis using CiteSpace, which employs algorithmic thresholds to identify the most influential and structurally significant publications within a large corpus. These 24 documents form the core co-citation network, which serves as a conceptual backbone for further thematic interpretation. This was the result of a multi-step refinement process using CiteSpace’s default thresholds and clustering algorithms to detect the most influential nodes based on centrality, citation burst, and network clustering. Our findings reveal six primary research clusters: “Enhancing Academic Performance”, “Digital Leadership Scale Adaptation”, “Construction Industry”, “Innovative Work Behavior”, “Development Business Strategy”, and “Education.” The analysis demonstrates a significant increase in publications over the decade, with the highest concentration in 2024, reflecting growing scholarly interest in this field. Keywords analysis shows “digital leadership”, “digital transformation”, “performance”, and “innovation” as dominant terms, highlighting the field’s evolution from technology-focused approaches to holistic leadership frameworks. Geographical analysis reveals significant contributions from Pakistan, Ireland, and India, indicating valuable insights emerging from diverse global contexts. These findings suggest that effective digital academic leadership requires not only technical competencies but also transformational capabilities, communication skills, and innovation management to enhance student outcomes and institutional performance in an increasingly digitalized educational landscape. Full article
Show Figures

Figure 1

22 pages, 548 KiB  
Article
Readability Formulas for Elementary School Texts in Mexican Spanish
by Daniel Fajardo-Delgado, Lino Rodriguez-Coayahuitl, María Guadalupe Sánchez-Cervantes, Miguel Ángel Álvarez-Carmona and Ansel Y. Rodríguez-González
Appl. Sci. 2025, 15(13), 7259; https://doi.org/10.3390/app15137259 - 27 Jun 2025
Viewed by 316
Abstract
Readability formulas are mathematical functions that assess the ‘difficulty’ level of a given text. They play a crucial role in aligning educational texts with student reading abilities; however, existing models are often not tailored to specific linguistic or regional contexts. This study aims [...] Read more.
Readability formulas are mathematical functions that assess the ‘difficulty’ level of a given text. They play a crucial role in aligning educational texts with student reading abilities; however, existing models are often not tailored to specific linguistic or regional contexts. This study aims to develop and evaluate two novel readability formulas specifically designed for the Mexican Spanish language, targeting elementary education levels. The formulas were trained on a corpus of 540 texts drawn from official elementary-level textbooks issued by the Mexican public education system. The first formula was constructed using multiple linear regression, emulating the structure of traditional readability models. The second was derived through genetic programming (GP), a machine learning technique that evolves symbolic expressions based on training data. Both approaches prioritize interpretability and use standard textual features, such as sentence length, word length, and lexical and syntactic complexity. Experimental results show that the proposed formulas outperform several well-established Spanish and non-Spanish readability formulas in distinguishing between grade levels, particularly for early and intermediate stages of elementary education. The GP-based formula achieved the highest alignment with target grade levels while maintaining a clear analytical form. These findings underscore the potential of combining machine learning with interpretable modeling techniques and highlight the importance of linguistic and curricular adaptation in readability assessment tools. Full article
(This article belongs to the Special Issue Machine Learning and Soft Computing: Current Trends and Applications)
Show Figures

Figure 1

24 pages, 3367 KiB  
Article
From Policy to Practice: A Comparative Topic Modeling Study of Smart Forestry in China
by Yukun Cao, Yafang Zhang, Yuchen Shi and Yue Ren
Forests 2025, 16(6), 1019; https://doi.org/10.3390/f16061019 - 18 Jun 2025
Viewed by 455
Abstract
The accelerated penetration of digital technology into natural ecosystems has led to the digital transformation of forest ecological spaces. Smart forestry, as a key pathway for digital-intelligence-enabled ecological governance, plays an important role in global sustainable development and multi-level governance. However, due to [...] Read more.
The accelerated penetration of digital technology into natural ecosystems has led to the digital transformation of forest ecological spaces. Smart forestry, as a key pathway for digital-intelligence-enabled ecological governance, plays an important role in global sustainable development and multi-level governance. However, due to differences in functional positioning, resource capacity, and policy translation mechanisms, semantic shifts and disconnections arise between central policies, local policies, and practical implementation, thereby affecting policy execution and governance effectiveness. Fujian Province has been identified as a key pilot region for smart forestry practices in China, owing to its early adoption of informatization strategies and distinctive ecological conditions. This study employed the Latent Dirichlet Allocation (LDA) topic modeling method to construct a corpus of smart forestry texts, including central policies, local policies, and local media reports from 2010 to 2025. Seven potential themes were identified and categorized into three overarching dimensions: technological empowerment, governance mechanisms, and ecological goals. The results show that central policies emphasize macro strategy and ecological security, local policies focus on platform construction and governance coordination, and local practice features digital innovation and ecological value transformation. Three transmission paths are summarized to support smart forestry policy optimization and inform digital ecological governance globally. Full article
(This article belongs to the Section Forest Economics, Policy, and Social Science)
Show Figures

Figure 1

24 pages, 1461 KiB  
Article
Syllable-, Bigram-, and Morphology-Driven Pseudoword Generation in Greek
by Kosmas Kosmidis, Vassiliki Apostolouda and Anthi Revithiadou
Appl. Sci. 2025, 15(12), 6582; https://doi.org/10.3390/app15126582 - 11 Jun 2025
Viewed by 447
Abstract
Pseudowords are essential in (psycho)linguistic research, offering a way to study language without meaning interference. Various methods for creating pseudowords exist, but each has its limitations. Traditional approaches modify existing words, risking unintended recognition. Modern algorithmic methods use high-frequency n-grams or syllable [...] Read more.
Pseudowords are essential in (psycho)linguistic research, offering a way to study language without meaning interference. Various methods for creating pseudowords exist, but each has its limitations. Traditional approaches modify existing words, risking unintended recognition. Modern algorithmic methods use high-frequency n-grams or syllable deconstruction but often require specialized expertise. Currently, no automatic process for pseudoword generation is designed explicitly for Greek, which is our primary focus. Therefore, we developed SyBig-r-Morph, a novel application that constructs pseudowords using syllables as the main building block, replicating Greek phonotactic patterns. SyBig-r-Morph draws input from word lists and databases that include syllabification, word length, part of speech, and frequency information. It categorizes syllables by position to ensure phonotactic consistency with user-selected morphosyntactic categories and can optionally assign stress to generated words. Additionally, the tool uses multiple lexicons to eliminate phonologically invalid combinations. Its modular architecture allows easy adaptation to other languages. To further evaluate its output, we conducted a manual assessment using a tool that verifies phonotactic well-formedness based on phonological parameters derived from a corpus. Most SyBig-r-Morph words passed the stricter phonotactic criteria, confirming the tool’s sound design and linguistic adequacy. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

13 pages, 1136 KiB  
Article
Machine Learning-Driven Acoustic Feature Classification and Pronunciation Assessment for Mandarin Learners
by Gulnur Arkin, Tangnur Abdukelim, Hankiz Yilahun and Askar Hamdulla
Appl. Sci. 2025, 15(11), 6335; https://doi.org/10.3390/app15116335 - 5 Jun 2025
Viewed by 465
Abstract
Based on acoustic feature analysis, this study systematically examines the differences in vowel pronunciation characteristics among Mandarin learners at various proficiency levels. A speech corpus containing samples from advanced, intermediate, and elementary learners (N = 50) and standard speakers (N = 10) was [...] Read more.
Based on acoustic feature analysis, this study systematically examines the differences in vowel pronunciation characteristics among Mandarin learners at various proficiency levels. A speech corpus containing samples from advanced, intermediate, and elementary learners (N = 50) and standard speakers (N = 10) was constructed, with a total of 5880 samples. Support Vector Machine (SVM) and ID3 decision tree algorithms were employed to classify vowel formant parameters (F1-F2) patterns. The results demonstrate that SVM significantly outperforms the ID3 algorithm in vowel classification, with an average accuracy of 92.09% for the three learner groups (92.38% for advanced, 92.25% for intermediate, and 91.63% for elementary), an improvement of 2.05 percentage points compared to ID3 (p < 0.05). Learners’ vowel production exhibits systematic deviations, particularly pronounced in complex vowels for the elementary group. For instance, the apical vowel “ẓ” has a deviation of 2.61 Bark (standard group: F1 = 3.39/F2 = 8.13; elementary group: F1 = 3.42/F2 = 10.74), while the advanced group’s deviations are generally less than 0.5 Bark (e.g., vowel “a” deviation is only 0.09 Bark). The difficulty of tongue position control strongly correlates with the deviation magnitude (r = 0.87, p < 0.001). This study confirms the effectiveness of objective assessment methods based on formant analysis in speech acquisition research, provides a theoretical basis for algorithm optimization in speech evaluation systems, and holds significant application value for the development of Computer-Assisted Language Learning (CALL) systems and the improvement of multi-ethnic Mandarin speech recognition technology. Full article
(This article belongs to the Collection Fishery Acoustics)
Show Figures

Figure 1

24 pages, 3152 KiB  
Article
EHMQA-GPT: A Knowledge Augmented Large Language Model for Personalized Elderly Health Management
by Shaofu Lin, Yidan Duan, Tao Zhou, Xiliang Liu and Jiaojiao Wang
Information 2025, 16(6), 467; https://doi.org/10.3390/info16060467 - 30 May 2025
Viewed by 664
Abstract
Due to training limitations, general LLMs often lack sufficient accuracy and practicality in specialized domains such as elderly health management. To help alleviate this issue, this paper introduces EHMQA-GPT, the first domain-specific LLM tailored for non-specialist users (caregivers, elderly individuals, family members, and [...] Read more.
Due to training limitations, general LLMs often lack sufficient accuracy and practicality in specialized domains such as elderly health management. To help alleviate this issue, this paper introduces EHMQA-GPT, the first domain-specific LLM tailored for non-specialist users (caregivers, elderly individuals, family members, and community health workers) for low-risk, daily health consultations in real-world scenarios. EHMQA-GPT innovates in two aspects: (1) professional corpus construction: we established a multi-dimensional annotation system, integrating EHM-KB, EHM-SFT, and EHM-Eval, to achieve vector representation and hierarchical classification of domain knowledge; and (2) knowledge-enhanced large language model construction: based on ChatGLM3-6B, we integrated knowledge retrieval mechanisms and supervised fine-tuning strategies, enhanced the generation effect through knowledge base retrieval, and achieved deep alignment of domain knowledge through mixed supervised fine-tuning. The experimental verification part adopts testing in six fields. EHMQA-GPT has an accuracy rate of 78.1%, which is 22.3% higher than ChatGLM3-6B. Subjective assessment constructs a dual verification system (GPT-4 automatic scoring + gerontology expert blind review) and is significantly superior to the baseline model in three dimensions: knowledge accuracy (+38.9%), logical coherence (+39.4%), and practical guidance (+31.4%). The proposed framework and corpus provide a novel and scalable foundation for future research and deployment of LLMs in elderly health. Full article
Show Figures

Figure 1

12 pages, 351 KiB  
Article
HOTGAME: A Corpus of Early House and Techno Music from Germany and America
by Tim Ziemer
Metrics 2025, 2(2), 8; https://doi.org/10.3390/metrics2020008 - 29 May 2025
Viewed by 393
Abstract
Many publications on early house and techno music have the character of documentation and include (auto-)biographical statements from contemporaries of the scene. This literature has led to many statements, hypotheses, and conclusions. The weaknesses of such sources are their selective and subjective nature, [...] Read more.
Many publications on early house and techno music have the character of documentation and include (auto-)biographical statements from contemporaries of the scene. This literature has led to many statements, hypotheses, and conclusions. The weaknesses of such sources are their selective and subjective nature, and the danger of unclear memories, romanticization, and constructive memory. Consequently, a validation through content-based, quantitative music analyses is desirable. For this purpose, the HOuse and Techno music from Germany and AMErica (HOTGAME) corpus was built. Metrics from the field of data quality control show that the corpus is representative and explanatory for house and techno music from Germany and the United States of America between 1984 and 1994. HOTGAME can serve as a reliable source for the analysis of early house and techno music using big data methods, like inferential statistics and machine learning. Full article
Show Figures

Figure 1

Back to TopTop