MDPI - Publisher of Open Access Journals

41 pages, 3112 KB

Open AccessFeature PaperArticle

A Bird’s-Eye View on a New Stochastic Interpretation of Quantum Mechanics

by Olavo L. Silva Filho and Marcello Ferreira

Mathematics 2025, 13(21), 3571; https://doi.org/10.3390/math13213571 - 6 Nov 2025

Viewed by 201

Since the early twentieth century, quantum mechanics has sought an interpretation that offers a consistent worldview. In the course of that, many proposals were advanced, but all of them introduce, at some point, interpretation elements (semantics) that find no correlate in the formalism [...] Read more.

Since the early twentieth century, quantum mechanics has sought an interpretation that offers a consistent worldview. In the course of that, many proposals were advanced, but all of them introduce, at some point, interpretation elements (semantics) that find no correlate in the formalism (syntactics). This distance from semantics and syntactics is one of the major reasons for finding so abstruse and diverse interpretations of the formalism. To overcome this issue, we propose an alternative stochastic interpretation, based exclusively on the formal structure of the Schrödinger equation, without resorting to external assumptions such as the collapse of the wave function or the role of the observer. We present four (mathematically equivalent) mathematical derivations of the Schrödinger equation based on four constructs: characteristic function, Boltzmann entropy, Central Limit Theorem (CLT), and Langevin equation. All of them resort to axioms already interpreted and offer complementary perspectives to the quantum formalism. The results show the possibility of deriving the Schrödinger equation from well-defined probabilistic principles and that the wave function represents a probability amplitude in the configuration space, with dispersions linked to the CLT. It is concluded that quantum mechanics has a stochastic support, originating from the separation between particle and field subsystems, allowing an objective description of quantum behavior as a mean-field theory, analogous, but not equal, to Brownian motion, without the need for arbitrary ontological entities. Full article

(This article belongs to the Special Issue Advances in Mathematics for Quantum Mechanics)

► Show Figures

Figure 1

17 pages, 1747 KB

Open AccessArticle

Weighted Transformer Classifier for User-Agent Progression Modeling, Bot Contamination Detection, and Traffic Trust Scoring

by Geza Lucz and Bertalan Forstner

Mathematics 2025, 13(19), 3153; https://doi.org/10.3390/math13193153 - 2 Oct 2025

Viewed by 315

Abstract

In this paper, we present a unique method to determine the level of bot contamination of web-based user agents. It is common practice for bots and robotic agents to masquerade as human-like to avoid content and performance limitations. This paper continues our previous [...] Read more.

In this paper, we present a unique method to determine the level of bot contamination of web-based user agents. It is common practice for bots and robotic agents to masquerade as human-like to avoid content and performance limitations. This paper continues our previous work, using over 600 million web log entries collected from over 4000 domains to derive and generalize how the prominence of specific web browser versions progresses over time, assuming genuine human agency. Here, we introduce a parametric model capable of reproducing this progression in a tunable way. This simulation allows us to tag human-generated traffic in our data accurately. Along with the highest confidence self-tagged bot traffic, we train a Transformer-based classifier that can determine the bot contamination—a botness metric of user-agents without prior labels. Unlike traditional syntactic or rule-based filters, our model learns temporal patterns of raw and heuristic-derived features, capturing nuanced shifts in request volume, response ratios, content targeting, and entropy-based indicators over time. This rolling window-based pre-classification of traffic allows content providers to bin streams according to their bot infusion levels and direct them to several specifically tuned filtering pipelines, given the current load levels and available free resources. We also show that aggregated traffic data from multiple sources can enhance our model’s accuracy and can be further tailored to regional characteristics using localized metadata from standard web server logs. Our ability to adjust the heuristics to geographical or use case specifics makes our method robust and flexible. Our evaluation highlights that 65% of unclassified traffic is bot-based, underscoring the urgency of robust detection systems. We also propose practical methods for independent or third-party verification and further classification by abusiveness. Full article

(This article belongs to the Special Issue Computational Methods, Algorithms and Models for IoT and Information Security)

► Show Figures

Figure 1

31 pages, 900 KB

Open AccessArticle

Distribution and Timing of Verbal Backchannels in Conversational Speech: A Quantitative Study

by Michael Paierl, Anneliese Kelterer and Barbara Schuppler

Languages 2025, 10(8), 194; https://doi.org/10.3390/languages10080194 - 15 Aug 2025

Viewed by 2162

Abstract

This paper explores backchannels, short listener responses such as “mhm”, which play an important role in managing turn-taking and grounding in spontaneous conversation. While previous work has largely focused on their acoustic cues or listener’s behavior in isolation, this study investigates if and [...] Read more.

This paper explores backchannels, short listener responses such as “mhm”, which play an important role in managing turn-taking and grounding in spontaneous conversation. While previous work has largely focused on their acoustic cues or listener’s behavior in isolation, this study investigates if and when backchannels occur by taking into account the prosodic characteristics together with the communicative functions of the interlocutor’s speech preceding backchannels. Using a corpus of spontaneous dyadic conversations in Austrian German annotated with continuous turn-taking labels, we analyze the distribution of backchannels across different turn-taking contexts and examine which acoustic features affect their occurrence and timing by means of Conditional Inference Trees and linear mixed-effects regression models. Our findings show that the turn-taking function of the interlocutor’s utterance is a significant predictor of whether a backchannel occurs or not: Backchannels tend to occur most frequently after longer and syntactically complete utterances by the interlocutor. Moreover, prosodic features such as utterance duration, articulation rate variability and rising or falling intensity affect the timing of listener responses, with significant differences across different turn-taking functions. These results highlight the value of using continuous turn-taking annotations to investigate conversational dynamics and demonstrate how turn-taking function and prosody jointly shape backchannel behavior in spontaneous conversation. Full article

(This article belongs to the Special Issue Current Trends in Discourse Marker Research)

► Show Figures

Figure 1

23 pages, 1115 KB

Open AccessArticle

Research on Mongolian–Chinese Neural Machine Translation Based on Implicit Linguistic Features and Deliberation Networks

by Qingdaoerji Ren, Shike Li, Xuerong Wei, Yatu Ji and Nier Wu

Electronics 2025, 14(15), 3144; https://doi.org/10.3390/electronics14153144 - 7 Aug 2025

Viewed by 946

Abstract

Sequence-to-sequence neural machine translation (NMT) has achieved great success with many language pairs. However, its performance remains constrained in low-resource settings such as Mongolian–Chinese translation due to its strong reliance on large-scale parallel corpora. To address this issue, we propose ILFDN-Transformer, a Mongolian–Chinese [...] Read more.

Sequence-to-sequence neural machine translation (NMT) has achieved great success with many language pairs. However, its performance remains constrained in low-resource settings such as Mongolian–Chinese translation due to its strong reliance on large-scale parallel corpora. To address this issue, we propose ILFDN-Transformer, a Mongolian–Chinese NMT model that integrates implicit language features and a deliberation network to improve translation quality under limited-resource conditions. Specifically, we leverage the BART pre-trained language model to capture deep semantic representations of source sentences and apply knowledge distillation to integrate the resulting implicit linguistic features into the Transformer encoder to provide enhanced semantic support. During decoding, we introduce a deliberation mechanism that guides the generation process by referencing linguistic knowledge encoded in a multilingual pre-trained model, therefore improving the fluency and coherence of target translations. Furthermore, considering the flexible word order characteristics of the Mongolian language, we propose a Mixed Positional Encoding (MPE) method that combines absolute positional encoding with LSTM-based dynamic encoding, enabling the model to better adapt to complex syntactic variations. Experimental results show that ILFDN-Transformer achieves a BLEU score improvement of 3.53 compared to the baseline Transformer model, fully demonstrating the effectiveness of our proposed method. Full article

► Show Figures

Figure 1

19 pages, 435 KB

Open AccessArticle

Translation as Pedagogy: Dharmagupta’s Didactic Rendering of the Diamond Sutra (Vajracchedikā-Prajñāpāramitā-Sūtra) and Sanskrit Instruction in the Sui–Tang Period

by Jiayi Wang and Nan Wang

Religions 2025, 16(8), 959; https://doi.org/10.3390/rel16080959 - 24 Jul 2025

Viewed by 931

Abstract

The Diamond Sutra (Vajracchedikā-Prajñāpāramitā-Sūtra) translated by the Sui Dynasty monk Dharmagupta is the fourth Chinese rendition of the Diamond Sutra. Characterized by unprecedented linguistic opacity and syntactic complexity within the history of Buddhist textual transmission, this translation’s distinctive features have attracted significant scholarly [...] Read more.

The Diamond Sutra (Vajracchedikā-Prajñāpāramitā-Sūtra) translated by the Sui Dynasty monk Dharmagupta is the fourth Chinese rendition of the Diamond Sutra. Characterized by unprecedented linguistic opacity and syntactic complexity within the history of Buddhist textual transmission, this translation’s distinctive features have attracted significant scholarly attention. This study synthesizes existing academic perspectives and employs Sanskrit–Chinese textual criticism and comparative analysis of parallel translations to conduct a granular examination of Dharmagupta’s retranslation. Our findings reveal that this text fundamentally deviates from conventional sutras designed for religious dissemination or liturgical recitation. Its defining traits, including morphological calquing of Sanskrit structures, simplified pronominal systems, and etymologically prioritized equivalence, collectively reflect a pedagogical focus characteristic of language instructional texts. Dharmagupta’s approach epitomizes a translation-as-pedagogy paradigm, with the text’s deviations from conventional norms resulting from the interplay of religious development, historical context, and translator agency. We argue that the Diamond Sutra retranslation constitutes a radical experimental paradigm in translation history, warranting re-evaluation of its significance within the broader trajectory of Buddhist textual practice. Full article

(This article belongs to the Special Issue Languages and Buddhist Texts: Translation, Transmission, and Interpretation Across Traditions)

24 pages, 2269 KB

Open AccessArticle

This Is the Way People Are Negative Anymore: Mapping Emotionally Negative Affect in Syntactically Positive Anymore Through Sentiment Analysis of Tweets

by Christopher Strelluf and Thomas T. Hills

Languages 2025, 10(6), 136; https://doi.org/10.3390/languages10060136 - 10 Jun 2025

Viewed by 1457

Abstract

The adverb anymore is standardly a negative polarity item (NPI), which must be licensed by triggers of non-positive polarity. Some Englishes also allow anymore in positive-polarity clauses. Linguists have posited that this non-polarity anymore (NPAM) carries a feature of negative affect. However, this [...] Read more.

The adverb anymore is standardly a negative polarity item (NPI), which must be licensed by triggers of non-positive polarity. Some Englishes also allow anymore in positive-polarity clauses. Linguists have posited that this non-polarity anymore (NPAM) carries a feature of negative affect. However, this claim is based on elicited judgments, and linguists have argued that respondents cannot reliably evaluate NPAM via conscious judgment. To solve this problem, we employ sentiment analysis to examine the relationship between NPAM and negative affect in a Twitter corpus. Using two complementary sentiment analytic frameworks, we demonstrate that words occurring with NPAM have lower valence, higher arousal, and lower dominance than words occurring with NPI-anymore. Broadly, this confirms NPAM’s association with negative affect in natural-language productions. We additionally identify inter- and intra-regional differences in affective dimensions, as well as variability across different types of NPI trigger, showing that the relationship between negative affect and NPAM is not monolithic dialectally, syntactically, or semantically. The project demonstrates the utility of sentiment analysis for examining emotional characteristics of low-frequency variables, providing a new tool for dialectology, micro-syntax, and variationist sociolinguistics. Full article

(This article belongs to the Special Issue Linguistics of Social Media)

► Show Figures

Figure 1

18 pages, 766 KB

Open AccessArticle

Multi-Task Sequence Tagging for Denoised Causal Relation Extraction

by Yijia Zhang, Chaofan Liu, Yuan Zhu and Wanyu Chen

Mathematics 2025, 13(11), 1737; https://doi.org/10.3390/math13111737 - 24 May 2025

Viewed by 621

Abstract

Extracting causal relations from natural language texts is crucial for uncovering causality, and most existing causal relation extraction models are single-task learning-based models, which can not comprehensively address attributes such as part-of-speech tagging and chunk analysis. However, the characteristics of words with multi-domains [...] Read more.

Extracting causal relations from natural language texts is crucial for uncovering causality, and most existing causal relation extraction models are single-task learning-based models, which can not comprehensively address attributes such as part-of-speech tagging and chunk analysis. However, the characteristics of words with multi-domains are more relevant for causal relation extraction, due to words such as adjectives, linking verbs, etc., bringing more noise data limiting the effectiveness of the single-task-based learning methods. Furthermore, causalities from diverse domains also raise a challenge, as existing models tend to falter in multiple domains compared to a single one. In light of this, we propose a multi-task sequence tagging model, MPC−CE, which utilizes more information about causality and relevant tasks to improve causal relation extraction in noised data. By modeling auxiliary tasks, MPC−CE promotes a hierarchical understanding of linguistic structure and semantic roles, filtering noise and isolating salient entities. Furthermore, the sparse sharing paradigm extracts only the most broadly beneficial parameters by pruning redundant ones during training, enhancing model generalization. The empirical results on two datasets show 2.19% and 3.12% F1 improvement, respectively, compared to baselines, demonstrating that our proposed model can effectively enhance causal relation extraction with semantic features across multiple syntactic tasks, offering the representational power to overcome pervasive noise and cross-domain issues. Full article

► Show Figures

Figure 1

16 pages, 1051 KB

Open AccessArticle

Kafka’s Literary Style: A Mixed-Method Approach

by Carsten Strathausen, Wenyi Shang and Andrei Kazakov

Humanities 2025, 14(3), 61; https://doi.org/10.3390/h14030061 - 12 Mar 2025

Viewed by 1594

Abstract

In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about [...] Read more.

In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about the major characteristics of Kafka’s literary style. A mixed-method approach means that we seek out points of connection that interlink traditional humanist (i.e., interpretative) and computational (i.e., quantitative) methods of investigation. Following the introduction, the second section of our article provides a critical overview of the existing scholarship from both a humanist and a computational perspective. We argue that the main methodological difference between traditional humanist and AI-enhanced computational studies of Kafka’s literary style lies not in the use of statistics but in the new interpretative possibilities enabled by AI methods to explore stylistic features beyond the scope of human comprehension. In the third and fourth sections of our article, we will introduce our own stylometric approach to Kafka, detail our methods, and interpret our findings. Rather than focusing on training an AI model capable of accurately attributing authorship to Kafka, we examine whether AI could help us detect significant stylistic differences between the writing Kafka himself published during his lifetime (Kafka Core) and his posthumous writings edited and published by Max Brod. Full article

(This article belongs to the Special Issue Franz Kafka in the Age of Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 867 KB

Open AccessEditor’s ChoiceArticle

Exploring the Boundaries Between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code

by Zixian Zhang and Takfarinas Saber

Big Data Cogn. Comput. 2025, 9(2), 41; https://doi.org/10.3390/bdcc9020041 - 13 Feb 2025

Cited by 4 | Viewed by 3933

Abstract

As Large Language Models (LLMs) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed LLM performance on human-generated code, the proliferation of LLM-generated code raises critical questions about their ability to detect clones across [...] Read more.

As Large Language Models (LLMs) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed LLM performance on human-generated code, the proliferation of LLM-generated code raises critical questions about their ability to detect clones across both human- and LLM-created codebases, as this capability remains largely unexplored. This paper addresses this gap by evaluating two versions of LLaMA3 on these distinct types of datasets. Additionally, we perform a deeper analysis beyond simple prompting, examining the nuanced relationship between code cloning and code similarity that LLMs infer. We further explore how fine-tuning impacts LLM performance in clone detection, offering new insights into the interplay between code clones and similarity in human versus AI-generated code. Our findings reveal that LLaMA models excel in detecting syntactic clones but face challenges with semantic clones. Notably, the models perform better on LLM-generated datasets for semantic clones, suggesting a potential bias. The fine-tuning technique enhances the ability of LLMs to comprehend code semantics, improving their performance in both code clone detection and code similarity assessment. Our results offer valuable insights into the effectiveness and characteristics of LLMs in clone detection and code similarity assessment, providing a foundation for future applications and guiding further research in this area. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

23 pages, 622 KB

Open AccessEditor’s ChoiceArticle

MalHAPGNN: An Enhanced Call Graph-Based Malware Detection Framework Using Hierarchical Attention Pooling Graph Neural Network

by Wenjie Guo, Wenbiao Du, Xiuqi Yang, Jingfeng Xue, Yong Wang, Weijie Han and Jingjing Hu

Sensors 2025, 25(2), 374; https://doi.org/10.3390/s25020374 - 10 Jan 2025

Cited by 7 | Viewed by 4167

Abstract

While deep learning techniques have been extensively employed in malware detection, there is a notable challenge in effectively embedding malware features. Current neural network methods primarily capture superficial characteristics, lacking in-depth semantic exploration of functions and failing to preserve structural information at the [...] Read more.

While deep learning techniques have been extensively employed in malware detection, there is a notable challenge in effectively embedding malware features. Current neural network methods primarily capture superficial characteristics, lacking in-depth semantic exploration of functions and failing to preserve structural information at the file level. Motivated by the aforementioned challenges, this paper introduces MalHAPGNN, a novel framework for malware detection that leverages a hierarchical attention pooling graph neural network based on enhanced call graphs. Firstly, to ensure semantic richness, a Bidirectional Encoder Representations from Transformers-based (BERT) attribute-enhanced function embedding method is proposed for the extraction of node attributes in the function call graph. Subsequently, this work designs a hierarchical graph neural network that integrates attention mechanisms and pooling operations, complemented by function node sampling and structural learning strategies. This framework delivers a comprehensive profile of malicious code across semantic, syntactic, and structural dimensions. Extensive experiments conducted on the Kaggle and VirusShare datasets have demonstrated that the proposed framework outperforms other graph neural network (GNN)-based malware detection methods. Full article

(This article belongs to the Special Issue Security of IoT-Enabled Infrastructures in Smart Cities)

► Show Figures

Figure 1

22 pages, 2244 KB

Open AccessArticle

Mismatch Negativity Unveils Tone Perception Strategies and Degrees of Tone Merging: The Case of Macau Cantonese

by Han Wang, Fei Gao and Jingwei Zhang

Brain Sci. 2024, 14(12), 1271; https://doi.org/10.3390/brainsci14121271 - 17 Dec 2024

Cited by 1 | Viewed by 1577

Abstract

Background/Objectives: Previous studies have examined the role of working memory in cognitive tasks such as syntactic, semantic, and phonological processing, thereby contributing to our understanding of linguistic information management and retrieval. However, the real-time processing of phonological information—particularly in relation to suprasegmental features [...] Read more.

Background/Objectives: Previous studies have examined the role of working memory in cognitive tasks such as syntactic, semantic, and phonological processing, thereby contributing to our understanding of linguistic information management and retrieval. However, the real-time processing of phonological information—particularly in relation to suprasegmental features like tone, where its contour represents a time-varying signal—remains a relatively underexplored area within the framework of Information Processing Theory (IPT). This study aimed to address this gap by investigating the real-time processing of similar tonal information by native Cantonese speakers, thereby providing a deeper understanding of how IPT applies to auditory processing. Methods: Specifically, this study combined assessments of cognitive functions, an AX discrimination task, and electroencephalography (EEG) to investigate the discrimination results and real-time processing characteristics of native Macau Cantonese speakers perceiving three pairs of similar tones. Results: The behavioral results confirmed the completed merging of T2–T5 in Macau Cantonese, and the ongoing merging of T3–T6 and T4–T6, with perceptual merging rates of 45.46% and 27.28%, respectively. Mismatch negativity (MMN) results from the passive oddball experiment revealed distinct temporal processing patterns for the three tone pairs. Cognitive functions, particularly attention and working memory, significantly influenced tone discrimination, with more pronounced effects observed in the mean amplitude of MMN during T4–T6 discrimination. Differences in MMN peak latency between T3–T6 and T4–T6 further suggested the use of different perceptual strategies for these contour-related tones. Specifically, the T3–T6 pair can be perceived through early signal input, whereas the perception of T4–T6 relies on constant signal input. Conclusions: This distinction in cognitive resource allocation may explain the different merging rates of the two tone pairs. This study, by focusing on the perceptual difficulty of tone pairs and employing EEG techniques, revealed the temporal processing of similar tones by native speakers, providing new insights into tone phoneme processing and speech variation. Full article

(This article belongs to the Collection Collection on Neurobiology of Language)

► Show Figures

Figure 1

23 pages, 621 KB

Open AccessArticle

An Updated Overview of the Austroasiatic Components of Vietnamese

by Mark Alves

Languages 2024, 9(12), 377; https://doi.org/10.3390/languages9120377 - 17 Dec 2024

Viewed by 4312

Abstract

This article presents an updated view of the language history of Vietnamese from its native Austroasiatic roots, including key historical phonological, morphological, and syntactic features and developments; a characterization of its Austroasiatic etyma; and the context of this information in Vietnamese linguistic ethnohistory. [...] Read more.

This article presents an updated view of the language history of Vietnamese from its native Austroasiatic roots, including key historical phonological, morphological, and syntactic features and developments; a characterization of its Austroasiatic etyma; and the context of this information in Vietnamese linguistic ethnohistory. It is now possible to make better supported claims and more precise characterizations due to improved understanding of the history of Austroasiatic and Vietic and their reconstructions, the nature and effect of language contact with Chinese, and the process of typological convergence of the ancestral language of Vietnamese. This study shows that, while Vietnamese is not a typologically characteristic Austroasiatic language, the Austroasiatic components of the Vietnamese lexicon and linguistic structure are more prominent than previously supposed. Full article

(This article belongs to the Special Issue Current Issues in Vietnamese Linguistics)

21 pages, 3698 KB

Open AccessArticle

Child-Centric Robot Dialogue Systems: Fine-Tuning Large Language Models for Better Utterance Understanding and Interaction

by Da-Young Kim, Hyo Jeong Lym, Hanna Lee, Ye Jun Lee, Juhyun Kim, Min-Gyu Kim and Yunju Baek

Sensors 2024, 24(24), 7939; https://doi.org/10.3390/s24247939 - 12 Dec 2024

Cited by 1 | Viewed by 2256

Abstract

Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot [...] Read more.

Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children’s intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children’s speech to enhance its intention reasoning performance in verbal interactions with children. To this end, we propose a fine-tuning methodology that utilizes the LLM–human judgment discrepancy and interactive response data. The former data represent cases in which the LLM and human judgments of the contextual appropriateness of a child’s answer to a robot’s question diverge. The latter data involve robot responses suitable for children’s utterance intentions, generated by the LLM. We developed a fine-tuned dialogue system using these datasets to achieve human-like interpretations of children’s utterances and to respond adaptively. Our system was evaluated through human assessment using the Robotic Social Attributes Scale (RoSAS) and Sensibleness and Specificity Average (SSA) metrics. Consequently, it supports the effective interpretation of children’s utterance intentions and enables natural verbal interactions, even in cases with syntactic incompleteness and mispronunciations. Full article

(This article belongs to the Special Issue Challenges in Human-Robot Interactions for Social Robotics)

► Show Figures

Figure 1

22 pages, 1599 KB

Open AccessArticle

Single-Stage Entity–Relation Joint Extraction of Pesticide Registration Information Based on HT-BES Multi-Dimensional Labeling Strategy

by Chenyang Dong, Shiyu Xi, Yinchao Che, Shufeng Xiong, Xinming Ma, Lei Xi and Shuping Xiong

Algorithms 2024, 17(12), 559; https://doi.org/10.3390/a17120559 - 6 Dec 2024

Viewed by 896

Abstract

Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high [...] Read more.

Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high information density, complex logical structures, large spans between entities, and heterogeneous entity lengths, as well as to overcome the challenges faced when using traditional joint extraction methods, including triplet overlap, exposure bias, and redundant computation, we propose a single-stage entity–relation joint extraction model based on HT-BES multi-dimensional labeling (MD-SERel). First, in the encoding layer, to address the complex structural characteristics of pesticide registration texts, we employ RoBERTa combined with a multi-head self-attention mechanism to capture the deep semantic features of the text. Simultaneously, syntactic features are extracted using a syntactic dependency tree and graph neural networks to enhance the model’s understanding of text structure. Subsequently, we integrate semantic and syntactic features, enriching the character vector representations and thus improving the model’s ability to represent complex textual data. Secondly, in the multi-dimensional labeling framework layer, we use HT-BES multi-dimensional labeling, where the model assigns multiple labels to each character. These labels include entity boundaries, positions, and head–tail entity association information, which naturally resolves overlapping triplets. Through utilizing a parallel scoring function and fine-grained classification components, the joint extraction of entities and relations is transformed into a multi-label sequence labeling task based on relation dimensions. This process does not involve interdependent steps, thus enabling single-stage parallel labeling, preventing exposure bias and reducing computational redundancy. Finally, in the decoding layer, entity–relation triplets are decoded based on the predicted labels from the fine-grained classification. The experimental results demonstrate that the MD-SERel model performs well on both the Pesticide Registration Dataset (PRD) and the general DuIE dataset. On the PRD, compared to the optimal baseline model, the training time is 1.2 times faster, the inference time is 1.2 times faster, and the F1 score is improved by 1.5%, demonstrating its knowledge extraction capabilities in pesticide registration documents. On the DuIE dataset, the MD-SERel model also achieved better results compared to the baseline, demonstrating its strong generalization ability. These findings will provide technical support for the construction of pesticide knowledge bases. Full article

(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))

► Show Figures

Figure 1

16 pages, 434 KB

Open AccessArticle

Paternal and Maternal Speech at 3 Months Postpartum: An Exploratory Study on the Effect of Parental Role and Birth Weight

by Erica Neri, Alessandra Provera and Francesca Agostini

Behav. Sci. 2024, 14(11), 1007; https://doi.org/10.3390/bs14111007 - 30 Oct 2024

Viewed by 1186

Abstract

Recent research highlights a growing interest in early interactions between fathers and their infants, acknowledging the significant influence these interactions have on developmental outcomes. However, there is a limited understanding of the specific characteristics of paternal infant-directed speech (IDS), especially in the context [...] Read more.

Recent research highlights a growing interest in early interactions between fathers and their infants, acknowledging the significant influence these interactions have on developmental outcomes. However, there is a limited understanding of the specific characteristics of paternal infant-directed speech (IDS), especially in the context of premature birth. This study aimed to analyze the functional and morpho-syntactic features of paternal IDS to full-term (FT) and preterm (PT) infants at 3 months, comparing it with maternal communicative style. Additionally, the study explored the influence of the severity of preterm birth according to birth weight, further distinguishing between extremely low-birth-weight (ELBW) and very low-birth-weight (VLBW) infants. Seventy-one father–infant and mother–infant dyads (24 FT, 22 ELBW, 25 VLBW) were recruited at 3 months (corrected age for PTs). Parent–infant interactions were video recorded to assess lexical, syntactic, and functional aspects of paternal and maternal speech. Results revealed lower verbosity and lexical variability in paternal IDS compared to the maternal one. No differences were found between parents of the PT and FT groups. Overall, these findings could contribute to better understanding the patterns of parent–infant communications in both FT and PT dyads, confirming the importance of involving both mothers and fathers from the early stages of development. Full article

(This article belongs to the Section Developmental Psychology)

► Show Figures

Figure 1

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI