Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (16)

Search Parameters:
Keywords = proper nouns

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 3483 KiB  
Article
Classification of English Words into Grammatical Notations Using Deep Learning Technique
by Muhammad Imran, Sajjad Hussain Qureshi, Abrar Hussain Qureshi and Norah Almusharraf
Information 2024, 15(12), 801; https://doi.org/10.3390/info15120801 - 11 Dec 2024
Cited by 7 | Viewed by 1395
Abstract
The impact of artificial intelligence (AI) on English language learning has become the center of attention in the past few decades. This study, with its potential to transform English language instruction and offer various instructional approaches, provides valuable insights and knowledge. To fully [...] Read more.
The impact of artificial intelligence (AI) on English language learning has become the center of attention in the past few decades. This study, with its potential to transform English language instruction and offer various instructional approaches, provides valuable insights and knowledge. To fully grasp the potential advantages of AI, more research is needed to improve, validate, and test AI algorithms and architectures. Grammatical notations provide a word’s information to the readers. If a word’s images are properly extracted and categorized using a CNN, it can help non-native English speakers improve their learning habits. The classification of parts of speech into different grammatical notations is the major problem that non-native English learners face. This situation stresses the need to develop a computer-based system using a machine learning algorithm to classify words into proper grammatical notations. A convolutional neural network (CNN) was applied to classify English words into nine classes: noun, pronoun, adjective, determiner, verb, adverb, preposition, conjunction, and interjection. A simulation of the selected model was performed in MATLAB. The model achieved an overall accuracy of 97.22%. The CNN showed 100% accuracy for pronouns, determiners, verbs, adverbs, and prepositions; 95% for nouns, adjectives, and conjunctions; and 90% for interjections. The significant results (p < 0.0001) of the chi-square test supported the use of the CNN by non-native English learners. The proposed approach is an important source of word classification for non-native English learners by putting the word image into the model. This not only helps beginners in English learning but also helps in setting standards for evaluating documents. Full article
(This article belongs to the Special Issue Applications of Machine Learning and Convolutional Neural Networks)
Show Figures

Figure 1

25 pages, 6212 KiB  
Article
Qualitative Analysis of Responses in Estimating Older Adults Cognitive Functioning in Spontaneous Speech: Comparison of Questions Asked by AI Agents and Humans
by Toshiharu Igarashi, Katsuya Iijima, Kunio Nitta and Yu Chen
Healthcare 2024, 12(21), 2112; https://doi.org/10.3390/healthcare12212112 - 23 Oct 2024
Viewed by 1496
Abstract
Background/Objectives: Artificial Intelligence (AI) technology is gaining attention for its potential in cognitive function assessment and intervention. AI robots and agents can offer continuous dialogue with the elderly, helping to prevent social isolation and support cognitive health. Speech-based evaluation methods are promising as [...] Read more.
Background/Objectives: Artificial Intelligence (AI) technology is gaining attention for its potential in cognitive function assessment and intervention. AI robots and agents can offer continuous dialogue with the elderly, helping to prevent social isolation and support cognitive health. Speech-based evaluation methods are promising as they reduce the burden on elderly participants. AI agents could replace human questioners, offering efficient and consistent assessments. However, existing research lacks sufficient comparisons of elderly speech content when interacting with AI versus human partners, and detailed analyses of factors like cognitive function levels and dialogue partner effects on speech elements such as proper nouns and fillers. Methods: This study investigates how elderly individuals’ cognitive functions influence their communication patterns with both human and AI conversational partners. A total of 34 older people (12 men and 22 women) living in the community were selected from a silver human resource centre and day service centre in Tokyo. Cognitive function was assessed using the Mini-Mental State Examination (MMSE), and participants engaged in semi-structured daily conversations with both human and AI partners. Results: The study examined the frequency of fillers, proper nouns, and “listen back” in conversations with AI and humans. Results showed that participants used more fillers in human conversations, especially those with lower cognitive function. In contrast, proper nouns were used more in AI conversations, particularly by those with higher cognitive function. Participants also asked for explanations more often in AI conversations, especially those with lower cognitive function. These findings highlight differences in conversation patterns based on cognitive function and the conversation partner being either AI or human. Conclusions: These results suggest that there are differences in conversation patterns depending on the cognitive function of the participants and whether the conversation partner is a human or an AI. This study aims to provide new insights into the effective use of AI agents in dialogue with the elderly, contributing to the improvement of elderly welfare. Full article
Show Figures

Figure 1

16 pages, 2121 KiB  
Article
Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu
by Fida Ullah, Alexander Gelbukh, Muhammad Tayyab Zamir, Edgardo Manuel Felipe Riverόn and Grigori Sidorov
Computers 2024, 13(10), 258; https://doi.org/10.3390/computers13100258 - 10 Oct 2024
Cited by 2 | Viewed by 2735
Abstract
Identifying and categorizing proper nouns in text, known as named entity recognition (NER), is crucial for various natural language processing tasks. However, developing effective NER techniques for low-resource languages like Urdu poses challenges due to limited training data, particularly in the nastaliq script. [...] Read more.
Identifying and categorizing proper nouns in text, known as named entity recognition (NER), is crucial for various natural language processing tasks. However, developing effective NER techniques for low-resource languages like Urdu poses challenges due to limited training data, particularly in the nastaliq script. To address this, our study introduces a novel data augmentation method, “contextual word embeddings augmentation” (CWEA), for Urdu, aiming to enrich existing datasets. The extended dataset, comprising 160,132 tokens and 114,912 labeled entities, significantly enhances the coverage of named entities compared to previous datasets. We evaluated several transformer models on this augmented dataset, including BERT-multilingual, RoBERTa-Urdu-small, BERT-base-cased, and BERT-large-cased. Notably, the BERT-multilingual model outperformed others, achieving the highest macro F1 score of 0.982%. This surpassed the macro f1 scores of the RoBERTa-Urdu-small (0.884%), BERT-large-cased (0.916%), and BERT-base-cased (0.908%) models. Additionally, our neural network model achieved a micro F1 score of 96%, while the RNN model achieved 97% and the BiLSTM model achieved a macro F1 score of 96% on augmented data. Our findings underscore the efficacy of data augmentation techniques in enhancing NER performance for low-resource languages like Urdu. Full article
Show Figures

Figure 1

24 pages, 3254 KiB  
Article
Construction and Inference Method of Semantic-Driven, Spatio-Temporal Derivation Relationship Network for Place Names
by Wenjie Dong, Xi Mao, Wenjuan Lu, Jizhou Wang and Yao Cheng
ISPRS Int. J. Geo-Inf. 2024, 13(9), 327; https://doi.org/10.3390/ijgi13090327 - 13 Sep 2024
Cited by 2 | Viewed by 1125
Abstract
As the proper noun for geographical entities, place names provide an intuitive way to identify and access specific geographic locations, playing a key role in semantic expression and spatial retrieval. However, existing research has insufficiently explored the spatio-temporal derivation relationships of place names, [...] Read more.
As the proper noun for geographical entities, place names provide an intuitive way to identify and access specific geographic locations, playing a key role in semantic expression and spatial retrieval. However, existing research has insufficiently explored the spatio-temporal derivation relationships of place names, failing to fully utilize these relationships to enhance the connectivity between place names and improve spatial retrieval capabilities. Therefore, this paper conducts research on the spatio-temporal derivation relationships of place names, defines them in a standardized manner, clarifies the boundary conditions and identification methods, and then constructs a spatio-temporal derivation network of place names for expression and uses this network to carry out reasoning research on spatial adjacency relationships. Experiments and results showed that using the theory and methods of this paper to identify the spatio-temporal derivation relationships of Canadian place names achieves an accuracy rate of 98.5% and a recall rate of 93.4%, and the reasoning results can effectively improve the accuracy of query results. The research enriches the theoretical framework of spatio-temporal derivation relationships of place names, solves the current problems of unclear definition and inability to automatically identify spatio-temporal derivation relationships, and provides new perspectives and tools for the application practice in the field of geographical information science. Full article
Show Figures

Figure 1

44 pages, 998 KiB  
Article
Semantic and Morphosyntactic Differences among Nouns: A Template-Based and Modular Cognitive Model
by Mohamed El Idrissi
Mathematics 2024, 12(12), 1777; https://doi.org/10.3390/math12121777 - 7 Jun 2024
Viewed by 1675
Abstract
The noun category exhibits diverse dissimilarities, characterised at the semantic (e.g., countable/uncountable) or/and morphosyntactic (e.g., determined/determinerless) level, which may be more or less important depending on languages. In this paper, we would like to discuss those differences, which we named inter-word and inter-process [...] Read more.
The noun category exhibits diverse dissimilarities, characterised at the semantic (e.g., countable/uncountable) or/and morphosyntactic (e.g., determined/determinerless) level, which may be more or less important depending on languages. In this paper, we would like to discuss those differences, which we named inter-word and inter-process morphosyntactic variations. The Riffian language served us as a reference in our enquiries, before referring to other languages to show how our discoveries could be applied to them. By putting in perspective those aspects, this led us to propose a formal mathematical model denoted as a Template-Based and Modular Cognitive model. The latter is able to predict the nonlinear dynamic mapping of lexical items onto morphological templates. The aims of this article are thus manifold and cover theoretical issues. We demonstrate that nouns are organised and distributed in modular cognitive sets, having their own morphological template and unmarked forms. The extent of these sets and their number as well as the template, are specific to each language. All sorts of markers can compose with the template, but some, namely countability markers, are prevalent among several languages with no relationship. This approach allows us to explain the marking discrepancies existing between different kinds of nouns (borrowed, proper, countable and uncountable nouns) for a given linguistic variety or between languages. The main assumption of this model is that these irregular markings are caused by a template shift, occurring when items undergo a process of word and meaning formation. Our contribution represents an initial stride toward understanding the fundamental patterns of morphosyntax and opens venues for applying this mathematical model with other behavioural and natural phenomena. Full article
Show Figures

Figure 1

18 pages, 2545 KiB  
Article
Toward Non-Taxonomic Structuring of Scientific Notions: The Case of the Language of Chemistry and the Environment
by Tomara Gotkova, Francesca Ingrosso, Polina Mikhel and Alain Polguère
Languages 2024, 9(3), 95; https://doi.org/10.3390/languages9030095 - 13 Mar 2024
Cited by 1 | Viewed by 1698
Abstract
This paper addresses the crucial question of the structuring of scientific Notions for the purpose of their proper teaching/acquisition. It aims to demonstrate that non-taxonomic structures, derived from the systematic lexicographic definition of terminological lexical units, can be rigorously constructed and are adequate [...] Read more.
This paper addresses the crucial question of the structuring of scientific Notions for the purpose of their proper teaching/acquisition. It aims to demonstrate that non-taxonomic structures, derived from the systematic lexicographic definition of terminological lexical units, can be rigorously constructed and are adequate for implementing a non-isolationist approach to terminology modeling: one that embeds the description of terminological units within a more global model of the general lexicon. Using theoretical and descriptive principles of Explanatory Combinatorial Lexicology and the lexicography of lexical networks known as Lexical Systems, we apply our approach to the core terminology of chemistry and chemistry-related environmental terminology. This allows us to propose Notion building road maps for three languages—English, French and Russian—that can be used as guides for the teaching/acquisition of chemistry Notions. Additionally, exploiting the special case of the noun carbon—which pertains to chemistry, environmental science and, even, general language—we demonstrate the potential of our non-isolationist approach for interfacing distinct sectors of terminological knowledge. Full article
(This article belongs to the Special Issue Terminology in the Digital World)
Show Figures

Figure 1

34 pages, 1677 KiB  
Article
Existential Constructions, Definiteness Effects, and Linguistic Contact: At the Crossroads between Spanish and Catalan
by Jorge Agulló
Languages 2024, 9(1), 11; https://doi.org/10.3390/languages9010011 - 23 Dec 2023
Cited by 1 | Viewed by 3682
Abstract
Existential sentences in Spanish are sensitive to the definiteness or quantification restriction or effect, which prevents personal pronouns, proper nouns, and definite constituents from occupying the pivot position. Contact varieties between Spanish, a robust language as regards the effect, and Catalan, which has [...] Read more.
Existential sentences in Spanish are sensitive to the definiteness or quantification restriction or effect, which prevents personal pronouns, proper nouns, and definite constituents from occupying the pivot position. Contact varieties between Spanish, a robust language as regards the effect, and Catalan, which has a weaker version, remain largely unexplored. This paper fills this void. A large corpus was gathered to quantitatively study the variation between definite and indefinite pivots. Examples involving definite, specific pivots and even proper names, hitherto unnoticed, are brought to the fore. The pivot of the existential in Spanish is argued to bear Partitive case, as shown by (i) pronominal existential pivots in other Romance languages, (ii) the phi-feature defectiveness of the clitic out of the pivot position, (iii) and partitive pronouns with unaccusatives in Spanish. The hypothesis is put forth that varieties of Spanish in contact with Catalan no longer relate Partitive case to the non-definiteness of the pivot. Full article
(This article belongs to the Special Issue New Approaches to Spanish Dialectal Grammar)
Show Figures

Figure 1

12 pages, 1353 KiB  
Article
From Name to Myth (Based on Russian Cultural and Literary Tradition)
by Olesia D. Surikova and Elena L. Berezovich
Religions 2023, 14(11), 1412; https://doi.org/10.3390/rel14111412 - 10 Nov 2023
Viewed by 1780
Abstract
This paper analyzes the cases wherein a previously unknown and unique mythological character (with his/her specific behavior, “personal” traits, appearance, origin, etc.) is generated by a cultural linguistic sign or a fragment of text. This research is based on the Russian cultural and [...] Read more.
This paper analyzes the cases wherein a previously unknown and unique mythological character (with his/her specific behavior, “personal” traits, appearance, origin, etc.) is generated by a cultural linguistic sign or a fragment of text. This research is based on the Russian cultural and linguistic tradition, mainly in its dialectal version (the language of Russian peasants). Its sources include data published in the late 19th–early 21st century in dictionaries of Russian dialects and, primarily, the unpublished field materials of the Ural Federal University Toponymic Expedition, covering data from the Russian North, the Urals, and the Volga region. According to their nature or origin, the names of characters studied in this paper derive from two types of linguistic signs: (1) Names based on usual forms of standard vocabulary that can be both proper and common nouns; the former may refer to various categories, such as toponyms (names of geographical objects), chrononyms (names of calendar dates), hagionyms (names of saints), names of icons, etc. (2) Names originating from a text, usually folkloric; these are word combinations or phrases that only act as a single unit within their “parent” text. Sometimes, but less often, these consist of one word that is of key importance in the source text. Such a phrase or word can migrate outside the “parent” text or genre, expanding their lexical combinability and changing their syntactic regime to become a name of a mythological character. It takes two sources of motivation for a new character to emerge—a linguistic (a word that seeks a new context) and a cultural one (a semiotically intense context, such as a situation associated with danger, prohibition, omens, aggression, or magical practices). The combination of these incentives is not uncommon, so the stock of mythology used for names is being constantly renewed. Full article
(This article belongs to the Special Issue Slavic Paganism(s): Past and Present)
16 pages, 2374 KiB  
Article
Text Matching in Insurance Question-Answering Community Based on an Integrated BiLSTM-TextCNN Model Fusing Multi-Feature
by Zhaohui Li, Xueru Yang, Luli Zhou, Hongyu Jia and Wenli Li
Entropy 2023, 25(4), 639; https://doi.org/10.3390/e25040639 - 10 Apr 2023
Cited by 6 | Viewed by 2708
Abstract
Along with the explosion of ChatGPT, the artificial intelligence question-answering system has been pushed to a climax. Intelligent question-answering enables computers to simulate people’s behavior habits of understanding a corpus through machine learning, so as to answer questions in professional fields. How to [...] Read more.
Along with the explosion of ChatGPT, the artificial intelligence question-answering system has been pushed to a climax. Intelligent question-answering enables computers to simulate people’s behavior habits of understanding a corpus through machine learning, so as to answer questions in professional fields. How to obtain more accurate answers to personalized questions in professional fields is the core content of intelligent question-answering research. As one of the key technologies of intelligent question-answering, the accuracy of text matching is related to the development of the intelligent question-answering community. Aiming to solve the problem of polysemy of text, the Enhanced Representation through Knowledge Integration (ERNIE) model is used to obtain the word vector representation of text, which makes up for the lack of prior knowledge in the traditional word vector representation model. Additionally, there are also problems of homophones and polyphones in Chinese, so this paper introduces the phonetic character sequence of the text to distinguish them. In addition, aiming at the problem that there are many proper nouns in the insurance field that are difficult to identify, after conventional part-of-speech tagging, proper nouns are distinguished by especially defining their parts of speech. After the above three types of text-based semantic feature extensions, this paper also uses the Bi-directional Long Short-Term Memory (BiLSTM) and TextCNN models to extract the global features and local features of the text, respectively. It can obtain the feature representation of the text more comprehensively. Thus, the text matching model integrating BiLSTM and TextCNN fusing Multi-Feature (namely MFBT) is proposed for the insurance question-answering community. The MFBT model aims to solve the problems that affect the answer selection in the insurance question-answering community, such as proper nouns, nonstandard sentences and sparse features. Taking the question-and-answer data of the insurance library as the sample, the MFBT text-matching model is compared and evaluated with other models. The experimental results show that the MFBT text-matching model has higher evaluation index values, including accuracy, recall and F1, than other models. The model trained by historical search data can better help users in the insurance question-and-answer community obtain the answers they need and improve their satisfaction. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

12 pages, 608 KiB  
Article
Adapting Off-the-Shelf Speech Recognition Systems for Novel Words
by Wiam Fadel, Toumi Bouchentouf, Pierre-André Buvet and Omar Bourja
Information 2023, 14(3), 179; https://doi.org/10.3390/info14030179 - 13 Mar 2023
Cited by 2 | Viewed by 3485
Abstract
Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. [...] Read more.
Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. Non-native accents, new vocabulary, and aging voices can cause malfunctions in a speech recognition system. If this task is not executed correctly, the assistant robot will inevitably produce false or random responses. In this paper, we used a statistical approach based on distance algorithms to improve OOV correction. We developed a post-processing algorithm to be combined with a speech recognition model. In this sense, we compared two distance algorithms: Damerau–Levenshtein and Levenshtein distance. We validated the performance of the two distance algorithms in conjunction with five off-the-shelf speech recognition models. Damerau–Levenshtein, as compared to the Levenshtein distance algorithm, succeeded in minimizing the Word Error Rate (WER) when using the MoroccanFrench test set with five speech recognition systems, namely VOSK API, Google API, Wav2vec2.0, SpeechBrain, and Quartznet pre-trained models. Our post-processing method works regardless of the architecture of the speech recognizer, and its results on our MoroccanFrench test set outperformed the five chosen off-the-shelf speech recognizer systems. Full article
Show Figures

Figure 1

12 pages, 1366 KiB  
Article
The Processing Differences between Chinese Proper Nouns and Common Nouns in the Left and Right Hemispheres of the Brain
by Zijia Lu and Xuejun Bai
Brain Sci. 2023, 13(3), 424; https://doi.org/10.3390/brainsci13030424 - 28 Feb 2023
Cited by 4 | Viewed by 9404
Abstract
In this study, we investigated whether there were differences between the processing of Chinese proper nouns and common nouns in the left and that in the right hemispheres of the brain by using a visual half-field technique. The experimental materials included four types [...] Read more.
In this study, we investigated whether there were differences between the processing of Chinese proper nouns and common nouns in the left and that in the right hemispheres of the brain by using a visual half-field technique. The experimental materials included four types of proper nouns (people’s names, landmark names, country names, and brand names), four types of common nouns (animals, fruits and vegetables, tools, and abstract nouns), and pseudowords. Participants were asked to judge whether target words that had been quickly presented in their left or right visual field were meaningful words. The results showed that there was a distinction between the processing of the two types of words in the left and right hemispheres. There was no significant difference in the processing of the two types of nouns in the right hemisphere, but the left hemisphere processed common nouns more effectively than proper nouns. Furthermore, the processing difference of proper nouns between the two hemispheres was less than that of common nouns, suggesting that proper nouns have a smaller lateralization effect than common nouns. Full article
(This article belongs to the Section Neurolinguistics)
Show Figures

Figure 1

27 pages, 1752 KiB  
Article
N-to-D Movement within Compounds and Phrases:Referential Compounding, -s Possessives, and Title Expressions in Dutch
by Marijke De Belder
Languages 2022, 7(4), 304; https://doi.org/10.3390/languages7040304 - 29 Nov 2022
Viewed by 2262
Abstract
Noun–noun concatenations can differ along two parameters. They can be compounds, i.e., single words, or constructs, i.e., constituents, and they can have modificational non-heads or referential non-heads. Of the four logical possibilities, one was argued not to exist: compounds of which the non-head [...] Read more.
Noun–noun concatenations can differ along two parameters. They can be compounds, i.e., single words, or constructs, i.e., constituents, and they can have modificational non-heads or referential non-heads. Of the four logical possibilities, one was argued not to exist: compounds of which the non-head is referential were considered to be principally excluded. In this article, I argue that Dutch has compounds with a referential non-head. They resemble the Dutch s-possessive in that their non-heads involve movement to a referential layer. However, unlike the possessive structures, the compounding structure contains head incorporation which results in word-hood. The article further discusses title expressions, such as Prince Charles, which are argued to be referential construct states. Together with the syntactic structure of titles plus proper names, the referential compounds further contribute evidence to the idea that a ban on N-to-D movement for certain uniquely referring roots, such as sun and Bronx, is extra-syntactic. Full article
24 pages, 786 KiB  
Article
Experiences on the Improvement of Logic-Based Anaphora Resolution in English Texts
by Stefano Ferilli and Domenico Redavid
Electronics 2022, 11(3), 372; https://doi.org/10.3390/electronics11030372 - 26 Jan 2022
Cited by 4 | Viewed by 3508
Abstract
Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based [...] Read more.
Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based approach to anaphora resolution in English texts. Our approach improves the traditional algorithm that is considered the standard baseline for comparison in the literature. Its most relevant contributions are in its ability to handle differently different kinds of anaphoras, and to disambiguate alternate associations using gender recognition of proper nouns. The former is obtained by refining the rules in the baseline algorithm, while the latter is obtained using a machine learning approach. Experimental results on a standard benchmark dataset used in the literature show that our approach can significantly improve the performance over the standard baseline algorithm used in the literature, and compares well also to the state-of-the-art algorithm that thoroughly exploits external knowledge. It is also efficient. Thus, we propose to use our algorithm as the new baseline in the literature. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

11 pages, 891 KiB  
Article
Text Classification Model Enhanced by Unlabeled Data for LaTeX Formula
by Hua Cheng, Renjie Yu, Yixin Tang, Yiquan Fang and Tao Cheng
Appl. Sci. 2021, 11(22), 10536; https://doi.org/10.3390/app112210536 - 9 Nov 2021
Cited by 5 | Viewed by 3178
Abstract
Generic language models pretrained on large unspecific domains are currently the foundation of NLP. Labeled data are limited in most model training due to the cost of manual annotation, especially in domains including massive Proper Nouns such as mathematics and biology, where it [...] Read more.
Generic language models pretrained on large unspecific domains are currently the foundation of NLP. Labeled data are limited in most model training due to the cost of manual annotation, especially in domains including massive Proper Nouns such as mathematics and biology, where it affects the accuracy and robustness of model prediction. However, directly applying a generic language model on a specific domain does not work well. This paper introduces a BERT-based text classification model enhanced by unlabeled data (UL-BERT) in the LaTeX formula domain. A two-stage Pretraining model based on BERT(TP-BERT) is pretrained by unlabeled data in the LaTeX formula domain. A double-prediction pseudo-labeling (DPP) method is introduced to obtain high confidence pseudo-labels for unlabeled data by self-training. Moreover, a multi-rounds teacher–student model training approach is proposed for UL-BERT model training with few labeled data and more unlabeled data with pseudo-labels. Experiments on the classification of the LaTex formula domain show that the classification accuracies have been significantly improved by UL-BERT where the F1 score has been mostly enhanced by 2.76%, and lower resources are needed in model training. It is concluded that our method may be applicable to other specific domains with enormous unlabeled data and limited labelled data. Full article
Show Figures

Figure 1

45 pages, 323 KiB  
Article
Compensating for Language Deficits in Amnesia II: H.M.’s Spared versus Impaired Encoding Categories
by Donald G. MacKay, Laura W. Johnson and Chris Hadley
Brain Sci. 2013, 3(2), 415-459; https://doi.org/10.3390/brainsci3020415 - 27 Mar 2013
Cited by 8 | Viewed by 6706
Abstract
Although amnesic H.M. typically could not recall where or when he met someone, he could recall their topics of conversation after long interference-filled delays, suggesting impaired encoding for some categories of novel events but not others. Similarly, H.M. successfully encoded into internal representations [...] Read more.
Although amnesic H.M. typically could not recall where or when he met someone, he could recall their topics of conversation after long interference-filled delays, suggesting impaired encoding for some categories of novel events but not others. Similarly, H.M. successfully encoded into internal representations (sentence plans) some novel linguistic structures but not others in the present language production studies. For example, on the Test of Language Competence (TLC), H.M. produced uncorrected errors when encoding a wide range of novel linguistic structures, e.g., violating reliably more gender constraints than memory-normal controls when encoding referent-noun, pronoun-antecedent, and referent-pronoun anaphora, as when he erroneously and without correction used the gender-inappropriate pronoun “her” to refer to a man. In contrast, H.M. never violated corresponding referent-gender constraints for proper names, suggesting that his mechanisms for encoding proper name gender-agreement were intact. However, H.M. produced no more dysfluencies, off-topic comments, false starts, neologisms, or word and phonological sequencing errors than controls on the TLC. Present results suggest that: (a) frontal mechanisms for retrieving and sequencing word, phrase, and phonological categories are intact in H.M., unlike in category-specific aphasia; (b) encoding mechanisms in the hippocampal region are category-specific rather than item-specific, applying to, e.g., proper names rather than words; (c) H.M.’s category-specific mechanisms for encoding referents into words, phrases, and propositions are impaired, with the exception of referent gender, person, and number for encoding proper names; and (d) H.M. overuses his intact proper name encoding mechanisms to compensate for his impaired mechanisms for encoding other functionally equivalent linguistic information. Full article
Back to TopTop