Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (35)

Search Parameters:
Keywords = recursive word

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 389 KB  
Article
Weak Monotone Fixed Points for Positive–Negative Guarded Language Systems in a Length-Based Ultrametric Space
by Laura Ajeti, Hristo Hristov, Atanas Ilchev and Boyan Zlatanov
Axioms 2026, 15(6), 440; https://doi.org/10.3390/axioms15060440 - 13 Jun 2026
Viewed by 128
Abstract
We study positive–negative guarded systems of language equations over a fixed finite alphabet. The ambient space is the complete ultrametric space of all formal languages equipped with a length-based distance, where two languages are close whenever they agree on all words up to [...] Read more.
We study positive–negative guarded systems of language equations over a fixed finite alphabet. The ambient space is the complete ultrametric space of all formal languages equipped with a length-based distance, where two languages are close whenever they agree on all words up to a sufficiently large length. The systems considered here contain both positive recursive dependencies and negative dependencies expressed through language complements. To handle this mixed structure, we introduce a suitable product order on pairs of languages and prove that the associated system operator has the weak monotone property. We show that the complement is an isometry for the length-based ultrametric and establish a signed wrapping estimate for guarded positive and negative language terms. These estimates lead to an ordered contraction principle for comparable pairs. As a consequence, the canonical lower and upper Picard iterations converge to the same limit, which is the unique fixed pair of the system. We also derive an explicit convergence rate and a finite-depth certification result: after a prescribed number of iterations, the approximants agree with the fixed-point semantics on all words below a given length. Additional symmetry assumptions are shown to force the unique fixed pair to be diagonal, reducing the system to a single language equation. Finally, we discuss an application to trace-based policies for tool-using AI agents. In this interpretation, finite executions of an agent are represented as words over an alphabet of observable tool-events, and the two components of the fixed point provide a stable semantics for policy-defined admissible and risky trace classes. The resulting framework gives a mathematically certified method for finite-depth analysis of recursive trace-based policies based on ultrametric fixed-point techniques. Full article
(This article belongs to the Special Issue Theory and Applications in Functional Analysis)
18 pages, 335 KB  
Article
Guarded Language Operators as Contractions in a Length-Based Ultrametric Space
by Hristo Hristov, Atanas Ilchev, Hristina Kulina and Boyan Zlatanov
Mathematics 2026, 14(10), 1644; https://doi.org/10.3390/math14101644 - 12 May 2026
Viewed by 219
Abstract
We study a class of wrapping operators acting on the space of formal languages over a fixed finite alphabet. The underlying space is equipped with a length-based ultrametric, in which two languages are close whenever they coincide on all sufficiently short words. We [...] Read more.
We study a class of wrapping operators acting on the space of formal languages over a fixed finite alphabet. The underlying space is equipped with a length-based ultrametric, in which two languages are close whenever they coincide on all sufficiently short words. We prove that every wrapping operator generated by a finite family of guards with positive total guard length is a contraction. As a consequence, Banach’s contraction principle yields existence and uniqueness of a fixed point for the corresponding recursive language equation, together with convergence of the Picard iteration from an arbitrary initial language. We also obtain an explicit quantitative estimate for the rate of convergence. This makes it possible to determine how many iterations are sufficient to recover the fixed point correctly on all words up to a prescribed length. Several examples illustrate the theory, including operators with different guard lengths and a case showing that convergence in the length-based ultrametric does not coincide with set-theoretic convergence. An application to recursive structures and document validation is also presented, including recursive data formats, abstract syntax trees, and a restricted fragment of JSON schemas. The results provide a formal foundation for validation together with explicit bounds for correctness on inputs of bounded length. Full article
(This article belongs to the Section C: Mathematical Analysis)
20 pages, 3073 KB  
Article
Polygon Dissections via Lucas-Inspired Encoding
by Aybeyan Selim, Muzafer Saracevic and Omer Aydin
Mathematics 2026, 14(10), 1631; https://doi.org/10.3390/math14101631 - 11 May 2026
Viewed by 920
Abstract
Classical enumeration of triangulations and angulations of convex polygons is governed by the Catalan and Fuss–Catalan families. In this paper, we introduce a Lucas-inspired symbolic encoding framework for a restricted subclass of triangulations, called Lucas-compatible triangulations. The purpose of the framework is not [...] Read more.
Classical enumeration of triangulations and angulations of convex polygons is governed by the Catalan and Fuss–Catalan families. In this paper, we introduce a Lucas-inspired symbolic encoding framework for a restricted subclass of triangulations, called Lucas-compatible triangulations. The purpose of the framework is not to replace classical Catalan enumeration, but to provide a complementary structural layer that records admissible local reductions through two canonical operations. Within this restricted setting, the geometric objects remain Catalan-based, whereas the associated encoding space satisfies a Fibonacci-type recurrence. We formalize the reduction model, define admissible Lucas words, and prove structural properties of the encoding map. We further present recursive generation algorithms, analyze their output-sensitive complexity, and compare the size of the encoding space with the size of the full triangulation space. In addition, we discuss geometric constraints, equivalence phenomena, and potential uses of the encoding in compact representation, constrained enumeration, and recursion-guided generation of polygon dissections. Computational experiments support the theoretical predictions and illustrate how the proposed encoding yields a compressed symbolic view of a restricted but mathematically meaningful class of dissections. Full article
Show Figures

Figure 1

29 pages, 2186 KB  
Article
Insights for Curriculum-Oriented Instruction of Programming Paradigms for Non-Computer Science Majors: Survey and Public Q&A Evidence
by Ji-Hye Oh and Hyun-Seok Park
Appl. Sci. 2026, 16(3), 1191; https://doi.org/10.3390/app16031191 - 23 Jan 2026
Viewed by 593
Abstract
This study examines how different programming paradigms are associated with learning experiences and cognitive challenges as encountered by non-computer science novice learners. Using a case-study approach situated within specific instructional contexts, we integrate survey data from undergraduate students with large-scale public question-and-answer data [...] Read more.
This study examines how different programming paradigms are associated with learning experiences and cognitive challenges as encountered by non-computer science novice learners. Using a case-study approach situated within specific instructional contexts, we integrate survey data from undergraduate students with large-scale public question-and-answer data from Stack Overflow to explore paradigm-related difficulty patterns. Four instructional contexts—C, Java, Python, and Prolog—were examined as pedagogical instantiations of imperative, object-oriented, functional-style, and logic-based paradigms using text clustering, word embedding models, and interaction-informed complexity metrics. The analysis identifies distinct patterns of learning challenges across paradigmatic contexts, including difficulties related to low-level memory management in C-based instruction, abstraction and design reasoning in object-oriented contexts, inference-driven reasoning in Prolog-based instruction, and recursion-related challenges in functional-style programming tasks. Survey responses exhibit tendencies that are broadly consistent with patterns observed in public Q&A data, supporting the use of large-scale community-generated content as a complementary source for learner-centered educational analysis. Based on these findings, the study discusses paradigm-aware instructional implications for programming education tailored to non-major learners within comparable educational settings. The results provide empirical support for differentiated instructional approaches and offer evidence-informed insights relevant to curriculum-oriented teaching and future research on adaptive learning systems. Full article
Show Figures

Figure 1

20 pages, 2254 KB  
Article
A Hybrid Deep Learning and Optimization Model for Enterprise Archive Semantic Retrieval
by Xiaonan Shi, Junhe Chen, Yumo Wang and Limei Fu
Appl. Sci. 2025, 15(23), 12381; https://doi.org/10.3390/app152312381 - 21 Nov 2025
Viewed by 729
Abstract
By searching for and summarizing the relevant information of the enterprise, we can build relevant knowledge maps, supplement and enrich the existing knowledge base, and support existing experiments and subsequent algorithm improvements. The extracted input text of enterprise archives is described via relation [...] Read more.
By searching for and summarizing the relevant information of the enterprise, we can build relevant knowledge maps, supplement and enrich the existing knowledge base, and support existing experiments and subsequent algorithm improvements. The extracted input text of enterprise archives is described via relation extraction and semantic analysis to improve the efficiency of archive retrieval and reduce the cost of communication. On the basis of the analysis of previous research, an enterprise archive semantic retrieval algorithm based on deep learning technology is constructed, that is, the BERT + BiGRU + CRF + HHO_improved model, to extract the relevant information of the enterprise. In the model, the Bidirectional Encoder Representations from Transformers (BERT) model is used to preprocess the Chinese word embedding, and the question-and-answer data are generated from the actual enterprise file database. Next, a Bidirectional Gated Recursive Unit (BiGRU) is used with the attention mechanism to capture the contextual features of the sequence. The Conditional Random Field (CRF) classifier is subsequently used to classify the text related to the enterprise archives, and the obtained data are labeled in sequence. Moreover, the swarm intelligence algorithm is introduced to dynamically optimize the model parameters and data processing strategies further to improve the generalization ability and adaptability of the model. The Harris Hawks Optimizer Improved (HHO_improved) algorithm is used to optimize the parameters of the CRF module to increase the performance and efficiency of named entity recognition. On the independently constructed dataset, the advantages of our algorithm are verified via comparative experiments with a variety of semantic matching algorithms and ablation experiments on the CRF and HHO_improved. The CRF and HHO_improved play essential roles in improving model performance. The obtained knowledge extraction results are expected to supplement and enhance the existing knowledge base, simplify the workflow, assist the enterprise’s dynamic production task management, and improve the efficiency of enterprise operations. The proposed algorithm achieves an accuracy improvement of 36.33%, 43.88%, 15.24%, and 12.41% over the BERT, BiGRU, BERT + BiGRU, and BERT + BiGRU + CRF models, respectively. Full article
Show Figures

Figure 1

81 pages, 17721 KB  
Review
Interactive Coupling Relaxation of Dipoles and Wagner Charges in the Amorphous State of Polymers Induced by Thermal and Electrical Stimulations: A Dual-Phase Open Dissipative System Perspective
by Jean Pierre Ibar
Polymers 2025, 17(2), 239; https://doi.org/10.3390/polym17020239 - 19 Jan 2025
Viewed by 1750
Abstract
This paper addresses the author’s current understanding of the physics of interactions in polymers under a voltage field excitation. The effect of a voltage field coupled with temperature to induce space charges and dipolar activity in dielectric materials can be measured by very [...] Read more.
This paper addresses the author’s current understanding of the physics of interactions in polymers under a voltage field excitation. The effect of a voltage field coupled with temperature to induce space charges and dipolar activity in dielectric materials can be measured by very sensitive electrometers. The resulting characterization methods, thermally stimulated depolarization (TSD) and thermal-windowing deconvolution (TWD), provide a powerful way to study local and cooperative relaxations in the amorphous state of matter that are, arguably, essential to understanding the glass transition, molecular motions in the rubbery and molten states and even the processes leading to crystallization. Specifically, this paper describes and tries to explain ‘interactive coupling’ between molecular motions in polymers by their dielectric relaxation characteristics when polymeric samples have been submitted to thermally induced polarization by a voltage field followed by depolarization at a constant heating rate. Interactive coupling results from the modulation of the local interactions by the collective aspect of those interactions, a recursive process pursuant to the dynamics of the interplay between the free volume and the conformation of dual-conformers, two fundamental basic units of the macromolecules introduced by this author in the “dual-phase” model of interactions. This model reconsiders the fundamentals of the TSD and TWD results in a different way: the origin of the dipoles formation, induced or permanent dipoles; the origin of the Wagner space charges and the Tg,ρ transition; the origin of the TLL manifestation; the origin of the Debye elementary relaxations’ compensation or parallelism in a relaxation map; and finally, the dual-phase origin of their super-compensations. In other words, this paper is an attempt to link the fundamentals of TSD and TWD activation and deactivation of dipoles that produce a current signal with the statistical parameters of the “dual-phase” model of interactions underlying the Grain-Field Statistics. Full article
Show Figures

Figure 1

12 pages, 2666 KB  
Article
Statistical Signal Integrity Analysis on DFE with Nonideal Latch Model
by Junyong Park
Electronics 2025, 14(1), 202; https://doi.org/10.3390/electronics14010202 - 6 Jan 2025
Cited by 1 | Viewed by 2080
Abstract
This paper introduces the nonideal latch model for the decision feedback equalizer (DFE) for statistical signal integrity (SI) analysis. The DFE equalizes inter-symbol-interference (ISI) noise from the channel in the time domain. The nonideal DFE may propagate an error due to the ISI [...] Read more.
This paper introduces the nonideal latch model for the decision feedback equalizer (DFE) for statistical signal integrity (SI) analysis. The DFE equalizes inter-symbol-interference (ISI) noise from the channel in the time domain. The nonideal DFE may propagate an error due to the ISI noise, and the nonideal latch in the DFE may also generate a bit error in the DFE operation. The dynamic latch in the slicer of the DFE circuit amplifies the received signal in a recursive manner. During the amplification, the voltage difference ebetween the signal and the threshold voltage may be less amplified when the amplification time is not enough. Thus, the nonideal dynamic latch is another error source in the DFE operation. In order to reflect the effect of the nonideal latch, the gray zone is defined based on the transfer function of the dynamic latch with iterations. In other words, the gray zone is approximated with the Gaussian distribution and reflected into the statistical eye diagram. As a result of the nonideal latch model, the statistical eye diagram has blurred probability density functions (PDFs). Full article
(This article belongs to the Special Issue Advances in Signals and Systems Research)
Show Figures

Figure 1

28 pages, 3221 KB  
Article
Dissimilation in Hispano-Romance Diminutive Suffixation
by Claire Julia Lozano and Travis G. Bradley
Languages 2024, 9(12), 380; https://doi.org/10.3390/languages9120380 - 20 Dec 2024
Viewed by 2508
Abstract
A highly productive derivational process, diminutive suffixation in Spanish (e.g., gatito ~ gatiko/gatico ‘little/well-known/beloved/awful cat’ < gato ‘cat’) has received much attention in the morphology–phonology interface literature. The present study contributes a novel comparative analysis of a dissimilatory alternation between diminutive suffix allomorphs [...] Read more.
A highly productive derivational process, diminutive suffixation in Spanish (e.g., gatito ~ gatiko/gatico ‘little/well-known/beloved/awful cat’ < gato ‘cat’) has received much attention in the morphology–phonology interface literature. The present study contributes a novel comparative analysis of a dissimilatory alternation between diminutive suffix allomorphs -ito/a and -ico/a (-iko/a) across three Hispano-Romance varieties. In Judeo-Spanish, the voiceless dorsal stop [k] of default -iko/a dissimilates to coronal [t] after any dorsal segment [k, ɡ, ɡʷ, x, w] in the base-final syllable. In Colombian Spanish, the voiceless coronal stop [t] of default -ito/a dissimilates to dorsal [k] after only an identical [t] in the base-final syllable. By contrast, Castilian Spanish -ito/a does not dissimilate, thereby providing a baseline for comparison. All three varieties allow for optional iteration of the suffix, which conveys greater smallness or endearment than the simple diminutive, e.g., Castilian Spanish gatitito ‘little/beloved kitty’, without dissimilation. Iterated diminutives in Colombian Spanish show two patterns of dissimilation, which have not been fully acknowledged in the previous literature. For example, either (i) [it] and [ik] alternate to avoid adjacent identical syllable onsets, e.g., gat[ikitíko], or (ii) [it] is iterated until alternating with word-final [ik], e.g., gat[ititíko]. In all three Hispano-Romance varieties, base-final unstressed vowels are deleted before a vowel-initial diminutive suffix, followed by unstressed -o/a, and stress (indicated by an acute accent) is shifted rightward onto the penultimate syllable of the diminutive word. Vowel deletion and stress shift apply recursively in iterated diminutives. We propose an Optimality Theory analysis of these alternations in terms of suffix allomorphy that is phonologically conditioned by consonantal place dissimilation. The analysis is formalized as an interaction among constraints that enforce prosodic unmarkedness, output–output correspondence, allomorph preference, and similarity avoidance. We consider theoretical alternatives and compare our analysis to other recent proposals. Full article
(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)
Show Figures

Figure 1

15 pages, 328 KB  
Article
Partial Metrics Viewed as w-Distances: Extending Some Powerful Fixed-Point Theorems
by Salvador Romaguera and Pedro Tirado
Mathematics 2024, 12(24), 3991; https://doi.org/10.3390/math12243991 - 18 Dec 2024
Viewed by 1088
Abstract
Involving w-distances and hybrid contractions that combine conditions of the Ćirić type and Samet et al. type, we obtain some general fixed-point results for quasi-metric spaces from which powerful and significant fixed-point theorems on partial metric spaces are deduced as special cases. [...] Read more.
Involving w-distances and hybrid contractions that combine conditions of the Ćirić type and Samet et al. type, we obtain some general fixed-point results for quasi-metric spaces from which powerful and significant fixed-point theorems on partial metric spaces are deduced as special cases. We present examples showing that our results are real generalizations of those corresponding to the partial metric case and we give an application to the study of recursive equations where the usual Baire partial metric on a domain of words is replaced with a suitable w-distance. Our approach is inspired on the nice fact, stated by Matthews, that every partial metric induces a weighted quasi-metric. Then, we define the notion of a strong w-distance and deduce that every partial metric is a symmetric strong w-distance for its induced weighted quasi-metric space. Full article
(This article belongs to the Special Issue Novel Approaches in Fuzzy Sets and Metric Spaces)
9 pages, 14858 KB  
Proceeding Paper
An Experimental Study for Localization Using Lidar Point Cloud Similarity
by Sai S. Reddy, Luis Jaimes and Onur Toker
Eng. Proc. 2024, 82(1), 89; https://doi.org/10.3390/ecsa-11-20446 - 25 Nov 2024
Viewed by 931
Abstract
In this paper, we consider the use of high-definition maps for autonomous vehicle (AV) localization. An autonomous vehicle may have a variety of sensors, including cameras, lidars, and Global Positioning System(GPS) sensors. Each sensor technology has its own pros and cons; for example, [...] Read more.
In this paper, we consider the use of high-definition maps for autonomous vehicle (AV) localization. An autonomous vehicle may have a variety of sensors, including cameras, lidars, and Global Positioning System(GPS) sensors. Each sensor technology has its own pros and cons; for example, GPS may not be very effective in a city environment with high-rise buildings; cameras may not be very effective in poorly illuminated environments; and lidars simply generate a relatively dense local point cloud. In a typical autonomous vehicle system, all of these sensors are present and sensor fusion algorithms are used to extract the most accurate information. Using our AV research vehicle, we drove on our university campus and recorded Real Time Kinematic-GPS(RTK-GPS) (ZED-F9P) and Velodyne Lidar (VLP-16) data in a time-synchronized fashion. In other words, for every GPS location on our campus, we have lidar-generated point cloud data, resulting in a simple high-definition map of the campus. The main challenge that we look to overcome in this paper is thus: given a high-definition map of the environment and local point cloud data generated by a single lidar scan, determine the AV research vehicle’s location by using point cloud “similarity” metrics. We first propose a computationally simple similarity metric and then describe a recursive Kalman filter-like approach for localization. The effectiveness of the proposed similarity metric has been demonstrated using the experimental data. Full article
Show Figures

Figure 1

24 pages, 2131 KB  
Article
Improving Text Classification in Agricultural Expert Systems with a Bidirectional Encoder Recurrent Convolutional Neural Network
by Xiaojuan Guo, Jianping Wang, Guohong Gao, Li Li, Junming Zhou and Yancui Li
Electronics 2024, 13(20), 4054; https://doi.org/10.3390/electronics13204054 - 15 Oct 2024
Cited by 5 | Viewed by 2264
Abstract
With the rapid development of internet and AI technologies, Agricultural Expert Systems (AESs) have become crucial for delivering technical support and decision-making in agricultural management. However, traditional natural language processing methods often struggle with specialized terminology and context, and they lack the adaptability [...] Read more.
With the rapid development of internet and AI technologies, Agricultural Expert Systems (AESs) have become crucial for delivering technical support and decision-making in agricultural management. However, traditional natural language processing methods often struggle with specialized terminology and context, and they lack the adaptability to handle complex text classifications. The diversity and evolving nature of agricultural texts make deep semantic understanding and integration of contextual knowledge especially challenging. To tackle these challenges, this paper introduces a Bidirectional Encoder Recurrent Convolutional Neural Network (AES-BERCNN) tailored for short-text classification in agricultural expert systems. We designed an Agricultural Text Encoder (ATE) with a six-layer transformer architecture to capture both preceding and following word information. A recursive convolutional neural network based on Gated Recurrent Units (GRUs) was also developed to merge contextual information and learn complex semantic features, which are then combined with the ATE output and refined through max-pooling to form the final feature representation. The AES-BERCNN model was tested on a self-constructed agricultural dataset, achieving an accuracy of 99.63% in text classification. Its generalization ability was further verified on the Tsinghua News dataset. Compared to other models such as TextCNN, DPCNN, BiLSTM, and BERT-based models, the AES-BERCNN shows clear advantages in agricultural text classification. This work provides precise and timely technical support for intelligent agricultural expert systems. Full article
Show Figures

Figure 1

17 pages, 13756 KB  
Communication
Sign Language Interpreting System Using Recursive Neural Networks
by Erick A. Borges-Galindo, Nayely Morales-Ramírez, Mario González-Lee, José R. García-Martínez, Mariko Nakano-Miyatake  and Hector Perez-Meana 
Appl. Sci. 2024, 14(18), 8560; https://doi.org/10.3390/app14188560 - 23 Sep 2024
Cited by 6 | Viewed by 3236
Abstract
According to the World Health Organization (WHO), 5% of people around the world have hearing disabilities, which limits their capacity to communicate with others. Recently, scientists have proposed systems based on deep learning techniques to create a sign language-to-text translator, expecting this to [...] Read more.
According to the World Health Organization (WHO), 5% of people around the world have hearing disabilities, which limits their capacity to communicate with others. Recently, scientists have proposed systems based on deep learning techniques to create a sign language-to-text translator, expecting this to help deaf people communicate; however, the performance of such systems is still low for practical scenarios. Furthermore, the proposed systems are language-oriented, which leads to particular problems related to the signs for each language. For this reason, to address this problem, in this paper, we propose a system based on a Recursive Neural Network (RNN) focused on Mexican Sign Language (MSL) that uses the spatial tracking of hands and facial expressions to predict the word that a person intends to communicate. To achieve this, we trained four RNN-based models using a dataset of 600 clips that were 30 s long; each word included 30 clips. We conducted two experiments; we tailored the first experiment to determine the most well-suited model for the target application and measure the accuracy of the resulting system in offline mode; in the second experiment, we measured the accuracy of the system in online mode. We assessed the system’s performance using the following metrics: the precision, recall, F1-score, and the number of errors during online scenarios, and the results computed indicate an accuracy of 0.93 in the offline mode and a higher performance for the online operating mode compared to previously proposed approaches. These results underscore the potential of the proposed scheme in scenarios such as teaching, learning, commercial transactions, and daily communications among deaf and non-deaf people. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

29 pages, 1152 KB  
Article
A Descriptive and Experimental Investigation of Recursive Compounds in English: Their Semantic, Syntactic, and Phonological Characterization
by Makiko Mukai
Languages 2024, 9(5), 175; https://doi.org/10.3390/languages9050175 - 11 May 2024
Viewed by 3047
Abstract
The aim of this study is to experimentally capture the semantic, syntactic, and phonological properties of recursive compounds in English. We asked 22 native speakers of English to judge the semantic, syntactic, and phonological properties of 20 recursive compounds that are inherently ambiguous [...] Read more.
The aim of this study is to experimentally capture the semantic, syntactic, and phonological properties of recursive compounds in English. We asked 22 native speakers of English to judge the semantic, syntactic, and phonological properties of 20 recursive compounds that are inherently ambiguous in interpretation (e.g., university entrance exam). We found variations among the participants in each of these three basic aspects. For semantic interpretation, there was a tendency among the participants to prefer left-branching interpretation (‘an exam for university entrance’) over right-branching interpretation (‘an entrance exam in a university’). Using a lexical integrity effect for the syntactic tests, it was found that certain recursive compounds allow for coordination inside. Phonologically, speaker variation was observed in whether and how recursive compounds were pronounced, with 16 participants obeying the Lexical Category Prominence Rule. Full article
(This article belongs to the Special Issue Word-Formation Processes in English)
Show Figures

Figure 1

18 pages, 1061 KB  
Article
Automatic Spell-Checking System for Spanish Based on the Ar2p Neural Network Model
by Eduard Puerto, Jose Aguilar and Angel Pinto
Computers 2024, 13(3), 76; https://doi.org/10.3390/computers13030076 - 12 Mar 2024
Cited by 4 | Viewed by 3102
Abstract
Currently, approaches to correcting misspelled words have problems when the words are complex or massive. This is even more serious in the case of Spanish, where there are very few studies in this regard. So, proposing new approaches to word recognition and correction [...] Read more.
Currently, approaches to correcting misspelled words have problems when the words are complex or massive. This is even more serious in the case of Spanish, where there are very few studies in this regard. So, proposing new approaches to word recognition and correction remains a research topic of interest. In particular, an interesting approach is to computationally simulate the brain process for recognizing misspelled words and their automatic correction. Thus, this article presents an automatic recognition and correction system of misspelled words in Spanish texts, for the detection of misspelled words, and their automatic amendments, based on the systematic theory of pattern recognition of the mind (PRTM). The main innovation of the research is the use of the PRTM theory in this context. Particularly, a corrective system of misspelled words in Spanish based on this theory, called Ar2p-Text, was designed and built. Ar2p-Text carries out a recursive process of analysis of words by a disaggregation/integration mechanism, using specialized hierarchical recognition modules that define formal strategies to determine if a word is well or poorly written. A comparative evaluation shows that the precision and coverage of our Ar2p-Text model are competitive with other spell-checkers. In the experiments, the system achieves better performance than the three other systems. In general, Ar2p-Text obtains an F-measure of 83%, above the 73% achieved by the other spell-checkers. Our hierarchical approach reuses a lot of information, allowing for the improvement of the text analysis processes in both quality and efficiency. Preliminary results show that the above will allow for future developments of technologies for the correction of words inspired by this hierarchical approach. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

15 pages, 2748 KB  
Article
Keyword Pool Generation for Web Text Collecting: A Framework Integrating Sample and Semantic Information
by Xiaolong Wu, Chong Feng, Qiyuan Li and Jianping Zhu
Mathematics 2024, 12(3), 405; https://doi.org/10.3390/math12030405 - 26 Jan 2024
Cited by 2 | Viewed by 2388
Abstract
Keyword pools are used as search queries to collect web texts, largely determining the size and coverage of the samples and provide a data base for subsequent text mining. However, how to generate a refined keyword pool with high similarity and some expandability [...] Read more.
Keyword pools are used as search queries to collect web texts, largely determining the size and coverage of the samples and provide a data base for subsequent text mining. However, how to generate a refined keyword pool with high similarity and some expandability is a challenge. Currently, keyword pools for search queries aimed at collecting web texts either lack an objective generation method and evaluation system, or have a low utilization rate of sample semantic information. Therefore, this paper proposed a keyword generation framework that integrates sample and semantic information to construct a complete and objective keyword pool generation and evaluation system. The framework includes a data phase and a modeling phase, and its core is in the modeling phase, where both feature ranking and model performance are considered. A regression model about a topic vector and word vectors is constructed for the first time based on word embedding, and keyword pools are generated from the perspective of model performance. In addition, two keyword generation methods, Recursive Feature Introduction (RFI) and Recursive Feature Introduction and Elimination (RFIE), are also proposed in this paper. Different feature ranking algorithms, keyword generation methods and regression models are compared in the experiments. The results show that: (1) When using RFI to generate keywords, the regression model using ranked features has better prediction performance than the baseline model, and the number of generated keywords is refiner, and the prediction performance of the regression model using tree-based ranked features is significantly better than that of the one using SHAP-based ranked features. (2) The prediction performance of the regression model using RFI with tree-based ranked features is significantly better than that using Recursive Feature Elimination (RFE) with tree-based one. (3) All four regression models using RFI/RFE with SHAP- based/tree-based ranked features have significantly higher average similarity scores and cumulative advantages than the baseline model (the model using RFI with unranked features). (4) Light Gradient Boosting Machine (LGBM) using RFI with SHAP-based ranked features has significantly better prediction performance, higher average similarity scores, and cumulative advantages. In conclusion, our framework can generate a keyword pool that is more similar to the topic, and more refined and expandable, which provides certain research ideas for expanding the research sample size while ensuring the coverage of topics in web text collecting. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

Back to TopTop