MDPI - Publisher of Open Access Journals

24 pages, 939 KiB

Open AccessReview

Advances in Amazigh Language Technologies: A Comprehensive Survey Across Processing Domains

by Oussama Akallouch, Mohammed Akallouch and Khalid Fardousse

Information 2025, 16(7), 600; https://doi.org/10.3390/info16070600 - 13 Jul 2025

Viewed by 207

The Amazigh language, spoken by millions across North Africa, presents unique computational challenges due to its complex morphological system, dialectal variation, and multiple writing systems. This survey examines technological advances over the past decade across four key domains: natural language processing, speech recognition, [...] Read more.

The Amazigh language, spoken by millions across North Africa, presents unique computational challenges due to its complex morphological system, dialectal variation, and multiple writing systems. This survey examines technological advances over the past decade across four key domains: natural language processing, speech recognition, optical character recognition, and machine translation. We analyze the evolution from rule-based systems to advanced neural models, demonstrating how researchers have addressed resource constraints through innovative approaches that blend linguistic knowledge with machine learning. Our analysis reveals uneven progress across domains, with optical character recognition reaching high maturity levels while machine translation remains constrained by limited parallel data. Beyond technical metrics, we explore applications in education, cultural preservation, and digital accessibility, showing how these technologies enable Amazigh speakers to participate in the digital age. This work illustrates that advancing language technology for marginalized languages requires fundamentally different approaches that respect linguistic diversity while ensuring digital equity. Full article

► Show Figures

Figure 1

19 pages, 16096 KiB

Open AccessArticle

Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations

by Tawffeek A. S. Mohammed

Information 2025, 16(6), 440; https://doi.org/10.3390/info16060440 - 26 May 2025

Viewed by 769

Abstract

This study investigates translation quality between Arabic and English, comparing traditional rule-based machine translation systems, modern neural machine translation tools such as Google Translate, and large language models like ChatGPT. The research adopts both qualitative and quantitative approaches to assess the efficacy, accuracy, [...] Read more.

This study investigates translation quality between Arabic and English, comparing traditional rule-based machine translation systems, modern neural machine translation tools such as Google Translate, and large language models like ChatGPT. The research adopts both qualitative and quantitative approaches to assess the efficacy, accuracy, and contextual fidelity of translations. It particularly focuses on the translation of idiomatic and colloquial expressions as well as technical texts and genres. Using well-established evaluation metrics such as bilingual evaluation understudy (BLEU), translation error rate (TER), and character n-gram F-score (chrF), alongside the qualitative translation quality assessment model proposed by Juliane House, this study investigates the linguistic and semantic nuances of translations generated by different systems. This study concludes that although metric-based evaluations like BLEU and TER are useful, they often fail to fully capture the semantic and contextual accuracy of idiomatic and expressive translations. Large language models, particularly ChatGPT, show promise in addressing this gap by offering more coherent and culturally aligned translations. However, both systems demonstrate limitations that necessitate human post-editing for high-stakes content. The findings support a hybrid approach, combining machine translation tools with human oversight for optimal translation quality, especially in languages with complex morphology and culturally embedded expressions like Arabic. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

23 pages, 711 KiB

Open AccessArticle

Comparison of Grammar Characteristics of Human-Written Corpora and Machine-Generated Texts Using a Novel Rule-Based Parser

by Simon Strübbe, Irina Sidorenko and Renée Lampe

Information 2025, 16(4), 274; https://doi.org/10.3390/info16040274 - 28 Mar 2025

Viewed by 526

Abstract

As the prevalence of machine-written texts grows, it has become increasingly important to distinguish between human- and machine-generated content, especially when such texts are not explicitly labeled. Current artificial intelligence (AI) detection methods primarily focus on human-like characteristics, such as emotionality and subjectivity. [...] Read more.

As the prevalence of machine-written texts grows, it has become increasingly important to distinguish between human- and machine-generated content, especially when such texts are not explicitly labeled. Current artificial intelligence (AI) detection methods primarily focus on human-like characteristics, such as emotionality and subjectivity. However, these features can be easily modified through AI humanization, which involves altering word choice. In contrast, altering the underlying grammar without affecting the conveyed information is considerably more challenging. Thus, the grammatical characteristics of a text can be used as additional indicators of its origin. To address this, we employ a newly developed rule-based parser to analyze the grammatical structures in human- and machine-written texts. Our findings reveal systematic grammatical differences between human- and machine-written texts, providing a reliable criterion for the determination of the text origin. We further examine the stability of this criterion in the context of AI humanization and translation to other languages. Full article

(This article belongs to the Special Issue The Role of Artificial Intelligence for Diversity, Equity, and Inclusion)

► Show Figures

Figure 1

19 pages, 3746 KiB

Open AccessArticle

The Impact of the Human Factor on Communication During a Collision Situation in Maritime Navigation

by Leszek Misztal and Paulina Hatlas-Sowinska

Appl. Sci. 2025, 15(5), 2797; https://doi.org/10.3390/app15052797 - 5 Mar 2025

Viewed by 720

Abstract

In this paper, the authors draw attention to the significant impact of the human factor during collision situations in maritime navigation. The problems in the communication process between navigators are so excessive that the authors propose automatic communication. This is an alternative method [...] Read more.

In this paper, the authors draw attention to the significant impact of the human factor during collision situations in maritime navigation. The problems in the communication process between navigators are so excessive that the authors propose automatic communication. This is an alternative method to the current one. The presented system comprehensively performs communication tasks during a sea voyage. To reach the mentioned goal, AI methods of natural language processing and additional properties of metaontology (ontology supplemented with objective functions) are applied. Dedicated to maritime transport applications, the model for translating a natural language into an ontology consists of multiple steps and uses AI methods of classification for the recognition of a message from the ship’s bridge. The reverse model is also multi-stage and uses a created rule-based knowledge base to create natural-language sentences built on the basis of the ontology. Validation of the model’s accuracy results was conducted through accuracy assessment coefficients for information classification, commonly used in science. Receiver operating characteristic (ROC) curves represent the results in the datasets. The presented solution of the designed architecture of the system as well as algorithms developed in the software prototype confirmed the correctness of the assumptions in the described study. The authors demonstrated that it is feasible to successfully apply metaontology and machine learning methods in the proposed prototype software for ship-to-ship communication. Full article

(This article belongs to the Section Marine Science and Engineering)

► Show Figures

Figure 1

19 pages, 2296 KiB

Open AccessArticle

A Hybrid Approach to Ontology Construction for the Badini Kurdish Language

by Media Azzat, Karwan Jacksi and Ismael Ali

Information 2024, 15(9), 578; https://doi.org/10.3390/info15090578 - 19 Sep 2024

Viewed by 2334

Abstract

Semantic ontologies have been widely utilized as crucial tools within natural language processing, underpinning applications such as knowledge extraction, question answering, machine translation, text comprehension, information retrieval, and text summarization. While the Kurdish language, a low-resource language, has been the subject of some [...] Read more.

Semantic ontologies have been widely utilized as crucial tools within natural language processing, underpinning applications such as knowledge extraction, question answering, machine translation, text comprehension, information retrieval, and text summarization. While the Kurdish language, a low-resource language, has been the subject of some ontological research in other dialects, a semantic web ontology for the Badini dialect remains conspicuously absent. This paper addresses this gap by presenting a methodology for constructing and utilizing a semantic web ontology for the Badini dialect of the Kurdish language. A Badini annotated corpus (UOZBDN) was created and manually annotated with part-of-speech (POS) tags. Subsequently, an HMM-based POS tagger model was developed using the UOZBDN corpus and applied to annotate additional text for ontology extraction. Ontology extraction was performed by employing predefined rules to identify nouns and verbs from the model-annotated corpus and subsequently forming semantic predicates. Robust methodologies were adopted for ontology development, resulting in a high degree of precision. The POS tagging model attained an accuracy of 95.04% when applied to the UOZBDN corpus. Furthermore, a manual evaluation conducted by Badini Kurdish language experts yielded a 97.42% accuracy rate for the extracted ontology. Full article

(This article belongs to the Special Issue Knowledge Representation and Ontology-Based Data Management)

► Show Figures

Figure 1

20 pages, 2098 KiB

Open AccessArticle

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method

by Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng

Appl. Sci. 2024, 14(7), 2989; https://doi.org/10.3390/app14072989 - 2 Apr 2024

Cited by 2 | Viewed by 1328

Abstract

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, [...] Read more.

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, which have high requirements on the corpus and the linguistic foundation of the researchers and are more costly to annotate manually. In this study, we explore Tibetan SBD using deep learning technology. Initially, we analyze Tibetan characteristics and various subword techniques, selecting Byte Pair Encoding (BPE) and Sentencepiece (SP) for text segmentation and training the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model. Secondly, we studied the Tibetan SBD based on different BERT pre-trained language models, which mainly learns the ambiguity of the shad (“།”) in different positions in modern Tibetan texts and determines through the model whether the shad (“།”) in the texts has the function of segmenting sentences. Meanwhile, this study introduces four models, BERT-CNN, BERT-RNN, BERT-RCNN, and BERT-DPCNN, based on the BERT model for performance comparison. Finally, to verify the performance of the pre-trained language models on the SBD task, this study conducts SBD experiments on both the publicly available Tibetan pre-trained language model TiBERT and the multilingual pre-trained language model (Multi-BERT). The experimental results show that the F1 score of the BERT (BPE) model trained in this study reaches 95.32% on 465,669 Tibetan sentences, nearly five percentage points higher than BERT (SP) and Multi-BERT. The SBD method based on pre-trained language models in this study lays the foundation for establishing datasets for the later tasks of Tibetan pre-training, summary extraction, and machine translation. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 415 KiB

Open AccessArticle

Developing an Urdu Lemmatizer Using a Dictionary-Based Lookup Approach

by Saima Shaukat, Muhammad Asad and Asmara Akram

Appl. Sci. 2023, 13(8), 5103; https://doi.org/10.3390/app13085103 - 19 Apr 2023

Cited by 3 | Viewed by 3071

Abstract

Lemmatization aims at returning the root form of a word. The lemmatizer is envisioned as a vital instrument that can assist in many Natural Language Processing (NLP) tasks. These tasks include Information Retrieval, Word Sense Disambiguation, Machine Translation, Text Reuse, and Plagiarism Detection. [...] Read more.

Lemmatization aims at returning the root form of a word. The lemmatizer is envisioned as a vital instrument that can assist in many Natural Language Processing (NLP) tasks. These tasks include Information Retrieval, Word Sense Disambiguation, Machine Translation, Text Reuse, and Plagiarism Detection. Previous studies in the literature have focused on developing lemmatizers using rule-based approaches for English and other highly-resourced languages. However, there have been no thorough efforts for the development of a lemmatizer for most South Asian languages, specifically Urdu. Urdu is a morphologically rich language with many inflectional and derivational forms. This makes the development of an efficient Urdu lemmatizer a challenging task. A standardized lemmatizer would contribute towards establishing much-needed methodological resources for this low-resourced language, which are required to boost the performance of many Urdu NLP applications. This paper presents a lemmatization system for the Urdu language, based on a novel dictionary lookup approach. The contributions made through this research are the following: (1) the development of a large benchmark corpus for the Urdu language, (2) the exploration of the relationship between parts of speech tags and the lemmatizer, and (3) the development of standard approaches for an Urdu lemmatizer. Furthermore, we experimented with the impact of Part of Speech (PoS) on our proposed dictionary lookup approach. The empirical results showed that we achieved the best accuracy score of 76.44% through the proposed dictionary lookup approach. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

18 pages, 973 KiB

Open AccessArticle

Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform

by Rafał Jaworski, Sanja Seljan and Ivan Dunđer

Information 2023, 14(4), 226; https://doi.org/10.3390/info14040226 - 6 Apr 2023

Cited by 2 | Viewed by 2974

Abstract

Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, [...] Read more.

Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is still limited. In this paper, the authors present a framework for collecting, organizing, and storing corpora. The solution was originally designed to obtain data for less-resourced languages, but it proved to work very well for the collection of high-value domain-specific corpora. The scenario is based on the collective work of a group of people who are motivated by the means of gamification. The rules of the game motivate the participants to submit large resources, and a peer-review process ensures quality. More than four million translated segments have been collected so far. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

29 pages, 530 KiB

Open AccessReview

Machine Translation Systems Based on Classical-Statistical-Deep-Learning Approaches

by Sonali Sharma, Manoj Diwakar, Prabhishek Singh, Vijendra Singh, Seifedine Kadry and Jungeun Kim

Electronics 2023, 12(7), 1716; https://doi.org/10.3390/electronics12071716 - 4 Apr 2023

Cited by 21 | Viewed by 9440

Abstract

Over recent years, machine translation has achieved astounding accomplishments. Machine translation has become more evident with the need to understand the information available on the internet in different languages and due to the up-scaled exchange in international trade. The enhanced computing speed due [...] Read more.

Over recent years, machine translation has achieved astounding accomplishments. Machine translation has become more evident with the need to understand the information available on the internet in different languages and due to the up-scaled exchange in international trade. The enhanced computing speed due to advancements in the hardware components and easy accessibility of the monolingual and bilingual data are the significant factors that have added up to boost the success of machine translation. This paper investigates the machine translation models developed so far to the current state-of-the-art providing a solid understanding of different architectures with the comparative evaluation and future directions for the translation task. Because hybrid models, neural machine translation, and statistical machine translation are the types of machine translation that are utilized the most frequently, it is essential to have an understanding of how each one functions. A comprehensive comprehension of the several approaches to machine translation would be made possible as a result of this. In order to understand the advantages and disadvantages of the various approaches, it is necessary to conduct an in-depth comparison of several models on a variety of benchmark datasets. The accuracy of translations from multiple models is compared using metrics such as the BLEU score, TER score, and METEOR score. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 3686 KiB

Open AccessArticle

Features of Regulation Document Translation into a Machine-Readable Format within the Verification of Building Information Models

by Elena Makisha

CivilEng 2023, 4(2), 373-390; https://doi.org/10.3390/civileng4020022 - 3 Apr 2023

Cited by 3 | Viewed by 2615

Abstract

The transition to a design based on information modeling has paved the way for automated verification of project documentation. The most complicated type of design documentation check is the assessment of compliance with the requirements of regulatory documents since its automation requires the [...] Read more.

The transition to a design based on information modeling has paved the way for automated verification of project documentation. The most complicated type of design documentation check is the assessment of compliance with the requirements of regulatory documents since its automation requires the translation of statements written in natural language into a machine-readable format. At the same time, building codes and regulations should be suitable for translation into machine-readable form. However, a large number of provisions presented in regulatory documents cannot be subjected to automated verification due to their specific features. This study aimed to analyze the suitability of the regulatory provisions to be translated into a machine-readable format, identify limiting factors, and establish recommendations to avoid these factors. This study was conducted based on the example of the code of rules for “Residential Apartment Buildings” (SP 54.13330.2016) applied in the Russian Federation. During the research, a previously developed algorithm that generates rules for checking building information models and is based on the RuleML language was applied to the provisions of the standard above to identify statements that can and cannot be translated. As a result, 356 statements were received, which were divided into five groups: requirements suitable for translation into a machine-readable format, requirements containing links to other documents (regulatory and not only), requirements of uncertain interpretation, requirements that cannot be verified based on an information model, and informative requirements. For the first group of statements, there were examples given for both the statements and the rules derived from them. For the other four groups, examples of statements were supplied with factors preventing the translation of requirements into a machine-readable format and solutions on how to avoid these factors. An analysis of the distribution of statements (related to the above-mentioned groups) by sections of the standard showed that a significant part of the requirements is suitable for translation into a machine-readable format. The possible obstacles to translation can be overcome by attracting experts and using programming tools. The paper also makes recommendations on how to arrange new standard structures. Full article

(This article belongs to the Special Issue Feature Papers in CivilEng)

► Show Figures

Figure 1

19 pages, 9423 KiB

Open AccessArticle

Identification and Correction of Grammatical Errors in Ukrainian Texts Based on Machine Learning Technology

by Vasyl Lytvyn, Petro Pukach, Victoria Vysotska, Myroslava Vovk and Nataliia Kholodna

Mathematics 2023, 11(4), 904; https://doi.org/10.3390/math11040904 - 10 Feb 2023

Cited by 12 | Viewed by 3690

Abstract

A machine learning model for correcting errors in Ukrainian texts has been developed. It was established that the neural network has the ability to correct simple sentences written in Ukrainian; however, the development of a full-fledged system requires the use of spell-checking using [...] Read more.

A machine learning model for correcting errors in Ukrainian texts has been developed. It was established that the neural network has the ability to correct simple sentences written in Ukrainian; however, the development of a full-fledged system requires the use of spell-checking using dictionaries and the checking of rules, both simple and those based on the result of parsing dependencies or other features. In order to save computing resources, a pre-trained BERT (Bidirectional Encoder Representations from Transformer) type neural network was used. Such neural networks have half as many parameters as other pre-trained models and show satisfactory results in correcting grammatical and stylistic errors. Among the ready-made neural network models, the pre-trained neural network model mT5 (a multilingual variant of T5 or Text-to-Text Transfer Transformer) showed the best performance according to the BLEU (bilingual evaluation understudy) and METEOR (metric for evaluation of translation with explicit ordering) metrics. Full article

(This article belongs to the Special Issue Mathematical Methods and Analysis for the Industrial Management and Business, 2nd Edition)

► Show Figures

Figure 1

15 pages, 594 KiB

Open AccessArticle

Robust Data Augmentation for Neural Machine Translation through EVALNET

by Yo-Han Park, Yong-Seok Choi, Seung Yun, Sang-Hun Kim and Kong-Joo Lee

Mathematics 2023, 11(1), 123; https://doi.org/10.3390/math11010123 - 27 Dec 2022

Cited by 4 | Viewed by 3720

Abstract

Since building Neural Machine Translation (NMT) systems requires a large parallel corpus, various data augmentation techniques have been adopted, especially for low-resource languages. In order to achieve the best performance through data augmentation, the NMT systems should be able to evaluate the quality [...] Read more.

Since building Neural Machine Translation (NMT) systems requires a large parallel corpus, various data augmentation techniques have been adopted, especially for low-resource languages. In order to achieve the best performance through data augmentation, the NMT systems should be able to evaluate the quality of augmented data. Several studies have addressed data weighting techniques to assess data quality. The basic idea of data weighting adopted in previous studies is the loss value that a system calculates when learning from training data. The weight derived from the loss value of the data, through simple heuristic rules or neural models, can adjust the loss used in the next step of the learning process. In this study, we propose EvalNet, a data evaluation network, to assess parallel data of NMT. EvalNet exploits a loss value, a cross-attention map, and a semantic similarity between parallel data as its features. The cross-attention map is an encoded representation of cross-attention layers of Transformer, which is a base architecture of an NMT system. The semantic similarity is a cosine distance between two semantic embeddings of a source sentence and a target sentence. Owing to the parallelism of data, the combination of the cross-attention map and the semantic similarity proved to be effective features for data quality evaluation, besides the loss value. EvalNet is the first NMT data evaluator network that introduces the cross-attention map and the semantic similarity as its features. Through various experiments, we conclude that EvalNet is simple yet beneficial for robust training of an NMT system and outperforms the previous studies as a data evaluator. Full article

(This article belongs to the Special Issue New Machine Learning and Deep Learning Techniques in Natural Language Processing)

► Show Figures

Figure 1

16 pages, 3696 KiB

Open AccessArticle

Optimizing Antibody Affinity and Developability Using a Framework–CDR Shuffling Approach—Application to an Anti-SARS-CoV-2 Antibody

by Ranjani Gopal, Emmett Fitzpatrick, Niharika Pentakota, Akila Jayaraman, Kannan Tharakaraman and Ishan Capila

Viruses 2022, 14(12), 2694; https://doi.org/10.3390/v14122694 - 30 Nov 2022

Cited by 7 | Viewed by 3634

Abstract

The computational methods used for engineering antibodies for clinical development have undergone a transformation from three-dimensional structure-guided approaches to artificial-intelligence- and machine-learning-based approaches that leverage the large sequence data space of hundreds of millions of antibodies generated by next-generation sequencing (NGS) studies. Building [...] Read more.

The computational methods used for engineering antibodies for clinical development have undergone a transformation from three-dimensional structure-guided approaches to artificial-intelligence- and machine-learning-based approaches that leverage the large sequence data space of hundreds of millions of antibodies generated by next-generation sequencing (NGS) studies. Building on the wealth of available sequence data, we implemented a computational shuffling approach to antibody components, using the complementarity-determining region (CDR) and the framework region (FWR) to optimize an antibody for improved affinity and developability. This approach uses a set of rules to suitably combine the CDRs and FWRs derived from naturally occurring antibody sequences to engineer an antibody with high affinity and specificity. To illustrate this approach, we selected a representative SARS-CoV-2-neutralizing antibody, H4, which was identified and isolated previously based on the predominant germlines that were employed in a human host to target the SARS-CoV-2-human ACE2 receptor interaction. Compared to screening vast CDR libraries for affinity enhancements, our approach identified fewer than 100 antibody framework–CDR combinations, from which we screened and selected an antibody (CB79) that showed a reduced dissociation rate and improved affinity against the SARS-CoV-2 spike protein (7-fold) when compared to H4. The improved affinity also translated into improved neutralization (>75-fold improvement) of SARS-CoV-2. Our rapid and robust approach for optimizing antibodies from parts without the need for tedious structure-guided CDR optimization will have broad utility for biotechnological applications. Full article

(This article belongs to the Special Issue Therapeutic Antibodies against Virus Infection)

► Show Figures

Figure 1

15 pages, 723 KiB

Open AccessArticle

A Sentence Prediction Approach Incorporating Trial Logic Based on Abductive Learning

by Long Ouyang, Ruizhang Huang, Yanping Chen and Yongbin Qin

Appl. Sci. 2022, 12(16), 7982; https://doi.org/10.3390/app12167982 - 9 Aug 2022

Cited by 6 | Viewed by 2866

Abstract

Sentencing prediction is an important direction of artificial intelligence applied to the judicial field. The purpose is to predict the trial sentence for the case based on the description of the case in the adjudication documents. Traditional methods mainly use neural networks exclusively, [...] Read more.

Sentencing prediction is an important direction of artificial intelligence applied to the judicial field. The purpose is to predict the trial sentence for the case based on the description of the case in the adjudication documents. Traditional methods mainly use neural networks exclusively, which are trained on a large amount of data to encode textual information and then directly regress or classify out the sentence. This shows that machine learning methods are effective, but are extremely dependent on the amount of data. We found that there is still external knowledge such as laws and regulations that are not used. Moreover, the prediction of sentences in these methods does not fit well with the trial process. Thus, we propose a sentence prediction method that incorporates trial logic based on abductive learning, called SPITL. The logic of the trial is reflected in two aspects: one is that the process of sentence prediction is more in line with the logic of the trial, and the other is that external knowledge, such as legal texts, is utilized in the process of sentence prediction. Specifically, we establish a legal knowledge base for the characteristics of theft cases, translating relevant laws and legal interpretations into first-order logic. At the same time, we designed the process of sentence prediction according to the trial process by dividing it into key circumstance element identification and sentence calculation. We fused the legal knowledge base as weakly supervised information into a neural network through the combination of logical inference and machine learning. Furthermore, a sentencing calculation method that is more consistent with the sentencing rules is proposed with reference to the Sentencing Guidelines. Under the condition of the same training data, the effect of this model in the experiment of responding to the legal documents of theft cases was improved compared with state-of-the-art models without domain knowledge. The results are not only more accurate as a sentencing aid in the judicial trial process, but also more explanatory. Full article

(This article belongs to the Special Issue Recent Analysis and Applications of Algorithms, Programs and Data Based on Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 5113 KiB

Open AccessArticle

PhaLP: A Database for the Study of Phage Lytic Proteins and Their Evolution

by Bjorn Criel, Steff Taelman, Wim Van Criekinge, Michiel Stock and Yves Briers

Viruses 2021, 13(7), 1240; https://doi.org/10.3390/v13071240 - 26 Jun 2021

Cited by 32 | Viewed by 7615

Abstract

Phage lytic proteins are a clinically advanced class of novel enzyme-based antibiotics, so-called enzybiotics. A growing community of researchers develops phage lytic proteins with the perspective of their use as enzybiotics. A successful translation of enzybiotics to the market requires well-considered selections of [...] Read more.

Phage lytic proteins are a clinically advanced class of novel enzyme-based antibiotics, so-called enzybiotics. A growing community of researchers develops phage lytic proteins with the perspective of their use as enzybiotics. A successful translation of enzybiotics to the market requires well-considered selections of phage lytic proteins in early research stages. Here, we introduce PhaLP, a database of phage lytic proteins, which serves as an open portal to facilitate the development of phage lytic proteins. PhaLP is a comprehensive, easily accessible and automatically updated database (currently 16,095 entries). Capitalizing on the rich content of PhaLP, we have mapped the high diversity of natural phage lytic proteins and conducted analyses at three levels to gain insight in their host-specific evolution. First, we provide an overview of the modular diversity. Secondly, datamining and interpretable machine learning approaches were adopted to reveal host-specific design rules for domain architectures in endolysins. Lastly, the evolution of phage lytic proteins on the protein sequence level was explored, revealing host-specific clusters. In sum, PhaLP can act as a starting point for the broad community of enzybiotic researchers, while the steadily improving evolutionary insights will serve as a natural inspiration for protein engineers. Full article

(This article belongs to the Special Issue Advances in Bacteriophage Biology)

► Show Figures

Figure 1

Search Results (22)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (22)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI