MDPI - Publisher of Open Access Journals

14 pages, 1646 KB

Open AccessArticle

Arabic WikiTableQA: Benchmarking Question Answering over Arabic Tables Using Large Language Models

by Fawaz Alsolami and Asmaa Alrayzah

Electronics 2025, 14(19), 3829; https://doi.org/10.3390/electronics14193829 - 27 Sep 2025

Viewed by 264

Table-based question answering (TableQA) has made significant progress in recent years; however, most advancements have focused on English datasets and SQL-based techniques, leaving Arabic TableQA largely unexplored. This gap is especially critical given the widespread use of structured Arabic content in domains such [...] Read more.

Table-based question answering (TableQA) has made significant progress in recent years; however, most advancements have focused on English datasets and SQL-based techniques, leaving Arabic TableQA largely unexplored. This gap is especially critical given the widespread use of structured Arabic content in domains such as government, education, and media. The main challenge lies in the absence of benchmark datasets and the difficulty that large language models (LLMs) face when reasoning over long, complex tables in Arabic, due to token limitations and morphological complexity. To address this, we introduce Arabic WikiTableQA, the first large-scale dataset for non-SQL Arabic TableQA, constructed from the WikiTableQuestions dataset and enriched with natural questions and gold-standard answers. We developed three methods to evaluate this dataset: a direct input approach, a sub-table selection strategy using SQL-like filtering, and a knowledge-guided framework that filters the table using semantic graphs. Experimental results with an LLM show that the graph-guided approach outperforms the others, achieving 74% accuracy, compared to 64% for sub-table selection and 45% for direct input, demonstrating its effectiveness in handling long and complex Arabic tables. Full article

(This article belongs to the Special Issue Deep Learning Approaches for Natural Language Processing)

► Show Figures

Figure 1

10 pages, 1340 KB

Open AccessArticle

Genomic Analysis of Cardiovascular Diseases Utilizing Space Omics and Medical Atlas

by Ryung Lee, Abir Rayhun, Jang Keun Kim, Cem Meydan, Afshin Beheshti, Kyle Sporn, Rahul Kumar, Jacques Calixte, M. Windy McNerney, Jainam Shah, Ethan Waisberg, Joshua Ong and Christopher Mason

Genes 2025, 16(9), 996; https://doi.org/10.3390/genes16090996 - 25 Aug 2025

Viewed by 803

Abstract

Background: The Space Omics and Medical Atlas (SOMA) is an extensive database containing gene expression information from samples collected during the short-duration Inspiration4 spaceflight mission in 2021. Given our prior understanding of the genetic basis for cardiovascular diseases in spaceflight, including orthostatic intolerance [...] Read more.

Background: The Space Omics and Medical Atlas (SOMA) is an extensive database containing gene expression information from samples collected during the short-duration Inspiration4 spaceflight mission in 2021. Given our prior understanding of the genetic basis for cardiovascular diseases in spaceflight, including orthostatic intolerance and cardiac deconditioning, we aimed to characterize changes in differential gene expression among astronauts using SOMA-derived data and curated cardiovascular pathways. Methods: Using the KEGG 2021 database, we curated a list of genes related to cardiovascular adaptations in spaceflight, focusing on pathways such as fluid shear stress and atherosclerosis, lipid metabolism, arrhythmogenic ventricular hypertrophy, and cardiac muscle contraction. Genes were cross-matched to spaceflight-relevant datasets from the Open Science Data Repository (OSDR). Differential expression analysis was performed using DESeq2 (v1.40.2, R) with normalization by median-of-ratios, paired pre-/post-flight covariates, and log2 fold change shrinkage using apeglm. Differentially expressed genes (DEGs) were defined as |log2FC| ≥ 1 and FDR < 0.05 (Benjamini–Hochberg correction). Module score analyses were conducted across SOMA cell types to confirm conserved cardiac adaptation genes. Results: A total of 185 spaceflight-relevant genes were analyzed. Statistically significant changes were observed in immune-related cardiovascular pathways, particularly within monocytes and T cells. Persistent upregulation of arrhythmogenic genes such as GJA1 was noted at post-flight day 82. WikiPathways enrichment revealed additional pathways, including focal adhesion, insulin signaling, and heart development. Conclusions: Short-duration spaceflight induces significant gene expression changes that are relevant to cardiovascular disease risk. These changes are mediated largely through immune signaling and transcriptional regulation in peripheral blood mononuclear cells. Findings highlight the need for tailored countermeasures and longitudinal monitoring in future long-duration missions. Full article

(This article belongs to the Section Molecular Genetics and Genomics)

► Show Figures

Figure 1

18 pages, 1897 KB

Open AccessArticle

The Role of the Wnt/β-Catenin Pathway in the Modulation of Doxorubicin-Induced Cytotoxicity in Cardiac H9c2 Cells by Sulforaphane and Quercetin

by Viktória Líšková, Barbora Svetláková and Miroslav Barančík

Int. J. Mol. Sci. 2025, 26(16), 7858; https://doi.org/10.3390/ijms26167858 - 14 Aug 2025

Viewed by 664

Abstract

This study investigates the role of sulforaphane (SFN) and quercetin (QCT) in alleviating the oxidative stress and modulation of cellular responses induced by doxorubicin (DOX) in rat cardiomyoblast cells H9c2. The potential mechanisms involving Wnt/β-catenin signaling and antioxidant response were determined. We found [...] Read more.

This study investigates the role of sulforaphane (SFN) and quercetin (QCT) in alleviating the oxidative stress and modulation of cellular responses induced by doxorubicin (DOX) in rat cardiomyoblast cells H9c2. The potential mechanisms involving Wnt/β-catenin signaling and antioxidant response were determined. We found that SFN effectively mitigated DOX-induced cytotoxicity in H9c2 cells. These effects of SFN significantly exceeded the influence of QCT. Levels of superoxide dismutase isoforms 1 (SOD-1) and 2 (SOD-2) were upregulated following SFN and QCT pretreatment in cells exposed to effects of DOX. Additionally, β-catenin levels were increased following both SFN and QCT treatment, even in the presence of doxorubicin. Elevated β-catenin levels for QCT were associated with increased phosphorylation and inactivation of glycogen synthase kinase 3-β. The critical role of Wnt/β-catenin signaling in responses of H9c2 cells to effects of DOX was confirmed using Wnt/β-catenin inhibitor WIKI-4. This inhibitor increased the sensitivity of cells to DOX, and the decreased cellular viability after pretreatment with WIKI-4 was linked to SOD activities’ inhibition. Conclusively, sulforaphane and quercetin exert a protective effect against doxorubicin-induced cytotoxicity in H9c2 cells through the Wnt/β-catenin pathway as well as in association with modulation of enzymes related to the cellular antioxidant response. Full article

(This article belongs to the Special Issue Molecular Research in Cardiovascular Disease, 3rd Edition)

► Show Figures

Figure 1

32 pages, 2182 KB

Open AccessArticle

Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence

by Abdullah, Muhammad Ateeb Ather, Olga Kolesnikova and Grigori Sidorov

Big Data Cogn. Comput. 2025, 9(7), 190; https://doi.org/10.3390/bdcc9070190 - 21 Jul 2025

Viewed by 1067

Abstract

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. [...] Read more.

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired t-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems. Full article

► Show Figures

Figure 1

18 pages, 2200 KB

Open AccessArticle

A Self-Supervised Adversarial Deblurring Face Recognition Network for Edge Devices

by Hanwen Zhang, Myun Kim, Baitong Li and Yanping Lu

J. Imaging 2025, 11(7), 241; https://doi.org/10.3390/jimaging11070241 - 15 Jul 2025

Viewed by 620

Abstract

With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, [...] Read more.

With the advancement of information technology, human activity recognition (HAR) has been widely applied in fields such as intelligent surveillance, health monitoring, and human–computer interaction. As a crucial component of HAR, facial recognition plays a key role, especially in vision-based activity recognition. However, current facial recognition models on the market perform poorly in handling blurry images and dynamic scenarios, limiting their effectiveness in real-world HAR applications. This study aims to construct a fast and accurate facial recognition model based on novel adversarial learning and deblurring theory to enhance its performance in human activity recognition. The model employs a generative adversarial network (GAN) as the core algorithm, optimizing its generation and recognition modules by decomposing the global loss function and incorporating a feature pyramid, thereby solving the balance challenge in GAN training. Additionally, deblurring techniques are introduced to improve the model’s ability to handle blurry and dynamic images. Experimental results show that the proposed model achieves high accuracy and recall rates across multiple facial recognition datasets, with an average recall rate of 87.40% and accuracy rates of 81.06% and 79.77% on the YTF, IMDB-WIKI, and WiderFace datasets, respectively. These findings confirm that the model effectively addresses the challenges of recognizing faces in dynamic and blurry conditions in human activity recognition, demonstrating significant application potential. Full article

(This article belongs to the Special Issue Techniques and Applications in Face Image Analysis)

► Show Figures

Figure 1

22 pages, 2204 KB

Open AccessArticle

Gender Classification Using Face Vectors: A Deep Learning Approach Without Classical Models

by Semiha Makinist and Galip Aydin

Information 2025, 16(7), 531; https://doi.org/10.3390/info16070531 - 24 Jun 2025

Viewed by 2786

Abstract

In recent years, deep learning techniques have become increasingly prominent in face recognition tasks, particularly through the extraction and classification of face vectors. These vectors enable the inference of demographic attributes such as gender, age, and ethnicity. This study introduces a gender classification [...] Read more.

In recent years, deep learning techniques have become increasingly prominent in face recognition tasks, particularly through the extraction and classification of face vectors. These vectors enable the inference of demographic attributes such as gender, age, and ethnicity. This study introduces a gender classification approach based solely on face vectors, avoiding the use of traditional machine learning algorithms. Face embeddings were generated using three popular models: dlib, ArcFace, and FaceNet512. For classification, the Average Neural Face Embeddings (ANFE) technique was applied by calculating distances between vectors. To improve gender recognition performance for Asian individuals, a new dataset was created by scraping facial images and related metadata from AsianWiki. The experimental evaluations revealed that ANFE models based on ArcFace achieved classification accuracies of 93.1% for Asian women and 90.2% for Asian men. In contrast, the models utilizing dlib embeddings performed notably lower, with accuracies dropping to 76.4% for women and 74.3% for men. Among the tested models, FaceNet512 provided the best results, reaching 97.5% accuracy for female subjects and 94.2% for males. Furthermore, this study includes a comparative analysis between ANFE and other commonly used gender classification methods. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

37 pages, 3049 KB

Open AccessArticle

English-Arabic Hybrid Semantic Text Chunking Based on Fine-Tuning BERT

by Mai Alammar, Khalil El Hindi and Hend Al-Khalifa

Computation 2025, 13(6), 151; https://doi.org/10.3390/computation13060151 - 16 Jun 2025

Cited by 1 | Viewed by 1933

Abstract

Semantic text chunking refers to segmenting text into coherently semantic chunks, i.e., into sets of statements that are semantically related. Semantic chunking is an essential pre-processing step in various NLP tasks e.g., document summarization, sentiment analysis and question answering. In this paper, we [...] Read more.

Semantic text chunking refers to segmenting text into coherently semantic chunks, i.e., into sets of statements that are semantically related. Semantic chunking is an essential pre-processing step in various NLP tasks e.g., document summarization, sentiment analysis and question answering. In this paper, we propose a hybrid chunking; two-steps semantic text chunking method that combines the effectiveness of unsupervised semantic text chunking based on the similarities between sentences embeddings and the pre-trained language models (PLMs) especially BERT by fine-tuning the BERT on semantic textual similarity task (STS) to provide a flexible and effective semantic text chunking. We evaluated the proposed method in English and Arabic. To the best of our knowledge, there is an absence of an Arabic dataset created to assess semantic text chunking at this level. Therefore, we created an AraWiki50k to evaluate our proposed text chunking method inspired by an existing English dataset. Our experiments showed that exploiting the fine-tuned pre-trained BERT on STS enhances results over unsupervised semantic chunking by an average of 7.4 in the PK metric and by an average of 11.19 in the WindowDiff metric on four English evaluation datasets, and 0.12 in the PK and 2.29 in the WindowDiff for the Arabic dataset. Full article

(This article belongs to the Section Computational Social Science)

► Show Figures

Figure 1

15 pages, 1677 KB

Open AccessArticle

Screening out microRNAs and Their Molecular Pathways with a Potential Role in the Regulation of Parvovirus B19 Infection Through In Silico Analysis

by Vívian de Almeida Salvado, Arthur Daniel Rocha Alves, Wagner Luis da Costa Nunes Pimentel Coelho, Mayla Abrahim Costa, Alexandro Guterres and Luciane Almeida Amado

Int. J. Mol. Sci. 2025, 26(11), 5038; https://doi.org/10.3390/ijms26115038 - 23 May 2025

Viewed by 655

Abstract

Parvovirus B19 (B19V) infection in healthy individuals is commonly asymptomatic or has non-specific symptoms, such as fever, headache, chills, myalgia, rash, and arthralgia. However, some groups of individuals, such as pregnant women, patients with hemolytic disorders, and immunocompromised individuals, may present severe forms [...] Read more.

Parvovirus B19 (B19V) infection in healthy individuals is commonly asymptomatic or has non-specific symptoms, such as fever, headache, chills, myalgia, rash, and arthralgia. However, some groups of individuals, such as pregnant women, patients with hemolytic disorders, and immunocompromised individuals, may present severe forms of the infection, which may even lead to a negative outcome. To better understand what leads to this divergence of outcomes in different populational groups, this study sought to analyze the role of miRNAs in the pathogenesis of B19V infection. The miRNAs that potentially bind to the B19V transcripts were identified using complete genomic sequences retrieved from Genbank and miRNAs cataloged in miRbase. The results of this alignment between the seed region of the miRNAs with the B19V complete genome identified 1517 miRNAs that showed 100% identity, of which 412 are bound to NS1, VP1, and VP2 transcripts. Based on the number of total binds to the genome, these miRNAs were ranked, and the top five, miR-4799-5p, miR-5690, miR-335-3p, miR-193b-5p, and miR-6771-3p, were selected to evaluate the target genes and signaling pathways in which they act. We identified 214 common genes among the top five miRNAs, and five of these genes bind to at least two of these miRNAs. Based on WikiPathways and KEGG, these 214 genes act on 29 statistically significant pathways, and the three main pathways were selected. Our results revealed some miRNAs that may be involved in regulating B19V replication and that can act as potential biomarkers for the prognosis of infection. Full article

(This article belongs to the Special Issue Regulation by Non-Coding RNAs 2025)

► Show Figures

Figure 1

21 pages, 726 KB

Open AccessArticle

Improving Age Estimation in Occluded Facial Images with Knowledge Distillation and Layer-Wise Feature Reconstruction

by Shuangfei Yu and Qilu Zhao

Appl. Sci. 2025, 15(11), 5806; https://doi.org/10.3390/app15115806 - 22 May 2025

Cited by 1 | Viewed by 975

Abstract

With the widespread application of facial image-based age estimation technologies in fields such as marketing, medical aesthetics, and intelligent surveillance, their importance has become increasingly evident. However, in real-world scenarios, the facial images obtained are often incomplete due to occlusions caused by masks [...] Read more.

With the widespread application of facial image-based age estimation technologies in fields such as marketing, medical aesthetics, and intelligent surveillance, their importance has become increasingly evident. However, in real-world scenarios, the facial images obtained are often incomplete due to occlusions caused by masks or sunglasses, which obscure the eyes, mouth, or nose to varying degrees. Such occlusions lead to the loss of critical facial feature information, thereby reducing the accuracy of age estimation. Although prior research has explored de-occlusion methods for occluded facial images, there remains a lack of studies focusing on the implicit facial feature information present in fixed occlusion patterns. To address this issue, this study proposes a novel method for reconstructing occluded facial features to enhance age estimation accuracy under occlusion conditions. This study introduces a facial feature reconstruction network based on knowledge distillation and feature reconstruction. The primary objective is to leverage complete facial information from a teacher model to guide a student network in fully extracting effective information from the unoccluded regions of occluded images. Additionally, the proposed method reconstructs feature maps of the occluded regions through a meticulous, layer-wise feature reconstruction process. The reconstructed network can then act as a feature encoder to provide more informative features for the age estimation regression module. Experimental results demonstrate that the proposed approach achieves superior performance in age estimation with randomly occluded images on the MORPH-2, AFAD, CACD, and IMDB-WIKI datasets, with mean absolute errors (MAE) of 4.27, 4.83, 5.15, and 5.71, respectively. These results outperform existing occluded facial age estimation methods based on attention mechanisms and generative facial image reconstruction. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Face Recognition Research)

► Show Figures

Figure 1

25 pages, 1834 KB

Open AccessArticle

Modeling Semantic-Aware Prompt-Based Argument Extractor in Documents

by Yipeng Zhou, Jiaxin Fan, Qingchuan Zhang, Lin Zhu and Xingchen Sun

Appl. Sci. 2025, 15(10), 5279; https://doi.org/10.3390/app15105279 - 9 May 2025

Viewed by 670

Abstract

Event extraction aims to identify and structure event information from unstructured text, playing a critical role in real-world applications such as news analysis, public opinion discovery, and intelligence gathering. Traditional approaches, however, struggle with event co-occurrence and long-distance dependencies. To address these challenges, [...] Read more.

Event extraction aims to identify and structure event information from unstructured text, playing a critical role in real-world applications such as news analysis, public opinion discovery, and intelligence gathering. Traditional approaches, however, struggle with event co-occurrence and long-distance dependencies. To address these challenges, we introduce the Semantic-aware Prompt-based Argument Extractor (SPARE) model, which integrates entity extraction, heterogeneous graph construction, event type detection, and argument filling. By constructing a document–sentence–entity heterogeneous graph and employing graph convolutional networks (GCNs), the model effectively captures global semantic associations and interactions between cross-sentence triggers and arguments. Additionally, a position-aware semantic role (SRL) attention mechanism is proposed to enhance the association between semantic and positional information, improving argument extraction accuracy in the context of event co-occurrence. The experimental outcomes on the Richly Annotated Multilingual Schema-guided Event Structure (RAMS) and WikiEvents datasets display considerable F1 score improvements, which confirms the model’s effectiveness. Full article

► Show Figures

Figure 1

12 pages, 8504 KB

Open AccessArticle

Altered Lactylation Myocardial Tissue May Contribute to a More Severe Energy-Deprived State of the Tissue and Left Ventricular Outflow Tract Obstruction in HOCM

by Ruoxuan Li, Jing Wang, Jia Zhao, Jiao Liu, Yuze Qin, Yue Wang, Yiming Yuan, Nan Kang, Lu Yao, Fan Yang, Ke Feng, Lanlan Zhang, Shengjun Ta, Bo Wang and Liwen Liu

Bioengineering 2025, 12(4), 379; https://doi.org/10.3390/bioengineering12040379 - 3 Apr 2025

Viewed by 1200

Abstract

Hypertrophic cardiomyopathy (HCM) is the most common hereditary cardiovascular disease. In general, obstructive hypertrophic cardiomyopathy (HOCM) is more closely related to severe clinical symptoms and adverse clinical outcomes. Therefore, it is necessary to explore the possible causes of HOCM, which may help physicians [...] Read more.

Hypertrophic cardiomyopathy (HCM) is the most common hereditary cardiovascular disease. In general, obstructive hypertrophic cardiomyopathy (HOCM) is more closely related to severe clinical symptoms and adverse clinical outcomes. Therefore, it is necessary to explore the possible causes of HOCM, which may help physicians better understand the disease and effectively control and manage the progression of the disease. In recent years, the discovery of lactylation has provided scholars with a new direction to explore the occurrence of diseases. In cardiovascular diseases, this post-translational modification can exacerbate cardiac dysfunction, and it can also promote the cardiac repair process after myocardial infarction. In this study, we used the myocardial tissue of mice carrying the Myh7 V878A gene mutation site for protein lactylation detection. Through a further analysis of the enriched pathways using KEGG enrichment, GO enrichment, and Wiki Pathways enrichment, we found that the enriched pathways with lactylation modifications in the HOCM mice mainly included the fatty acid oxidation pathway, the tricarboxylic acid cycle pathway, the adrenergic signaling pathway in cardiomyocytes, and the cardiomyocyte hypertrophy pathway. Among the above pathways, significant changes in lactylation occurred in proteins including Acads, Acaa2, Mdh2, Myl2, and Myl3. We used the COIP experiment to verify the omics results and the ELISA assay to verify the function of the enzymes. We found that a decrease in lactylation modifications also led to a decrease in enzyme function. The abnormalities of these proteins not only lead to abnormalities in energy metabolism in the myocardial tissue of HOCM but also may affect myocardial contractility, resulting in the impaired contractile function of HOCM. The results of this study lay a preliminary theoretical foundation for further exploring the pathogenesis of HOCM. Full article

(This article belongs to the Section Cellular and Molecular Bioengineering)

► Show Figures

Figure 1

15 pages, 3027 KB

Open AccessArticle

TQAgent: Enhancing Table-Based Question Answering with Knowledge Graphs and Tree-Structured Reasoning

by Jianbin Zhao, Pengfei Zhang, Yuzhen Wang, Rui Xin, Xiuyuan Lu, Ripeng Li, Shuai Lyu, Zhonghong Ou and Meina Song

Appl. Sci. 2025, 15(7), 3788; https://doi.org/10.3390/app15073788 - 30 Mar 2025

Cited by 1 | Viewed by 2189

Abstract

Table-based question answering (TableQA) has emerged as an important task in natural language processing, yet existing models face challenges in handling complex reasoning and mitigating hallucinations, especially when dealing with diverse table structures. We introduce TQAgent, a framework designed to enhance table-based reasoning [...] Read more.

Table-based question answering (TableQA) has emerged as an important task in natural language processing, yet existing models face challenges in handling complex reasoning and mitigating hallucinations, especially when dealing with diverse table structures. We introduce TQAgent, a framework designed to enhance table-based reasoning by incorporating knowledge graphs and tree-structured reasoning paths. TQAgent reduces hallucinations and improves model reliability by grounding reasoning in external knowledge and dynamically sampling high-confidence paths. Additionally, it employs knowledge distillation techniques for lightweight deployment. Experimental results on the TabFact, WikiTQ, and FeTaQA datasets show significant performance improvements, with accuracy increases of up to 4% over baseline models. TQAgent’s dynamic operation planning and knowledge graph integration enable effective multi-step reasoning and better handling of diverse table data. Furthermore, the framework achieves state-of-the-art results, surpassing traditional large-scale models in both reasoning accuracy and computational efficiency. These findings open new avenues for future research in table-based question answering and model deployment optimization. Full article

► Show Figures

Figure 1

17 pages, 10698 KB

Open AccessArticle

Unveiling FAM111B: A Pan-Cancer Biomarker for DNA Repair and Immune Infiltration

by Fang Wei, Wanying Li, Ting Zhou, Xianglin Yuan and Lihong Zhang

Int. J. Mol. Sci. 2025, 26(7), 3151; https://doi.org/10.3390/ijms26073151 - 28 Mar 2025

Cited by 3 | Viewed by 1116

Abstract

Recent evidence indicates that FAM111B is significantly involved in the progression of various cancers. Nonetheless, the potential pan-cancer implications of FAM111B have not been systematically investigated. In this study, FAM111B’s expression and oncogenic potential were studied using TCGA and GTEx data via GEPIA2, [...] Read more.

Recent evidence indicates that FAM111B is significantly involved in the progression of various cancers. Nonetheless, the potential pan-cancer implications of FAM111B have not been systematically investigated. In this study, FAM111B’s expression and oncogenic potential were studied using TCGA and GTEx data via GEPIA2, TIMER2.0, and STRING tools. Pathway enrichment analyses with the GO, KEGG, Reactome, and WikiPathways databases were conducted to explore its role in cancer development. The results were validated via multiplex immunofluorescence assays of pancreatic cancer tissues, microarray assays of ovarian cancer tissues, and protein transcriptomics of ovarian cancer cells. The expression levels of FAM111B were elevated in most cancer types and were associated with poor prognostic outcomes. Mechanistically, FAM111B expression was positively correlated with the expression of genes involved in DNA homologous recombination repair and with the infiltration of Th2 CD4+ T cells. These observations were further substantiated in ovarian cancer cell lines and tissue specimens from pancreatic and ovarian cancers. FAM111B functions as a biomarker for the DNA repair pathway and Th2 CD4+ T-cell infiltration in human malignancies. Full article

(This article belongs to the Section Molecular Immunology)

► Show Figures

Figure 1

20 pages, 682 KB

Open AccessArticle

Sentence Interaction and Bag Feature Enhancement for Distant Supervised Relation Extraction

by Wei Song and Qingchun Liu

AI 2025, 6(3), 51; https://doi.org/10.3390/ai6030051 - 4 Mar 2025

Viewed by 1135

Abstract

Background: Distant supervision employs external knowledge bases to automatically match with text, allowing for the automatic annotation of sentences. Although this method effectively tackles the challenge of manual labeling, it inevitably introduces noisy labels. Traditional approaches typically employ sentence-level attention mechanisms, assigning lower [...] Read more.

Background: Distant supervision employs external knowledge bases to automatically match with text, allowing for the automatic annotation of sentences. Although this method effectively tackles the challenge of manual labeling, it inevitably introduces noisy labels. Traditional approaches typically employ sentence-level attention mechanisms, assigning lower weights to noisy sentences to mitigate their impact. But this approach overlooks the critical importance of information flow between sentences. Additionally, previous approaches treated an entire bag as a single classification unit, giving equal importance to all features within the bag. However, they failed to recognize that different dimensions of features have varying levels of significance. Method: To overcome these challenges, this study introduces a novel network that incorporates sentence interaction and a bag-level feature enhancement (ESI-EBF) mechanism. We concatenate sentences within a bag into a continuous context, allowing information to flow freely between them during encoding. At the bag level, we partition the features into multiple groups based on dimensions, assigning an importance coefficient to each sub-feature within a group. This enhances critical features while diminishing the influence of less important ones. In the end, the enhanced features are utilized to construct high-quality bag representations, facilitating more accurate classification by the classification module. Result: The experimental findings from the New York Times (NYT) and Wiki-20m datasets confirm the efficacy of our suggested encoding approach and feature improvement module. Our method also outperforms state-of-the-art techniques on these datasets, achieving superior relation extraction accuracy. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

17 pages, 766 KB

Open AccessArticle

Semi-Supervised Relation Extraction Corpus Construction and Models Creation for Under-Resourced Languages: A Use Case for Slovene

by Timotej Knez, Miha Štravs and Slavko Žitnik

Information 2025, 16(2), 143; https://doi.org/10.3390/info16020143 - 15 Feb 2025

Cited by 1 | Viewed by 769

Abstract

The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of [...] Read more.

The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of these methods for under-resourced languages and domains is limited due to the lack of training data. In this work, we present a pipeline using distant supervision for constructing a relation extraction corpus in an arbitrary language. The corpus construction combines Wikipedia documents in the target language with relations in the WikiData knowledge graph. We demonstrate the process by constructing a new corpus for relation extraction in the Slovene language. Our corpus captures 20 unique relation types. The final corpus contains 811,032 relations annotated in 244,437 sentences. We use the corpus to train models using three architectures and evaluate them on the task of Slovene relation extraction. We achieve comparable performance to approaches on English data. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)

► Show Figures

Figure 1

Search Results (122)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (122)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI