Saved Queries

The green sea turtle (Chelonia mydas), a widely distributed species, plays a crucial role in maintaining the marine ecosystem. However, studies on C. mydas require accurate and comprehensive genome annotation information. Long-read direct transcriptome data of C. mydas were obtained using direct RNA sequencing on the Oxford Nanopore Technologies (ONT) platform from blood tissue of a single captive individual. A total of 4061 novel transcripts were obtained by comparing long-read direct transcripts with genome annotation of C. mydas. We also predicted 2402 CDSs on the novel transcripts. Among them, 1208 (50.29%) had functional annotation information in the databases. In addition, we predicted and analyzed AS events, fusion transcripts, methylation sites, poly(A)s, and lncRNAs in the C. mydas long-read direct transcriptome. Overall, our study provides the a long-read direct blood transcriptome for C. mydas to complement and improve its genome annotation. This valuable resource will contribute to future research on C. mydas. Additionally, the analyses of transcriptome structure mentioned above may provide new insights and ideas for the study of C. mydas. Full article

(This article belongs to the Special Issue Evolutionary Biology of Aquatic Animals)

15 pages, 449 KB

Open AccessArticle

Insights into Copy Number Variation Architecture in Black Bengal Goat Genome

by Sonali Sonejita Nayak, Shikha Mittal and Manjit Panigrahi

Int. J. Mol. Sci. 2026, 27(9), 4045; https://doi.org/10.3390/ijms27094045 - 30 Apr 2026

Abstract

Copy number variations (CNVs) are a major source of structural genomic diversity that influence adaptation, reproduction, and production traits in livestock. The Black Bengal goat, an economically important Indian breed known for its high fecundity, superior skin quality, and resilience to humid tropical climates, was studied to uncover its structural genomic landscape. We performed whole-genome CNV analysis using high-depth (10×) sequencing data from eight individuals. A total of 31,816 copy number variants (CNVs) were identified, predominantly duplications, with an average length of approximately 45 kb. These CNVs were combined into 8910 copy number variation regions (CNVRs) covering approximately 0.15 Gb (about 5.3% of the autosomal genome). CNVR hotspots were mainly located on chromosomes 1. Gene annotation showed that regions overlapping with CNVs and CNVRs contained over 1987 protein-coding genes that are involved in pathways related to immunity, reproduction, metabolism, and extracellular matrix (ECM) organization. The presence of copy number variations involving genes such as GDF9 and BMPR1B on chromosome 7 & 6, respectively, are important because it indicates that the breed has a high reproductive capacity due to dosage-sensitive duplications. Changes in the extracellular matrix and increased dermal strength have been linked to duplications of genes such as COL6A1, LAMC2, LAMB3, FMN1, and CLDN1. This helps explain the superior hide quality of the breed. This research offers a comprehensive map of CNVs and CNVRs within the genome of the Black Bengal goat. It demonstrates how these duplications lead to structural changes that enhance both reproductive performance and skin resilience. These findings provide a valuable genomic resource for future marker-assisted selection, comparative genomics, and conservation breeding programs aimed at preserving indigenous goat populations. Full article

(This article belongs to the Special Issue 25th Anniversary of IJMS: Updates and Advances in Molecular Genetics and Genomics)

8 pages, 528 KB

Open AccessData Descriptor

Whole-Genome Sequencing Dataset from Two High-Risk Breast Cancer Families Negative for BRCA1/2 and Other Known Susceptibility Genes

by Silvia González-Martínez, Alejandra Rezqallah Arón, José Manuel Pérez-García, José Palacios, Belén Pérez-Mies, Javier Román, Laia Garrigos, Judith Balmaña, Daniela Camacho, Sandra Íñiguez-Muñoz, Diego M. Marzese and Javier Cortés

Data 2026, 11(5), 99; https://doi.org/10.3390/data11050099 - 30 Apr 2026

Abstract

Hereditary breast cancer (BC) remains unexplained in a substantial proportion of families who test negative for BRCA1/2 and other known susceptibility genes. To contribute to the genomic characterization of these unresolved cases, we generated a whole-genome sequencing (WGS) dataset from six women belonging to two unrelated high-risk families, each comprising three sisters diagnosed with BC. All participants had previously received negative results in conventional multigene panel testing. WGS was performed on peripheral blood DNA using the Illumina NovaSeq platform, followed by variant calling against GRCh38 and the comprehensive annotation of single-nucleotide variants, indels, and structural variants. For each family, we identified shared ClinVar-annotated variants, rare exonic or splice-site alterations, and intronic variants located within a curated set of 286 cancer-related genes. The dataset includes per-patient VCF files, copy number variation annotations, and family-level variant summaries. Raw and processed data are publicly available through the Sequence Read Archive and Zenodo. This resource supports variant reinterpretation, exploration of regulatory and intronic regions, and methodological benchmarking in the study of familial BC beyond established susceptibility genes. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

►▼ Show Figures

Figure 1

10 pages, 2106 KB

Open AccessArticle

Identification and Characterization of a Novel Bovine Adenovirus Which Represents a Distinct Evolutionary Branch

by Jinyu Sui, Suchun Wang, Zihao Pan and Kaicheng Wang

Viruses 2026, 18(5), 522; https://doi.org/10.3390/v18050522 - 30 Apr 2026

Abstract

Bovine adenovirus (BAdV) is associated with respiratory and enteric diseases in cattle. In this study, the complete genomic sequence of a novel BAdV strain (named BAdV/LN/CHN/2023) was sequenced and annotated using the next-generation sequencing (NGS) technology. The viral genome comprises 32,391 base pairs with a GC content of 44.93%, encoding 33 predicted open reading frames (ORFs), consistent with the genomic organization of mastadenoviruses. Comparative genomic analysis confirmed that BAdV/LN/CHN/2023 contains conserved structural and functional motifs characteristic of the genus Mastadenovirus. Phylogenetic analysis revealed that BAdV/LN/CHN/2023 shares low similarity with all currently recognized bovine mastadenoviruses classified by the International Committee on Taxonomy of Viruses (ICTV). In addition, an open reading frame (ORF) encoding the 146R protein was annotated in this strain; this feature has not been identified in any previously recognized bovine mastadenoviruses. This study presents the first full-length genomic sequence of a putative BAdV-11 strain, and based on ICTV criteria, we propose that this strain represents a novel mastadenovirus species, supported by phylogenetic distance and genomic divergence. Our findings expand the known genetic diversity of BAdVs and contribute to a better understanding of their evolutionary relationships. Full article

(This article belongs to the Section Animal Viruses)

►▼ Show Figures

Figure 1

24 pages, 2196 KB

Open AccessArticle

Regulatory Variation at TERT and TERC Shows Limited Association with Early-Onset Alzheimer’s Disease in Carriers of the Mexican Founder Mutation PSEN1 A431E

by Celeste Patricia Gazcón-Rivas, Iliannis Yisel Roa-Bruzón, Luis Félix Duany-Almira, Cesar Aly Valdéz-Gaxiola, Sofia Dumois-Petersen, Luis Eduardo Figuera-Villanueva, Antonio Quintero-Ramos, Carmen Magdalena Gurrola-Díaz, Daniel Ortuño-Sahagun, Yeminia Valle and Oscar Arias-Carrión

Med. Sci. 2026, 14(2), 228; https://doi.org/10.3390/medsci14020228 - 30 Apr 2026

Abstract

Background: Early-onset Alzheimer’s disease (EOAD) caused by autosomal dominant mutations provides a deterministic framework for investigating genetic modifiers of neurodegeneration. Telomere biology has emerged as a central regulator of genomic stability, cellular ageing, and stress response integration, yet its role in EOAD, particularly in under-represented populations, remains poorly defined. Methods: We conducted a cross-sectional case–control study to evaluate the genetic distribution, disease association, and predicted regulatory consequences of common variants in the telomere maintenance genes TERT and TERC in individuals from Western Mexico. The EOAD group comprised genetically confirmed carriers of the PSEN1 p.Ala431Glu (A431E) founder mutation with clinical EOAD (n = 69), and controls were unrelated individuals without dementia (n = 179). Five common variants were analyzed: rs2242652, rs2853677, rs2736100, and rs10069690 (TERT), and rs12696304 (TERC). Results: Genotype distributions in controls conformed to the Hardy–Weinberg equilibrium. Single-variant analyses showed no significant allele-level associations. Most TERT variants did not show significant allele-level associations with EOAD. However, a preliminary genotype-level enrichment for the GC allele at rs12696304 (TERC) was observed among EOAD cases compared with controls; allele-level associations were not significant. Linkage disequilibrium analysis revealed low r² values (<0.20), supporting variant independence. Population-level allele frequency comparisons revealed ancestry-dependent divergence across loci; in silico functional annotation localised all variants to non-coding regulatory regions. GTEx-based analyses indicated that rs12696304 acts as an eQTL for ACTRT3 in whole blood and pituitary, as well as for LRRC34 in the cerebellar hemisphere, suggesting a potential regulatory network within the TERC locus (3q26.2). Conclusions: Overall, common regulatory variants in TERT did not show strong independent effects on EOAD susceptibility in PSEN1 A431E carriers. However, the convergence of association patterns, functional annotation, and regulatory evidence provides hypothesis-generating support for the TERC locus (3q26.2), particularly rs12696304, as a candidate region for further investigation. Additional studies integrating telomere dynamics, functional validation, and multi-omics analyses are needed to clarify the role of telomere biology in the pathogenesis of autosomal dominant EOAD. Full article

(This article belongs to the Section Neurosciences)

►▼ Show Figures

Figure 1

18 pages, 6709 KB

Open AccessArticle

Genomic Diversity of Vaginal Lactobacillus crispatus Prophages from South African Women

by Adijat Ozohu Jimoh, Anika Chicken, Brandon Maust, Colin Feng, Seth Rakoff-Nahoum, Jo-Ann S. Passmore, Brian R. Kullin, Simona Kraberger, Fatima Aysha Hussain, Heather B. Jaspan, Arvind Varsani and Anna-Ursula Happel

Viruses 2026, 18(5), 519; https://doi.org/10.3390/v18050519 - 30 Apr 2026

Abstract

Lactobacillus crispatus is widely associated with optimal sexual and reproductive health outcomes. While L. crispatus genomes commonly harbor prophages, little is known about their genomic diversity and potential inducibility by clinically relevant compounds. We induced and characterized four bacteriophages from four L. crispatus strains isolated from vaginal secretions of South African adolescents. Sequenced viral DNA from induced phages was assembled, and their respective genomes were annotated and compared to bacteriophage reference genomes. All the phage genomes range in size from 42.9 to 48.3 kbp. Of the four phages, UC101 and UC164 shared <90% pairwise intergenomic similarity to reference phages, suggesting that they represent new species. To explore factors potentially associated with prophage activation, L. crispatus strains were exposed to physiological concentrations of copper ions and tenofovir, selected based on their common use by women in Africa and reported associations with altered vaginal bacterial community composition. The presence of phage-like particles following exposure to copper ions (2.0 × 10⁻⁶ M–3.0 × 10⁻⁶ M) and tenofovir (500 ng/mL) was observed by transmission electron microscopy, suggesting possible prophage activation under these conditions. This study provides new insights into the genomic diversity of inducible L. crispatus phages and presents hypothesis-generating evidence regarding their potential inducibility using copper ions and tenofovir. Full article

(This article belongs to the Special Issue Viruses in the Reproductive Tract)

►▼ Show Figures

Figure 1

19 pages, 3739 KB

Open AccessArticle

Bacillus velezensis M4 from Northeast Chinese Soybean Paste Combines Nattokinase and Antibacterial Activities

by Yin Feng, Yuexin Gao, Linxi Wang, Bo Nan, Jingsheng Liu and Yuhua Wang

Foods 2026, 15(9), 1553; https://doi.org/10.3390/foods15091553 - 30 Apr 2026

Abstract

A bacterial strain M4 exhibiting high nattokinase (NK) activity and favorable antibacterial properties was isolated from fermented soybean paste in Northeast China. Based on morphological observation, physiological and biochemical characterization, 16S rDNA sequence analysis, and whole-genome sequencing, the strain was identified as Bacillus velezensis. Its probiotic potential and safety were systematically evaluated using a combination of in vitro assays and genome mining. Genomic analysis revealed that M4 possessed a complete genome consisting of a single circular chromosome of 4,473,838 bp with a GC content of 46.94%, encoding 4516 predicted proteins. Functional domain annotation identified four proteins (XLQ58132.1, XLQ58158.1, XLQ59409.1, and XLQ59873.1) containing both the Peptidase inhibitor I9 and Peptidase S8 domains, confirming the presence of the typical molecular signature of NK. Furthermore, the genome harbored 132 genes encoding carbohydrate-active enzymes, 37 biosynthetic gene clusters, and 142 genes encoding proteolytic enzymes. Comparative genomic analysis revealed a close phylogenetic relationship with other B. velezensis strains and identified 98 strain-specific genes. Safety assessment demonstrated that M4 exhibited no hemolytic activity, was susceptible to eight commonly tested antibiotics, and lacked genes encoding high-risk virulence factors. Probiotic characterization indicated that M4 exhibited certain levels of gastrointestinal tolerance, acid resistance, bile salt resistance, antioxidant activity, and antibacterial properties. In conclusion, B. velezensis M4 shows potential for development as a functional strain. Full article

(This article belongs to the Section Food Microbiology)

►▼ Show Figures

Figure 1

17 pages, 9499 KB

Open AccessArticle

Genome-Based Analysis of Chromosomal Colistin Non-Susceptibility in Stenotrophomonas pavanii Isolated from the Phycosphere of Pectinodesmus pectinatus

by Heejin Ahn, Hyunwoo Zin, Muhammad Akmal and Tae-Jin Choi

Antibiotics 2026, 15(5), 451; https://doi.org/10.3390/antibiotics15050451 - 30 Apr 2026

Abstract

Background/Objectives: Freshwater microalgae–bacteria consortia are increasingly utilized in wastewater treatment and biomass production. However, bacteria associated with the algal phycosphere may act as environmental reservoirs of multidrug-resistant (MDR) phenotypes and antibiotic resistance genes (ARGs), including resistance to last-resort antibiotics such as colistin. Methods: An axenic culture of the freshwater microalga Pectinodesmus pectinatus was established using a NaClO-based cleaning protocol. Three phycosphere-associated bacterial strains (Chryseobacterium sp., Pseudomonas monteilii, and Stenotrophomonas pavanii) were isolated and identified by 16S rRNA gene analysis. Antimicrobial susceptibility testing was performed using broth microdilution against 16 antibiotics. Whole-genome sequencing of the most resistant isolate, S. pavanii, was conducted using Oxford Nanopore technology, followed by genome annotation and in silico resistome analysis using CARD, AMRFinderPlus, and ResFinder. Results: Among the three isolates, S. pavanii exhibited the broadest resistance profile, including high minimum inhibitory concentrations (MICs) to multiple β-lactams and colistin (MIC ≥ 16 μg/mL). No plasmid-borne mcr genes were detected. Instead, the genome encoded multiple chromosomal determinants potentially associated with polymyxin non-susceptibility, including lipid A and lipopolysaccharide modification pathways (e.g., arn genes and eptA), outer-membrane maintenance and LPS transport systems, multidrug efflux pumps, and regulatory elements. Integration of genomic and phenotypic data suggested that the observed colistin non-susceptibility may be associated with intrinsic chromosomal determinants inferred from whole-genome analysis. Conclusions: This study demonstrates that the P. pectinatus phycosphere can harbor multidrug-resistant (MDR) bacteria, including strains exhibiting colistin non-susceptibility potentially associated with a repertoire of intrinsic chromosomal resistance mechanisms inferred from genomic analysis. Therefore, freshwater microalgae-based systems should be considered potential environmental reservoirs contributing to the dissemination of antimicrobial resistance. Full article

(This article belongs to the Section Genetic and Biochemical Studies of Antibiotic Activity and Resistance)

►▼ Show Figures

Figure 1

21 pages, 3794 KB

Open AccessArticle

Type 1 Diabetes and Multiple Sclerosis Share General Autoimmunity Genetic Variation

by Maristella Steri, Alessandro Testori, Valeria Orrù and Magdalena Zoledziewska

Genes 2026, 17(5), 531; https://doi.org/10.3390/genes17050531 - 30 Apr 2026

Abstract

Background/Objectives: Type 1 diabetes (T1D) and multiple sclerosis (MS) are autoimmune, multifactorial, organ-specific disorders mediated by immune cells. Their co-occurrence has been partially attributed to shared genetics and environmental factors. We aimed to dissect the shared genetic architecture between T1D and MS using large-scale genome-wide association studies (GWASs) and colocalization analyses. Methods: We applied a Bayesian colocalization framework to two large-scale GWAS data sets: a T1D study comprising 18,942 cases and 501,638 controls, and an MS GWAS including 14,802 cases and 26,703 controls. Results: We identified 26 shared colocalizing association signals between T1D and MS. Among them, seven loci (EOMES, RGS14, DLL1, ZNF438/ZEB1, SESN3, WARS1/SLC25A47, and IRF8) were novel for T1D and two (UBAC2 and LAT) for MS. Several signals showed supportive evidence in additional datasets and demonstrated functional annotation characteristics consistent with disease involvement. Conclusions: Colocalization can be a powerful discovery tool for disorders with co-divided genetic architecture, as prioritizing shared rather than individual causal variants may enhance the detection of novel loci. Our findings indicate that T1D and MS predominantly share general autoimmune susceptibility signals (17/26), rather than disease-specific (private), often with opposite direction of effect (9/26), underscoring their immunological heterogeneity. Full article

(This article belongs to the Special Issue Genetic Aspects of Autoimmune Diseases)

►▼ Show Figures

Figure 1

19 pages, 2388 KB

Open AccessArticle

Machine Learning-Based Genome-Wide Association Study Reveals Genetic Loci Associated with Body Measurement Traits in Yili Horses

by Zhehong Shen, Liping Yang, Yuheng Xue, Xiaokang Chang, Jingxuan Shen, Weijun Sun, Yaqi Zeng, Jun Meng and Xinkui Yao

Animals 2026, 16(9), 1373; https://doi.org/10.3390/ani16091373 - 29 Apr 2026

Abstract

Body measurement traits are key indicators for evaluating growth performance, production potential, and breeding value in Yili horses. However, studies investigating the association between body measurement traits and mutation loci in Yili horses remain limited. In this study, 255 adult Yili mares were used as the study population, including 152 speed-type and 103 meat-type individuals. Whole-genome resequencing was performed, and four phenotypic traits and body weight were measured. A mixed linear model (MLM)-based genome-wide association study (GWAS) was conducted using GEMMA (v 0.98.5), incorporating age, farm effects, and top three principal components as covariates. In parallel, a machine learning-based GWAS (ML-GWAS) framework integrating Lasso regression for feature selection and Random Forest (RF) with five-fold cross-validation was applied to improve the detection of complex genetic signals. Using both conventional GWAS methods and machine learning-based GWAS approaches, a total of 238 mutation loci significantly associated with body measurement traits were identified, and 277 candidate genes were annotated. These genes may play a role in several biological processes, including skeletal development, muscle formation, cell growth, energy metabolism, and protein synthesis. The findings suggest that genetic variations have already manifested among the studied groups. The results indicate that genetic differences have already emerged among different Yili horse populations at the genomic level. Furthermore, this study demonstrates that integrating machine learning with conventional GWAS effectively improves the detection efficiency of loci associated with complex traits, while also providing new molecular evidence for understanding the genetic mechanisms underlying differences in body measurement traits among Yili horse groups. Full article

(This article belongs to the Special Issue Advances in Genetic Variability and Selection of Equines)

18 pages, 1512 KB

Open AccessArticle

STEA: Histologically Validated and Reference-Independent Major Cell-Type Annotation for Spatial Transcriptomics Reveals Relevant Cellular Organization and Architecture of Tumor Microenvironment

by Qian Li, Qingyang Zhang, Fanhong Zeng, Irene Oi-Lin Ng and Daniel Wai-Hung Ho

Cancers 2026, 18(9), 1425; https://doi.org/10.3390/cancers18091425 - 29 Apr 2026

Abstract

Background: Recent advances in spatial transcriptomic technologies enable in situ gene expression profiling while preserving spatial context. This capability is particularly important for studying the tumor microenvironment (TME), where diverse and admixed cell populations interact within highly organized spatial niches that influence tumor progression and therapeutic response. However, the limited resolution of early spatial transcriptomic platforms results in each spatial spot capturing transcripts from multiple cell types, making accurate spot deconvolution or annotation a critical yet challenging step in downstream data analysis. The level of complexity will be particularly prominent in heterogeneous samples like the tumor microenvironments where multiple cell types are highly admixed and reliable single-cell reference atlases may usually be unavailable. Methods: In this paper, we developed our method called STEA, which is a novel and accurate reference-independent enrichment-based annotation algorithm for major cell type. Unlike the existing approaches, STEA does not require single-cell RNA sequencing datasets as reference, offering both flexibility and computational efficiency in execution. Results: We performed comprehensive benchmarking using a variety of simulated datasets across different platforms and scenarios and demonstrated the superior accuracy of STEA. Apart from synthetic data, we also evaluated multiple real datasets to further exemplify its practical applicability on both oncology-related and oncology-unrelated data. More importantly, we could confidently demonstrate the high concordance between prediction of STEA and histological classification by experienced pathologist. Conclusion: Our STEA algorithm provides a practical reference-independent framework to complement the cutting-edge spatial transcriptomics in genomics studies, facilitating accurate downstream high-dimensional spatial characterization of cellular and molecular landscapes, reconstruction of tissue architecture as well as cell–cell communication in malignant and non-malignant scenarios. Taken together, our comprehensive evaluation demonstrates the robustness and reliability of STEA, highlighting its potential as a valuable tool for studying complex tissue organization, particularly within heterogeneous TME. Full article

(This article belongs to the Special Issue From Molecular Genomics and the Tumor Microenvironment to Precision Diagnosis and Therapy in Liver Cancer)

19 pages, 1738 KB

Open AccessArticle

Whole-Genome Sequencing in Premature Coronary Artery Disease in South Asians: A Pilot Case–Control Study

by Iftikhar Ali Ch, Azhar Chaudhry, Fazal Jalil, Yasir Ali, Waseem Iqbal, Yusra Javed, Salman Khalid, Azeen Razzaq, Muhammad Azhar, Amna Nadeem, Tayyab Afzal, Naeem Tahirkheli, Ankur Kalra and Khurram Nasir

Cardiogenetics 2026, 16(2), 9; https://doi.org/10.3390/cardiogenetics16020009 - 29 Apr 2026

Abstract

Background/Objectives: Coronary artery disease (CAD) remains the leading cause of mortality worldwide, with South Asia bearing a disproportionately high and rising burden, particularly at younger ages. The present study aimed to investigate genetic variants associated with premature coronary artery disease (PCAD) using whole-genome sequencing (WGS). Methods: WGS was conducted on 12 people (five PCAD cases, seven matched controls) to assess feasibility and methodology for future large-scale research. High-quality genomic DNA was sequenced at a minimum read depth of 10× with a quality threshold of Q30. Variant calling with stringent quality control identified single-nucleotide polymorphisms (SNPs), followed by annotation against gnomAD for allele frequencies and ClinVar for pathogenicity. Protein-coding variants were filtered, and candidate genes were prioritized for comparative analysis between cases and controls. Results: An average of over 8.8 million SNPs per individual was identified, with comparable overall variant distributions between cases and controls. Initial analyses revealed 120 SNPs exclusively present in PCAD cases. All protein-coding variants were rare (allele frequency < 0.0001), and none were previously classified as pathogenic in ClinVar. After filtration, 87 candidate genes were prioritized. Enriched or unique variants in PCAD cases are mapped to genes involved in lipid metabolism, endothelial dysfunction, inflammatory signaling, immune regulation, thrombosis, vascular remodeling, and metabolic processes. Additional variants were identified in genes related to smooth muscle proliferation, oxidative stress, and other biological pathways. Conclusions: This WGS pilot study provides an initial overview of the genomic landscape of PCAD in a South Asian cohort, highlighting rare variants across multiple biological pathways implicated in atherosclerosis that need validation in a large-scale study. Full article

(This article belongs to the Topic Biomarkers in Cardiovascular Disease—Chances and Risks, 2nd Volume)

►▼ Show Figures

Graphical abstract

18 pages, 2423 KB

Open AccessArticle

UK Biobank-Based Genetic and Proteomic Network Insights into Metabolic Dysfunction-Associated Steatotic Liver Disease Pathogenesis

by Sang Wook Kang, Su Kang Kim, Ju Yeon Ban and Min Su Park

Int. J. Mol. Sci. 2026, 27(9), 3920; https://doi.org/10.3390/ijms27093920 - 28 Apr 2026

Viewed by 69

Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) is increasingly recognized as a systemic disorder shaped by genetic variants and network-level interactions beyond obesity and insulin resistance. This study aimed to define the genetic and proteomic architecture of MASLD by integrating GWAS and plasma proteomic profiling from the UK Biobank. Genome-wide association analyses were conducted under additive and dominant models, with functional annotations performed using SIFT, PolyPhen-2, PROVEAN, REVEL, CADD, MutationTaster, and conservation metrics (GERP++, phyloP, phastCons, and B-statistic). Differential protein expression was assessed using the Olink^® platform, and STRING was applied for protein–protein interaction analysis. MASLD patients showed male predominance and significant differences in hepatic (AST, ALT, GGT, PDFF), metabolic (glucose, triglycerides, TyG index), and inflammatory markers (CRP, neutrophils, NLR, CAR). GWAS confirmed PNPLA3 (rs738409, I148M) and TM6SF2 (rs58542926, E167K) as major risk variants, while SAMM50 and NCAN showed weaker but conserved associations. Proteomics revealed downregulation of IGFBP2, IGFBP1, PON3, CKB, and APOF and upregulation of CPM, IGSF9, GUSB, ACY1, AFM, LEP, and GSTA1/3. PPI analysis identified ADIPOQ, LEP, FGF21, and ADH1B as central hubs in metabolic and inflammatory regulation. MASLD should be regarded as a network disease involving lipid metabolism, insulin/IGF signaling, mitochondrial function, and ECM–inflammatory pathways. These findings highlight PNPLA3 and TM6SF2 as major genetic drivers, while SAMM50, NCAN, and peripheral proteins contribute regulatory roles, suggesting novel biomarkers and therapeutic targets. Full article

(This article belongs to the Special Issue 25th Anniversary of IJMS: Advances in Molecular Endocrinology and Metabolism)

►▼ Show Figures

Figure 1

20 pages, 925 KB

Open AccessReview

Integrating Protein Language Models with Multimodal Embeddings to Accelerate Function Prediction of Uncharacterized Proteins

by Ruyang Cheng, Tianyu Liu, Chentao Liao, Xiaomin Wu, Lingyun Zhu and Shaowei Zhang

Int. J. Mol. Sci. 2026, 27(9), 3891; https://doi.org/10.3390/ijms27093891 - 27 Apr 2026

Viewed by 110

Abstract

Accurate prediction of protein function is fundamental to progress in biotechnology and biomedicine, yet progress remains severely hampered by the widening chasm between exponentially growing genomic data and the limited capacity for functional annotation. High-throughput sequencing and metagenomics have driven an explosion in sequence data that far outstrips experimental characterization. UniProt now contains over 203 million protein entries, of which only ~2% have been experimentally validated. This widening “sequence–function gap” exceeds the reach of traditional homology-based tools such as BLAST (v2.17.0) and HMMER (v3.2), which are inherently constrained by sequence identity thresholds. The emergence of Protein Language Models (PLMs), including ESM and ProtTrans, has introduced a transformative paradigm, thereby shifting functional inference from similarity-based retrieval to geometric reasoning within learned semantic spaces. Nevertheless, current approaches remain largely confined to unimodal or narrowly bimodal frameworks, failing to capture the inherently multidimensional determinants of enzymatic function, including active-site geometry, chemical reaction logic, and literature-embedded semantic context. This review systematically adopts a multimodal global-fusion perspective, elucidating how three-dimensional geometric features, chemical reaction semantics, and textual knowledge graphs are synergistically integrated around PLMs as a core backbone. We delineate complementary mechanisms and integration strategies that together enable fine-grained protein function annotation beyond the performance ceiling of single-sequence methods. Furthermore, we survey the translational potential of such frameworks from computational prediction to real biological applications, and critically examine persistent bottlenecks including activity cliffs, transition-state inference, and conformational dynamics. We identify the integration of physics-informed machine learning with dynamics-aware architectures as a pivotal direction toward a causal, mechanism-level understanding of protein function. Full article

(This article belongs to the Special Issue Advances in Protein Structure-Function and Drug Discovery)

11 pages, 12230 KB

Open AccessArticle

Molecular Characterization and Comparative Genomics of Two Staphylococcus pseudintermedius Strains from Humans in Egypt

by Ola K. Elsakhawy, Haitham Elaadli, Yassien Badr, May Raouf, Stephen A. Kania, Hend Altaib and Mohamed A. Abouelkhair

Vet. Sci. 2026, 13(5), 424; https://doi.org/10.3390/vetsci13050424 - 27 Apr 2026

Viewed by 150

Abstract

Staphylococcus pseudintermedius is an opportunistic bacterium previously associated with dogs but has recently been found in human infections, raising zoonotic concerns. Genomic characterization of human S. pseudintermedius isolates can provide preliminary information on antibiotic resistance, pathogenicity, and genomic features relevant to host range. Two S. pseudintermedius isolates (hereafter referred to as S. pseudintermedius EGH1 and S. pseudintermedius EGH2) from human clinical samples in Egypt were sequenced using the Illumina NovaSeq X Plus platform. To assess genetic relatedness to human S. pseudintermedius isolates worldwide, multilocus sequence typing (MLST), pangenome analysis, and antimicrobial resistance gene profiling were performed. The sequencing produced a total of 9,499,989 reads for S. pseudintermedius EGH1 and 9,567,531 reads for S. pseudintermedius EGH2. Sequences were assembled with Geneious Prime^® 2025 and annotated using NCBI Prokaryotic Genome Annotation Pipeline v6.10. Pangenome analysis identified 9574 genes, comprising 1681 core genes (17.56%), 180 soft-core genes (1.88%), 837 shell genes (8.74%), and 6876 cloud genes (71.82%). MLST was conducted on human S. pseudintermedius genome assemblies using MLST v2.23.0. The analysis revealed both isolates as novel sequence types: S. pseudintermedius EGH1 was assigned ST-3037 with a new allele (purA-107), and S. pseudintermedius EGH2 was assigned ST-2874. Clonal relationships among S. pseudintermedius isolates were evaluated using the eBURST algorithm. This study presents the first next-generation genome sequencing and comparative genomic analysis of S. pseudintermedius isolates from humans in Egypt. Future studies integrating genomic, epidemiological, and phenotypic data are required. Full article

(This article belongs to the Special Issue Molecular Insights into Zoonotic and Animal Pathogens: A One Health Perspective)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 63.

Go to page 1 2 3 4 5

Search Results (3,131)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI