Genes

2026

Jump to: 2025, 2024, 2023, 2022

16 pages, 2778 KB

Open AccessArticle

Genome-Wide Characterization and Transcriptional Profiling of the WRKY Gene Family During Heartwood Formation in Dalbergia odorifera

by Ruoke Ma, Yueyao Xu, Heng Liu, Qianying Wei, Jia Luo, Boling Liu and Yunlin Fu

Genes 2026, 17(4), 386; https://doi.org/10.3390/genes17040386 - 28 Mar 2026

Viewed by 524

Abstract

Background: The WRKY transcription factor family represents one of the most crucial transcription factor families in plants, regulating diverse physiological processes. The heartwood of Dalbergia odorifera is a prized material for both high-quality rosewood and traditional medicinal applications, exhibiting exceptional economic value. However, [...] Read more.

Background: The WRKY transcription factor family represents one of the most crucial transcription factor families in plants, regulating diverse physiological processes. The heartwood of Dalbergia odorifera is a prized material for both high-quality rosewood and traditional medicinal applications, exhibiting exceptional economic value. However, the roles of WRKY transcription factors in the growth and development of D. odorifera, particularly their correlation with heartwood formation, remain unexplored. Methods: WRKY transcription factors were identified through bioinformatics analysis using the published genome data of D. odorifera. Phylogenetic comparative analysis was performed based on the Arabidopsis classification system. Collinearity analysis was conducted to investigate the evolutionary dynamics and expansion mechanisms of the WRKY gene family, and differential expression analysis was performed across tissues. Results: A total of 94 WRKY genes were unevenly distributed across 10 chromosomes and systematically designated as DodWRKY1 to DodWRKY94 according to their chromosomal positions. The WRKY family was classified into three major clades (Groups I, II, and III), with Group II further subdivided into five subgroups (IIa–IIe). Purifying selection served as the primary force shaping the WRKY family, with whole-genome or segmental duplication acting as the dominant expansion mechanism; these duplication events contributed to functional divergence, whereas genes within the same subgroup retained conserved structural features and motif compositions. DodWRKY14 (subgroup IIb) and DodWRKY58/68 (subgroup IIc) were highly expressed in the transition zone, suggesting a potential involvement in heartwood formation. Conclusions: This study provides a comprehensive characterization of the DodWRKY family and identifies candidate genes associated with heartwood formation, thereby establishing a foundation for further investigation into the molecular mechanisms underlying heartwood development. Full article

► Show Figures

Figure 1

28 pages, 25430 KB

Open AccessArticle

Unraveling Circadian Rhythm Disorder-Related Gene Signatures and Molecular Subtypes in Ulcerative Colitis: An Analysis of Bulk and Single-Cell Transcriptomics

by Meng Sun, Xiaowei Fu, Xiaoyun Zhu, Dingqiao Xu, Shengyu Zhang, Yingshu Tan, Yaqing Mao, Yongming Li and Shanting Liao

Genes 2026, 17(4), 383; https://doi.org/10.3390/genes17040383 - 27 Mar 2026

Viewed by 855

Abstract

Background: Ulcerative colitis (UC) is an intestinal disease characterized by long-term inflammation. Circadian rhythm disorder (CRD) affects various biological activities and has been linked to several diseases, including UC. This study aimed to investigate the role and significance of CRD in UC. Methods: [...] Read more.

Background: Ulcerative colitis (UC) is an intestinal disease characterized by long-term inflammation. Circadian rhythm disorder (CRD) affects various biological activities and has been linked to several diseases, including UC. This study aimed to investigate the role and significance of CRD in UC. Methods: Bulk RNA-seq data from five independent UC cohorts were obtained from the Gene Expression Omnibus (GEO) database and integrated into a single dataset. The dataset underwent differential analysis to identify differentially expressed genes (DEGs) in association with CRD. Expression levels and pathway enrichment of CRD genes were analyzed, and signature genes were identified using machine learning algorithms. Based on these signature genes, a UC risk prediction model and CRD-related molecular subtypes were established. Furthermore, single-cell RNA-seq data of UC were analyzed to discuss the key role of CRD and signature genes in the UC microenvironment. RT-PCR analysis was employed to validate the expression levels of the identified signature genes. Results: 247 DEGs associated with CRD in UC were identified (referred to as CRD-DEGs). Gene set enrichment analysis (GSEA) revealed a strong association between CRD and inflammation, as well as immune cell infiltration in UC. This association potentially impacts intestinal fibrosis. A comparison of three machine learning algorithms (Lasso, SVM-RFE, and Random Forest) resulted in the identification of 12 signature genes. A UC risk prediction model and two UC CRD subtypes were developed using these genes. Among them, STXBP1 was identified by all three machine learning algorithms and was further analyzed. STXBP1 was predominantly enriched in pathways related to inflammatory response. Elevated levels of STXBP1 are mainly caused by reduced levels of methylation of its gene promoter. RT-PCR confirmed elevated expression of certain genes in mouse UC models. Conclusions: This study is the first to establish a strong association between CRD and the onset of UC. The newly developed UC nomogram based on CRD demonstrated high predictive accuracy, although further clinical validation is required. Understanding the intrinsic relationship between CRD and UC enhances our understanding of the potential pathogenesis of UC. This study introduces novel ideas and methods for early diagnosis, treatment, and prognosis of UC. Full article

► Show Figures

Figure 1

12 pages, 3790 KB

Open AccessArticle

Bioinformatics and Preliminary Functional Analysis of OsPP2C61

by Hao Wang, Enjie Xu, Yujiao Shi, Nuoyan Li, Jinyilin Leng, Yuan Luo, Jianyang Sun, Yaofang Zhang and Zhongyou Pei

Genes 2026, 17(4), 374; https://doi.org/10.3390/genes17040374 - 25 Mar 2026

Viewed by 550

Abstract

Background: Protein phosphatase 2Cs (PP2Cs) constitutes the largest phosphatase family in plants, playing a pivotal role in signal transduction. Within this family, the PP2C.D subfamily exerts significant influence on cell elongation and stress adaptation by mediating the ‘SAUR-PP2C.D-H+-ATPase’ regulatory module in the auxin [...] Read more.

Background: Protein phosphatase 2Cs (PP2Cs) constitutes the largest phosphatase family in plants, playing a pivotal role in signal transduction. Within this family, the PP2C.D subfamily exerts significant influence on cell elongation and stress adaptation by mediating the ‘SAUR-PP2C.D-H+-ATPase’ regulatory module in the auxin signaling pathway. In rice, OsPP2C61 is a PP2C member whose molecular features and potential regulatory context remain unclear. Methods: Our study conducted a preliminary characterization of OsPP2C61 through integrated bioinformatics analysis, spatiotemporal expression profiling, and subcellular localization experiments in tobacco leaf cell. Results: OsPP2C61 encodes a 377-amino-acid protein predicted to be hydrophilic, basic, and structurally unstable. Secondary-structure prediction identified three major elements with random coils as the predominant component, whereas 3D modeling indicated alternating α-helices and β-sheets consistent with a canonical PP2C fold. Phylogenetic inference placed OsPP2C61 within the PP2C.D clade and revealed conserved motifs shared with OsPP2C25, OsPP2C28, and OsPP2C39. Promoter analysis showed enrichment of abscisic acid (ABA)- and methyl jasmonate (MeJA)-responsive elements along with multiple stress-related cis-regulatory motifs. Spatiotemporal expression analysis showed that OsPP2C61 is highly expressed in roots. Subcellular localization assays further demonstrated that the OsPP2C61-GFP fusion protein localizes to the nucleus and the plasma membrane when transiently expressed in epidermal cells of Nicotiana benthamiana. Conclusions: This work delivers the first comprehensive characterization of OsPP2C61, establishing a foundation for mechanistic studies and positioning OsPP2C61 as a candidate gene for rice improvement. Full article

► Show Figures

Figure 1

2025

Jump to: 2026, 2024, 2023, 2022

14 pages, 2527 KB

Open AccessArticle

Genome-Wide Identification and Expression Pattern of the SPP Gene Family in Cotton (Gossypium hirsutum) Under Abiotic Stress

by Cuijie Cui, Chao Wang, Shangfu Ren and Huiqin Wang

Genes 2025, 16(12), 1500; https://doi.org/10.3390/genes16121500 - 15 Dec 2025

Viewed by 757

Abstract

Background: Sucrose metabolism plays a crucial role in plant responses to abiotic stresses such as drought and high temperatures, significantly influencing plant growth and yield formation. In higher plants, the second step in sucrose bioconversion involves sucrose phosphate phosphatase (SPP) hydrolyzing sucrose-6-phosphate to [...] Read more.

Background: Sucrose metabolism plays a crucial role in plant responses to abiotic stresses such as drought and high temperatures, significantly influencing plant growth and yield formation. In higher plants, the second step in sucrose bioconversion involves sucrose phosphate phosphatase (SPP) hydrolyzing sucrose-6-phosphate to form sucrose. This study determined the number of SPP gene family members in upland cotton (Gossypium hirsutum), systematically analyzed their fundamental characteristics, physicochemical properties, phylogenetic relationships, chromosomal localization, and expression patterns across different tissues and under various abiotic stresses. Methods: The SPP gene family in hirsutum was identified using Hidden Markov Models (HMMER) and the NCBI Conserved Domain Database (NCBI CDD), and its physico-chemical properties were analyzed via the SOPMA online analysis website. Phylogenetic relationships were determined using MEGA 12.0 software. Promoter regions were analyzed with PlantCARE, sequence patterns were identified via MEME, and transcriptome data were downloaded from the CottonMD database. Results: This study identified four members of the hirsutum SPP gene family, with amino acid lengths ranging from 335 to 1015, molecular weights between 38.38 and 113.28 kDa, and theoretical isoelectric points (pI) between 5.39 and 6.33. These genes are localized across four chromosomes. The SPP gene family in hirsutum exhibits closer phylo-genetic relationships with SPP genes in Arabidopsis thaliana and Chenopodium quinoa. Their promoter regions are rich in cis-elements associated with multiple abiotic stress resistance functions, and their expression patterns vary across different tissues and under different abiotic stress conditions. Conclusions: The GhSPP gene may play an important role in the growth and development of upland cotton and its responses to salt stress and drought. Therefore, it could be considered as a candidate gene for future functional analysis of cotton resistance to salt and drought stress. Full article

► Show Figures

Figure 1

2024

Jump to: 2026, 2025, 2023, 2022

17 pages, 5446 KB

Open AccessArticle

NF-ΚB Activation as a Key Driver in Chronic Lymphocytic Leukemia Evolution to Richter’s Syndrome: Unraveling the Influence of Immune Microenvironment Dynamics

by Paulo Rohan, Renata Binato and Eliana Abdelhay

Genes 2024, 15(11), 1434; https://doi.org/10.3390/genes15111434 - 5 Nov 2024

Cited by 5 | Viewed by 2987

Abstract

Background/Objectives: Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in Western countries and it can progress to Richter’s syndrome (RS), a more aggressive condition. The NF-κB pathway is pivotal in CLL pathogenesis, driven mainly by B-cell receptor (BCR) signaling. However, [...] Read more.

Background/Objectives: Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in Western countries and it can progress to Richter’s syndrome (RS), a more aggressive condition. The NF-κB pathway is pivotal in CLL pathogenesis, driven mainly by B-cell receptor (BCR) signaling. However, recent evidence indicates that BCR signaling is reduced in RS, raising questions about whether and how NF-κB activity is maintained in RS. This study aims to elucidate the triggers and dynamics of NF-κB activation and the progression from CLL to RS. Methods: Integrated single-cell RNA sequencing data from peripheral blood samples of four CLL–RS patients were analyzed. NF-κB pathway activity and gene expression profiles were assessed to determine changes in NF-κB components and their targets. Tumor microenvironment composition and cell–cell communication patterns were inferred to explore NF-κB regulatory mechanisms. Results: RS samples showed increased proportions of malignant cells expressing NF-κB components, including NFKB1, NFKB2, RELA, IKBKG, MAP3K14, CHUK, and IKBKB, with significantly higher expression levels than in CLL. Enhanced NF-κB pathway activity in RS cells was associated with targets involved in immune modulation. The tumor microenvironment in RS displayed significant compositional changes, and signaling inference revealed enhanced cell–cell communication via BAFF and APRIL pathways, involving interactions with receptors such as BAFF-R and TACI on RS cells. Conclusions: The findings from this study reveal an active state of NF-κB in RS and suggest that this state plays a critical role in the evolution of CLL to RS, which is modulated by alternative signaling pathways and the influence of the tumor microenvironment. Full article

► Show Figures

Figure 1

20 pages, 1640 KB

Open AccessEditor’s ChoiceArticle

Reduced-Cost Genotyping by Resequencing in Peanut Breeding Programs Using Tecan Allegro Targeted Resequencing V2

by Cheng-Jung Sung, Roshan Kulkarni, Andrew Hillhouse, Charles E. Simpson, John Cason and Mark D. Burow

Genes 2024, 15(11), 1364; https://doi.org/10.3390/genes15111364 - 24 Oct 2024

Cited by 1 | Viewed by 2557

Abstract

The identification of informative molecular markers is useful for linkage mapping and can benefit genome-wide association studies by providing fine-scale information about sequence variations. However, high-throughput genotyping approaches are not cost-effective for labs that require frequent use, such as breeding programs that need [...] Read more.

The identification of informative molecular markers is useful for linkage mapping and can benefit genome-wide association studies by providing fine-scale information about sequence variations. However, high-throughput genotyping approaches are not cost-effective for labs that require frequent use, such as breeding programs that need to perform genotyping on large populations with hundreds of individuals. The number of single nucleotide polymorphism markers generated by those approaches can be far more than needed for most breeding programs; instead, breeders focus on the use of at most hundreds of polymorphic molecular markers for analysis. To help make use of molecular markers a routine tool for breeding programs, we aim to develop a cost-effective genotyping system by using the Tecan Allegro Targeted Resequencing V2 kit. This provides a customized probe design, which indicates that all the DNA fragments synthesized are known targets. SNPs obtained from previous peanut next-generation sequencing data were pre-filtered and selected as targets. These SNP targets were polymorphic among different tetraploid accessions and were selected to be distinguishable from paralogs. A total of 5154 probes were designed to detect 2770 SNP targets and were tested on 48 accessions, which include some closely related sister lines from a breeding population. The results indicated that genotyping by a targeted resequencing approach reduced the cost from around USD 28 (SNP chip and GBS) to USD 18 per sample, while providing polymorphic markers with accurate SNP calls. With this cost-effective genotyping platform, pre-selected SNP markers can be used effectively and routinely for more breeding programs. Full article

► Show Figures

Figure 1

10 pages, 1434 KB

Open AccessEditor’s ChoiceArticle

PhenoMetaboDiff: R Package for Analysis and Visualization of Phenotype Microarray Data

by Rini Pauly, Mehtab Iqbal, Narae Lee, Bridgette Allen Moffitt, Sara Moir Sarasua, Luyi Li, Nina Christine Hubig and Luigi Boccuto

Genes 2024, 15(11), 1362; https://doi.org/10.3390/genes15111362 - 24 Oct 2024

Cited by 3 | Viewed by 2253

Abstract

Background: PhenoMetaboDiff is a novel R package for computational analysis and visualization of data generated by Biolog Phenotype Mammalian Microarrays (PM-Ms). These arrays measure the energy production of mammalian cells in different metabolic environments, assess the metabolic activity of cells exposed to various [...] Read more.

Background: PhenoMetaboDiff is a novel R package for computational analysis and visualization of data generated by Biolog Phenotype Mammalian Microarrays (PM-Ms). These arrays measure the energy production of mammalian cells in different metabolic environments, assess the metabolic activity of cells exposed to various drugs or energy sources, and compare the metabolic profiles of cells from individuals affected by specific disorders versus healthy controls. Methods: PhenoMetaboDiff has several modules that facilitate statistical analysis by sample comparisons using non-parametric Mann–Whitney U-test, the integration of the OPM package (an R package for analysing OmniLog^® phenotype microarray data) for robust file conversion, and calculation of slope and area under the curve (AUC). In addition, the built-in visualization allows specific wells to be visualized in selected pathways for a particular time slice. Results: Compared to the standard OPM package, the features developed in PhenoMetaboDiff assess metabolic profiles by employing statistical tests and visualize the dynamic nature of the energy production in several conditions. Examples of how this package can be used are demonstrated for several rare disease conditions. The incorporation of a graphical user interface expands the utility of this program to both expert and novice users of R. Conclusions: PhenoMetaboDiff makes the deployment of the cutting-edge Biolog system available to any researcher. Full article

► Show Figures

Figure 1

18 pages, 4797 KB

Open AccessEditor’s ChoiceArticle

coiTAD: Detection of Topologically Associating Domains Based on Clustering of Circular Influence Features from Hi-C Data

by Drew Houchens, H. M. A. Mohit Chowdhury and Oluwatosin Oluwadare

Genes 2024, 15(10), 1293; https://doi.org/10.3390/genes15101293 - 30 Sep 2024

Cited by 2 | Viewed by 3613

Abstract

Background/Objectives: Topologically associating domains (TADs) are key structural units of the genome, playing a crucial role in gene regulation. TAD boundaries are enriched with specific biological markers and have been linked to genetic diseases, making consistent TAD detection essential. However, accurately identifying TADs [...] Read more.

Background/Objectives: Topologically associating domains (TADs) are key structural units of the genome, playing a crucial role in gene regulation. TAD boundaries are enriched with specific biological markers and have been linked to genetic diseases, making consistent TAD detection essential. However, accurately identifying TADs remains challenging due to the lack of a definitive validation method. This study aims to develop a novel algorithm, termed coiTAD, which introduces an innovative approach for preprocessing Hi-C data to improve TAD prediction. This method employs a proposed “circle of influence” (COI) approach derived from Hi-C contact matrices. Methods: The coiTAD algorithm is based on the creation of novel features derived from the circle of influence in input contact matrices, which are subsequently clustered using the HDBSCAN clustering algorithm. The TADs are extracted from the clustered features based on intra-cluster interactions, thereby providing a more accurate method for identifying TADs. Results: Rigorous tests were conducted using both simulated and real Hi-C datasets. The algorithm’s validation included analysis of boundary proteins such as H3K4me1, RNAPII, and CTCF. coiTAD consistently matched other TAD prediction methods. Conclusions: The coiTAD algorithm represents a novel approach for detecting TADs. At its core, the circle-of-influence methodology introduces an innovative strategy for preparing Hi-C data, enabling the assessment of interaction strengths between genomic regions. This approach facilitates a nuanced analysis that effectively captures structural variations within chromatin. Ultimately, the coiTAD algorithm enhances our understanding of chromatin organization and offers a robust tool for genomic research. The source code for coiTAD is publicly available, and the URL can be found in the Data Availability Statement section. Full article

► Show Figures

Figure 1

14 pages, 5171 KB

Open AccessArticle

Key Genes FECH and ALAS2 under Acute High-Altitude Exposure: A Gene Expression and Network Analysis Based on Expression Profile Data

by Yifan Zhao, Lingling Zhu, Dawei Shi, Jiayue Gao and Ming Fan

Genes 2024, 15(8), 1075; https://doi.org/10.3390/genes15081075 - 14 Aug 2024

Cited by 6 | Viewed by 3518

Abstract

High-altitude acclimatization refers to the physiological adjustments and adaptation processes by which the human body gradually adapts to the hypoxic conditions of high altitudes after entering such environments. This study analyzed three mRNA expression profile datasets from the GEO database, focusing on 93 [...] Read more.

High-altitude acclimatization refers to the physiological adjustments and adaptation processes by which the human body gradually adapts to the hypoxic conditions of high altitudes after entering such environments. This study analyzed three mRNA expression profile datasets from the GEO database, focusing on 93 healthy residents from low altitudes (≤1400 m). Peripheral blood samples were collected for analysis on the third day after these individuals rapidly ascended to higher altitudes (3000–5300 m). The analysis identified significant differential expression in 382 genes, with 361 genes upregulated and 21 downregulated. Further, gene ontology (GO) annotation analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis indicated that the top-ranked enriched pathways are upregulated, involving blood gas transport, erythrocyte development and differentiation, and heme biosynthetic process. Network analysis highlighted ten key genes, namely, SLC4A1, FECH, EPB42, SNCA, GATA1, KLF1, GYPB, ALAS2, DMTN, and GYPA. Analysis revealed that two of these key genes, FECH and ALAS2, play a critical role in the heme biosynthetic process, which is pivotal in the development and maturation of red blood cells. These findings provide new insights into the key gene mechanisms of high-altitude acclimatization and identify potential biomarkers and targets for personalized acclimatization strategies. Full article

► Show Figures

Figure 1

20 pages, 5637 KB

Open AccessArticle

Genome-Wide Identification and Characterization of Ammonium Transporter (AMT) Genes in Chlamydomonas reinhardtii

by Wenhui Hu, Dan Wang, Shuangshuang Zhao, Jiaqi Ji, Jing Yang, Yiqin Wan and Chao Yu

Genes 2024, 15(8), 1002; https://doi.org/10.3390/genes15081002 - 31 Jul 2024

Cited by 4 | Viewed by 2590

Abstract

Ammonium transporters (AMTs) are vital plasma membrane proteins facilitating NH₄⁺ uptake and transport, crucial for plant growth. The identification of favorable AMT genes is the main goal of improving ammonium-tolerant algas. However, there have been no reports on the systematic identification [...] Read more.

Ammonium transporters (AMTs) are vital plasma membrane proteins facilitating NH₄⁺ uptake and transport, crucial for plant growth. The identification of favorable AMT genes is the main goal of improving ammonium-tolerant algas. However, there have been no reports on the systematic identification and expression analysis of Chlamydomonas reinhardtii (C. reinhardtii) AMT genes. This study comprehensively identified eight CrAMT genes, distributed across eight chromosomes, all containing more than 10 transmembrane structures. Phylogenetic analysis revealed that all CrAMTs belonged to the AMT1 subfamily. The conserved motifs and domains of CrAMTs were similar to those of the AMT1 members of OsAMTs and AtAMTs. Notably, the gene fragments of CrAMTs are longer and contain more introns compared to those of AtAMTs and OsAMTs. And the promoter regions of CrAMTs are enriched with cis-elements associated with plant hormones and light response. Under NH₄⁺ treatment, CrAMT1;1 and CrAMT1;3 were significantly upregulated, while CrAMT1;2, CrAMT1;4, and CrAMT1;6 saw a notable decrease. CrAMT1;7 and CrAMT1;8 also experienced a decline, albeit less pronounced. Transgenic algas with overexpressed CrAMT1;7 did not show a significant difference in growth compared to CC-125, while transgenic algas with CrAMT1;7 knockdown exhibited growth inhibition. Transgenic algas with overexpressed or knocked-down CrAMT1;8 displayed reduced growth compared to CC-125, which also resulted in the suppression of other CrAMT genes. None of the transgenic algas showed better growth than CC-125 at high ammonium levels. In summary, our study has unveiled the potential role of CrAMT genes in high-ammonium environments and can serve as a foundational research platform for investigating ammonium-tolerant algal species. Full article

► Show Figures

Figure 1

22 pages, 4152 KB

Open AccessArticle

Dissecting the Genetic Architecture of Morphological Traits in Sunflower (Helianthus annuus L.)

by Yavuz Delen, Semra Palali-Delen, Gen Xu, Mohamed Neji, Jinliang Yang and Ismail Dweikat

Genes 2024, 15(7), 950; https://doi.org/10.3390/genes15070950 - 19 Jul 2024

Cited by 4 | Viewed by 3411

Abstract

The sunflower (Helianthus annuus L.) is one of the most essential oil crops in the world. Several component traits, including flowering time, plant height, stem diameter, seed weight, and kernel weight, determine sunflower seed and oil yield. Although the genetic mechanisms governing [...] Read more.

The sunflower (Helianthus annuus L.) is one of the most essential oil crops in the world. Several component traits, including flowering time, plant height, stem diameter, seed weight, and kernel weight, determine sunflower seed and oil yield. Although the genetic mechanisms governing the variation of these yield-related traits have been studied using various approaches, genome-wide association studies (GWAS) have not been widely applied to sunflowers. In this study, a set of 342 sunflower accessions was evaluated in 2019 and 2020 using an incomplete randomized block design, and GWAS was conducted utilizing two complementary approaches: the mixed linear model (MLM) and the fixed and random model circulating probability unification (farmCPU) model by fitting 226,779 high-quality SNPs. As a result, GWAS identified a number of trait-associated SNPs. Those SNPs were located close to several genes that may serve as a basis for further molecular characterization and provide promising targets for sunflower yield improvement. Full article

► Show Figures

Figure 1

13 pages, 2603 KB

Open AccessEditor’s ChoiceArticle

CrossMP: Enabling Cross-Modality Translation between Single-Cell RNA-Seq and Single-Cell ATAC-Seq through Web-Based Portal

by Zhen Lyu, Sabin Dahal, Shuai Zeng, Juexin Wang, Dong Xu and Trupti Joshi

Genes 2024, 15(7), 882; https://doi.org/10.3390/genes15070882 - 5 Jul 2024

Viewed by 4249

Abstract

In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers [...] Read more.

In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers make more accurate predictions and gain a more comprehensive understanding than with single-modality analysis. However, generating such multimodal data is technically challenging and expensive, leading to limited availability of single-cell co-assay data. Here, we propose a model for cross-modal prediction between the transcriptome and chromatin profiles in single cells. Our model is based on a deep neural network architecture that learns the latent representations from the source modality and then predicts the target modality. It demonstrates reliable performance in accurately translating between these modalities across multiple paired human scATAC-seq and scRNA-seq datasets. Additionally, we developed CrossMP, a web-based portal allowing researchers to upload their single-cell modality data through an interactive web interface and predict the other type of modality data, using high-performance computing resources plugged at the backend. Full article

► Show Figures

Figure 1

18 pages, 9277 KB

Open AccessEditor’s ChoiceArticle

Analysis of Hyperosmotic Tolerance Mechanisms in Gracilariopsis lemaneiformis Based on Weighted Co-Expression Network Analysis

by Baoheng Xiao, Xiaoqing Feng, Pingping Li and Zhenghong Sui

Genes 2024, 15(6), 781; https://doi.org/10.3390/genes15060781 - 13 Jun 2024

Viewed by 2325

Abstract

We conducted transcriptome sequencing on salt-tolerant mutants X5 and X3, and a control (Ctr) strain of Gracilariopsis lemaneiformis after treatment with artificial seawater at varying salinities (30‰, 45‰, and 60‰) for 3 weeks. Differentially expressed genes were identified and a weighted co-expression network [...] Read more.

We conducted transcriptome sequencing on salt-tolerant mutants X5 and X3, and a control (Ctr) strain of Gracilariopsis lemaneiformis after treatment with artificial seawater at varying salinities (30‰, 45‰, and 60‰) for 3 weeks. Differentially expressed genes were identified and a weighted co-expression network analysis was conducted. The blue, red, and tan modules were most closely associated with salinity, while the black, cyan, light cyan, and yellow modules showed a close correlation with strain attributes. KEGG enrichment of genes from the aforementioned modules revealed that the key enrichment pathways for salinity attributes included the proteasome and carbon fixation in photosynthesis, whereas the key pathways for strain attributes consisted of lipid metabolism, oxidative phosphorylation, soluble N-ethylmaleimide-sensitive factor-activating protein receptor (SNARE) interactions in vesicular transport, and porphyrin and chlorophyll metabolism. Gene expression for the proteasome and carbon fixation in photosynthesis was higher in all strains at 60‰. In addition, gene expression in the proteasome pathway was higher in the X5-60 than Ctr-60 and X3-60. Based on the above data and relevant literature, we speculated that mutant X5 likely copes with high salt stress by upregulating genes related to lysosome and carbon fixation in photosynthesis. The proteasome may be reset to adjust the organism’s proteome composition to adapt to high-salt environments, while carbon fixation may aid in maintaining material and energy metabolism for normal life activities by enhancing carbon dioxide uptake via photosynthesis. The differences between the X5-30 and Ctr-30 expression of genes involved in the synthesis of secondary metabolites, oxidative phosphorylation, and SNARE interactions in vesicular transport suggested that the X5-30 may differ from Ctr-30 in lipid metabolism, energy metabolism, and vesicular transport. Finally, among the key pathways with good correlation with salinity and strain traits, the key genes with significant correlation with salinity and strain traits were identified by correlation analysis. Full article

► Show Figures

Figure 1

11 pages, 453 KB

Open AccessArticle

The Effect of Genome Parametrization and SNP Marker Subsetting on Genomic Selection in Autotetraploid Alfalfa

by Nelson Nazzicari, Nicolò Franguelli, Barbara Ferrari, Luciano Pecetti and Paolo Annicchiarico

Genes 2024, 15(4), 449; https://doi.org/10.3390/genes15040449 - 2 Apr 2024

Cited by 13 | Viewed by 2742

Abstract

Background: Alfalfa, the most economically important forage legume worldwide, features modest genetic progress due to long selection cycles and the extent of the non-additive genetic variance associated with its autotetraploid genome. Methods: To improve the efficiency of genomic selection in alfalfa, we explored [...] Read more.

Background: Alfalfa, the most economically important forage legume worldwide, features modest genetic progress due to long selection cycles and the extent of the non-additive genetic variance associated with its autotetraploid genome. Methods: To improve the efficiency of genomic selection in alfalfa, we explored the effects of genome parametrization (as tetraploid and diploid dosages, plus allele ratios) and SNP marker subsetting (all available SNPs, only genic regions, and only non-genic regions) on genomic regressions, together with various levels of filtering on reading depth and missing rates. We used genotyping by sequencing-generated data and focused on traits of different genetic complexity, i.e., dry biomass yield in moisture-favorable (FE) and drought stress (SE) environments, leaf size, and the onset of flowering, which were assessed in 143 genotyped plants from a genetically broad European reference population and their phenotyped half-sib progenies. Results: On average, the allele ratio improved the predictive ability compared with other genome parametrizations (+7.9% vs. tetraploid dosage, +12.6% vs. diploid dosage), while using all the SNPs offered an advantage compared with any specific SNP subsetting (+3.7% vs. genic regions, +7.6% vs. non-genic regions). However, when focusing on specific traits, different combinations of genome parametrization and subsetting achieved better performances. We also released Legpipe2, an SNP calling pipeline tailored for reduced representation (GBS, RAD) in medium-sized genotyping experiments. Full article

► Show Figures

Figure 1

17 pages, 1389 KB

Open AccessArticle

Data Augmentation Enhances Plant-Genomic-Enabled Predictions

by Osval A. Montesinos-López, Mario Alberto Solis-Camacho, Leonardo Crespo-Herrera, Carolina Saint Pierre, Gloria Isabel Huerta Prado, Sofia Ramos-Pulido, Khalid Al-Nowibet, Roberto Fritsche-Neto, Guillermo Gerard, Abelardo Montesinos-López and José Crossa

Genes 2024, 15(3), 286; https://doi.org/10.3390/genes15030286 - 24 Feb 2024

Cited by 5 | Viewed by 4488

Abstract

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data [...] Read more.

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings. Full article

► Show Figures

Figure 1

12 pages, 1854 KB

Open AccessEditor’s ChoiceArticle

Transcription Factor Regulation of Gene Expression Network by ZNF385D and HAND2 in Carotid Atherosclerosis

by Ming Tan, Lars Juel Andersen, Niels Eske Bruun, Matias Greve Lindholm, Qihua Tan and Martin Snoer

Genes 2024, 15(2), 213; https://doi.org/10.3390/genes15020213 - 7 Feb 2024

Cited by 1 | Viewed by 2818

Abstract

Carotid intima-media thickness (CIMT) is a surrogate indicator for atherosclerosis and has been shown to predict cardiovascular risk in multiple large studies. Identification of molecular markers for carotid atheroma plaque formation can be critical for early intervention and prevention of atherosclerosis. This study [...] Read more.

Carotid intima-media thickness (CIMT) is a surrogate indicator for atherosclerosis and has been shown to predict cardiovascular risk in multiple large studies. Identification of molecular markers for carotid atheroma plaque formation can be critical for early intervention and prevention of atherosclerosis. This study performed transcription factor (TF) network analysis of global gene expression data focusing on two TF genes, ZNF385D and HAND2, whose polymorphisms have been recently reported to show association with CIMT. Genome-wide gene expression data were measured from pieces of carotid endarterectomy collected from 34 hypertensive patients (atheroma plaque of stages IV and above according to the Stary classification) each paired with one sample of distant macroscopically intact tissue (stages I and II). Transcriptional regulation networks or the regulons were reconstructed for ZNF385D (5644 target genes) and HAND2 (781 target genes) using network inference. Their association with the progression of carotid atheroma was examined using gene-set enrichment analysis with extremely high statistical significance for regulons of both ZNF385D and HAND2 (p < 6.95 × 10⁻⁷) suggesting the involvement of expression quantitative loci (eQTL). Functional annotation of the regulon genes found heavy involvement in the immune system’s response to inflammation and infection in the development of atherosclerosis. Detailed examination of the regulation and correlation patterns suggests that activities of the two TF genes could have high clinical and interventional impacts on impairing carotid atheroma plaque formation and preventing carotid atherosclerosis. Full article

► Show Figures

Figure 1

2023

Jump to: 2026, 2025, 2024, 2022

16 pages, 729 KB

Open AccessArticle

Base-Excision Repair Mutational Signature in Two Sebaceous Carcinomas of the Eyelid

by Eugenio Sangiorgi, Federico Giannuzzi, Clelia Molinario, Giulia Rapari, Melania Riccio, Giovanni Cuffaro, Federica Castri, Roberta Benvenuto, Maurizio Genuardi, Daniela Massi and Gustavo Savino

Genes 2023, 14(11), 2055; https://doi.org/10.3390/genes14112055 - 8 Nov 2023

Cited by 2 | Viewed by 2469

Abstract

Personalized medicine aims to develop tailored treatments for individual patients based on specific mutations present in the affected organ. This approach has proven paramount in cancer treatment, as each tumor carries distinct driver mutations that respond to targeted drugs and, in some cases, [...] Read more.

Personalized medicine aims to develop tailored treatments for individual patients based on specific mutations present in the affected organ. This approach has proven paramount in cancer treatment, as each tumor carries distinct driver mutations that respond to targeted drugs and, in some cases, may confer resistance to other therapies. Particularly for rare conditions, personalized medicine has the potential to revolutionize treatment strategies. Rare cancers often lack extensive datasets of molecular and pathological information, large-scale trials for novel therapies, and established treatment guidelines. Consequently, surgery is frequently the only viable option for many rare tumors, when feasible, as traditional multimodal approaches employed for more common cancers often play a limited role. Sebaceous carcinoma of the eyelid is an exceptionally rare cancer affecting the eye’s adnexal tissues, most frequently reported in Asia, but whose prevalence is significantly increasing even in Europe and the US. The sole established curative treatment is surgical excision, which can lead to significant disfigurement. In cases of metastatic sebaceous carcinoma, validated drug options are currently lacking. In this project, we set out to characterize the mutational landscape of two sebaceous carcinomas of the eyelid following surgical excision. Utilizing available bioinformatics tools, we demonstrated our ability to identify common features promptly and accurately in both tumors. These features included a Base-Excision Repair mutational signature, a notably high tumor mutational burden, and key driver mutations in somatic tissues. These findings had not been previously reported in similar studies. This report underscores how, in the case of rare tumors, it is possible to comprehensively characterize the mutational landscape of each individual case, potentially opening doors to targeted therapeutic options. Full article

► Show Figures

Figure 1

15 pages, 1626 KB

Open AccessArticle

SNPtotree—Resolving the Phylogeny of SNPs on Non-Recombining DNA

by Zehra Köksal, Claus Børsting, Leonor Gusmão and Vania Pereira

Genes 2023, 14(10), 1837; https://doi.org/10.3390/genes14101837 - 22 Sep 2023

Cited by 3 | Viewed by 3751

Abstract

Genetic variants on non-recombining DNA and the hierarchical order in which they accumulate are commonly of interest. This variant hierarchy can be established and combined with information on the population and geographic origin of the individuals carrying the variants to find population structures [...] Read more.

Genetic variants on non-recombining DNA and the hierarchical order in which they accumulate are commonly of interest. This variant hierarchy can be established and combined with information on the population and geographic origin of the individuals carrying the variants to find population structures and infer migration patterns. Further, individuals can be assigned to the characterized populations, which is relevant in forensic genetics, genetic genealogy, and epidemiologic studies. However, there is currently no straightforward method to obtain such a variant hierarchy. Here, we introduce the software SNPtotree v1.0, which uniquely determines the hierarchical order of variants on non-recombining DNA without error-prone manual sorting. The algorithm uses pairwise variant comparisons to infer their relationships and integrates the combined information into a phylogenetic tree. Variants that have contradictory pairwise relationships or ambiguous positions in the tree are removed by the software. When benchmarked using two human Y-chromosomal massively parallel sequencing datasets, SNPtotree outperforms traditional methods in the accuracy of phylogenetic trees for sequencing data with high amounts of missing information. The phylogenetic trees of variants created using SNPtotree can be used to establish and maintain publicly available phylogeny databases to further explore genetic epidemiology and genealogy, as well as population and forensic genetics. Full article

► Show Figures

Figure 1

9 pages, 2055 KB

Open AccessArticle

PMIDigest: Interactive Review of Large Collections of PubMed Entries to Distill Relevant Information

by Jorge Novoa, Mónica Chagoyen, Carlos Benito, F. Javier Moreno and Florencio Pazos

Genes 2023, 14(4), 942; https://doi.org/10.3390/genes14040942 - 19 Apr 2023

Cited by 9 | Viewed by 4252

Abstract

Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands [...] Read more.

Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands of entries (articles) that are difficult to process manually. In this work, we present an interactive tool for automatically digesting large sets of PubMed articles: PMIDigest (PubMed IDs digester). The system allows for classification/sorting of articles according to different criteria, including the type of article and different citation-related figures. It also calculates the distribution of MeSH (medical subject headings) terms for categories of interest, providing in a picture of the themes addressed in the set. These MeSH terms are highlighted in the article abstracts in different colors depending on the category. An interactive representation of the interarticle citation network is also presented in order to easily locate article “clusters” related to particular subjects, as well as their corresponding “hub” articles. In addition to PubMed articles, the system can also process a set of Scopus or Web of Science entries. In summary, with this system, the user can have a “bird’s eye view” of a large set of articles and their main thematic tendencies and obtain additional information not evident in a plain list of abstracts. Full article

► Show Figures

Figure 1

20 pages, 1802 KB

Open AccessReview

Computational Biology Helps Understand How Polyploid Giant Cancer Cells Drive Tumor Success

by Matheus Correia Casotti, Débora Dummer Meira, Aléxia Stefani Siqueira Zetum, Bruno Cancian de Araújo, Danielle Ribeiro Campos da Silva, Eldamária de Vargas Wolfgramm dos Santos, Fernanda Mariano Garcia, Flávia de Paula, Gabriel Mendonça Santana, Luana Santos Louro, Lyvia Neves Rebello Alves, Raquel Furlani Rocon Braga, Raquel Silva dos Reis Trabach, Sara Santos Bernardes, Thomas Erik Santos Louro, Eduardo Cremonese Filippi Chiela, Guido Lenz, Elizeu Fagundes de Carvalho and Iúri Drumond Louro

Genes 2023, 14(4), 801; https://doi.org/10.3390/genes14040801 - 26 Mar 2023

Cited by 23 | Viewed by 7506

Abstract

Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. [...] Read more.

Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. Newly formed cells end up accessing numerous multicellular and unicellular programs that enable metastasis, drug resistance, tumor recurrence, and self-renewal or diverse clone formation. An integrative literature review was carried out, searching articles in several sites, including: PUBMED, NCBI-PMC, and Google Academic, published in English, indexed in referenced databases and without a publication time filter, but prioritizing articles from the last 3 years, to answer the following questions: (i) “What is the current knowledge about polyploidy in tumors?”; (ii) “What are the applications of computational studies for the understanding of cancer polyploidy?”; and (iii) “How do PGCCs contribute to tumorigenesis?” Full article

► Show Figures

Figure 1

16 pages, 3618 KB

Open AccessArticle

Understanding Drug Resistance of Wild-Type and L38HL Insertion Mutant of HIV-1 C Protease to Saquinavir

by Sankaran Venkatachalam, Nisha Murlidharan, Sowmya R. Krishnan, C. Ramakrishnan, Mpho Setshedi, Ramesh Pandian, Debmalya Barh, Sandeep Tiwari, Vasco Azevedo, Yasien Sayed and M. Michael Gromiha

Genes 2023, 14(2), 533; https://doi.org/10.3390/genes14020533 - 20 Feb 2023

Cited by 9 | Viewed by 3519

Abstract

Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions [...] Read more.

Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions of HIV aspartic protease compared to HIV subtype B, which affects the binding affinity. Recently, a novel double-insertion mutation at codon 38 (L38HL) was characterized in HIV subtype C protease, whose effects on the interaction with protease inhibitors are hitherto unknown. In this study, the potential of L38HL double-insertion in HIV subtype C protease to induce a drug resistance phenotype towards the protease inhibitor, Saquinavir (SQV), was probed using various computational techniques, such as molecular dynamics simulations, binding free energy calculations, local conformational changes and principal component analysis. The results indicate that the L38HL mutation exhibits an increase in flexibility at the hinge and flap regions with a decrease in the binding affinity of SQV in comparison with wild-type HIV protease C. Further, we observed a wide opening at the binding site in the L38HL variant due to an alteration in flap dynamics, leading to a decrease in interactions with the binding site of the mutant protease. It is supported by an altered direction of motion of flap residues in the L38HL variant compared with the wild-type. These results provide deep insights into understanding the potential drug resistance phenotype in infected individuals. Full article

► Show Figures

Figure 1

10 pages, 2570 KB

Open AccessTechnical Note

DraculR: A Web-Based Application for In Silico Haemolysis Detection in High-Throughput microRNA Sequencing Data

by Melanie D. Smith, Shalem Y. Leemaqz, Tanja Jankovic-Karasoulos, Dylan McCullough, Dale McAninch, Anya L. Arthurs, James Breen, Claire T. Roberts and Katherine A. Pillman

Genes 2023, 14(2), 448; https://doi.org/10.3390/genes14020448 - 9 Feb 2023

Cited by 3 | Viewed by 3135

Abstract

The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived [...] Read more.

The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived nature of miRNA transcripts in plasma, giving researchers a functional window for tissues that are otherwise difficult or disadvantageous to sample. The inclusion of red-blood-cell-derived miRNA transcripts in downstream analysis introduces a source of error that is difficult to identify posthoc and may lead to spurious results. Where access to a physical specimen is not possible, our tool will provide an in silico approach to haemolysis prediction. We present DraculR, an interactive Shiny/R application that enables a user to upload miRNA expression data from a short-read sequencing of human plasma as a raw read counts table and interactively calculate a metric that indicates the degree of haemolysis contamination. The code, DraculR web tool and its tutorial are freely available as detailed herein. Full article

► Show Figures

Figure 1

11 pages, 579 KB

Open AccessReview

Networks as Biomarkers: Uses and Purposes

by Caterina Alfano, Lorenzo Farina and Manuela Petti

Genes 2023, 14(2), 429; https://doi.org/10.3390/genes14020429 - 8 Feb 2023

Cited by 19 | Viewed by 3596

Abstract

Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the [...] Read more.

Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the integration of biomarkers of different natures is an emerging trend in the literature. Network analysis can be used to analyze the relationships between different features of a disease; nodes can be disease-related phenotypes, gene expression, mutational events, protein quantification, imaging-derived features and more. Since different biomarkers can exert causal effects between them, describing such interrelationships can be used to better understand the underlying mechanisms of complex diseases. Networks as biomarkers are not yet commonly used, despite being proven to lead to interesting results. Here, we discuss in which ways they have been used to provide novel insights into disease susceptibility, disease development and severity. Full article

► Show Figures

Figure 1

15 pages, 6994 KB

Open AccessArticle

An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF

by Kai Liu, Qi Chen and Guo-Hua Huang

Genes 2023, 14(2), 421; https://doi.org/10.3390/genes14020421 - 6 Feb 2023

Cited by 10 | Viewed by 3111

Abstract

Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation [...] Read more.

Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method’s categorization is superior to state-of-the-art feature selection approaches. Full article

► Show Figures

Figure 1

18 pages, 816 KB

Open AccessReview

Translational Bioinformatics Applied to the Study of Complex Diseases

by Matheus Correia Casotti, Débora Dummer Meira, Lyvia Neves Rebello Alves, Barbara Gomes de Oliveira Bessa, Camilly Victória Campanharo, Creuza Rachel Vicente, Carla Carvalho Aguiar, Daniel de Almeida Duque, Débora Gonçalves Barbosa, Eldamária de Vargas Wolfgramm dos Santos, Fernanda Mariano Garcia, Flávia de Paula, Gabriel Mendonça Santana, Isabele Pagani Pavan, Luana Santos Louro, Raquel Furlani Rocon Braga, Raquel Silva dos Reis Trabach, Thomas Santos Louro, Elizeu Fagundes de Carvalho and Iúri Drumond Louro

Genes 2023, 14(2), 419; https://doi.org/10.3390/genes14020419 - 6 Feb 2023

Cited by 16 | Viewed by 5964

Abstract

Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as [...] Read more.

Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as well as their clinical applications. This technology makes it possible to access the knowledge of scientific evidence and apply it to clinical practice. This manuscript aims to highlight the role of TBI in the study of complex diseases, as well as its application to the understanding and treatment of cancer. An integrative literature review was carried out, obtaining articles through several websites, among them: PUBMED, Science Direct, NCBI-PMC, Scientific Electronic Library Online (SciELO), and Google Academic, published in English, Spanish, and Portuguese, indexed in the referred databases and answering the following guiding question: “How does TBI provide a scientific understanding of complex diseases?” An additional effort is aimed at the dissemination, inclusion, and perpetuation of TBI knowledge from the academic environment to society, helping the study, understanding, and elucidating of complex disease mechanics and their treatment. Full article

► Show Figures

Figure 1

21 pages, 1401 KB

Open AccessArticle

Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier

by Magdalena Kircher, Josefin Säurich, Michael Selle and Klaus Jung

Genes 2023, 14(2), 387; https://doi.org/10.3390/genes14020387 - 1 Feb 2023

Viewed by 3773

Abstract

Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model [...] Read more.

Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses. Full article

► Show Figures

Figure 1

20 pages, 36173 KB

Open AccessArticle

Reconstruction of Single-Cell Trajectories Using Stochastic Tree Search

by Jingyi Zhai, Hongkai Ji and Hui Jiang

Genes 2023, 14(2), 318; https://doi.org/10.3390/genes14020318 - 26 Jan 2023

Viewed by 2342

Abstract

The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods [...] Read more.

The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods for modeling cell trajectories, such as minimal spanning tree or k-nearest neighbor graph, often lead to locally optimal solutions. In this paper, we propose a penalized likelihood-based framework and introduce a stochastic tree search (STS) algorithm aiming at the global solution in a large and non-convex tree space. Both simulated and real data experiments show that our approach is more accurate and robust than other existing methods in terms of cell ordering and pseudotime estimation. Full article

► Show Figures

Figure 1

14 pages, 2836 KB

Open AccessArticle

Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models

by Kyu-Ree Dhong, Jae-Hyeong Lee, You-Rim Yoon and Hye-Jin Park

Genes 2023, 14(2), 284; https://doi.org/10.3390/genes14020284 - 21 Jan 2023

Cited by 7 | Viewed by 3768

Abstract

Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers [...] Read more.

Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers that can diagnose the COPD phenotype to plan therapeutic efficacy. To identify potential novel biomarkers of COPD, we first obtained COPD and the normal lung tissue gene expression dataset GSE151052 from the NCBI Gene Expression Omnibus (GEO). A total of 250 differentially expressed genes (DEGs) were investigated and analyzed using GEO2R, gene ontology (GO) functional annotation, and Kyoto Encyclopedia of Genes and Genomes (KEGG) identification. The GEO2R analysis revealed that TRPC6 was the sixth most highly expressed gene in patients with COPD. The GO analysis indicated that the upregulated DEGs were mainly concentrated in the plasma membrane, transcription, and DNA binding. The KEGG pathway analysis indicated that the upregulated DEGs were mainly involved in pathways related to cancer and axon guidance. TRPC6, one of the most abundant genes among the top 10 differentially expressed total RNAs (fold change ≥ 1.5) between the COPD and normal groups, was selected as a novel COPD biomarker based on the results of the GEO dataset and analysis using machine learning models. The upregulation of TRPC6 was verified in PM-stimulated RAW264.7 cells, which mimicked COPD conditions, compared to untreated RAW264.7 cells by a quantitative reverse transcription polymerase chain reaction. In conclusion, our study suggests that TRPC6 can be regarded as a potential novel biomarker for COPD pathogenesis. Full article

► Show Figures

Figure 1

20 pages, 2837 KB

Open AccessArticle

Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite

by Ahmed Ibrahem Hafez, Beatriz Soriano, Aya Allah Elsayed, Ricardo Futami, Raquel Ceprian, Ricardo Ramos-Ruiz, Genis Martinez, Francisco Jose Roig, Miguel Angel Torres-Font, Fernando Naya-Catala, Josep Alvar Calduch-Giner, Lucia Trilla-Fuertes, Angelo Gamez-Pozo, Vicente Arnau, Jose Maria Sempere-Luna, Jaume Perez-Sanchez, Toni Gabaldon and Carlos Llorens

Genes 2023, 14(2), 267; https://doi.org/10.3390/genes14020267 - 19 Jan 2023

Cited by 5 | Viewed by 5015

Abstract

The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called [...] Read more.

The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeq” and “VariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software. Full article

► Show Figures

Figure 1

2022

Jump to: 2026, 2025, 2024, 2023

20 pages, 11313 KB

Open AccessArticle

Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Almond Reveal Its Expression Features in Different Flowering Periods

by Xingyue Liu, Dongdong Zhang, Zhenfan Yu, Bin Zeng, Shaobo Hu, Wenwen Gao, Xintong Ma, Yawen He and Huanxue Qin

Genes 2022, 13(10), 1764; https://doi.org/10.3390/genes13101764 - 29 Sep 2022

Cited by 4 | Viewed by 3436

Abstract

The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for [...] Read more.

The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for more than 50% of the world’s dried fruit trade, and one of the main economic fruit trees in Kashgar, Xinjiang. In addition, almonds contain a variety of nutrients, such as protein and dietary fiber, which can supplement nutrients for people. They also have the functions of nourishing the yin and kidneys, improving eyesight, and strengthening the brain, and they can be applied to various diseases. However, there is no report on the MADS-box gene family in almond (Prunus dulcis). In this study, a total of 67 PdMADS genes distributed across 8 chromosomes were identified from the genome of almond ‘Wanfeng’. The PdMADS members were divided into five subgroups—Mα, Mβ, Mγ, Mδ, and MIKC—and the members in each subgroup had conserved motif types and exon and intron numbers. The number of exons of PdMADS members ranged from 1 to 20, and the number of introns ranged from 0 to 19. The number of exons and introns of different subfamily members varied greatly. The results of gene duplication analysis showed that the PdMADS members had 16 pairs of segmental duplications and 9 pairs of tandem duplications, so we further explored the relationship between the MADS-box gene members in almond and those in Arabidopsis thaliana, Oryza sativa, Malus domestica, and Prunus persica based on colinear genes and evolutionary selection pressure. The results of the cis-acting elements showed that the PdMADS members were extensively involved in a variety of processes, such as almond growth and development, hormone regulation, and stress response. In addition, the expression patterns of PdMADS members across six floral transcriptome samples from two almond cultivars, ‘Wanfeng’ and ‘Nonpareil’, had significant expression differences. Subsequently, the fluorescence quantitative expression levels of the 15 PdMADS genes were highly similar to the transcriptome expression patterns, and the gene expression levels increased in the samples at different flowering stages, indicating that the two almond cultivars expressed different PdMADS genes during the flowering process. It is worth noting that the difference in flowering time between ‘Wanfeng’ and ‘Nonpareil’ may be caused by the different expression activities of PdMADS47 and PdMADS16 during the dormancy period, resulting in different processes of vernalization. We identified a total of 13,515 target genes in the genome based on the MIKC DNA-binding sites. The GO and KEGG enrichment results showed that these target genes play important roles in protein function and multiple pathways. In summary, we conducted bioinformatics and expression pattern studies on the PdMADS gene family and investigated six flowering samples from two almond cultivars, the early-flowering ‘Wanfeng’ and late-flowering ‘Nonpareil’, for quantitative expression level identification. These findings lay a foundation for future in-depth studies on the mechanism of PdMADS gene regulation during flowering in different almond cultivars. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Feature Papers in Bioinformatics

Share This Topical Collection

Editor

Topical Collection Information

Keywords

Published Papers (30 papers)

2026

Jump to: 2025, 2024, 2023, 2022

2025

Jump to: 2026, 2024, 2023, 2022

2024

Jump to: 2026, 2025, 2023, 2022

2023

Jump to: 2026, 2025, 2024, 2022

2022

Jump to: 2026, 2025, 2024, 2023

Further Information

Guidelines

MDPI Initiatives

Follow MDPI