Feature Papers in Bioinformatics

A topical collection in Genes (ISSN 2073-4425). This collection belongs to the section "Bioinformatics".

Viewed by 28177

Editor


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
Interests: computational molecular biology; bioinformatics; genomics; epigenetics; data mining

Topical Collection Information

Dear Colleagues,

This Special Issue, “Feature Papers in Bioinformatics”, aims to collect high-quality research articles, review articles, and communications on advances in the research area of bioinformatics. Since the aim of this topical collection is to illustrate, through selected works, frontier research in the field of bioinformatics, we encourage Editorial Board Members of the Section “Bioinformatics” to contribute feature papers reflecting the latest progress in their research field or to invite relevant senior experts and colleagues to make contributions to this Special Issue. We aim to represent our Section as an attractive open-access publishing platform for bioinformatics. Topics include but are not limited to:

  • Molecular sequence analysis
  • Sequencing and genotyping technologies
  • Regulation and epigenomics
  • Transcriptomics, including single-cell
  • Metagenomics
  • Population and statistical genetics
  • Evolutionary, compressive, and comparative genomics
  • Structure and function of non-coding RNAs
  • Computational proteomics and proteogenomics
  • Protein structure and function
  • Biological networks
  • Computational systems biology
  • Privacy of biomedical data
  • Bioimaging

Prof. Dr. Stefano Lonardi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sequence analysis
  • sequencing technologies
  • genotyping technologies
  • gene regulation
  • epigenomics
  • epigenetics
  • transcriptomics
  • single-cell
  • metagenomics
  • population genetics
  • statistical genetics
  • comparative genomics
  • non-coding RNAs
  • proteomics
  • proteogenomics
  • systems biology
  • privacy of biomedical data
  • bioimaging

Published Papers (17 papers)

2024

Jump to: 2023, 2022

11 pages, 453 KiB  
Article
The Effect of Genome Parametrization and SNP Marker Subsetting on Genomic Selection in Autotetraploid Alfalfa
by Nelson Nazzicari, Nicolò Franguelli, Barbara Ferrari, Luciano Pecetti and Paolo Annicchiarico
Genes 2024, 15(4), 449; https://doi.org/10.3390/genes15040449 - 02 Apr 2024
Viewed by 494
Abstract
Background: Alfalfa, the most economically important forage legume worldwide, features modest genetic progress due to long selection cycles and the extent of the non-additive genetic variance associated with its autotetraploid genome. Methods: To improve the efficiency of genomic selection in alfalfa, we explored [...] Read more.
Background: Alfalfa, the most economically important forage legume worldwide, features modest genetic progress due to long selection cycles and the extent of the non-additive genetic variance associated with its autotetraploid genome. Methods: To improve the efficiency of genomic selection in alfalfa, we explored the effects of genome parametrization (as tetraploid and diploid dosages, plus allele ratios) and SNP marker subsetting (all available SNPs, only genic regions, and only non-genic regions) on genomic regressions, together with various levels of filtering on reading depth and missing rates. We used genotyping by sequencing-generated data and focused on traits of different genetic complexity, i.e., dry biomass yield in moisture-favorable (FE) and drought stress (SE) environments, leaf size, and the onset of flowering, which were assessed in 143 genotyped plants from a genetically broad European reference population and their phenotyped half-sib progenies. Results: On average, the allele ratio improved the predictive ability compared with other genome parametrizations (+7.9% vs. tetraploid dosage, +12.6% vs. diploid dosage), while using all the SNPs offered an advantage compared with any specific SNP subsetting (+3.7% vs. genic regions, +7.6% vs. non-genic regions). However, when focusing on specific traits, different combinations of genome parametrization and subsetting achieved better performances. We also released Legpipe2, an SNP calling pipeline tailored for reduced representation (GBS, RAD) in medium-sized genotyping experiments. Full article
Show Figures

Figure 1

17 pages, 1389 KiB  
Article
Data Augmentation Enhances Plant-Genomic-Enabled Predictions
by Osval A. Montesinos-López, Mario Alberto Solis-Camacho, Leonardo Crespo-Herrera, Carolina Saint Pierre, Gloria Isabel Huerta Prado, Sofia Ramos-Pulido, Khalid Al-Nowibet, Roberto Fritsche-Neto, Guillermo Gerard, Abelardo Montesinos-López and José Crossa
Genes 2024, 15(3), 286; https://doi.org/10.3390/genes15030286 - 24 Feb 2024
Viewed by 1320
Abstract
Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data [...] Read more.
Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings. Full article
Show Figures

Figure 1

12 pages, 1854 KiB  
Article
Transcription Factor Regulation of Gene Expression Network by ZNF385D and HAND2 in Carotid Atherosclerosis
by Ming Tan, Lars Juel Andersen, Niels Eske Bruun, Matias Greve Lindholm, Qihua Tan and Martin Snoer
Genes 2024, 15(2), 213; https://doi.org/10.3390/genes15020213 - 07 Feb 2024
Viewed by 855
Abstract
Carotid intima-media thickness (CIMT) is a surrogate indicator for atherosclerosis and has been shown to predict cardiovascular risk in multiple large studies. Identification of molecular markers for carotid atheroma plaque formation can be critical for early intervention and prevention of atherosclerosis. This study [...] Read more.
Carotid intima-media thickness (CIMT) is a surrogate indicator for atherosclerosis and has been shown to predict cardiovascular risk in multiple large studies. Identification of molecular markers for carotid atheroma plaque formation can be critical for early intervention and prevention of atherosclerosis. This study performed transcription factor (TF) network analysis of global gene expression data focusing on two TF genes, ZNF385D and HAND2, whose polymorphisms have been recently reported to show association with CIMT. Genome-wide gene expression data were measured from pieces of carotid endarterectomy collected from 34 hypertensive patients (atheroma plaque of stages IV and above according to the Stary classification) each paired with one sample of distant macroscopically intact tissue (stages I and II). Transcriptional regulation networks or the regulons were reconstructed for ZNF385D (5644 target genes) and HAND2 (781 target genes) using network inference. Their association with the progression of carotid atheroma was examined using gene-set enrichment analysis with extremely high statistical significance for regulons of both ZNF385D and HAND2 (p < 6.95 × 10−7) suggesting the involvement of expression quantitative loci (eQTL). Functional annotation of the regulon genes found heavy involvement in the immune system’s response to inflammation and infection in the development of atherosclerosis. Detailed examination of the regulation and correlation patterns suggests that activities of the two TF genes could have high clinical and interventional impacts on impairing carotid atheroma plaque formation and preventing carotid atherosclerosis. Full article
Show Figures

Figure 1

2023

Jump to: 2024, 2022

16 pages, 729 KiB  
Article
Base-Excision Repair Mutational Signature in Two Sebaceous Carcinomas of the Eyelid
by Eugenio Sangiorgi, Federico Giannuzzi, Clelia Molinario, Giulia Rapari, Melania Riccio, Giovanni Cuffaro, Federica Castri, Roberta Benvenuto, Maurizio Genuardi, Daniela Massi and Gustavo Savino
Genes 2023, 14(11), 2055; https://doi.org/10.3390/genes14112055 - 08 Nov 2023
Viewed by 972
Abstract
Personalized medicine aims to develop tailored treatments for individual patients based on specific mutations present in the affected organ. This approach has proven paramount in cancer treatment, as each tumor carries distinct driver mutations that respond to targeted drugs and, in some cases, [...] Read more.
Personalized medicine aims to develop tailored treatments for individual patients based on specific mutations present in the affected organ. This approach has proven paramount in cancer treatment, as each tumor carries distinct driver mutations that respond to targeted drugs and, in some cases, may confer resistance to other therapies. Particularly for rare conditions, personalized medicine has the potential to revolutionize treatment strategies. Rare cancers often lack extensive datasets of molecular and pathological information, large-scale trials for novel therapies, and established treatment guidelines. Consequently, surgery is frequently the only viable option for many rare tumors, when feasible, as traditional multimodal approaches employed for more common cancers often play a limited role. Sebaceous carcinoma of the eyelid is an exceptionally rare cancer affecting the eye’s adnexal tissues, most frequently reported in Asia, but whose prevalence is significantly increasing even in Europe and the US. The sole established curative treatment is surgical excision, which can lead to significant disfigurement. In cases of metastatic sebaceous carcinoma, validated drug options are currently lacking. In this project, we set out to characterize the mutational landscape of two sebaceous carcinomas of the eyelid following surgical excision. Utilizing available bioinformatics tools, we demonstrated our ability to identify common features promptly and accurately in both tumors. These features included a Base-Excision Repair mutational signature, a notably high tumor mutational burden, and key driver mutations in somatic tissues. These findings had not been previously reported in similar studies. This report underscores how, in the case of rare tumors, it is possible to comprehensively characterize the mutational landscape of each individual case, potentially opening doors to targeted therapeutic options. Full article
Show Figures

Figure 1

15 pages, 1626 KiB  
Article
SNPtotree—Resolving the Phylogeny of SNPs on Non-Recombining DNA
by Zehra Köksal, Claus Børsting, Leonor Gusmão and Vania Pereira
Genes 2023, 14(10), 1837; https://doi.org/10.3390/genes14101837 - 22 Sep 2023
Viewed by 1705
Abstract
Genetic variants on non-recombining DNA and the hierarchical order in which they accumulate are commonly of interest. This variant hierarchy can be established and combined with information on the population and geographic origin of the individuals carrying the variants to find population structures [...] Read more.
Genetic variants on non-recombining DNA and the hierarchical order in which they accumulate are commonly of interest. This variant hierarchy can be established and combined with information on the population and geographic origin of the individuals carrying the variants to find population structures and infer migration patterns. Further, individuals can be assigned to the characterized populations, which is relevant in forensic genetics, genetic genealogy, and epidemiologic studies. However, there is currently no straightforward method to obtain such a variant hierarchy. Here, we introduce the software SNPtotree v1.0, which uniquely determines the hierarchical order of variants on non-recombining DNA without error-prone manual sorting. The algorithm uses pairwise variant comparisons to infer their relationships and integrates the combined information into a phylogenetic tree. Variants that have contradictory pairwise relationships or ambiguous positions in the tree are removed by the software. When benchmarked using two human Y-chromosomal massively parallel sequencing datasets, SNPtotree outperforms traditional methods in the accuracy of phylogenetic trees for sequencing data with high amounts of missing information. The phylogenetic trees of variants created using SNPtotree can be used to establish and maintain publicly available phylogeny databases to further explore genetic epidemiology and genealogy, as well as population and forensic genetics. Full article
Show Figures

Figure 1

9 pages, 2055 KiB  
Article
PMIDigest: Interactive Review of Large Collections of PubMed Entries to Distill Relevant Information
by Jorge Novoa, Mónica Chagoyen, Carlos Benito, F. Javier Moreno and Florencio Pazos
Genes 2023, 14(4), 942; https://doi.org/10.3390/genes14040942 - 19 Apr 2023
Cited by 2 | Viewed by 1351
Abstract
Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands [...] Read more.
Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands of entries (articles) that are difficult to process manually. In this work, we present an interactive tool for automatically digesting large sets of PubMed articles: PMIDigest (PubMed IDs digester). The system allows for classification/sorting of articles according to different criteria, including the type of article and different citation-related figures. It also calculates the distribution of MeSH (medical subject headings) terms for categories of interest, providing in a picture of the themes addressed in the set. These MeSH terms are highlighted in the article abstracts in different colors depending on the category. An interactive representation of the interarticle citation network is also presented in order to easily locate article “clusters” related to particular subjects, as well as their corresponding “hub” articles. In addition to PubMed articles, the system can also process a set of Scopus or Web of Science entries. In summary, with this system, the user can have a “bird’s eye view” of a large set of articles and their main thematic tendencies and obtain additional information not evident in a plain list of abstracts. Full article
Show Figures

Figure 1

20 pages, 1802 KiB  
Review
Computational Biology Helps Understand How Polyploid Giant Cancer Cells Drive Tumor Success
by Matheus Correia Casotti, Débora Dummer Meira, Aléxia Stefani Siqueira Zetum, Bruno Cancian de Araújo, Danielle Ribeiro Campos da Silva, Eldamária de Vargas Wolfgramm dos Santos, Fernanda Mariano Garcia, Flávia de Paula, Gabriel Mendonça Santana, Luana Santos Louro, Lyvia Neves Rebello Alves, Raquel Furlani Rocon Braga, Raquel Silva dos Reis Trabach, Sara Santos Bernardes, Thomas Erik Santos Louro, Eduardo Cremonese Filippi Chiela, Guido Lenz, Elizeu Fagundes de Carvalho and Iúri Drumond Louro
Genes 2023, 14(4), 801; https://doi.org/10.3390/genes14040801 - 26 Mar 2023
Cited by 4 | Viewed by 2955
Abstract
Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. [...] Read more.
Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. Newly formed cells end up accessing numerous multicellular and unicellular programs that enable metastasis, drug resistance, tumor recurrence, and self-renewal or diverse clone formation. An integrative literature review was carried out, searching articles in several sites, including: PUBMED, NCBI-PMC, and Google Academic, published in English, indexed in referenced databases and without a publication time filter, but prioritizing articles from the last 3 years, to answer the following questions: (i) “What is the current knowledge about polyploidy in tumors?”; (ii) “What are the applications of computational studies for the understanding of cancer polyploidy?”; and (iii) “How do PGCCs contribute to tumorigenesis?” Full article
Show Figures

Figure 1

16 pages, 3618 KiB  
Article
Understanding Drug Resistance of Wild-Type and L38HL Insertion Mutant of HIV-1 C Protease to Saquinavir
by Sankaran Venkatachalam, Nisha Murlidharan, Sowmya R. Krishnan, C. Ramakrishnan, Mpho Setshedi, Ramesh Pandian, Debmalya Barh, Sandeep Tiwari, Vasco Azevedo, Yasien Sayed and M. Michael Gromiha
Genes 2023, 14(2), 533; https://doi.org/10.3390/genes14020533 - 20 Feb 2023
Cited by 1 | Viewed by 1609
Abstract
Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions [...] Read more.
Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions of HIV aspartic protease compared to HIV subtype B, which affects the binding affinity. Recently, a novel double-insertion mutation at codon 38 (L38HL) was characterized in HIV subtype C protease, whose effects on the interaction with protease inhibitors are hitherto unknown. In this study, the potential of L38HL double-insertion in HIV subtype C protease to induce a drug resistance phenotype towards the protease inhibitor, Saquinavir (SQV), was probed using various computational techniques, such as molecular dynamics simulations, binding free energy calculations, local conformational changes and principal component analysis. The results indicate that the L38HL mutation exhibits an increase in flexibility at the hinge and flap regions with a decrease in the binding affinity of SQV in comparison with wild-type HIV protease C. Further, we observed a wide opening at the binding site in the L38HL variant due to an alteration in flap dynamics, leading to a decrease in interactions with the binding site of the mutant protease. It is supported by an altered direction of motion of flap residues in the L38HL variant compared with the wild-type. These results provide deep insights into understanding the potential drug resistance phenotype in infected individuals. Full article
Show Figures

Figure 1

10 pages, 2570 KiB  
Technical Note
DraculR: A Web-Based Application for In Silico Haemolysis Detection in High-Throughput microRNA Sequencing Data
by Melanie D. Smith, Shalem Y. Leemaqz, Tanja Jankovic-Karasoulos, Dylan McCullough, Dale McAninch, Anya L. Arthurs, James Breen, Claire T. Roberts and Katherine A. Pillman
Genes 2023, 14(2), 448; https://doi.org/10.3390/genes14020448 - 09 Feb 2023
Cited by 1 | Viewed by 1239
Abstract
The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived [...] Read more.
The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived nature of miRNA transcripts in plasma, giving researchers a functional window for tissues that are otherwise difficult or disadvantageous to sample. The inclusion of red-blood-cell-derived miRNA transcripts in downstream analysis introduces a source of error that is difficult to identify posthoc and may lead to spurious results. Where access to a physical specimen is not possible, our tool will provide an in silico approach to haemolysis prediction. We present DraculR, an interactive Shiny/R application that enables a user to upload miRNA expression data from a short-read sequencing of human plasma as a raw read counts table and interactively calculate a metric that indicates the degree of haemolysis contamination. The code, DraculR web tool and its tutorial are freely available as detailed herein. Full article
Show Figures

Figure 1

11 pages, 579 KiB  
Review
Networks as Biomarkers: Uses and Purposes
by Caterina Alfano, Lorenzo Farina and Manuela Petti
Genes 2023, 14(2), 429; https://doi.org/10.3390/genes14020429 - 08 Feb 2023
Cited by 3 | Viewed by 1544
Abstract
Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the [...] Read more.
Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the integration of biomarkers of different natures is an emerging trend in the literature. Network analysis can be used to analyze the relationships between different features of a disease; nodes can be disease-related phenotypes, gene expression, mutational events, protein quantification, imaging-derived features and more. Since different biomarkers can exert causal effects between them, describing such interrelationships can be used to better understand the underlying mechanisms of complex diseases. Networks as biomarkers are not yet commonly used, despite being proven to lead to interesting results. Here, we discuss in which ways they have been used to provide novel insights into disease susceptibility, disease development and severity. Full article
Show Figures

Figure 1

15 pages, 6994 KiB  
Article
An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF
by Kai Liu, Qi Chen and Guo-Hua Huang
Genes 2023, 14(2), 421; https://doi.org/10.3390/genes14020421 - 06 Feb 2023
Cited by 3 | Viewed by 1365
Abstract
Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation [...] Read more.
Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method’s categorization is superior to state-of-the-art feature selection approaches. Full article
Show Figures

Figure 1

18 pages, 816 KiB  
Review
Translational Bioinformatics Applied to the Study of Complex Diseases
by Matheus Correia Casotti, Débora Dummer Meira, Lyvia Neves Rebello Alves, Barbara Gomes de Oliveira Bessa, Camilly Victória Campanharo, Creuza Rachel Vicente, Carla Carvalho Aguiar, Daniel de Almeida Duque, Débora Gonçalves Barbosa, Eldamária de Vargas Wolfgramm dos Santos, Fernanda Mariano Garcia, Flávia de Paula, Gabriel Mendonça Santana, Isabele Pagani Pavan, Luana Santos Louro, Raquel Furlani Rocon Braga, Raquel Silva dos Reis Trabach, Thomas Santos Louro, Elizeu Fagundes de Carvalho and Iúri Drumond Louro
Genes 2023, 14(2), 419; https://doi.org/10.3390/genes14020419 - 06 Feb 2023
Cited by 3 | Viewed by 2612
Abstract
Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as [...] Read more.
Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as well as their clinical applications. This technology makes it possible to access the knowledge of scientific evidence and apply it to clinical practice. This manuscript aims to highlight the role of TBI in the study of complex diseases, as well as its application to the understanding and treatment of cancer. An integrative literature review was carried out, obtaining articles through several websites, among them: PUBMED, Science Direct, NCBI-PMC, Scientific Electronic Library Online (SciELO), and Google Academic, published in English, Spanish, and Portuguese, indexed in the referred databases and answering the following guiding question: “How does TBI provide a scientific understanding of complex diseases?” An additional effort is aimed at the dissemination, inclusion, and perpetuation of TBI knowledge from the academic environment to society, helping the study, understanding, and elucidating of complex disease mechanics and their treatment. Full article
Show Figures

Figure 1

21 pages, 1401 KiB  
Article
Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
by Magdalena Kircher, Josefin Säurich, Michael Selle and Klaus Jung
Genes 2023, 14(2), 387; https://doi.org/10.3390/genes14020387 - 01 Feb 2023
Viewed by 1727
Abstract
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model [...] Read more.
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses. Full article
Show Figures

Figure 1

20 pages, 36173 KiB  
Article
Reconstruction of Single-Cell Trajectories Using Stochastic Tree Search
by Jingyi Zhai, Hongkai Ji and Hui Jiang
Genes 2023, 14(2), 318; https://doi.org/10.3390/genes14020318 - 26 Jan 2023
Viewed by 1139
Abstract
The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods [...] Read more.
The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods for modeling cell trajectories, such as minimal spanning tree or k-nearest neighbor graph, often lead to locally optimal solutions. In this paper, we propose a penalized likelihood-based framework and introduce a stochastic tree search (STS) algorithm aiming at the global solution in a large and non-convex tree space. Both simulated and real data experiments show that our approach is more accurate and robust than other existing methods in terms of cell ordering and pseudotime estimation. Full article
Show Figures

Figure 1

14 pages, 2836 KiB  
Article
Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models
by Kyu-Ree Dhong, Jae-Hyeong Lee, You-Rim Yoon and Hye-Jin Park
Genes 2023, 14(2), 284; https://doi.org/10.3390/genes14020284 - 21 Jan 2023
Cited by 5 | Viewed by 1813
Abstract
Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers [...] Read more.
Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers that can diagnose the COPD phenotype to plan therapeutic efficacy. To identify potential novel biomarkers of COPD, we first obtained COPD and the normal lung tissue gene expression dataset GSE151052 from the NCBI Gene Expression Omnibus (GEO). A total of 250 differentially expressed genes (DEGs) were investigated and analyzed using GEO2R, gene ontology (GO) functional annotation, and Kyoto Encyclopedia of Genes and Genomes (KEGG) identification. The GEO2R analysis revealed that TRPC6 was the sixth most highly expressed gene in patients with COPD. The GO analysis indicated that the upregulated DEGs were mainly concentrated in the plasma membrane, transcription, and DNA binding. The KEGG pathway analysis indicated that the upregulated DEGs were mainly involved in pathways related to cancer and axon guidance. TRPC6, one of the most abundant genes among the top 10 differentially expressed total RNAs (fold change ≥ 1.5) between the COPD and normal groups, was selected as a novel COPD biomarker based on the results of the GEO dataset and analysis using machine learning models. The upregulation of TRPC6 was verified in PM-stimulated RAW264.7 cells, which mimicked COPD conditions, compared to untreated RAW264.7 cells by a quantitative reverse transcription polymerase chain reaction. In conclusion, our study suggests that TRPC6 can be regarded as a potential novel biomarker for COPD pathogenesis. Full article
Show Figures

Figure 1

20 pages, 2837 KiB  
Article
Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite
by Ahmed Ibrahem Hafez, Beatriz Soriano, Aya Allah Elsayed, Ricardo Futami, Raquel Ceprian, Ricardo Ramos-Ruiz, Genis Martinez, Francisco Jose Roig, Miguel Angel Torres-Font, Fernando Naya-Catala, Josep Alvar Calduch-Giner, Lucia Trilla-Fuertes, Angelo Gamez-Pozo, Vicente Arnau, Jose Maria Sempere-Luna, Jaume Perez-Sanchez, Toni Gabaldon and Carlos Llorens
Genes 2023, 14(2), 267; https://doi.org/10.3390/genes14020267 - 19 Jan 2023
Viewed by 2296
Abstract
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called [...] Read more.
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeqandVariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software. Full article
Show Figures

Figure 1

2022

Jump to: 2024, 2023

20 pages, 11313 KiB  
Article
Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Almond Reveal Its Expression Features in Different Flowering Periods
by Xingyue Liu, Dongdong Zhang, Zhenfan Yu, Bin Zeng, Shaobo Hu, Wenwen Gao, Xintong Ma, Yawen He and Huanxue Qin
Genes 2022, 13(10), 1764; https://doi.org/10.3390/genes13101764 - 29 Sep 2022
Cited by 1 | Viewed by 1798
Abstract
The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for [...] Read more.
The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for more than 50% of the world’s dried fruit trade, and one of the main economic fruit trees in Kashgar, Xinjiang. In addition, almonds contain a variety of nutrients, such as protein and dietary fiber, which can supplement nutrients for people. They also have the functions of nourishing the yin and kidneys, improving eyesight, and strengthening the brain, and they can be applied to various diseases. However, there is no report on the MADS-box gene family in almond (Prunus dulcis). In this study, a total of 67 PdMADS genes distributed across 8 chromosomes were identified from the genome of almond ‘Wanfeng’. The PdMADS members were divided into five subgroups—Mα, Mβ, Mγ, Mδ, and MIKC—and the members in each subgroup had conserved motif types and exon and intron numbers. The number of exons of PdMADS members ranged from 1 to 20, and the number of introns ranged from 0 to 19. The number of exons and introns of different subfamily members varied greatly. The results of gene duplication analysis showed that the PdMADS members had 16 pairs of segmental duplications and 9 pairs of tandem duplications, so we further explored the relationship between the MADS-box gene members in almond and those in Arabidopsis thaliana, Oryza sativa, Malus domestica, and Prunus persica based on colinear genes and evolutionary selection pressure. The results of the cis-acting elements showed that the PdMADS members were extensively involved in a variety of processes, such as almond growth and development, hormone regulation, and stress response. In addition, the expression patterns of PdMADS members across six floral transcriptome samples from two almond cultivars, ‘Wanfeng’ and ‘Nonpareil’, had significant expression differences. Subsequently, the fluorescence quantitative expression levels of the 15 PdMADS genes were highly similar to the transcriptome expression patterns, and the gene expression levels increased in the samples at different flowering stages, indicating that the two almond cultivars expressed different PdMADS genes during the flowering process. It is worth noting that the difference in flowering time between ‘Wanfeng’ and ‘Nonpareil’ may be caused by the different expression activities of PdMADS47 and PdMADS16 during the dormancy period, resulting in different processes of vernalization. We identified a total of 13,515 target genes in the genome based on the MIKC DNA-binding sites. The GO and KEGG enrichment results showed that these target genes play important roles in protein function and multiple pathways. In summary, we conducted bioinformatics and expression pattern studies on the PdMADS gene family and investigated six flowering samples from two almond cultivars, the early-flowering ‘Wanfeng’ and late-flowering ‘Nonpareil’, for quantitative expression level identification. These findings lay a foundation for future in-depth studies on the mechanism of PdMADS gene regulation during flowering in different almond cultivars. Full article
Show Figures

Figure 1

Back to TopTop