Special Issue "Bioinformatics and Computational Biology 2019"

A special issue of Cells (ISSN 2073-4409).

Deadline for manuscript submissions: closed (31 December 2019).

Special Issue Editors

Dr. Ka-Chun Wong
E-Mail Website
Guest Editor
Departament of Computer Science, City University of Hong Kong, Hong Kong
Interests: bioinformatics; computational biology; applied machine learning; data science; evolutionary computation and numerical optimization
Special Issues and Collections in MDPI journals
Dr. Kathleen Steinhofel
E-Mail Website
Guest Editor
Department of Informatics, King's College London, London, UK
Interests: bioinformatics; combinatorial optimization; energy landscape analysis; algorithms; structural prediction; evolutionary computation
Dr. James Cai
E-Mail Website
Guest Editor
Departament of Veterinary Medicine and Biomedical Sciences, Texas A and M University, College Station, TX, USA
Interests: biostatistics; human genetics; gene expression and regulation; computational statistics; data science; evolutionary bioinformatics
Dr. Pingzhao Hu
E-Mail Website
Guest Editor
Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada
Interests: health informatics; deep learning; electronic health records; genomics; medical imaging; mobile health data; drug discovery; precision medicine

Special Issue Information

Dear Colleagues,

In recent years, we have witnessed a series of breakthroughs in molecular biology and its companion technologies; for instance, CRISPR-Cas9 technology has attracted worldwide attention for its translational applications to human embryo gene editing; the latest high-throughput sequencing technology such as nanopore sequencing has also enabled us to respond to Ebola virus spread in a rapid manner. Nonetheless, those wet-lab technologies have raised multitudes of computational challenges such as off-target sequence analysis, combinatorial sequence mutation patterns, genotype–phenotype relationship mining, high-throughput sequencing data processing, and genomic data integration with other medical data (e.g., electronic health records and medical imaging).

Therefore, in this Special Issue, we envision that bioinformatics and computational biology will play a larger role than in the past for addressing these computational challenges. We invite your contributions, either in the form of original research articles, reviews, or shorter perspective articles on all aspects related to the theme of “Bioinformatics and Computational Biology”. Articles with sound methodology and scientific practice are particularly welcomed. Relevant topics include, but are not limited to, the following:

  • Bioinformatics
  • Computational biology
  • Biostatistics
  • Health informatics
  • Medical informatics
  • Evolutionary bioinformatics
  • Genome research
  • Metagenomic research
  • High-throughput sequencing technology
  • High-throughput biodata management
  • Emerging topics in bioinformatics
  • Emerging topics in computational biology

Dr. Ka-Chun Wong
Dr. Kathleen Steinhofel
Dr. James Cai
Dr. Pingzhao Hu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Cells is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Bioinformatics
  • Computational biology
  • Biostatistics
  • Health informatics
  • Medical informatics
  • Evolutionary bioinformatics
  • Genomics
  • Metagenomics
  • High-throughput sequencing technology
  • High-throughput biodata management

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

Open AccessArticle
Understanding Calcium-Dependent Conformational Changes in S100A1 Protein: A Combination of Molecular Dynamics and Gene Expression Study in Skeletal Muscle
Cells 2020, 9(1), 181; https://doi.org/10.3390/cells9010181 - 10 Jan 2020
Abstract
The S100A1 protein, involved in various physiological activities through the binding of calcium ions (Ca2+), participates in several protein-protein interaction (PPI) events after Ca2+-dependent activation. The present work investigates Ca2+-dependent conformational changes in the helix-EF hand-helix using [...] Read more.
The S100A1 protein, involved in various physiological activities through the binding of calcium ions (Ca2+), participates in several protein-protein interaction (PPI) events after Ca2+-dependent activation. The present work investigates Ca2+-dependent conformational changes in the helix-EF hand-helix using the molecular dynamics (MD) simulation approach that facilitates the understanding of Ca2+-dependent structural and dynamic distinctions between the apo and holo forms of the protein. Furthermore, the process of ion binding by inserting Ca2+ into the bulk of the apo structure was simulated by molecular dynamics. Expectations of the simulation were demonstrated using cluster analysis and a variety of structural metrics, such as interhelical angle estimation, solvent accessible surface area, hydrogen bond analysis, and contact analysis. Ca2+ triggered a rise in the interhelical angles of S100A1 on the binding site and solvent accessible surface area. Significant configurational regulations were observed in the holo protein. The findings would contribute to understanding the molecular basis of the association of Ca2+ with the S100A1 protein, which may be an appropriate study to understand the Ca2+-mediated conformational changes in the protein target. In addition, we investigated the expression profile of S100A1 in myoblast differentiation and muscle regeneration. These data showed that S100A1 is expressed in skeletal muscles. However, the expression decreases with time during the process of myoblast differentiation. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Single-Cell Expression Variability Implies Cell Function
Cells 2020, 9(1), 14; https://doi.org/10.3390/cells9010014 - 19 Dec 2019
Abstract
As single-cell RNA sequencing (scRNA-seq) data becomes widely available, cell-to-cell variability in gene expression, or single-cell expression variability (scEV), has been increasingly appreciated. However, it remains unclear whether this variability is functionally important and, if so, what are its implications for multi-cellular organisms. [...] Read more.
As single-cell RNA sequencing (scRNA-seq) data becomes widely available, cell-to-cell variability in gene expression, or single-cell expression variability (scEV), has been increasingly appreciated. However, it remains unclear whether this variability is functionally important and, if so, what are its implications for multi-cellular organisms. Here, we analyzed multiple scRNA-seq data sets from lymphoblastoid cell lines (LCLs), lung airway epithelial cells (LAECs), and dermal fibroblasts (DFs) and, for each cell type, selected a group of homogenous cells with highly similar expression profiles. We estimated the scEV levels for genes after correcting the mean-variance dependency in that data and identified 465, 466, and 364 highly variable genes (HVGs) in LCLs, LAECs, and DFs, respectively. Functions of these HVGs were found to be enriched with those biological processes precisely relevant to the corresponding cell type’s function, from which the scRNA-seq data used to identify HVGs were generated—e.g., cytokine signaling pathways were enriched in HVGs identified in LCLs, collagen formation in LAECs, and keratinization in DFs. We repeated the same analysis with scRNA-seq data from induced pluripotent stem cells (iPSCs) and identified only 79 HVGs with no statistically significant enriched functions; the overall scEV in iPSCs was of negligible magnitude. Our results support the “variation is function” hypothesis, arguing that scEV is required for cell type-specific, higher-level system function. Thus, quantifying and characterizing scEV are of importance for our understating of normal and pathological cellular processes. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Tristetraprolin/ZFP36 Regulates the Turnover of Autoimmune-Associated HLA-DQ mRNAs
Cells 2019, 8(12), 1570; https://doi.org/10.3390/cells8121570 - 04 Dec 2019
Abstract
HLA class II genes encode highly polymorphic heterodimeric proteins functioning to present antigens to T cells and stimulate a specific immune response. Many HLA genes are strongly associated with autoimmune diseases as they stimulate self-antigen specific CD4+ T cells driving pathogenic responses [...] Read more.
HLA class II genes encode highly polymorphic heterodimeric proteins functioning to present antigens to T cells and stimulate a specific immune response. Many HLA genes are strongly associated with autoimmune diseases as they stimulate self-antigen specific CD4+ T cells driving pathogenic responses against host tissues or organs. High expression of HLA class II risk genes is associated with autoimmune diseases, influencing the strength of the CD4+ T-mediated autoimmune response. The expression of HLA class II genes is regulated at both transcriptional and post-transcriptional levels. Protein components of the RNP complex binding the 3′UTR and affecting mRNA processing have previously been identified. Following on from this, the regulation of HLA-DQ2.5 risk genes, the main susceptibility genetic factor for celiac disease (CD), was investigated. The DQ2.5 molecule, encoded by HLA-DQA1*05 and HLA-DQB1*02 alleles, presents the antigenic gluten peptides to CD4+ T lymphocytes, activating the autoimmune response. The zinc-finger protein Tristetraprolin (TTP) or ZFP36 was identified to be a component of the RNP complex and has been described as a factor modulating mRNA stability. The 3′UTR of CD-associated HLA-DQA1*05 and HLA-DQB1*02 mRNAs do not contain canonical TTP binding consensus sequences, therefore an in silico approach focusing on mRNA secondary structure accessibility and stability was undertaken. Key structural differences specific to the CD-associated mRNAs were uncovered, allowing them to strongly interact with TTP through their 3′UTR, conferring a rapid turnover, in contrast to lower affinity binding to HLA non-CD associated mRNA. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data
Cells 2019, 8(10), 1161; https://doi.org/10.3390/cells8101161 - 27 Sep 2019
Abstract
Estimating cell type compositions for complex diseases is an important step to investigate the cellular heterogeneity for understanding disease etiology and potentially facilitate early disease diagnosis and prevention. Here, we developed a computationally statistical method, referring to Multi-Omics Matrix Factorization (MOMF), to estimate [...] Read more.
Estimating cell type compositions for complex diseases is an important step to investigate the cellular heterogeneity for understanding disease etiology and potentially facilitate early disease diagnosis and prevention. Here, we developed a computationally statistical method, referring to Multi-Omics Matrix Factorization (MOMF), to estimate the cell-type compositions of bulk RNA sequencing (RNA-seq) data by leveraging cell type-specific gene expression levels from single-cell RNA sequencing (scRNA-seq) data. MOMF not only directly models the count nature of gene expression data, but also effectively accounts for the uncertainty of cell type-specific mean gene expression levels. We demonstrate the benefits of MOMF through three real data applications, i.e., Glioblastomas (GBM), colorectal cancer (CRC) and type II diabetes (T2D) studies. MOMF is able to accurately estimate disease-related cell type proportions, i.e., oligodendrocyte progenitor cells and macrophage cells, which are strongly associated with the survival of GBM and CRC, respectively. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Automated Counting of Cancer Cells by Ensembling Deep Features
Cells 2019, 8(9), 1019; https://doi.org/10.3390/cells8091019 - 02 Sep 2019
Abstract
High-content and high-throughput digital microscopes have generated large image sets in biological experiments and clinical practice. Automatic image analysis techniques, such as cell counting, are in high demand. Here, cell counting was treated as a regression problem using image features (phenotypes) extracted by [...] Read more.
High-content and high-throughput digital microscopes have generated large image sets in biological experiments and clinical practice. Automatic image analysis techniques, such as cell counting, are in high demand. Here, cell counting was treated as a regression problem using image features (phenotypes) extracted by deep learning models. Three deep convolutional neural network models were developed to regress image features to their cell counts in an end-to-end way. Theoretically, ensembling imaging phenotypes should have better representative ability than a single type of imaging phenotype. We implemented this idea by integrating two types of imaging phenotypes (dot density map and foreground mask) extracted by two autoencoders and regressing the ensembled imaging phenotypes to cell counts afterwards. Two publicly available datasets with synthetic microscopic images were used to train and test the proposed models. Root mean square error, mean absolute error, mean absolute percent error, and Pearson correlation were applied to evaluate the models’ performance. The well-trained models were also applied to predict the cancer cell counts of real microscopic images acquired in a biological experiment to evaluate the roles of two colorectal-cancer-related genes. The proposed model by ensembling deep imaging features showed better performance in terms of smaller errors and larger correlations than those based on a single type of imaging feature. Overall, all models’ predictions showed a high correlation with the true cell counts. The ensembling-based model integrated high-level imaging phenotypes to improve the estimation of cell counts from high-content and high-throughput microscopic images. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations
Cells 2019, 8(9), 1012; https://doi.org/10.3390/cells8091012 - 30 Aug 2019
Cited by 2
Abstract
Aberrant expressions of long non-coding RNAs (lncRNAs) are often associated with diseases and identification of disease-related lncRNAs is helpful for elucidating complex pathogenesis. Recent methods for predicting associations between lncRNAs and diseases integrate their pertinent heterogeneous data. However, they failed to deeply integrate [...] Read more.
Aberrant expressions of long non-coding RNAs (lncRNAs) are often associated with diseases and identification of disease-related lncRNAs is helpful for elucidating complex pathogenesis. Recent methods for predicting associations between lncRNAs and diseases integrate their pertinent heterogeneous data. However, they failed to deeply integrate topological information of heterogeneous network comprising lncRNAs, diseases, and miRNAs. We proposed a novel method based on the graph convolutional network and convolutional neural network, referred to as GCNLDA, to infer disease-related lncRNA candidates. The heterogeneous network containing the lncRNA, disease, and miRNA nodes, is constructed firstly. The embedding matrix of a lncRNA-disease node pair was constructed according to various biological premises about lncRNAs, diseases, and miRNAs. A new framework based on a graph convolutional network and a convolutional neural network was developed to learn network and local representations of the lncRNA-disease pair. On the left side of the framework, the autoencoder based on graph convolution deeply integrated topological information within the heterogeneous lncRNA-disease-miRNA network. Moreover, as different node features have discriminative contributions to the association prediction, an attention mechanism at node feature level is constructed. The left side learnt the network representation of the lncRNA-disease pair. The convolutional neural networks on the right side of the framework learnt the local representation of the lncRNA-disease pair by focusing on the similarities, associations, and interactions that are only related to the pair. Compared to several state-of-the-art prediction methods, GCNLDA had superior performance. Case studies on stomach cancer, osteosarcoma, and lung cancer confirmed that GCNLDA effectively discovers the potential lncRNA-disease associations. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Characterizing Human Cell Types and Tissue Origin Using the Benford Law
Cells 2019, 8(9), 1004; https://doi.org/10.3390/cells8091004 - 29 Aug 2019
Abstract
Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its [...] Read more.
Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Transcription Factors Indirectly Regulate Genes through Nuclear Colocalization
Cells 2019, 8(7), 754; https://doi.org/10.3390/cells8070754 - 20 Jul 2019
Cited by 1
Abstract
Various types of data, including genomic sequences, transcription factor (TF) knockout data, TF-DNA interaction and expression profiles, have been used to decipher TF regulatory mechanisms. However, most of the genes affected by knockout of a particular TF are not bound by that factor. [...] Read more.
Various types of data, including genomic sequences, transcription factor (TF) knockout data, TF-DNA interaction and expression profiles, have been used to decipher TF regulatory mechanisms. However, most of the genes affected by knockout of a particular TF are not bound by that factor. Here, I showed that this interesting result can be partially explained by considering the nuclear positioning of TF knockout affected genes and TF bound genes. I found that a statistically significant number of TF knockout affected genes show nuclear colocalization with genes bound by the corresponding TF. Although these TF knockout affected genes are not directly bound by the corresponding TF; the TF tend to be in the same cellular component with the TFs that directly bind these genes. TF knockout affected genes show co-expression and tend to be involved in the same biological process with the spatially adjacent genes that are bound by the corresponding TF. These results demonstrate that TFs can regulate genes through nuclear colocalization without direct DNA binding, complementing the conventional view that TFs directly bind DNA to regulate genes. My findings will have implications in understanding TF regulatory mechanisms. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Sugar Transporter Proteins (STPs) in Gramineae Crops: Comparative Analysis, Phylogeny, Evolution, and Expression Profiling
Cells 2019, 8(6), 560; https://doi.org/10.3390/cells8060560 - 08 Jun 2019
Abstract
Sugar transporter proteins (STPs), such as H+/sugar symporters, play essential roles in plants’ sugar transport, growth, and development, and possess an important potential to enhance plants’ performance of multiple agronomic traits, especially crop yield and stress tolerance. However, the evolutionary dynamics [...] Read more.
Sugar transporter proteins (STPs), such as H+/sugar symporters, play essential roles in plants’ sugar transport, growth, and development, and possess an important potential to enhance plants’ performance of multiple agronomic traits, especially crop yield and stress tolerance. However, the evolutionary dynamics of this important gene family in Gramineae crops are still not well-documented and functional differentiation of rice STP genes remain unclear. To address this gap, we conducted a comparative genomic study of STP genes in seven representative Gramineae crops, which are Brachypodium distachyon (Bd), Hordeum vulgare (Hv), Setaria italica (Si), Sorghum bicolor (Sb), Zea mays (Zm), Oryza rufipogon (Or), and Oryza sativa ssp. japonica (Os). In this case, a total of 177 STP genes were identified and grouped into four clades. Of four clades, the Clade I, Clade III, and Clade IV showed an observable number expansion compared to Clade II. Our results of identified duplication events and divergence time of duplicate gene pairs indicated that tandem, Whole genome duplication (WGD)/segmental duplication events play crucial roles in the STP gene family expansion of some Gramineae crops (expect for Hv) during a long-term evolutionary process. However, expansion mechanisms of the STP gene family among the tested species were different. Further selective force studies revealed that the STP gene family in Gramineae crops was under purifying selective forces and different clades and orthologous groups with different selective forces. Furthermore, expression analysis showed that rice STP genes play important roles not only in flower organs development but also under various abiotic stresses (cold, high-temperature, and submergence stresses), blast infection, and wounding. The current study highlighted the expansion and evolutionary patterns of the STP gene family in Gramineae genomes and provided some important messages for the future functional analysis of Gramineae crop STP genes. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
In Silico Analysis of Bioactive Peptides in Invasive Sea Grass Halophila stipulacea
Cells 2019, 8(6), 557; https://doi.org/10.3390/cells8060557 - 07 Jun 2019
Cited by 1
Abstract
Halophila stipulacea is a well-known invasive marine sea grass in the Mediterranean Sea. Having been introduced into the Mediterranean Sea via the Suez Channel, it is considered a Lessepsian migrant. Although, unlike other invasive marine seaweeds, it has not demonstrated serious negative impacts [...] Read more.
Halophila stipulacea is a well-known invasive marine sea grass in the Mediterranean Sea. Having been introduced into the Mediterranean Sea via the Suez Channel, it is considered a Lessepsian migrant. Although, unlike other invasive marine seaweeds, it has not demonstrated serious negative impacts on indigenous species, it does have remarkable invasive properties. The present in-silico study reveals the biotechnological features of H. stipulacea by showing bioactive peptides from its rubisc/o protein. These are features such as antioxidant and hypolipideamic activities, dipeptidyl peptidase-IV and angiotensin converting enzyme inhibitions. The reported data open up new applications for such bioactive peptides in the field of pharmacy, medicine and also the food industry. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Open AccessArticle
In Silico Genome-Wide Analysis of Respiratory Burst Oxidase Homolog (RBOH) Family Genes in Five Fruit-Producing Trees, and Potential Functional Analysis on Lignification of Stone Cells in Chinese White Pear
Cells 2019, 8(6), 520; https://doi.org/10.3390/cells8060520 - 29 May 2019
Cited by 4
Abstract
The accumulation of lignin in fruit has a significant negative impact on the quality of fruit-producing trees, and in particular the lignin formation stimulates the development of stone cells in pear fruit. Reactive oxygen species (ROS) are essential for lignin polymerization. However, knowledge [...] Read more.
The accumulation of lignin in fruit has a significant negative impact on the quality of fruit-producing trees, and in particular the lignin formation stimulates the development of stone cells in pear fruit. Reactive oxygen species (ROS) are essential for lignin polymerization. However, knowledge of the RBOH family, a key enzyme in ROS metabolism, remains unknown in most fruit trees. In this study, a total of 40 RBOHs were identified from five fruit-producing trees (Pyrus bretschneideri, Prunus persica, Citrus sinensis, Vitis vinifera, and Prunus mume), and 10 of these sequences came from Pyrus bretschneideri. Multiple sequence alignments revealed that all 10 PbRBOHs contained the NADPH_Ox domain and the six alpha-helical transmembrane domains (TM-I to TM-VI). Chromosome localization and interspecies phylogenetic tree analysis showed that 10 PbRBOHs irregularly distributed on 8 chromosomes and 3 PbRBOHs (PbRBOHA, PbRBOHB, and PbRBOHD) are closely related to known lignification-related RBOHs. Furthermore, hormone response pattern analysis showed that the transcription of PbRBOHs is regulated by SA, ABA and MeJA. Reverse transcription-quantitative real-time polymerase chain reaction (qRT-PCR) and transcriptome sequencing analysis showed that PbRBOHA, PbRBOHB, and PbRBOHD accumulated high transcript abundance in pear fruit, and the transcriptional trends of PbRBOHA and PbRBOHD was consistent with the change of stone cell content during fruit development. In addition, subcellular localization revealed that PbRBOHA and PbRBOHD are distributed on the plasma membrane. Combining the changes of apoplastic superoxide (O2.) content and spatio-temporal expression analysis, these results indicate that PbRBOHA and PbRBOHD, which are candidate genes, may play an important role in ROS metabolism during the lignification of pear stone cells. This study not only provided insight into the molecular characteristics of the RBOH family in fruit-producing trees, but also lays the foundation for studying the role of ROS in plant lignification. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Multi-Path Dilated Residual Network for Nuclei Segmentation and Detection
Cells 2019, 8(5), 499; https://doi.org/10.3390/cells8050499 - 23 May 2019
Cited by 2
Abstract
As a typical biomedical detection task, nuclei detection has been widely used in human health management, disease diagnosis and other fields. However, the task of cell detection in microscopic images is still challenging because the nuclei are commonly small and dense with many [...] Read more.
As a typical biomedical detection task, nuclei detection has been widely used in human health management, disease diagnosis and other fields. However, the task of cell detection in microscopic images is still challenging because the nuclei are commonly small and dense with many overlapping nuclei in the images. In order to detect nuclei, the most important key step is to segment the cell targets accurately. Based on Mask RCNN model, we designed a multi-path dilated residual network, and realized a network structure to segment and detect dense small objects, and effectively solved the problem of information loss of small objects in deep neural network. The experimental results on two typical nuclear segmentation data sets show that our model has better recognition and segmentation capability for dense small targets. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Evolutionary Divergence of Duplicated Hsf Genes in Populus
Cells 2019, 8(5), 438; https://doi.org/10.3390/cells8050438 - 10 May 2019
Cited by 2
Abstract
Heat shock transcription factors (Hsfs), which function as the activator of heat shock proteins (Hsps), play multiple roles in response to environmental stress and the development of plants. The Hsf family had experienced gene expansion via whole-genome duplication from a single cell algae [...] Read more.
Heat shock transcription factors (Hsfs), which function as the activator of heat shock proteins (Hsps), play multiple roles in response to environmental stress and the development of plants. The Hsf family had experienced gene expansion via whole-genome duplication from a single cell algae to higher plants. However, how the Hsf gene family went through evolutionary divergence after genome duplication is unknown. As a model wood species, Populus trichocarpa is widely distributed in North America with various ecological and climatic environments. In this study, we used P. trichocarpa as materials and identified the expression divergence of the PtHsf gene family in developmental processes, such as dormant bud formation and opening, catkins development, and in response to environments. Through the co-expression network, we further discovered the divergent co-expressed genes that related to the functional divergence of PtHsfs. Then, we studied the alternative splicing events, single nucleotide polymorphism distribution and tertiary structures of members of the PtHsf gene family. In addition to expression divergence, we uncovered the evolutionary divergence in the protein level which may be important to new function formations and for survival in changing environments. This study comprehensively analyzed the evolutionary divergence of a member of the PtHsf gene family after genome duplication, paving the way for further gene function analysis and genetic engineering. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Evolutionary Conservation and Divergence of Genes Encoding 3-Hydroxy-3-methylglutaryl Coenzyme A Synthase in the Allotetraploid Cotton Species Gossypium hirsutum
Cells 2019, 8(5), 412; https://doi.org/10.3390/cells8050412 - 03 May 2019
Abstract
Polyploidization is important for the speciation and subsequent evolution of many plant species. Analyses of the duplicated genes produced via polyploidization events may clarify the origin and evolution of gene families. During terpene biosynthesis, 3-hydroxy-3-methylglutaryl coenzyme A synthase (HMGS) functions as a key [...] Read more.
Polyploidization is important for the speciation and subsequent evolution of many plant species. Analyses of the duplicated genes produced via polyploidization events may clarify the origin and evolution of gene families. During terpene biosynthesis, 3-hydroxy-3-methylglutaryl coenzyme A synthase (HMGS) functions as a key enzyme in the mevalonate pathway. In this study, we first identified a total of 53 HMGS genes in 23 land plant species, while no HMGS genes were detected in three green algae species. The phylogenetic analysis suggested that plant HMGS genes may have originated from a common ancestral gene before clustering in different branches during the divergence of plant lineages. Then, we detected six HMGS genes in the allotetraploid cotton species (Gossypium hirsutum), which was twice that of the two diploid cotton species (Gossypium raimondii and Gossypium arboreum). The comparison of gene structures and phylogenetic analysis of HMGS genes revealed conserved evolution during polyploidization in Gossypium. Moreover, the expression patterns indicated that six GhHMGS genes were expressed in all tested tissues, with most genes considerably expressed in the roots, and they were responsive to various phytohormone treatments and abiotic stresses. The sequence and expression divergence of duplicated genes in G. hirsutum implied the sub-functionalization of GhHMGS1A and GhHMGS1D as well as GhHMGS3A and GhHMGS3D, whereas it implied the pseudogenization of GhHMGS2A and GhHMGS2D. Collectively, our study unraveled the evolutionary history of HMGS genes in green plants and from diploid to allotetraploid in cotton and illustrated the different evolutionary fates of duplicated HMGS genes resulting from polyploidization. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Structure Based Design and Molecular Docking Studies for Phosphorylated Tau Inhibitors in Alzheimer’s Disease
Cells 2019, 8(3), 260; https://doi.org/10.3390/cells8030260 - 19 Mar 2019
Cited by 7
Abstract
The purpose of our study is to identify phosphorylated tau (p-tau) inhibitors. P-tau has recently received great interest as a potential drug target in Alzheimer’s disease (AD). The continuous failure of Aβ-targeted therapeutics recommends an alternative drug target to treat AD. There is [...] Read more.
The purpose of our study is to identify phosphorylated tau (p-tau) inhibitors. P-tau has recently received great interest as a potential drug target in Alzheimer’s disease (AD). The continuous failure of Aβ-targeted therapeutics recommends an alternative drug target to treat AD. There is increasing evidence and growing awareness of tau, which plays a central role in AD pathophysiology, including tangles formation, abnormal activation of phosphatases/kinases, leading p-tau aggregation in AD neurons. In the present study, we performed computational pharmacophore models, molecular docking, and simulation studies for p-tau in order to identify hyperphosphorylated sites. We found multiple serine sites that altered the R1/R2 repeats flanking sequences in the tau protein, affecting the microtubule binding ability of tau. The ligand molecules exhibited the p-O ester scaffolds with inhibitory and/or blocking actions against serine residues of p-tau. Our molecular docking results revealed five ligands that showed high docking scores and optimal protein-ligand interactions of p-tau. These five ligands showed the best pharmacokinetic and physicochemical properties, including good absorption, distribution, metabolism, and excretion (ADME) and admetSAR toxicity tests. The p-tau pharmacophore based drug discovery models provide the comprehensive and rapid drug interventions in AD, and tauopathies are expected to be the prospective future therapeutic approach in AD. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
Retroelement—Linked Transcription Factor Binding Patterns Point to Quickly Developing Molecular Pathways in Human Evolution
Cells 2019, 8(2), 130; https://doi.org/10.3390/cells8020130 - 06 Feb 2019
Cited by 4Correction
Abstract
Background: Retroelements (REs) are transposable elements occupying ~40% of the human genome that can regulate genes by providing transcription factor binding sites (TFBS). RE-linked TFBS profile can serve as a marker of gene transcriptional regulation evolution. This approach allows for interrogating the regulatory [...] Read more.
Background: Retroelements (REs) are transposable elements occupying ~40% of the human genome that can regulate genes by providing transcription factor binding sites (TFBS). RE-linked TFBS profile can serve as a marker of gene transcriptional regulation evolution. This approach allows for interrogating the regulatory evolution of organisms with RE-rich genomes. We aimed to characterize the evolution of transcriptional regulation for human genes and molecular pathways using RE-linked TFBS accumulation as a metric. Methods: We characterized human genes and molecular pathways either enriched or deficient in RE-linked TFBS regulation. We used ENCODE database with mapped TFBS for 563 transcription factors in 13 human cell lines. For 24,389 genes and 3124 molecular pathways, we calculated the score of RE-linked TFBS regulation reflecting the regulatory evolution rate at the level of individual genes and molecular pathways. Results: The major groups enriched by RE regulation deal with gene regulation by microRNAs, olfaction, color vision, fertilization, cellular immune response, and amino acids and fatty acids metabolism and detoxication. The deficient groups were involved in translation, RNA transcription and processing, chromatin organization, and molecular signaling. Conclusion: We identified genes and molecular processes that have characteristics of especially high or low evolutionary rates at the level of RE-linked TFBS regulation in human lineage. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
Cells 2019, 8(2), 122; https://doi.org/10.3390/cells8020122 - 03 Feb 2019
Cited by 9
Abstract
Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient [...] Read more.
Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Open AccessArticle
The Catalase Gene Family in Cotton: Genome-Wide Characterization and Bioinformatics Analysis
Cells 2019, 8(2), 86; https://doi.org/10.3390/cells8020086 - 24 Jan 2019
Cited by 6
Abstract
Catalases (CATs), which were coded by the catalase gene family, were a type notably distinguished ROS-metabolizing proteins implicated to perform various physiological functions in plant growth, development and stress responses. However, no systematical study has been performed in cotton. In the present study, [...] Read more.
Catalases (CATs), which were coded by the catalase gene family, were a type notably distinguished ROS-metabolizing proteins implicated to perform various physiological functions in plant growth, development and stress responses. However, no systematical study has been performed in cotton. In the present study, we identified 7 and 7 CAT genes in the genome of Gossypium hirsutum L. Additionally, G. barbadense L., respectively. The results of the phylogenetic and synteny analysis showed that the CAT genes were divided into two groups, and whole-genome duplication (WGD) or polyploidy events contributed to the expansion of the Gossypium CAT gene family. Expression patterns analysis showed that the CAT gene family possessed temporal and spatial specificity and was induced by the Verticillium dahliae infection. In addition, we predicted the putative molecular regulatory mechanisms of the CAT gene family. Based on the analysis and preliminary verification results, we hypothesized that the CAT gene family, which might be regulated by transcription factors (TFs), alternative splicing (AS) events and miRNAs at different levels, played roles in cotton development and stress tolerance through modulating the reactive oxygen species (ROS) metabolism. This is the first report on the genome-scale analysis of the cotton CAT gene family, and these data will help further study the roles of CAT genes during stress responses, leading to crop improvement. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Review

Jump to: Research, Other

Open AccessReview
Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites
Cells 2019, 8(2), 95; https://doi.org/10.3390/cells8020095 - 28 Jan 2019
Cited by 5
Abstract
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards [...] Read more.
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards the development of sequence-based prediction through machine learning, due to its promising and essential properties of being highly accurate, robust and cost-effective. In spite of these advantages, there are several problems that are in need of attention in the design and development of succinylation site predictors. Notwithstanding of many studies on the employment of machine learning approaches, few articles have examined this bioinformatics field in a systematic manner. Thus, we review the advancements regarding the current state-of-the-art prediction models, datasets, and online resources and illustrate the challenges and limitations to present a useful guideline for developing powerful succinylation site prediction tools. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Other

Jump to: Research, Review

Open AccessData Descriptor
CVm6A: A Visualization and Exploration Database for m6As in Cell Lines
Cells 2019, 8(2), 168; https://doi.org/10.3390/cells8020168 - 17 Feb 2019
Cited by 1
Abstract
N6-methyladenosine (m6A) has been identified in various biological processes and plays important regulatory functions in diverse cells. However, there is still no visualization database for exploring global m6A patterns across cell lines. Here we collected all available MeRIP-Seq and [...] Read more.
N6-methyladenosine (m6A) has been identified in various biological processes and plays important regulatory functions in diverse cells. However, there is still no visualization database for exploring global m6A patterns across cell lines. Here we collected all available MeRIP-Seq and m6A-CLIP-Seq datasets from public databases and identified 340,950 and 179,201 m6A peaks dependent on 23 human and eight mouse cell lines respectively. Those m6A peaks were further classified into mRNA and lncRNA groups. To better understand the potential function of m6A, we then mapped m6A peaks in different subcellular components and gene regions. Among those human m6A modification, 190,050 and 150,900 peaks were identified in cancer and non-cancer cells, respectively. Finally, all results were integrated and imported into a visualized cell-dependent m6A database CVm6A. We believe the specificity of CVm6A could significantly contribute to the research for the function and regulation of cell-dependent m6A modification in disease and development. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Show Figures

Figure 1

Back to TopTop