Trends and Future Perspectives in Genome Annotation

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (30 June 2021) | Viewed by 10332

Special Issue Editor


E-Mail Website
Guest Editor
Biología Molecular e Ingeniería Bioquímica, Universidad Pablo de Olavide, Sevilla 41013, Spain
Interests: bioinformatics; genome annotation; sequence analysis; microbial genomics; rare disease

Special Issue Information

Dear Colleagues,

The current genomics era is now generating a plethora of biological data in the form of complete genome sequences. Once a new genome is available, the first step we need is the annotation of it, including both gene finding and functional annotation. Computational tools for sequence annotation have been developed during the last three decades, but now we need better tools for easy and fast annotation of complete genomes. These results should also include organized data from standardized sources, as well as to make use of updated functional information and algorithms such as machine and deep learning that today offer new challenges in bioinformatics.

This Special Issue welcomes articles of original research or reviews that present future perspectives for the annotation of complete genomes from either eukaryotic or prokaryotic organisms, in addition to viruses that usually show a high number of uncharacterized genes. Subjects can include structural annotation with new or updated gene finders, functional annotation with new algorithms or annotation sources, evaluation of methods and visualization of results from one or several genomes, as well as the comparison of annotations from different organisms, thus highlighting differences between annotations. A special appeal is also made for methods or protocols that are easy to use for experimental researchers who are now sequencing their genomes of interest and want to get the best from it, either annotating all their genes or completing the gap left by the uncharacterized ones.

Prof. Dr. Antonio J. Pérez-Pulido
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Genome annotation
  • Computational genomics
  • Gene prediction
  • Functional annotation
  • Sequence analysis
  • Bioinformatics

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 1192 KiB  
Article
FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
by Anna Vlasova, Toni Hermoso Pulido, Francisco Camara, Julia Ponomarenko and Roderic Guigó
Genes 2021, 12(10), 1645; https://doi.org/10.3390/genes12101645 - 19 Oct 2021
Cited by 2 | Viewed by 3939
Abstract
Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a [...] Read more.
Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility. Full article
(This article belongs to the Special Issue Trends and Future Perspectives in Genome Annotation)
Show Figures

Figure 1

10 pages, 988 KiB  
Article
The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein
by Pablo Mier and Miguel A. Andrade-Navarro
Genes 2021, 12(3), 451; https://doi.org/10.3390/genes12030451 - 22 Mar 2021
Cited by 3 | Viewed by 2805
Abstract
Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence [...] Read more.
Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host–pathogen interactions. Full article
(This article belongs to the Special Issue Trends and Future Perspectives in Genome Annotation)
Show Figures

Figure 1

10 pages, 4103 KiB  
Article
Protein-Coding Genes of Helicobacter pylori Predominantly Present Purifying Selection though Many Membrane Proteins Suffer from Selection Pressure: A Proposal to Analyze Bacterial Pangenomes
by Alejandro Rubio and Antonio J. Pérez-Pulido
Genes 2021, 12(3), 377; https://doi.org/10.3390/genes12030377 - 6 Mar 2021
Cited by 3 | Viewed by 2467
Abstract
The current availability of complete genome sequences has allowed knowing that bacterial genomes can bear genes not present in the genome of all the strains from a specific species. So, the genes shared by all the strains comprise the core of the species, [...] Read more.
The current availability of complete genome sequences has allowed knowing that bacterial genomes can bear genes not present in the genome of all the strains from a specific species. So, the genes shared by all the strains comprise the core of the species, but the pangenome can be much greater and usually includes genes appearing in one only strain. Once the pangenome of a species is estimated, other studies can be undertaken to generate new knowledge, such as the study of the evolutionary selection for protein-coding genes. Most of the genes of a pangenome are expected to be subject to purifying selection that assures the conservation of function, especially those in the core group. However, some genes can be subject to selection pressure, such as genes involved in virulence that need to escape to the host immune system, which is more common in the accessory group of the pangenome. We analyzed 180 strains of Helicobacter pylori, a bacterium that colonizes the gastric mucosa of half the world population and presents a low number of genes (around 1500 in a strain and 3000 in the pangenome). After the estimation of the pangenome, the evolutionary selection for each gene has been calculated, and we found that 85% of them are subject to purifying selection and the remaining genes present some grade of selection pressure. As expected, the latter group is enriched with genes encoding for membrane proteins putatively involved in interaction to host tissues. In addition, this group also presents a high number of uncharacterized genes and genes encoding for putative spurious proteins. It suggests that they could be false positives from the gene finders used for identifying them. All these results propose that this kind of analyses can be useful to validate gene predictions and functionally characterize proteins in complete genomes. Full article
(This article belongs to the Special Issue Trends and Future Perspectives in Genome Annotation)
Show Figures

Figure 1

Back to TopTop