Biology 2013, 2(4), 1465-1487; doi:10.3390/biology2041465
Article

Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana

1 Department of Electrical Engineering and Information Technology, University of Naples Federico II, Napoli 80125, Italy 2 Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, WC1E 6BT, UK 3 Department of Agricultural Sciences, University of Naples Federico II, Napoli 80055, Italy These authors contributed equally to this work.
* Author to whom correspondence should be addressed.
Received: 25 September 2013; in revised form: 19 November 2013 / Accepted: 20 November 2013 / Published: 9 December 2013
(This article belongs to the Special Issue Insights from Plant Genomes)
PDF Full-text Download PDF Full-Text [9074 KB, uploaded 9 December 2013 12:42 CET]
Abstract: Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community.
Keywords: gene duplication; paralog genes; single copy genes; singleton; gene network; Arabidopsis; genome annotation

Article Statistics

Load and display the download statistics.

Citations to this Article

Cite This Article

MDPI and ACS Style

Sangiovanni, M.; Vigilante, A.; Chiusano, M.L. Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana. Biology 2013, 2, 1465-1487.

AMA Style

Sangiovanni M, Vigilante A, Chiusano ML. Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana. Biology. 2013; 2(4):1465-1487.

Chicago/Turabian Style

Sangiovanni, Mara; Vigilante, Alessandra; Chiusano, Maria L. 2013. "Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana." Biology 2, no. 4: 1465-1487.

Biology EISSN 2079-7737 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert