Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning
Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170001, Colombia
Department of Systems and Informatics, Universidad de Caldas, Manizales 170001, Colombia
Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales 170001, Colombia
Institut de Recherche pour le Développement, CIRAD, University Montpellier, 34000 Montpellier, France
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2019, 20(15), 3837; https://doi.org/10.3390/ijms20153837
Received: 21 June 2019 / Revised: 31 July 2019 / Accepted: 2 August 2019 / Published: 6 August 2019
(This article belongs to the Section Molecular Plant Sciences)
Transposable elements (TEs) are genomic units able to move within the genome of virtually all organisms. Due to their natural repetitive numbers and their high structural diversity, the identification and classification of TEs remain a challenge in sequenced genomes. Although TEs were initially regarded as “junk DNA”, it has been demonstrated that they play key roles in chromosome structures, gene expression, and regulation, as well as adaptation and evolution. A highly reliable annotation of these elements is, therefore, crucial to better understand genome functions and their evolution. To date, much bioinformatics software has been developed to address TE detection and classification processes, but many problematic aspects remain, such as the reliability, precision, and speed of the analyses. Machine learning and deep learning are algorithms that can make automatic predictions and decisions in a wide variety of scientific applications. They have been tested in bioinformatics and, more specifically for TEs, classification with encouraging results. In this review, we will discuss important aspects of TEs, such as their structure, importance in the evolution and architecture of the host, and their current classifications and nomenclatures. We will also address current methods and their limitations in identifying and classifying TEs.