Publications des scientifiques de l'IRD

Orozco-Arias S., Isaza G., Guyot Romain, Tabares-Soto R. (2019). A systematic review of the application of machine learning in the detection and classification of transposable elements. PeerJ, 7, p. e8311 [29 p.]. ISSN 2167-8359.

Titre du document
A systematic review of the application of machine learning in the detection and classification of transposable elements
Année de publication
2019
Type de document
Article référencé dans le Web of Science WOS:000503384400008
Auteurs
Orozco-Arias S., Isaza G., Guyot Romain, Tabares-Soto R.
Source
PeerJ, 2019, 7, p. e8311 [29 p.] ISSN 2167-8359
Background: Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. Methodology: We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Results: Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. Conclusions: ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.
Plan de classement
Sciences fondamentales / Techniques d'analyse et de recherche [020] ; Informatique [122]
Localisation
Fonds IRD [F B010077475]
Identifiant IRD
fdi:010077475
Contact