Background: With the increase in the size of genomic datasets describing variability in populations, extracting relevant information becomes increasingly useful as well as complex. Recently, computational methodologies such as Supervised Machine Learning and specifically Convolutional Neural Networks have been proposed to make inferences on demographic and adaptive processes using genomic data. Even though it was already shown to be powerful and efficient in different fields of investigation, Supervised Machine Learning has still to be explored as to unfold its enormous potential in evolutionary genomics. Results: The paper proposes a method based on Supervised Machine Learning for classifying genomic data, represented as windows of genomic sequences from a sample of individuals belonging to the same population. A Convolutional Neural Network is used to test whether a genomic window shows the signature of natural selection. Training performed on simulated data show that the proposed model can accurately predict neutral and selection processes on portions of genomes taken from real populations with almost 90% accuracy.

Identification of natural selection in genomic data with deep convolutional neural network

Nguembang Fadja A.
Primo
;
Riguzzi F.
Secondo
;
Bertorelle G.
Penultimo
;
Trucchi E.
Ultimo
2021

Abstract

Background: With the increase in the size of genomic datasets describing variability in populations, extracting relevant information becomes increasingly useful as well as complex. Recently, computational methodologies such as Supervised Machine Learning and specifically Convolutional Neural Networks have been proposed to make inferences on demographic and adaptive processes using genomic data. Even though it was already shown to be powerful and efficient in different fields of investigation, Supervised Machine Learning has still to be explored as to unfold its enormous potential in evolutionary genomics. Results: The paper proposes a method based on Supervised Machine Learning for classifying genomic data, represented as windows of genomic sequences from a sample of individuals belonging to the same population. A Convolutional Neural Network is used to test whether a genomic window shows the signature of natural selection. Training performed on simulated data show that the proposed model can accurately predict neutral and selection processes on portions of genomes taken from real populations with almost 90% accuracy.
2021
Nguembang Fadja, A.; Riguzzi, F.; Bertorelle, G.; Trucchi, E.
File in questo prodotto:
File Dimensione Formato  
s13040-021-00280-9.pdf

accesso aperto

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: Creative commons
Dimensione 1.99 MB
Formato Adobe PDF
1.99 MB Adobe PDF Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2469739
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 6
social impact