Simple Summary Probiotics are a group of beneficial microorganisms that are symbionts of the human gut microbiome. The identification of new probiotics is therefore of paramount importance from both public health and commercial perspectives. In this study, we show for the first time that Artificial Intelligence algorithms can identify novel probiotics and also discriminate them from pathogenic organisms in the human gut. We were also able to determine the information content within tRNA sequences as the key genomic features capable of characterizing probiotics. Probiotic bacteria are microorganisms with beneficial effects on human health and are currently used in numerous food supplements. However, no selection process is able to effectively distinguish probiotics from non-probiotic organisms on the basis of their genomic characteristics. In the current study, four Machine Learning algorithms were employed to accurately identify probiotic bacteria based on their DNA characteristics. Although the prediction accuracies of all algorithms were excellent, the Neural Network returned the highest scores in all the evaluation metrics, managing to discriminate probiotics from non-probiotics with an accuracy greater than 90%. Interestingly, our analysis also highlighted the information content of the tRNA sequences as the most important feature in distinguishing the two groups of organisms probably because tRNAs have regulatory functions and might have allowed probiotics to evolve faster in the human gut environment. Through the methodology presented here, it was also possible to identify seven promising new probiotics that have a higher information content in their tRNA sequences compared to non-probiotics. In conclusion, we prove for the first time that Machine Learning methods can discriminate human probiotic from non-probiotic organisms underlining information within tRNA sequences as the most important genomic feature in distinguishing them.

Machine Learning Algorithms Highlight tRNA Information Content and Chargaff's Second Parity Rule Score as Important Features in Discriminating Probiotics from Non-Probiotics

Bergamini, Carlo M
Primo
;
Bianchi, Nicoletta
Secondo
;
Taccioli, Cristian
Ultimo
2022

Abstract

Simple Summary Probiotics are a group of beneficial microorganisms that are symbionts of the human gut microbiome. The identification of new probiotics is therefore of paramount importance from both public health and commercial perspectives. In this study, we show for the first time that Artificial Intelligence algorithms can identify novel probiotics and also discriminate them from pathogenic organisms in the human gut. We were also able to determine the information content within tRNA sequences as the key genomic features capable of characterizing probiotics. Probiotic bacteria are microorganisms with beneficial effects on human health and are currently used in numerous food supplements. However, no selection process is able to effectively distinguish probiotics from non-probiotic organisms on the basis of their genomic characteristics. In the current study, four Machine Learning algorithms were employed to accurately identify probiotic bacteria based on their DNA characteristics. Although the prediction accuracies of all algorithms were excellent, the Neural Network returned the highest scores in all the evaluation metrics, managing to discriminate probiotics from non-probiotics with an accuracy greater than 90%. Interestingly, our analysis also highlighted the information content of the tRNA sequences as the most important feature in distinguishing the two groups of organisms probably because tRNAs have regulatory functions and might have allowed probiotics to evolve faster in the human gut environment. Through the methodology presented here, it was also possible to identify seven promising new probiotics that have a higher information content in their tRNA sequences compared to non-probiotics. In conclusion, we prove for the first time that Machine Learning methods can discriminate human probiotic from non-probiotic organisms underlining information within tRNA sequences as the most important genomic feature in distinguishing them.
2022
Bergamini, Carlo M; Bianchi, Nicoletta; Giaccone, Valerio; Catellani, Paolo; Alberghini, Leonardo; Stella, Alessandra; Biffani, Stefano; Yaddehige, Sachithra Kalhari; Bobbo, Tania; Taccioli, Cristian
File in questo prodotto:
File Dimensione Formato  
2022 Begamini CM et al Biology.pdf

accesso aperto

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: Creative commons
Dimensione 684.43 kB
Formato Adobe PDF
684.43 kB Adobe PDF Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2500750
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact