Highlights: What are the main findings? eXtreme Gradient Boosting f1 optimization avoids losing several patients due to misdiagnosis. Genetic data implement the model’s power for mortality prediction. What are the implications of the main findings? Integrating genetics in ML enables a more personalized medical approach. Background/Objectives: COVID-19 has made a tremendous impact, causing a massive number of deaths worldwide. The inadequacy of health facilities resulted in shortage of resources and exhaustion of frontline workers who had to manage in a short time many patients with no tools to prioritize those at high risk. This study intended to disclose the architecture of such complex disease and enhance the management of hospitalized patients, preventing severe outcomes. Methods: We performed a retrospective multicenter study aimed at refining the best predictive model for COVID-19 mortality, integrating 19 genetic and 13 clinical features. We trained three machine learning (ML) models (GBM, XGB and RF) on a dataset of 532 COVID-19 hospitalized Italian patients, among the 605 recruited during the first wave of the pandemic, when vaccines were not available. Results: All the models achieved great values for accuracy, AUROC, f1, f2 and PR-AUC metrics. XGB f1 optimization resulted in better performance providing fewer false positives (Nf1 = 26 versus Nf2 = 27, NPR-AUC = 29), and mostly false negatives (Nf1 = 63 versus Nf2 = 69, NPR-AUC = 69), being the main goal to answer. We next delved into the feature importance to understand which features contribute to the model decision: age was the main driver of mortality prediction, followed by ventilation. The remainder was equally distributed between genetic (HLA-DRA rs3135363, PPARGC1A rs192678, CRP rs2808635, ABO rs657152) and other clinical features, demonstrating that genetic data did not confound, but rather implemented, the power of the model. Conclusions: Our results suggest that integrating genetic and clinical data into ML models is crucial for identifying high-risk cases within the vast disease heterogeneity, enabling the P4-medicine approach to improve patient outcomes and support the healthcare system.

Integrating Host Genetics and Clinical Setting in Machine Learning Models: Predicting COVID-19 Prognosis for Healthcare Decision-Making (The FeMiNa Study)

D'Aversa, Elisabetta;Antonica, Bianca;Grisafi, Miriana;Asselta, Rosanna;Passaro, Angelina;Volpato, Stefano;Remelli, Francesca;Castellazzi, Massimiliano;Salvatori, Francesca;Singh, Ajay Vikram;Tisato, Veronica
;
Gemmati, Donato
2026

Abstract

Highlights: What are the main findings? eXtreme Gradient Boosting f1 optimization avoids losing several patients due to misdiagnosis. Genetic data implement the model’s power for mortality prediction. What are the implications of the main findings? Integrating genetics in ML enables a more personalized medical approach. Background/Objectives: COVID-19 has made a tremendous impact, causing a massive number of deaths worldwide. The inadequacy of health facilities resulted in shortage of resources and exhaustion of frontline workers who had to manage in a short time many patients with no tools to prioritize those at high risk. This study intended to disclose the architecture of such complex disease and enhance the management of hospitalized patients, preventing severe outcomes. Methods: We performed a retrospective multicenter study aimed at refining the best predictive model for COVID-19 mortality, integrating 19 genetic and 13 clinical features. We trained three machine learning (ML) models (GBM, XGB and RF) on a dataset of 532 COVID-19 hospitalized Italian patients, among the 605 recruited during the first wave of the pandemic, when vaccines were not available. Results: All the models achieved great values for accuracy, AUROC, f1, f2 and PR-AUC metrics. XGB f1 optimization resulted in better performance providing fewer false positives (Nf1 = 26 versus Nf2 = 27, NPR-AUC = 29), and mostly false negatives (Nf1 = 63 versus Nf2 = 69, NPR-AUC = 69), being the main goal to answer. We next delved into the feature importance to understand which features contribute to the model decision: age was the main driver of mortality prediction, followed by ventilation. The remainder was equally distributed between genetic (HLA-DRA rs3135363, PPARGC1A rs192678, CRP rs2808635, ABO rs657152) and other clinical features, demonstrating that genetic data did not confound, but rather implemented, the power of the model. Conclusions: Our results suggest that integrating genetic and clinical data into ML models is crucial for identifying high-risk cases within the vast disease heterogeneity, enabling the P4-medicine approach to improve patient outcomes and support the healthcare system.
2026
D'Aversa, Elisabetta; Antonica, Bianca; Grisafi, Miriana; Asselta, Rosanna; Paraboschi, Elvezia Maria; Passaro, Angelina; Volpato, Stefano; Remelli, F...espandi
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2621971
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact