Diseasomes have emerged as valuable tools for analyzing and understanding the complex comorbidities correlations that characterize chronic clinical conditions. To date, they have typically been studied without accounting for distinctions between associations specific to the disease of interest and those occurring in the general population, which significantly weakens and limits the results. To address this issue and improve methodological robustness, we developed a novel pipeline that integrates a Control group directly into the Diseasome analysis, designing and implementing a straightforward multi-step approach for processing patient-level health-related big data, gathered from either large cohort studies or Electronic Health Records (EHRs). The methodology was validated on a de-identified, publicly available ICD-9 code–based EHR cohort from the U.S. Department of Veterans Affairs (VA), predominantly composed of U.S. veterans receiving care within the VA healthcare system ( n=925,639 ; mean age 64.3±13.2 years; nHTN=595,782 , nControl=329,857 ). The primary outcome measure is pairwise disease co-occurrence quantified as Relative Risk (RR) with 95 % confidence intervals, complemented by the Matthews Correlation Coefficient ( φ ) and a Label Permutation Test to identify associations statistically enriched in the HTN group relative to Controls; all significance decisions are based on Benjamini–Hochberg FDR-corrected p-values ( α=0.05 ). Eight disease-disease associations were identified as statistically specific to and stronger in the HTN cohort (all FDR-corrected p<0.05 ), the strongest being Drug Abuse–Chronic Liver Disease ( RR=3.81 , 95% CI: 3.75–3.88), Bipolar Disorder–Psychosis ( RR=3.69 , 95% CI: 3.59–3.80), and Chronic Liver Disease–Alcohol Abuse ( RR=3.32 , 95% CI: 3.27–3.38); all of them consistent with shared molecular and genetic evidence extracted from DISGENET databases, providing complementary context in a hypothesis-generating framework. The proposed pipeline provides a generalizable, statistically grounded framework for building disease-specific comorbidity networks that can support clinical risk stratification, comorbidity screening, and hypothesis generation for targeted preventive interventions in chronic disease management.
Control-Integrated Pipeline for Diseasome Analysis: A Case Study on Hypertension Comorbidities in U.S. Veterans
Magnoni, A.
Primo
;Pedrini, D.;Pistolesi, C.;Brombo, G.;Zuliani, G.;Calore, E.;Raparelli, V.;Schifano, S. F.;Zambelli, C.Ultimo
2026
Abstract
Diseasomes have emerged as valuable tools for analyzing and understanding the complex comorbidities correlations that characterize chronic clinical conditions. To date, they have typically been studied without accounting for distinctions between associations specific to the disease of interest and those occurring in the general population, which significantly weakens and limits the results. To address this issue and improve methodological robustness, we developed a novel pipeline that integrates a Control group directly into the Diseasome analysis, designing and implementing a straightforward multi-step approach for processing patient-level health-related big data, gathered from either large cohort studies or Electronic Health Records (EHRs). The methodology was validated on a de-identified, publicly available ICD-9 code–based EHR cohort from the U.S. Department of Veterans Affairs (VA), predominantly composed of U.S. veterans receiving care within the VA healthcare system ( n=925,639 ; mean age 64.3±13.2 years; nHTN=595,782 , nControl=329,857 ). The primary outcome measure is pairwise disease co-occurrence quantified as Relative Risk (RR) with 95 % confidence intervals, complemented by the Matthews Correlation Coefficient ( φ ) and a Label Permutation Test to identify associations statistically enriched in the HTN group relative to Controls; all significance decisions are based on Benjamini–Hochberg FDR-corrected p-values ( α=0.05 ). Eight disease-disease associations were identified as statistically specific to and stronger in the HTN cohort (all FDR-corrected p<0.05 ), the strongest being Drug Abuse–Chronic Liver Disease ( RR=3.81 , 95% CI: 3.75–3.88), Bipolar Disorder–Psychosis ( RR=3.69 , 95% CI: 3.59–3.80), and Chronic Liver Disease–Alcohol Abuse ( RR=3.32 , 95% CI: 3.27–3.38); all of them consistent with shared molecular and genetic evidence extracted from DISGENET databases, providing complementary context in a hypothesis-generating framework. The proposed pipeline provides a generalizable, statistically grounded framework for building disease-specific comorbidity networks that can support clinical risk stratification, comorbidity screening, and hypothesis generation for targeted preventive interventions in chronic disease management.| File | Dimensione | Formato | |
|---|---|---|---|
|
Control-Integrated_Pipeline_for_Diseasome_Analysis_A_Case_Study_on_Hypertension_Comorbidities_in_U.S._Veterans.pdf
accesso aperto
Tipologia:
Full text (versione editoriale)
Licenza:
Creative commons
Dimensione
3.09 MB
Formato
Adobe PDF
|
3.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


