Coronary artery disease (CAD) is the world’s leading cause of death and morbidity with a multifactorial aetiology, including a strong polygenic component. The latest meta-analysis of over a million samples identified 279 genome-wide significant associations, accounting for only 30-40% of CAD heritability. Epistasis (i.e., genetic interactions), which has been empirically demonstrated in CAD, can potentially explain some of this ‘missing heritability’.Furthermore, epistasis has implications in the transferability of polygenic risk scores across ethnic groups. However, most genome-wide association studies (GWAS) employ statistical methods (e.g., logistic regression) which focus on additive genetic associations only and forego non-linear relationships due to the intractable combinatorial challenge of robust identification of epistatic interactions.
We have therefore developed a machine learning based GWAS platform called VariantSpark, capable of identifying genetic variants associated to phenotypes from whole genomes whilst accounting for non-linear relationships such as epistasis. We have applied VariantSpark to a CAD cohort of 51,107 samples from the UKBiobank and have uncovered 25 independent loci significantly associated to CAD. Importantly, using VariantSpark we have found previously identified loci including PMAIP1-MC4R and AAK1 with 30% fewer samples than previous meta-analysis studies. Furthermore, VariantSpark validated two known CAD loci, LPA and CDKN2B-AS1 in the independent TOPMed CAD cohort (n = 11,326) while logistic regression only validated the CDKN2B-AS1 loci.
We hypothesise that this increase in detection power of VariantSpark is due to its capability to account for epistasis. Indeed, our secondary epistasis search tool, BitEpi, found significant epistasis between the LPA locus and the CDKN2B-AS1 locus in both UKBiobank and TOPMed cohorts. Taken together, our results suggest that by accounting for non-linear relationships including epistasis, VariantSpark has increased power to detect genetic variants associated with complex diseases like CAD in relatively smaller sized cohorts.