Oral Presentation GENEMAPPERS 2024

Effective Prioritisation of Novel Genes in Rare Mendelian Diseases through Machine Learning (#37)

Jacob E Munro 1 2 , Mark Bennett 1 2 , Melanie Bahlo 1 2
  1. Population Health and Immunity Division, Walter and Eliza Hall Institute, Melbourne, VIC, Australia
  2. Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia

The primary challenge in Mendelian disease research lies in identifying pathogenic variants from sequencing data. Researchers typically start with a panel of established disease-associated genes and search for disruptive variants in these genes. Extending the search beyond this panel is difficult and results in many putative causative variants in "genes of uncertain significance" (GUS). Manual evaluation and prioritisation of these GUS is challenging and laborious, and for larger disease cohorts is often intractable. Here, we explore machine learning approaches to prioritise novel genes based on a panel of established disease-associated genes. Our flexible methodology, PanRank, integrates multiple knowledge domains including Gene Ontology, genic intolerance metrics from population databases (gnomAD), gene expression from GTEx and protein-protein interaction from STRING alongside a panel of disease-associated genes. The resulting model generates rankings for putative disease-associated genes, and clusters known and putative disease genes into modules based on similarity.

To demonstrate the utility of our approach, we focus on prioritising novel genes for Epilepsy and compare gene rankings to the competing tool GLOWgenes1. Epilepsy is a heterogenous disease with nearly 1,000 Mendelian disease associated genes in the curated Genes4Epilepsy panel2. We trained both tools using dominant and recessive genes from Genes4Epilepsy and validated the performance on a hold-out set of recently discovered epilepsy genes. Both tools showed an enrichment of high rankings for the hold-out sets, with PanRank performing significantly better than GLOWgenes for dominantly inherited genes. For dominantly inherited epilepsy genes, the search-space can be limited to the top scoring ~10% of genes with PanRank and the top ~20% with GLOWgenes. By substantially reducing the search-space for new disease-gene associations, our approach enables researchers to focus efforts on segregating and functionally validating variants in GUS that are more likely to cause disease.

  1. de la Fuente, Lorena, et al. "Prioritization of new candidate genes for rare genetic diseases by a disease-aware evaluation of heterogeneous molecular networks." International Journal of Molecular Sciences 24.2 (2023): 1661.
  2. Oliver, Karen L., et al. "Genes4Epilepsy: an epilepsy gene resource." Epilepsia 64.5 (2023): 1368-1375.