Oral Presentation GENEMAPPERS 2024

Demonstration of technical reproducibility of polygenic scores (#19)

Tian Lin 1 , Laura Ziser 1 , Leanne Wallace 1 , Jian Zeng 1 , Sonia Shah 1 , Anjali Henders 1 , Naomi Wray 1 2 3
  1. Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
  2. Department of Psychiatry, University of Oxford, Oxford, England, UK
  3. Big Data Institute, University of Oxford, Oxford, England, UK

Polygenic scores (PGS) have emerged as a widely-used approach in genetic research, positioned for translation into clinical application. However, the consistency of PGS measurements across different technical batches and platforms is not well documented.

To address this, we accessed genome-wide data generated on two CEPH control samples which have been included as a technical benchmarks across each batch processed in our lab. These data were generated across multiple platforms: Illumina and Affymetrix arrays (up to 25 times per array type), ~1X low-pass whole genome sequence (WGS), as well as ~30X WGS sourced from the 1000G project's public dataset. Data were processed through standard quality control pipelines and imputed to the HRC reference hosted locally. PGS were generated for each copy of the genome-wide data using the state-of-the-art method SBayesRC derived from genome-wide association data for  over 130 traits.  SBayesRC generates SNP weights for 7million SNPs; for a handful of traits the GWAS data available were sparse and so SNP weights for  only 1 million SNPs were generated for PGS calculation.

Our results provide an empirical demonstration of minimal differences in PGS across genotyping batches and technical platforms particularly for the highly polygenic traits For the small number of traits with very large effect SNPs (such as type I diabetes) PGS are significantly impacted when the high effect SNPs are missed or incorrectly read.

Our results provide the empirical support validating that PGS are subject to very little technical variability.