The missing heritability of common human traits and diseases has long presented a mystery —representing a gap between the heritable phenotypic variation and the underlying genotyped variation. The missing heritability has been hypothesised to be accounted for by rare genetic variations, yet such variants remain elusive in many diseases due to the required scale of genotyping. Renewed interest in genetic studies of enriched high-risk families presents a solution, however, identifying and ascertaining such families provides its challenges.
To solve this challenge, we developed a scalable graphical method for pedigree prediction from large population-based cohorts, allowing family-based studies to proceed within existing cohorts recruited for studying complex diseases. Our approach uses relatedness, however, instead of the conventional sample size reduction to independent members, we use relatedness to reconstruct pedigrees and factor it into a linear mixed model for genetic association testing. This enables accurate family-based imputation of rare variants from a section of the population with whole genome sequencing.
We applied our method to the Busselton health study cohort —with cardiovascular events, lipid classes and species measurements— and found that 92% of the individuals were connected at a 7th-degree distance. We identified 745 pedigrees, ranging from duos and trios to larger 37-member families. Rare variants association testing showed signals in coding and non-coding regions on a joint family and population-based imputation using a gene-centric approach. For example, transforming growth factor beta receptor III (TGFBR3) carried a group of missense rare variants associated with cardiovascular disease risk (P=2.58×10-8). Some rare variant signals lie within regions with known common association signals. This suggests that rare variation could be driving the signal, and its discovery not only accounts for some of the missing heritability of cardiovascular disease but also serves as a refinement of the mapping of the genetic cause.