Authors
John S Malamon, John J Farrell, Li Charlie Xia, Beth A Dombroski, Wan-Ping Lee, Rueben G Das, Badri N Vardarajan, Jessica Way, Amanda B Kuzma, Otto Valladares, Yuk Yee Leung, Allison J Scanlon, Irving Antonio Barrera Lopez, Jack Brehony, Kim C Worley, Nancy R Zhang, Li-San Wang, Lindsay A Farrer, Gerard D Schellenberg
Publication date
2022/5/20
Journal
bioRxiv
Pages
2022.05. 19.492472
Publisher
Cold Spring Harbor Laboratory
Description
Background
Reliable detection and accurate genotyping of structural variants (SVs) and insertion/deletions (indels) from whole-genome sequence (WGS) data is a significant challenge. We present a protocol for variant calling, quality control, call merging, sensitivity analysis, in silico genotyping, and laboratory validation protocols for generating a high-quality deletion call set from whole genome sequences as part of the Alzheimer’s Disease Sequencing Project (ADSP). This dataset contains 578 individuals from 111 families.
Methods
We applied two complementary pipelines (Scalpel and Parliament) for SV/indel calling, break-point refinement, genotyping, and local reassembly to produce a high-quality annotated call set. Sensitivity was measured in sample replicates (N=9) for all callers using in silico variant spike-in for a wide range of event sizes. We focused on deletions because these events were more reliably called. To evaluate caller specificity, we developed a novel metric called the D-score that leverages deletion sharing frequencies within and outside of families to rank recurring deletions. Assessment of overall quality across size bins was measured with the kinship coefficient. Individual callers were evaluated for computational cost, performance, sensitivity, and specificity. Quality of calls were evaluated by Sanger sequencing of predicted loss-of-function (LOF) variants, variants near AD candidate genes, and randomly selected genome-wide deletions ranging from 2 to 17,000 bp.
Results
We generated a high-quality deletion call set across a wide range of event sizes consisting of 152,301 deletions with an average of 263 per genome. A …
Total citations