Problem statement
(top)
Modern genotyping and sequencing technologies are generating a variety of reference datasets that can be used for genotype imputation in association studies. Combining reference panels from different populations can often improve imputation accuracy (e.g., see Howie et al. 2011), but it is not clear how best to merge panels that are genotyped at different sets of variants.
Howie et al. 2009 proposed a solution for the special case where one reference panel contains a subset of the variants in another reference panel. We previously released a combined 1,000 Genomes + HapMap 3 panel that takes advantage of this framework, and it was also used in the WTCCC2 studies.
Many association studies are now using the latest 1,000 Genomes data to drive their genotype imputation, but they may also have sequenced additional individuals from the population being studied. It makes sense to combine these resources in order to use all available reference information, but in this case each reference panel will contain many variants that are not found in the other -- that is, the "hierarchical" variant framework of Howie et al. 2009 no longer applies.
With this in mind, we have devised a new strategy for combining reference panels created by different sequencing or genotyping studies.
|