HapMap 3 + 1,000 Genomes Pilot haplotypes -- NCBI build 36 (hg18) coordinates

   --The 1,000 Genomes files are based on a sequence data freeze from Mar 2010; the phased haplotypes were released Jun 2010.

   --The HapMap 3 files are from release #2 (Feb 2009).

   --We have processed these files so that they can be used as a single, composite reference panel with IMPUTE2.

Haplotype, legend, sample, and genetic map files
Download packages (warning: large files)
These downloads contain reference panels from HapMap 3 and the 1,000 Genomes Project. Each dataset includes 1,000 Genomes Pilot Project haplotypes from one panel, along with all available HapMap 3 haplotypes, except those present in the 1,000 Genomes panel of interest. We removed these duplicate haplotypes so that the two datasets could be combined without causing "double counting" of haplotypes during imputation. We also filtered both sets of haplotypes to remove SNPs with apparent quality issues.

When using these combined panels, you should set the -Ne argument of IMPUTE2 to 20000, as explained here.

To see an example command that combines haplotypes from HapMap 3 and the 1,000 Genomes Project in a single imputation analysis, go here.

To see our rationale for using all HapMap 3 haplotypes together, rather than focusing on population-matched subsets, go here.

As noted above, we performed a small amount of filtering on the HapMap 3 and 1,000 Genomes files to help them work together as a composite reference panel. If you prefer unfiltered 1,000 Genomes Pilot Project haplotypes, you can download them from here; similarly, you can download unfiltered HapMap 3 haplotypes from here.

We found problems with the 1,000 Genomes CHB+JPT haplotypes, so we decided not to distribute them. You can now obtain high-quality haplotypes for imputation in East Asia and other locations from here.



NOTE: When combining datasets in an imputation analysis, you should always take great care to ensure that they have been aligned to the same strand convention. In this case, we have already aligned the HapMap 3 and 1,000 Genomes data to the '+' strand of the human reference sequence, and we have removed SNPs with unresolvable strand flips between panels. Consequently, you just need to make sure that your dataset is correctly aligned before imputing from the combined panel.