Using IMPUTE with the HapMap2, HapMap3 and 1000 Genomes Project Data


A main use of IMPUTE is imputing genotypes based on the haplotypes from HapMap2, HapMap3 and the 1000 Genomes Project data. To facilitate this use we have prepared these haplotype sets in the format required by IMPUTE for all 22 autosomes. Be careful to make sure your genotype data uses base-pair positions that are matched to the genome-build used by the haplotype, rate and strand files.
We recommend that genome-wide imputation of genotypes be carried in relatively small chunks to avoid running out of RAM on your computer. For imputation of the WTCCC dataset we used a chunk size of 7Mb. The imputed chunks were then concatenated together to produce an imputed file for each chromosome. The chunk size can be specifed using the -int option. The -buffer should also be used to avoid edge effects of imputing in relatively small chunks.

1000 Genomes Project (August 2009 CEU haplotypes) - NCBI Build 36
(dbSNP b126)
Polymorphic files - The August 2009 release of phased data from the 1000 Genomes
Project. The file contains the haplotypes, legend files, recombination rates and one
example file.
[CEU]

Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]
Example - the CEU file contains a set of 20 simulated individuals on chromosome 22 (example.gen). Below is an example of imputing these indviduals using the CEU panel in the interval 20-25Mb. Note : no strand file is needed as this is simulated data. For real data you would need to need to either supply a strand file, align the strand of the genotype data to the + strand or use the -fix_strand option.
./impute -h CEU.0908.chr22.hap -l CEU.0908.chr22.legend -m genetic_map_chr22_combined_b36.txt -g example.gen -int 20000000 25000000 -o example.results

HapMap 3 (release 2) haplotypes - NCBI Build 36 (dbSNP b126)
Polymorphic files - Phased haplotypes from release 2 of the HapMap 3
dataset for all the populations : ASW, CEU, CHD, GIH, JPT+CHB, LWK,
MEX, MKK, TSI, YRI and a combined CEU+TSI set. The file contains
the haplotypes, legend files, recombination rates and one example file.
[HM3]

Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]
Example - the HM3 file contains a set of 20 simulated individuals on chromosome 22 (example.gen). Below is an example of imputing these indviduals using the CEU+TSI panel in the interval 20-25Mb. Note : no strand file is needed as this is simulated data. For real data you would need to need to either supply a strand file, align the strand of the genotype data to the + strand or use the -fix_strand option.
./impute -h CEU+TSI.chr22.hap -l hapmap3.r2.b36.chr22.legend -m genetic_map_chr22_combined_b36.txt -g example.gen -int 20000000 25000000 -o example.results

HapMap rel#24 - NCBI Build 36
(dbSNP b126)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[YRI]
Recombination rate files (nb. these are the same as the rel#22 rates)
[CEU]   [YRI]   [COMBINED]
Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]

HapMap rel#22 - NCBI Build 36 (dbSNP b126)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[YRI]
[JPT+CHB]
Consensus files - these files contain SNPs that occur in all 3 of the HapMap panels.
There are also files for all combinations of the panels, which are useful for imputation
of admixed individuals.
Single panels
Pairs of panels
Combined panels
[CEU]                      
[YRI]
                
[JPT+CHB]
[CEU+CHB+JPT]    
[CEU+YRI]
   
[CHB+JPT+YRI]
[CEU+YRI+CHB+JPT]

 
[Legend files]

Recombination rate files [CEU]   [YRI]   [COMBINED]
Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]

HapMap rel#21 - NCBI Build 35 (dbSNP b125)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[
YRI]
[
JPT+CHB]
Recombination rate files [CEU]    [YRI]    [JPT+CHB]    [COMBINED]
Strand files Affy500k [These files were constructed using these Affymetrix annotation files - Nsp Sty]

References

[1] J. Marchini, C. Spencer. Y.Y. Teo and P. Donnelly (2007) A Bayesian Hierarchical Mixture Model for Genotype Calling in a multi-cohort study. (in preparation)
[2] J. Marchini, B. Howie, S. Myers, G. McVean and P. Donnelly (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genetics 39 : 906-913 [Free Access PDF][Supplementary Material][News and Views Article]
[3] The Wellcome Trust Case Control Consortium (2007) Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
Nature 447;661-78. PMID: 17554300 DOI: 10.1038/nature05911

Contact Information

If you have any questions regarding the use of this program please send an email to both the following people

Dr. Bryan Howie (
howie <at> stats <dot> ox <dot> ac <dot> uk).
Dr. Jonathan Marchini (marchini <at> stats <dot> ox <dot> ac <dot> uk).

It is a good idea to include a copy of the screen output (in the ./summary file) with your email which helps us identify any problems.