1,000 Genomes haplotypes -- Phase I integrated variant set release (v3) in NCBI build 37 (hg19) coordinates

   --These files are based on sequence data freezes from 23 Nov 2010 (low-coverage genomes) and 21 May 2011 (high-coverage exomes); the phased haplotypes were released Mar 2012 ("version 3" of the Phase 1 integrated data). For more details, see the README file.

   --The haplotypes include an integrated set of SNPs, insertion/deletion polymorphisms, and structural variants. The haplotypes were inferred from sequence data by BEAGLE (Brian and Sharon Browning, University of Washington) and MaCH/Thunder (Yun Li, University of North Carolina and Goncalo Abecasis, University of Michigan).

   --The original haplotypes can be downloaded as VCF ("variant call format") files from the 1000 Genomes FTP site.

   --Note that the Phase I integrated haplotypes were originally released in Oct 2011 and revised in Feb 2012. These releases were discovered to have data quality issues pertaining to insertion/deletion and structural variants, so you should re-download the files if you got them prior to the v3 release.

   --Updated 05 Mar 2012 to remove spurious INDELs (release v2) and add chromosome X data.

   --Updated 19 Apr 2012 to remove additional problematic INDELs (release v3).
      --Changed naming convention for variants without IDs in the release VCF files (see README file).
      --Removed 'type', 'source', and 'rsq' columns from legend files in main download (see README file).
      --Posted separate legend files that still contain the 'type', 'source', and 'rsq' columns (see below).

   --Updated 26 Aug 2012 to add a version of the reference panel that is limited to variants with more than one minor allele copy ("macGT1", or "minor allele count greater than 1") across all 1,092 individuals.

Haplotype, legend, sample, and genetic map files
Download packages (warning: large files)
This download contains reference data for 1,092 individuals from Africa, Asia, Europe, and the Americas. We provide one worldwide haplotype file for each chromosome since we recommend using all available reference haplotypes with IMPUTE2, regardless of the ancestry of your study data. For background information on this approach, see here.

If you really want to use just a subset of the haplotypes, you can manually parse the haplotype files with a utility like the linux 'cut' command. The sample file that comes in the download package contains the information needed to subset the files.

When using these combined panels, you should set the -Ne argument of IMPUTE2 to 20000, as explained here.

You should use IMPUTE version 2.2.0 or later to ensure proper handling of these reference panels.


 [ALL (macGT1)]

 [README (macGT1)]
Annotated legend files
Download packages (warning: large files)
We are working on extending our legend file format to accommodate a wide range of sequence annotations in reference datasets. Non-numeric annotations may cause problems when current versions of IMPUTE2 (v2.2.2 or earlier) are used with the -filt_rules_l option, so we have created a separate download for the legend files that contain this kind of information. These files will be absorbed into the main download package with the next release of IMPUTE2.  [ANNOTATED_LEGEND_FILES]