Background on IMPUTE

The sections below provide some basic information about the different versions of the IMPUTE software.

Home

Version 0.5 (top)

IMPUTE v0.5 has now been superseded by IMPUTE v1.0, although we are keeping the website and software available for posterity. The description of IMPUTE v1 below is equally applicable to IMPUTE v0.5.

Go to the main IMPUTE v0.5 website.

Version 1 (top)

IMPUTE v1 (or "IMPUTE1", for short) is designed to be used with a reference panel of known haplotypes, such as those provided by the International HapMap Project or the 1,000 Genomes Project, and a study sample genotyped at a subset of the SNPs in the reference panel. IMPUTE1 fills in missing genotypes (shown as red ?'s in the figure above) by extrapolating linkage disequilibrium patterns from the reference panel to the study individuals. This analysis scheme is referred to as Scenario A by Howie et al. (2009) and in the figure above.

The basic method underlying IMPUTE1 (which was described by Marchini et al. [2007]) has been widely used to improve power in genome-wide association studies. Until recently, the software implementing this method was called IMPUTE v0.X.Y, where X is an integer between 1 and 5. We are still supporting and developing IMPUTE1. The method is not ideal in all settings, however, so we conducted a major revision of the modeling framework and software; this revised approach is implemented in IMPUTE v2, which is described below.

Go to the main IMPUTE v1 website.

Version 2 (top)

IMPUTE v2 (or "IMPUTE2", for short) is based on the same population genetic model as IMPUTE1, but IMPUTE2 embeds this model in a more flexible statistical framework. This framework allows IMPUTE2 to increase accuracy (by using more of the information in the data) and to handle a broader variety of imputation datasets.

One important kind of dataset to which IMPUTE2 can be applied is depicted above. In this example, we still want to impute the missing genotypes in a set of study individuals (those in the bottom panel), but we now have two different reference panels that can inform the imputation: a set of known haplotypes (top panel), as in Scenario A, and a set of unphased genotypes (middle panel) observed at a subset of the SNPs in the top panel. This kind of dataset, which is becoming increasingly common, is referred to as Scenario B by Howie et al. (2009).

IMPUTE2 is uniquely suited to handle Scenario B in a unified, integrated analysis framework. The Howie et al. (2009) paper focused on the setting of a case-control study where the controls form the middle panel and the cases form the bottom panel, but the program can also be used with other multi-tiered reference panels. Currently, it is most commonly used with 1,000 Genomes haplotypes as the first reference panel and HapMap 3 haplotypes as the second reference panel.

IMPUTE2 can also be applied in other imputation datasets that pose problems for IMPUTE1, including:

Datasets with a single, unphased reference panel (possibly with sporadically missing genotypes)
Reference panels (phased or unphased) containing a large number of chromosomes

Full details of IMPUTE2, including computational considerations and accuracy comparisons with other imputation programs, are provided in Howie et al. (2009). We are still making a number of extensions and improvements to the method, and we would appreciate any feedback that you might care to provide.

Go to the main IMPUTE v2 website.

References (top)

[1] J. Marchini, B. Howie, S. Myers, G. McVean and P. Donnelly (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genetics 39: 906-913 [Free Access PDF] [Supplementary Material] [News and Views Article]

[2] B. N. Howie, P. Donnelly and J. Marchini (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6): e1000529 [Open Access Article]