Example command: Imputation with one phased and one unphased reference panel

Home
Example command
How to use example commands


Example command

IMPUTATION WITH ONE PHASED AND ONE UNPHASED REFERENCE PANEL

Sometimes it is useful to combine a phased reference panel with an unphased reference panel when imputing genotypes in a study. For example, Howie et al. (2009) considered a hybrid reference panel that included phased haplotypes from HapMap and unphased genotypes from population controls typed on multiple SNP chips (they referred to this configuration as "Scenario B"). By using the genetic information in both panels simultaneously, IMPUTE2 can achieve a better combination of accuracy and coverage than it would with either panel alone.

The following command shows how to run this kind of analysis with IMPUTE2, using the example data that come with the program download:

./impute2 \
 -m ./Example/example.chr22.map \
 -h ./Example/example.chr22.1kG.haps \
 -l ./Example/example.chr22.1kG.legend \
 -g_ref ./Example/example.chr22.reference.gens \
 -strand_g_ref ./Example/example.chr22.reference.strand \
 -g ./Example/example.chr22.study.gens \
 -strand_g ./Example/example.chr22.study.strand \
 -int 20.4e6 20.5e6 \
 -Ne 20000 \
 -o ./Example/example.chr22.one.phased.one.unphased.impute2

Comments

  • This is a somewhat complicated scenario, and some restrictions are necessary to make sure the statistical machinery will produce good results. Ideally, the study data (-g file) should contain a subset of the SNPs in the unphased reference panel (-g_ref file), which should in turn contain a subset of the SNPs in the phased reference panel (-h and -l files). If your dataset deviates substantially from these conditions, you may obtain sub-optimal imputation accuracy. Please feel free to contact us if you want advice on whether this scheme will work with your dataset.

  • Here we have used the -strand_g and -strand_g_ref options to provide strand files to the program. These files tell IMPUTE2 how to align the allele coding of the study genotypes (-g file) and the unphased reference genotypes (-g_ref file) with the coding of the phased reference haplotypes (-h and -l files; assumed to be aligned to the '+' strand of the human genome reference sequence). You must always align the allele codings across your input datasets, either before running IMPUTE2 or during a run with the options described here.

  • Additional options must be invoked if you want to include the -g_ref panel in your association tests (e.g., as part of your control set). This process requires a fair amount of imputation expertise, and we prefer to advise people about it on an individual basis. If you are interested in using this approach, please contact us.

  • This example tells the program to produce results for a 100 kb region (positions 20,400,000-20,500,000) on a single chromosome (IMPUTE2 assumes there is only one chromosome per input file, and that all input files in a single run come from the same chromosome). Applying the program to a much larger region—say, a whole chromosome or the whole genome—requires running many such jobs with different values of the -int parameter, usually in parallel on a computing cluster. For more details about how to do this, see here.


How to use example commands

All of the data files in the example command above are included in the Example/ directory that comes with the IMPUTE2 software download. You should run the command from the main download directory, which is the one that contains the impute2 executable. For example, if you just downloaded a software package named impute_v2.X.Y_i386.tgz and unpacked it according to the directions here, you can reach the appropriate directory by typing "cd impute_v2.X.Y_i386/" on the command line.

Once you have found the right directory, you should be able to run the example command by entering it into a Unix-style terminal window. Depending on the settings of your computer, this may be as simple as highlighting the command text in your web browser, using the browser's Copy command, and then using the Paste command in your terminal window. (You may then need to hit 'enter' to start the run.)

Note that most lines in the example command end with the '\' character. This is not actually part of the command; it is just a shorthand notation that means "keep reading the next line as part of a single command." We use this notation to split the command over multiple lines so it is easier to read. This is a valid way to enter commands in a Unix-style terminal window, but it would be equivalent to put all of the arguments on a single line, separated by spaces.

You do not have to run IMPUTE2 exactly as in the example. Some of the arguments shown here are optional, and there are many other options that could be added to modify the behavior of the program. For a full list of available options, see here.