IMPUTATION WITH ONE PHASED AND ONE UNPHASED REFERENCE PANEL, WITH ADDITIONAL OPTIONS
Sometimes it is useful to combine a phased reference panel with an unphased reference panel when imputing genotypes in a study. For example, Howie et al. (2009) considered a hybrid reference panel that included phased haplotypes from HapMap and unphased genotypes from population controls typed on multiple SNP chips (they referred to this configuration as "Scenario B"). By using the genetic information in both panels simultaneously, IMPUTE2 can achieve a better combination of accuracy and coverage than it would with either panel alone.
Here we perform the same basic analysis as in this example, but we use a number of additional options to modify the behavior of IMPUTE2.
The following command shows how to run this kind of analysis with IMPUTE2, using the example data that come with the program download:
-m ./Example/example.chr22.map \
-h ./Example/example.chr22.1kG.haps \
-l ./Example/example.chr22.1kG.legend \
-g_ref ./Example/example.chr22.reference.gens \
-strand_g_ref ./Example/example.chr22.reference.strand \
-exclude_snps_g_ref ./Example/example.chr22.reference.snp.exclusions \
-g ./Example/example.chr22.study.gens \
-strand_g ./Example/example.chr22.study.strand \
-sample_g ./Example/example.study.samples \
-exclude_samples_g ./Example/example.study.sample.exclusions \
-int 20.4e6 20.5e6 \
-Ne 20000 \
-k 100 \
-burnin 5 \
-iter 20 \
These comments will focus on the specialized options used in the example above; for comments on this general imputation scenario, see here.
The -exclude_snps_g_ref option specifies a few SNPs to remove from the -g_ref file, using different types of SNP IDs. These might be SNPs that failed QC testing, for example.
The -align_by_maf_g option tells the program to use minor allele frequencies to align the allele coding of A/T and C/G SNPs between the -g file and the -l file. However, the -strand_g option takes precedence over -align_by_maf_g, and in this case all of the genotyped SNPs have explicit alignments in the strand file, so the -align_by_maf_g flag has no effect.
This run includes both a -sample_g file and an -exclude_samples_g file. The sample file tells IMPUTE2 which samples in the -g file are which, and the exclusions file tells it the IDs of samples that should be removed from the analysis. These might be individuals who showed systematic data quality problems on a genome-wide SNP chip, for example.
Here we have increased -k from its default value of 80 to 100. This will increase the imputation accuracy, but it will also increase IMPUTE2's running time. In this example we have tried to offset the increased running time by decreasing the -burnin value from 10 (default) to 5 and the -iter value from 30 (default) to 20.
The -pgs flag tells the program to "predict genotyped SNPs"; that is, to replace the original study genotypes with LD-based imputed genotypes in the output file.
The -no_sample_qc_info flag suppresses the output file that shows quality control metrics for each individual in the -g file.
The -o_gz flag specifies that the main output file should be compressed by the gzip algorithm; this is useful if you are running jobs that produce large output files.
All of the data files in the example command above are included in the Example/ directory that comes with the IMPUTE2 software download. You should run the command from the main download directory, which is the one that contains the impute2 executable. For example, if you just downloaded a software package named impute_v2.X.Y_i386.tgz and unpacked it according to the directions here, you can reach the appropriate directory by typing "cd impute_v2.X.Y_i386/" on the command line.
How to use example commands
Once you have found the right directory, you should be able to run the example command by entering it into a Unix-style terminal window. Depending on the settings of your computer, this may be as simple as highlighting the command text in your web browser, using the browser's Copy command, and then using the Paste command in your terminal window. (You may then need to hit 'enter' to start the run.)
Note that most lines in the example command end with the '\' character. This is not actually part of the command; it is just a shorthand notation that means "keep reading the next line as part of a single command." We use this notation to split the command over multiple lines so it is easier to read. This is a valid way to enter commands in a Unix-style terminal window, but it would be equivalent to put all of the arguments on a single line, separated by spaces.
You do not have to run IMPUTE2 exactly as in the example. Some of the arguments shown here are optional, and there are many other options that could be added to modify the behavior of the program. For a full list of available options, see here.