|
This table explains the formatting requirements for input data files that can be supplied to IMPUTE2. Some of these files allow more than one ID per SNP, but the program identifies SNPs internally by their base pair positions (which means that duplicate SNPs at a single position can cause problems). In all of these files, it is important that SNPs appear in base pair position order, from lowest to highest. It is also crucial that all SNP positions come from the same genome assembly (e.g., NCBI Build 36) so the program can combine information across input files. |
Flag | Default | Description |
-g <file>
REQUIRED unless |
none |
File containing genotypes for a study cohort that you want to impute or phase. The format of this file is described on our file format webpage and is the same as the output format from our genotype calling program CHIAMO.
If you do not supply a file of unphased genotypes via this argument, you must supply a file of phased study haplotypes via the |
-m <file>
REQUIRED |
none |
Fine-scale recombination map for the region to be analyzed. This file should have three columns: physical position (in base pairs), recombination rate between current position and next position in map (in cM/Mb), and genetic map position (in cM). The file should also have a header line with an unbroken character string for each column (e.g., "position COMBINED_rate(cM/Mb) Genetic_Map(cM)").
All of our |
-h <file 1> <file 2> | none |
File of known haplotypes, with one row per SNP and one column per haplotype. All alleles must be coded as 0 or 1, and each In IMPUTE2, it is possible to specify two |
-l <file 1> <file 2> | none |
Legend file(s) with information about the SNPs in the -h file(s). Each file should have four columns: rsID, physical position (in base pairs), allele 0, and allele 1. The last two columns specify the alleles underlying the 0/1 coding in the corresponding -h file; these alleles can take values in {A,C,G,T}. Each legend file should also have a header line with an unbroken character string for each column (e.g., "rsID position a0 a1"). We provide legend files for data from the HapMap Project and the 1,000 Genomes Project in our When using two -h files with IMPUTE2, you must supply the corresponding legend files in the same order -- i.e., the file with more SNPs comes first. |
-g_ref <file> | none | File containing unphased genotypes to use as a reference panel for imputation. This file should follow the same format as the -g file. A -g_ref file can be used as the lone reference panel for imputation, or it can be combined with a single -h file to create a two-tiered reference panel (in the latter case, the -g_ref file should contain roughly a subset of the SNPs in the -h file). |
-known_haps_g <file> | none |
File containing known haplotypes for the study cohort. The format is the same as the output format from IMPUTE2's
If your study dataset is fully phased, you can replace the The |