|
The options in this table control the format and naming conventions of output files printed by IMPUTE2. |
Flag | Default | Description |
|
|
Name of main output file. Follows the same format as the
|
|
|
Name of SNP-wise information file with one line per SNP and a single header line at the beginning.
This file always contains the following columns (header tags shown in parentheses):
1. SNP identifier from 2. rsID (rs_id) 3. base pair position (position) 4. expected frequency of allele coded '1' in the -o file (exp_freq_a1) 5. measure of the observed statistical information associated with the allele frequency estimate (info) [details] 6. average certainty of best-guess genotypes (certainty) 7. internal "type" assigned to SNP (type) Depending on the command-line options invoked, there may also be columns labeled info_typeX, concord_typeX, and r2_typeX. IMPUTE2 assigns every SNP an internal "type" which reflects the combination of input datasets that include data for that SNP; here, X gives the type, which takes values in For SNPs that have genotypes in the -g file, concord_typeX is the concordance between the input genotypes and the best-guess imputed genotypes, where the input genotypes at that SNP have been masked internally and then imputed as if the SNP were of type X; similarly, r2_typeX is the squared correlation between input and masked/imputed genotypes at a SNP. The info_typeX column is the same information metric used in column 5, but here is it applied to genotypes that have been imputed from pseudo-type X SNPs in the leave-one-out masking experiment. These columns are useful for post-hoc quality control; we will soon explain how we use them in our section on Best Practices for Imputation. |
|
|
Name of log file that records a summary of the screen output. |
|
|
Name of file that records warnings generated by IMPUTE2. |
|
|
"Output SNPs": specifies the SNP types that will be printed to the output file (SNP labeling is discussed in the Overview). By default, all imputed and genotyped SNPs are included in the output, i.e.,
" |
|
Specifies that the main output file should be compressed by the gzip utility; this also applies to some non-standard output files that can become large. | |
|
3 | Specifies the number of decimal places to use for reporting genotype probabilities in the main output file. |
|
Suppresses printing of info_typeX, concord_typeX, and r2_typeX columns in the -i file. | |
|
Suppresses printing of per-sample quality control metrics file. The default is to print a file named
" |
|
|
IMPUTE2 always implicitly phases the study genotypes
( In addition to this "best-guess" haplotype file, the program also prints the certainty that each successive pair of heterozygous SNPs is correctly phased. These certainties occur in a file named " As illustrated by our example commands, it is possible to use the |
|
|
"Predict Genotyped SNPs": Tells the program to replace the input genotypes from the
|
|
|
Unlike WARNING: This is an appealing option that will "fill in" sporadically missing genotypes in your input data. However, it is possible that this could cause subtle problems in downstream association testing. We therefore suggest that you use caution when applying this option. |
Details about 'info' metricIMPUTE2 reports an information metric in the fifth column of itsOur metric typically takes values between 0 and 1, where values near 1 indicate that a SNP has been imputed with high certainty. The metric can occasionally take negative values when the imputation is very uncertain, and we automatically assign a value of -1 when the metric is undefined (e.g., because it wasn't calculated). Investigators often use the info metric to remove poorly imputed SNPs from their association testing results. There is no universal cutoff value for post-imputation SNP filtering; various groups have used cutoffs of 0.3 and 0.5, for example, but the right threshold for your analysis may differ. One way to assess different info thresholds is to see whether they produce sensible Q-Q plots, although we emphasize that Q-Q plots can look bad for many reasons besides your post-imputation filtering scheme. We define our info metric and compare it against other metrics in a |