Strand alignment options

Home

In any imputation analysis, is it absolutely essential that all panels have their allele codings aligned to a fixed reference (usually the human genome reference sequence). The options in this table are meant to help align the allele codings in your input data files, but you should not assume that the program will do all the work for you. If you do not know exactly how your data were processed or what these options are doing, you should try to locate the original strand information or contact us for assistance.

NOTE: IMPUTE2 will automatically align the strand between panels whenever it can do so unambiguously; e.g., flipping A/C in Panel 2 to match G/T in the reference. The options below pertain to variants where this is not possible, e.g. because an A/T SNP cannot be aligned by label alone.

NOTE: We currently assume that all phased reference files have already been aligned to the '+' strand of the human genome reference sequence, which is true of the files that we distribute; hence, the options here pertain only to study genotype files (like the -g and -known_haps_g files) and unphased reference files (i.e., a -g_ref file).

Flag Default Description
-strand_g <file> none File showing the strand orientation of the SNP allele codings in the -g file, relative to a fixed reference point. Each SNP occupies one line, and the file should have two columns: (i) the base pair position of the SNP and (ii) the strand orientation ('+' or '-') of the alleles in the genotype file; the columns should be separated by a single space.

The ordering of the SNPs in this file does not matter (by contrast to the -g file, which must be sorted by SNP position), and it is okay if some SNPs in the strand file are not present in the genotype file (e.g., due to filtering). We provide model strand files in the Example/ directory that comes with the software download.
-strand_g_ref <file> none Same as -strand_g, but applies to the -g_ref file.
-align_by_maf_g Activates the program's internal strand alignment procedure for the -g file (AKA Panel 2; for details about the panel nomenclature used here, see the overview). The strand is aligned to the alleles in reference Panel 0, if present, otherwise to reference Panel 1. This option pertains only to A/T and C/G SNPs, which it aligns such that Panel 2 and the alignment reference (Panel 0 or 1) have the same minor allele.

NOTE: This flag can be used in conjunction with the -strand_g option. In that case, the information from the strand file takes precedence, i.e., the program will not try to align the strand of SNPs that have explicit strand info already. This is useful if you have strand information for some SNPs but not others.

NOTE: You should take care when using this option. In particular, it can get the alignment wrong at A/T and C/G SNPs with minor allele frequencies near 50%, which can hurt the inference by distorting the local haplotype patterns. The best way to get the correct alignment at these kinds of SNPs is to track down the original assay and determine which strand was measured.

This flag replaces -fix_strand_g as of IMPUTE v2.2.
-align_by_maf_g_ref Similar to -align_by_maf_g, but applies to the -g_ref file (Panel 1). In this case the strand is aligned to the alleles in Panel 0, so the flag does not work if Panel 0 was not provided (i.e., if you did not supply -l and -h files).

NOTE: Just as -align_by_maf_g can be used in conjunction with -strand_g, this flag can be used in conjunction with the -strand_g_ref option. As before, the strand file takes precedence over aligning the strand by MAF.

NOTE: As with -align_by_maf_g, you should be careful about using this option to align A/T and C/G SNPs with minor allele frequencies near 50%.

This flag replaces -fix_strand_g_ref as of IMPUTE v2.2.