Among human chromosomes, chromosome X is unique in that it is dizygous (two copies) in females but hemizygous (one copy) in males. To deal with chromosome X data, IMPUTE2 requires that you use the -chrX flag and make some small changes to the input file formats.
Genotypes file (-g): As in a standard -g file, each study individual should have three columns (genotype probabilities) per SNP. For females, these have the standard interpretation that columns 1, 2, and 3 represent P(G=0), P(G=1), and P(G=2), respectively, where G=1 is the heterozygous state. Males have only two possible genotypes on chromosome X, and we encode these in columns 1 and 3; column 2, which corresponds to P(G=1), should always be zero in this setting, and non-zero values in this column will automatically be truncated to zero for males when the -chrX flag is active.
Sample file (-sample_g): In order for the input genotype convention explained above to work, IMPUTE2 needs to know which study individuals are males and which are females. This is accomplished by adding an extra column named 'sex' to the -sample_g file, which is required when using the -chrX flag. This column should be coded as type 'D' (discrete covariate), where males are indicated by '1's and females are indicated by '2's. Here is an example snippet where the first individual is female and the second and third individuals are male:
ID_1 ID_2 missing sex
0 0 0 D
INDIV1 INDIV1 0.0 2
INDIV2 INDIV2 0.0 1
INDIV3 INDIV3 0.0 1
Reference haplotypes file (-h): It does not usually matter which reference individuals are male or female when their genotypes have already been phased. However, it may sometimes be convenient to create a -h file with two columns per individual, so IMPUTE2 allows the presence of dummy columns made of '-' characters to represent the non-existent second haplotypes of males on chromosome X. For example, here is a small haplotypes file with 5 SNPs (one per row) typed in a female (columns 1-2) and two males (columns 3-4 and 5-6):
The dummy columns are optional -- the following would be an equally valid format for the same file:
0 1 1 0
0 0 1 1
1 0 0 1
1 1 0 1
0 0 1 0
Output files (-o): The main output file will follow the same convention as the genotypes file described above: each individual has three entries per SNP, but the middle entry is set to zero for males. When IMPUTE2 produces haplotype output files for chromosome X, both males and females will have two columns per individual, although the second column for each male will be filled with dummy values of '-'.