1000 Genomes Project low-coverage pilot indel calls

Indel genotypes were imputed from the genotype likelihood files produce by
Kees Albers using the Dindel algorithm.

Imputation was carried out using IMPUTE v2.1.2.2 which can take genotype
likelihoods as input. The indels were phased into a scaffold of SNP
haplotypes. The haplotypes used were the low-coverage pilot haplotypes, but
were thinned down to only those SNPs in the HapMap3 project. This was done for
computational convenience. Each of the three cohorts (CEU, YRI, JPTCHB) were
analyzed separately. Both marginal genotype probabilities and best guess
haplotypes were estimated. 

The output files are named

COHORT.chr*.indels.txt

they contain the best-guess genotypes derived from the best-guess
haplotypes. Each line of the files contain the genotype calls for an
indel. Column 1 contains the indel type relative to the reference genome (+C, -TG, +CGTGA, etc). Column 2
contains the start position of indel. Subsequent columns contain the genotype
counts (i.e. 0, 1 or 2) of the non-reference allele.

There are also sample files 

COHORT.samples

that contain the sample names of the individuals with indel genotype calls.

Jonathan Marchini 04/12/2010