IMPUTE v0.5

IMPUTE is a program for imputing unobserved genotypes in genome-wide case-control studies based on a set of known haplotypes (like the HapMap Phase II haplotypes [2]). The program is designed to work seamlessly with the output of both the genotype calling program CHIAMO [1] and HAPGEN and produce output that can be analyzed using the program  SNPTEST [2]. This program was used to carry out genotype imputation as part of the analysis of the 7 genome-wide association studies analyzed by the Wellcome Trust Case-Control Consortium (WTCCC) [3].

Home X Chromosome Imputation
Overview
Options
Contributors
FAQ
Download
References
Running IMPUTE Contact Information
Using IMPUTE with the HapMap Data Version History


Overview (top)


Contributors (top)

The following people have contributed to the development of the methodology and software for IMPUTE.

Jonathan Marchini, Bryan Howie

Download (top)

Pre-compiled versions of the program and example files can be downloaded from the links below. We've supplied both static and dynamic versions of the Linux executables. If you intend to run IMPUTE on a machine running an old kernel then you probably want to use the dynamic version. If you have any problems getting the program to work on your machine please contact us.

Platform
File
Linux (x86_64) Static Executable
impute_v0.5.0_x86_64_static.tgz
Linux (x86_64) Static Executable (SuSE 9.3)
impute_v0.5.0_SuSE9.3_x86_64_static.tgz
Linux (x86_64) Dynamic Executable
impute_v0.5.0_x86_64_dynamic.tgz
Linux (i386) Static Executable
impute_v0.5.0_i386_static.tgz
Linux (i386) Dynamic Executable
impute_v0.5.0_i386_dynamic.tgz
Mac OS X 10.5.01 Tiger (Intel) impute_v0.5.0_MacOSX_10.4_Intel.tgz
Mac OS X 10.5.1 Leopard (Intel)
impute_v0.5.0_MacOSX_10.5_Intel.tgz
Mac OS X (PowerPC) impute_v0.5.0_MacOSX_PowerPC.tgz
Solaris 5.8 (Sun SPARC)
impute_v0.5.0_Solaris5.8_SPARC.tgz
Solaris 5.10 (AMD Opteron)
impute_v0.5.0_Solaris5.10_Opteron.tgz
SLES 10 (Intel Itanium2) impute_v0.5.0_Itanium2_SLES10.tgz
Windows MS-DOS (Intel)
impute_v0.5.0_Windows_Intel.tgz

Please fill out the registration form to receive emails about updates to this software.

To unpack the files use a command like

tar zxvf impute_vX.X.X_i386.tgz

This will create an executable called impute and a directory /example that contains the example files.

Running IMPUTE (top)

IMPUTE is a command line program. To illustrate its use we have included an example dataset in the directory /example

If you are a new user we suggest you spend some time working with the example files to get used to the input and output file formats, the command line options and flags and the effect they have on the results.

To run the program on the example file use

./impute -h example/haplo.txt -l example/legend.txt -g example/geno.txt -m example/map.txt -s example/strand.txt -Ne 11400 -int 62000000 63000000

This command runs IMPUTE on the example files and specifies that imputation is carried out from position 62,000,000bp to 63,000,000bp i.e. 62Mb to 63Mb.

This will produce the following screen output. We have annotated this output with comments in blue.

bash$ ./impute -h example/haplo.txt -l example/legend.txt -g example/geno.txt -m example/map.txt -s example/strand.txt -Ne 11400 -int 62000000 63000000

IMPUTE v0.5.0
=============

Copyright 2006 Jonathan Marchini
Please see the LICENCE file included with this program for conditions of use.

haplotypes file : example/haplo.txt
    legend file : example/legend.txt
 genotypes file : example/geno.txt
       map file : example/map.txt
              <----  list of input and output files
    strand file : example/strand.txt
    output file : ./out
      info file : ./info

imputation interval : [62000000,63000000]
      <---- specifies the region of imputation from -int option
reading genetic map...done
reading haplotypes
 # ind = 120
 # snps read in = 1129
reading genotypes
 # ind = 50
 # SNPs with genotypes read in = 250
reading strand file
 # SNPs in strand file = 250
 # SNPs in imputed region that have had strand assigned = 250

Summary :
122 SNPs in left-hand buffer region
223 SNPs in right-hand buffer region
662 type 1 SNPs will be in output file (type 1 = SNP in haplotype file only)
141 type 2 SNPs will be in output file (type 2 = SNP in haplotype file and genotype file)
27 type 3 SNPs will be in output file (type 3 = SNP in genotype file only)
830 SNPs will be in output file in total
1175 SNPs in total

-using strand file to orientate strand
 --flipped strand at 103 genotyped SNPs out of a total of 204
   <---- details of strand alignment
-aligning allele labels of haplotypes and genotypes
-removing non-aligned genotyped SNPs
 --removing 0 genotyped SNPs out of a total of 204

setting weights...done
setting storage space...done
setting mutation matrices...done
setting switch rates...done

Estimated RAM required is 74.115Mb


      n_hap : 120
      n_gen : 50
       nind : 50
   interval : [62000000, 63000000]
     buffer : 250
     <--- this is the buffer region (in kb) used on each end of the region to avoid edge effects
         Ne : 11400   <--- this is the Ne value used in the model
call_thresh : 0.900   <--- this is the threshold used to call genotypes from the input genotype file
      theta : 0.18655
      model : 4

 predicting individual [50/50] [forward sweep]  [backward sweep]  [predict]

Breakdown of impution accuracy at SNPs with genotypes in the input file
  This assessment only uses genotypes in input file that are called above threshold of 0.90
  There are 7024 such genotypes in total
  For each of these genotypes the maximum imputed genotype calls are distributed as follows
  Interval  #Genotypes %Concordance         Interval  %Called %Concordance
  [0.0-0.1]          0          0.0         [ >= 0.0]   100.0         95.9
  [0.1-0.2]          0          0.0         [ >= 0.1]   100.0         95.9
  [0.2-0.3]          0          0.0         [ >= 0.2]   100.0         95.9
  [0.3-0.4]          0          0.0         [ >= 0.3]   100.0         95.9
  [0.4-0.5]         32         40.6         [ >= 0.4]   100.0         95.9
       imputation accuracy
  [0.5-0.6]        175         51.4         [ >= 0.5]    99.5         96.1  <--- For genotypes in the input file
  [0.6-0.7]        155         65.8         [ >= 0.6]    97.1         97.3       this says that using a calling
  [0.7-0.8]        163         77.3         [ >= 0.7]    94.8         98.0       threshold of 0.5 99.5% of
  [0.8-0.9]        305         82.3         [ >= 0.8]    92.5         98.5       imputed genotypes would be
  [0.9-1.0]       6194         99.3         [ >= 0.9]    88.2         99.3       called and 96.1% of those are
                                                                                 concordant/correct.
finito   <--- this says 'I am finished' in Italian


Here are a few more examples that illustrate how various options and flags can modify the behaviour of IMPUTE.
See below for a full description of the options, input file formats and output file formats.


Example 1 This command uses the internal strand alignment (using the -fix_strand flag) rather than using a strand file (using the -s option). If you run this you should see that the accuracy is very similar to that obtained when using the strand file in the example above.
./impute -h example/haplo.txt -l example/legend.txt -g example/geno.txt -m example/map.txt -fix_strand -Ne 11400 -int 62000000 63000000

Example 2 This command differs from the first example in two ways. The -os 2 option specifies that only Type 2 SNPs (SNPs that occur in both the genotype file and the haplotype file) should occur in the output file. The -ps flag specifies that these SNPs should have their genotypes overwritten with predictions in the output file.
./impute -h example/haplo.txt -l example/legend.txt -g example/geno.txt -m example/map.txt -s example/strand.txt -Ne 11400 -int 62000000 63000000 -pgs -os 2

Example 3 The -exclude_snps option specifies a file that lists SNPs to be excluded from the genotype file. Imputation will be carried out  ignoring the data at these SNPs and these SNPs should not appear in the output. The -impute_excluded flag modifies the behaviour of the -exclude_snps option. It specfies that the SNPs excluded should be imputed i.e. these SNPs will appear in the output file but their  genotypes will be over-written with predictions.
./impute -h example/haplo.txt -l example/legend.txt -g example/geno.txt -m example/map.txt -s example/strand.txt -Ne 11400 -int 62000000 63000000 -exclude_snps example/exclude.txt -impute_excluded

Using IMPUTE with the HapMap Data (top)

A main use of this program will be imputing genotypes based on the HapMap Phase II haplotypes. To facilitate this use we have prepared the HapMap Phase II haplotypes in the format required by IMPUTE for all 22 autosomes. Be careful to make sure your genotype data uses base-pair positions that are matched to the genome-build used by the haplotype, rate and strand files. We recommend that genome-wide imputation of genotypes be carried in relatively small chunks to avoid running out of RAM on your computer. For imputation of the WTCCC dataset we used a chunk size of 7Mb. The imputed chunks were then concatenated together to produce an imputed file for each chromosome. The chunk size can be specifed using the -int option. The -buffer should also be used to avoid edge effects of imputing in relatively small chunks.

HapMap rel#24 - NCBI Build 36
(dbSNP b126)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[YRI]
Recombination rate files (nb. these are the same as the rel#22 rates)
[CEU]   [YRI]   [COMBINED]
Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]

HapMap rel#22 - NCBI Build 36 (dbSNP b126)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[YRI]
[JPT+CHB]
Consensus files - these files contain SNPs that occur in all 3 of the HapMap panels.
There are also files for all combinations of the panels, which are useful for imputation
of admixed individuals.
Single panels
Pairs of panels
Combined panels
[CEU]                      
[YRI]
                
[JPT+CHB]
[CEU+CHB+JPT]    
[CEU+YRI]
   
[CHB+JPT+YRI]
[CEU+YRI+CHB+JPT]

 
[Legend files]

Recombination rate files [CEU]   [YRI]   [COMBINED]
Strand files Affy500k [These were constructed using these Affymetrix annotation files - Nsp Sty]
Affy6.0 [These files were created using this Affymetrix annotation file - LINK]

HapMap rel#21 - NCBI Build 35 (dbSNP b125)
Polymorphic files - these files contain SNPs polymorphic in each panel respectively
i.e. the CEU haplotypes only contain data at SNPs that are polymorphic in the CEU panel.
The files contain the haplotypes and associated legend files.
[CEU]
[
YRI]
[
JPT+CHB]
Recombination rate files [CEU]    [YRI]    [JPT+CHB]    [COMBINED]
Strand files Affy500k [These files were constructed using these Affymetrix annotation files - Nsp Sty]

X Chromosome Imputation (top)

IMPUTE can carry out imputation of genotypes on the X chromosome but it is slightly more complicated.
There are 3 special flags associated with X chromosome imputation (-chrX, -Xpar and -sample). See the option list below for more details.
The pseudoautosomal (par) and non-pseduoautosomal  (non-par) regions of chromosome X are dealt with in slightly different ways.

We have put together a set of files for X chromosome imputation [chrX_files.tgz]. See the included README file for a complete description of the files.

Here is an example of using these files for carrying out imputation in the non-pseudoautosomal region of the X chromosome. The output format is the same as running IMPUTE on the autosomes. Males are reported as having 3 posterior probabilities for each genotype but the heterozygote probability will always be 0. The AA and BB homozygote probabilities for males correspond to the posterior probabilities of carrying the two alleles A and B respectively.

./impute -chrX -h chrX_files/genotypes_chrX_CEU_r21_nr_fwd_non-par_phased_by_snp_no_mono -l chrX_files/genotypes_chrX_CEU_r21_nr_fwd_non-par_legend.txt -m chrX_files/genetic_map_chrX_non-par.txt -s chrX_files/Affy500k_chrX_non-par.strand -g chrX_files/chrX.example.gen -sample chrX_files/chrX.example.sample -Ne 11400 -int 4000000 4100000

Options (top)

Flags
Required/Optional
Default Description
-h <file>
Required

File containing a set of known haplotypes for the region of interest. The alleles of the haplotypes should be coded as 0 and 1. The format of this input file is one line per SNP and one column per haplotype.
-l <file>
Required
Legend file for haplotypes file which give rs ID, position and the alleles that are coded as 0 and 1 in the haplotypes file. The alleles should be taken from A, C, G and T. Note that this file needs a header line (see the example file legend.txt for details)
-g <file>
Required
File containing a set of genotypes for the set if individuals. The file format is described in detail on the FILE FORMAT WEBPAGE. The file format is the same as the output format from our genotype calling program CHIAMO.
NOTE 1 : The SNPs  MUST appear in base-pair position order (lowest to highest) i.e. the 3rd column of this file must be sorted.
NOTE 2 : Base-pair positions of SNPs must use the same genome build as that used in the haplotype file.
-g_gz
Optional

Specify that the genotype file is gzipped.
-m <file>
Required
Fine-scale recombination map covering the region at which impution is required. There is one line for each position on the map. The first column contains the base pair position, the 2nd column contains the recombination rate in cM/Mb to the next point on the map and the 3rd column contains the recombination map position in cM.Note that this file needs a header line (see the example file map.txt for details)
-Ne <int>
Required
Sets effective population size that scales the fine-scale recombination map for the given population. For example, -Ne 11000 sets the effective population size to 11000. For autosomal chromosomes, we highly recommend the values 11418 for CEPH, 17469 for Yoruban and 14269 for Chinese Japanese populations.
-int <lower> <upper>
Required

Lower are Upper boudaries (in base pair position) of the region in which imputation should be carried out.
-s <file>
Optional
File listing the strand orientation of the SNPs in the genotype file relative to the orientation of the alleles in the haplotypes file. This is file is required if the orientation of alleles at SNPs in the haplotype and genotype files does not match up. The file should contain a line for each SNP in the genotype file with two entries (i) the base-position of the SNPs, and (b) the strand (+ or -) of the alleles in the genotype file. SNPs  do not have to be in the same order as in the genotypes file and the file can include SNPs that are not in the genotypes file i.e. if the genotypes file has had some SNPs filtered out. Take a look at the example files for an illustration of the required format.
NOTE : It is critical that the alleles used to code genotypes in the haplotype file and the genotype file match up. If not, then the quality of imputation may decrease substantially. Great care should be taken in constructing a strand file for your data.
NOTE : see the -fix_strand and -no_remove options below which control the internal strand alignment functions.
-fix_strand
Optional

This flag invokes an internal strand alignment at SNPs that occur in both the genotypes and haplotypes files. It is based on the allele labels (at non A/T and G/C SNPs), and discorandant allele frequencies (at A/T and G/C SNPs ). 
-no_remove
Optional

This flag turns off the default removal of all SNPs in the genotype file that are not aligned. The removal of SNPs is carried out after any specified strand file has been applied and after the checks described in the previous option have been applied.
-o <file>
Optional
./out
Name of main output file that will contain the imputed genotypes. The files has one line per SNP and has exactly the same format as the genotypes file format. NB the program will estimate probabilities for all genotypes including those that are known in the genotypes file (this allows an asssesment of genotyping errors and imputation of missing data at these SNPs)
-o_gz
Optional

Specify that the output file should be gzipped.
-i <file>
Optional ./info
Name of the file that information measures that describe theaccuracy of imputation at each SNP. This file contains one line per SNP that contains SNP ID, rs ID, position, expected allele frequency of the SNP, a measure of the observed statistical information associated with the estimate of the allele frequency and an alternative confidence score for the SNP (calculated as the average of the maximum posterior probabilities of the imputed genotypes). The information measure and the confidence score will be 1 if the SNP is imputed with hign confidence. Both measures decrease towards 0 as imputation confidence decreases.
-r <file>
Optional
./summary
Specify file where a copy of the screen output is written.
-buffer <int>
Optional
250
To avoid edge effects in the imputation the program includes genotypes either side of the interval specified by the the -int flag. This option specifies the length of the buffer region (in kb) at each end of the interval.
-call_thresh <double>
Optional
0.9
Threshold for calling genotypes in genotype input file. The genotype with the maximum probability will be used if that probability is above the threshold. Otherwise the genotype will treated as missing.
-nind <int>
Optional

Specify the number of individuals to impute i.e. the impute just the 1st individual use -nind 1
-exclude_snps <file> Optional
Exclude a set of genotyped SNPs (i.e. SNPs that occur in the file specified by the -g option) with ID equal to those listed in the file. The IDs can be either the rs ID or the alternate ID given in the first column of the genotype file. These SNPs will not be used for imputation and will not occur in the output files.
-impute_excluded
Optional

This flag modifies the behaviour of the -exclude_snps option. For Type 2 SNPs that have been excluded it places imputed genotypes in the output file.
-os <int>
-include_snps <file>
Optional
Optional
1 2 3

The SNPs that are included in the output are controlled by the combination of the -os and -include_snps options.

The -os option controls which types of SNPs are included in the output. There are three types of SNPs
1 = SNPs that occur ONLY in the haplotypes file
2 = SNPs that occur in BOTH the haplotypes and genotypes file
3 = SNPs that occur ONLY in the genotypes file

You can specify more than one type of SNP using the -os option. For example, using -os 1 2 would output SNPs in the haplotypes file. The default setting is to produce output at all snps i.e -os 1 2 3.

Using -os 2 is a useful if all you require is an LD-based estimate of the genotypes at SNPs in the genotypes file and can be substantially quicker than the default setting.

The -include_snps option specifies a list of SNPs to be included in the output BUT this list only applies to those SNPs that appear only in the haplotype file i.e the SNPs specified by -os 1. The IDs should be the rsIDs given in the legend file that corresponds to the haplotypes file.
-pgs
Optional

For SNPs that occur in the genotype file the default is now to return these genotypes in the output file rather than their predictions (which was the old default). The -pgs flag (which stands for predict genotyped snps) can be used to specify that the predictions should be written to the output file.
-outdp <int>
Optional
2
Specify the number of decimal places used to report the genotype probabilities.
-chrX
Optional

Specify this flag if you want to impute genotypes on the X chromosome. The haplotype files, legend file, map file and strand file should set appropriately. A sample file must also be supplied (see -sample below).
-Xpar
Optional

Controls whether you wish to do imputation in the pseudoautosomal or non-pseudoautosomal region of the X chromosome. If the flag is given it specifies that you are working in the pseudoautosomal region. If the flag is absent it specifies that you are working in the non-pseudoautosomal region. Only works when used in conjunction with the -chrX flag.
-sample
Optional

Sample file (see FILE FORMAT WEBPAGE for more details) containing a covariate named 'sex' specifying the sex of all indviduals in the genotype file. Males should be coded 1 and females coded 2.

FAQ (top)

Q. How do I code missing genotypes in the genotype file?
A. Internally, IMPUTE turns the probabilities in the genotype file into a single genotype by choosing the genotype with the maximum probability if it is greater than the threshold value supplied by the -call_thresh option (default is 0.9). If the threshold isn't reached then the genotype is set to missing. So if yo want to force a missing genotype then using (0 0 0) as the set fo genotype probabilities will work with the default threshold.

Q. How do I create a strand file?
A. The creation of strand files is difficult. You need to work out which strand of the human reference sequence the alleles for each SNP have been expressed against. This will depend on the genotyping chip/method used to measure the genotypes so you will need to refer to the appropriate annotation files for the platform you have used. We have supplied strand files for the Affy 500k chip that work with the build 35 release of the HapMap haplotypes (also available from this website in the correct format for IMPUTE). We are working on supplying strand files for other GWA chips and these will appear on the website. Finally, v0.3.0 introduced some internal checks that attempt to align the strands of the genotype and haplotype files (see above). These checks are particulary useful for the Illumina 300, 550 and 650 chips which do not have any A/T and G/C SNPs on them so that the strand of the genotype data can be aligned to the strand of the HapMap haplotypes using the alleles labels alone.

Q. Why do I get the message "rs numbers don't  agree"?
A. SNPs from the haplotype and genotype files are aligned on their base pair position. Once aligned IMPUTE checks to see if the rs id from the legend file matches the rs id from the genotype file. If they don't match IMPUTE prints the message. The most likely explanation is that the rs ids in the legend file and genotype file were created from different sources i.e. different versions of dbSNP. For example, the legend files available from the IMPUTE webiste (above) were created from the HapMap project and used the rs ids of the SNPs from dbSNP at the time of that project i.e. over a year ago. The genotype file you use will probably have rs ids from some later version of dbSNP e.g. the annotation file from one of the Affymetrix or Illumina chips. In dbSNP SNPs with different rs ids can get merged into one SNP if they get information that leads them to believe they are the same SNP so it is possible that  SNPs can have the same base pair position but different rs ids. If you see some of these messages then its is worth querying the mis-matching rs ids in dbSNP to check that this is the cause. For example, querying rs7446851 in dbSNP shows that this id was merged with another rs id
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs7446851. This is the kind of thing you would be looking for.

Q. Why does IMPUTE terminate with the message "terminate called after throwing an instance of 'std::bad_alloc'" or something similar?
A. The most likely cause is that you have tried to run IMPUTE on a whole chromosome and run out of RAM on your computer. See the advice above on how to use IMPUTE on whole chromosomes.

Q. How do i know whether IMPUTE is working as it should and giving good predictions?
A. In v0.5.0 we introduced new screen output that attempts to gauge the accuracy of the imputation using the known genotype data you have supplied using the -g option. IMPUTE predicts all of the data at the genotyped SNPs in a leave-one-out fashion. These predictions are then compared to the supplied genotypes to assess accuracy. The level of accuracy that is obtained will be close to the accuracy obtained at other imputed SNPs that do not occur in the genotype file. This information appears at the end of the screen output. In the example given above (which is real data) the concordance rate and missing data rate when calling imputed genotypes at a threshold of 0.9 was 99.3% and 88.2% respectively.

Version History (top)

0.1.1 07-06-2007 First version made available
0.2.0 29-07-2007
  • Minor bug fix for imputation
  • Speed-up for computation
  • Addition of -outdp flag for controlling precision of output files.
  • Fixed bug in screen output that counts SNPs will be in output.
  • For SNPs with genotype data but no haplotype data the original genotypes are now returned. In version  0.1.1 a uniform distribution on all genotypes was returned.
0.2.1
22-10-2007
Added LICENCE
0.3.0
04-12-2007
New features
  • added -g_gz that specifies that the genotype file is gzipped.
  • added -o_gz that specifies that the output file should be gzipped.
  • for SNPs that occur in the genotype file the default is now to return these genotypes in the output file rather than their predictions (which was the old default). There is a new -pgs flag (which stands for predict genotyped snps) which can be used to specify that the predictions should be written to the output file.
  • added some new methods for strand alignment of the genotype and haplotype files. A check is now done to see if the strand can be determined from the allele labels and at A/T and G/C SNPs allele frequencies are checked to see which is the best alignment. Also, all non-alignable SNPs are removed. There are also -no_fix and -no_remove flags that turn off the allele label/frequency matching and the non-aligned SNP removal steps.
0.3.1
17-12-2007
  • Fixed small bug in reporting of the number of SNPs that have their strands flipped due to allele-mismatch. Was reporting 0 and now reports correct number
  • Fixed bug in -g_gz option
0.3.2
18-01-2008
  • Fixed bug in feature added in v0.3.0 to align strand using allele labels and allele frequencies. The bug only affected A/T and G/C SNPs and only if the order of the two alleles differed between the legend file and the genotype file.
  • when using -o_gz option the extension .gz is now added to the output file name specified by -o
0.4.0
26-05-2008
New features
  • the -no_fix flag has been replaced by the -fix_strand flag. Now by default the internal strand alignment is not carried out and must be invoked using the -fix_strand flag.
  • the -impute_excluded flag has been added which causes the output to contain imputed genotypes at SNPs which were excluded using the -exclude_snps option.
  • the software comes with new (larger) example files
  • the info file now contains the expected allele frequency and a measure of the observed statistical information assocaited with the allele frequency estimate. The Brier's score measure has been removed.
  • The screen output now finishes by reporting a breakdown of imputation accuracy at SNPs which have genotypes in the input file. The imputed genotype probabilities are compared to genotype calls from the input genotype file. The distribution of the maximum imputed genotype calls are given. This allows the user to assess how well imputation is performing. The accuracy obtained on these SNPs is likely to be a good reflection of the accuracy of the imputation at other SNPs not in the genotype input file.
  • The screen output now reports a fuller breakdown of the SNPs that will be in the output file.
  • The -int option is now required.
0.4.1
28-05-2008
New feature
  • we have relaxed the constraints on the format of the strand file. Now the SNPs in the strand file do not have to occur in exactly the same order as the SNPs in the genotype file. Also, the strand file can contain information on SNPs not in the genotypes file. This is useful in the situation where you have a strand file for all the SNPs on a genotyping chip but have filtered some SNPs out of the genotype file.
0.4.2
16-06-2008
New features
  • added -r option to specify a file where screen output is copied (default ./summary)
  • added a check for duplicate SNPs in genotype file
Bug fix
  • fixed bug in the reading of the strand file. This bug only affected v0.4.1.
0.5.0
01-08-2008
New features
  • added support for X chromosome imputation with associated options -chrX, -Xpar and -sample.

References (top)

[1] J. Marchini, C. Spencer. Y.Y. Teo and P. Donnelly (2007) A Bayesian Hierarchical Mixture Model for Genotype Calling in a multi-cohort study. (in preparation)
[2] J. Marchini, B. Howie, S. Myers, G. McVean and P. Donnelly (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genetics 39 : 906-913 [Free Access PDF][Supplementary Material][News and Views Article]
[3] The Wellcome Trust Case Control Consortium (2007) Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
Nature 447;661-78. PMID: 17554300 DOI: 10.1038/nature05911

Contact Information (top)

If you have any questions regarding the use of this program please send an email to both the following people

Dr. Bryan Howie (
howie <at> stats <dot> ox <dot> ac <dot> uk).
Dr. Jonathan Marchini (marchini <at> stats <dot> ox <dot> ac <dot> uk).

It is a good idea to include a copy of the screen output (in the ./summary file) with your email which helps us identify any problems.