CHIAMO is a program for
calling genotypes from the Affymetrix 500K Mapping chip. The program
allows for multiple cohorts which have potentially different intensity
characteristics that can lead to elevated false-positive rates in
genome-wide studies. The underlying model has a hierarchical structure
that
allows for correlation between the parameters of each cohort. For
more
details see [1].
The output files
produced by CHIAMO feed
directly into both the programs SNPTEST [2] and IMPUTE [2]. CHIAMO was used to call
genotypes for
the 7 genome-wide
association studies carried
out by the Wellcome Trust Case-Control Consortium (WTCCC) [3].
|
Platform |
File |
Linux
(x86_64) Static Executable |
chiamo_v0.2.1_x86_64_static.tgz |
Linux (x86_64) Static
Executable (SuSE 9.3) |
chiamo_v0.2.1_SuSE9.3_x86_64_static.tgz |
Linux
(x86_64) Dynamic Executable |
chiamo_v0.2.1_x86_64_dynamic.tgz |
Linux
(i386) Static Executable |
chiamo_v0.2.1_i386_static.tgz |
Linux
(i386) Dynamic Executable |
chiamo_v0.2.1_i386_dynamic.tgz |
Mac
OS X 10.4.11 (Intel) |
chiamo_v0.2.1_MacOSX_10.4_Intel.tgz |
Mac OS X 10.5.1 (Intel) | chiamo_v0.2.1_MacOSX_10.5_Intel.tgz |
Mac
OS X (PowerPC) |
chiamo_v0.2.1_MacOSX_PowerPC.tgz |
Solaris
5.8 (Sun SPARC) |
chiamo_v0.2.1_Solaris5.8_SPARC.tgz |
Solaris
5.10 (AMD Opteron) |
chiamo_v0.2.1_Solaris5.10_Opteron.tgz |
tar zxvf chiamo_vX.X.X_i386.tgz |
Version |
Date |
Comments |
0.1.0 |
07-06-2007 |
First
version |
0.1.1 |
20-06-2007 |
Ability
to handle gzip'd and bzip2'd intensity files added |
0.2.0 |
07-09-2007 |
Significant
reductions in the run time of the algorithm through the addition of -single
option and improvements made to the -approx
option. |
0.2.1 |
22-10-2007 |
Addition
of a LICENCE |
A substantial gain in speed can be achieved by using -single and -approx in conjunction, as the algorithm can perform updates of the model with more computational efficiency. Chiamo is run in two stages during which these options may be applied . We advocate the use of the first stage to resolve the cluster centres and covariances, and the second to accurately estimate the posterior probability of each genotype call. Therefore, when appropriate we suggest applying both approximations to first stage, and then dropping the single cohort approximation for the second stage allowing each cohort to adapt to possible shifts in cluster location. e.g. using the flags -single 1 -approx 1 20.
./chiamo -i ./example/cohort1.txt ./example/cohort2.txt ./example/cohort3.txt ./example/cohort4.txt ./example/cohort5.txt ./example/cohort6.txt ./example/cohort7.txt ./example/cohort8.txt ./example/cohort9.txt -f ./example/freq.txt -max1 -max2 -nmax 200 -n 0 -b 0 -o ./example/output |
./chiamo -i
./example/cohort1.txt ./example/cohort2.txt ./example/cohort3.txt
./example/cohort4.txt ./example/cohort5.txt ./example/cohort6.txt
./example/cohort7.txt
./example/cohort8.txt ./example/cohort9.txt -f ./example/freq.txt
-max1
-max2 -nmax 200 -n 0 -b 0 -o ./example/output -single 1 -approx 1 20 |
Flags |
Required/Optional |
Default |
Description |
-i
<file_1> ...... <file_n> |
Required |
Specifies
n input files that contain the normalized intensity data for the n
cohorts. The normalized intensity files are created from the raw CEL file data using a program written by Hin-Tak Leung (hin-tak.leung@cimr.cam.ac.uk) and is available from the website http://www.wtccc.org.uk/info/software.shtml |
|
-gz |
Optional |
Specifies
that the input intensity files (specified by the -i flag) have been
gzipped and have the .gz file extension. The intensity files can be very big so gzip-ing the files can save a lot of disk space. There is very little difference in run time if the files have been compressed. |
|
-bz2 |
Optional |
Specifies
that the input intensity files (specified by the -i flag) have been
bzip2-ed and have the .bz2 file extension. The intensity files can be very big so bzip2-ing the files can save a lot of disk space. There is very little difference in run time if the files have been compressed. |
|
-f
<freq_file> |
Optional
|
File
containing allele frequency information for each Affy SNP i.e. derived
from the HapMap data. This information is used as a prior on allele frequency. There should be a line for each SNP in the same order as SNPs in the input file. Each line should have the following 5 entries : RS_ID, position, Affy_A_Allele, Affy_B_Allele, frequency of Affy_A_Allele. The allele frequency files that we have used for the Affy 500K chip are avialble from this link Affy500K_Allele_Frequency_Files.tgz |
|
-snps
<n> <m> |
Optional | Run
the program from the nth SNP to
the mth
SNP in the input files. Otherwise the program will run on each SNP sequentially. We recommend that each chromsome be processed in relatively small chunks. |
|
-max1 |
Optional | Attempt
to maximize the posterior in stage 1. The default is use MCMC to obtain
a sample from the posterior. |
|
-max2 |
Optional | Attempt
to maximize the posterior in stage 2. The default is use MCMC to obtain
a sample from the posterior. |
|
-nmax |
Optional | 40 |
Number
of stage 1 iterations. |
-b <int> | Optional | 200 |
Number
of burn-in iterations in stage 2 |
-n <int> | Optional | 1000 |
Number
of sampling iterations in stage 2 |
-o
<o_file> |
Optional | ./out |
The
program will produce an output file for each cohort with names o_file_1_mcmc,
...., o_file_n_mcmc. The output files will contain a line for each SNP with sets of 3 posterior probabilities of individual in the same order as individuals appear in the input files. It will also produce a file o_file_params which contains the final parameter estimates produced by the program and a file o_file_information that contains information measures for the calls at each SNP. The format of the output files is designed to be the input fil format for the programs SNPTEST and IMPUTE (see the FILE FORMAT WEBPAGE for more details). |
-chrX
<file> |
Optional | For
chromosome X data the program takes a file containing an indication of
the sex of each individual in the same order as the individuals appear in the input files. 1 = male, 2 = female. |
|
-no_null |
Optional | Do
not run with a 4th NULL cluster. |
|
-single
<int> |
Optional | When
multiple cohorts are to be called treat them as a single cohort
for stage 1 (use
-single 1) or for both stage 1 and stage 2 (use -single
2). |
|
-approx
<stages> <grid> |
Optional | The
first argument to this flag determines whether to apply the
approximation to just stage 1 (set to 1),or stage 1 and 2 (set to two).
The second argument specifies the density of the grid; the smaller the
number the cruder the approximation and the faster the program runs.
Note that when
this option is used the algorithm
is run for a single update without the approximation to provide
parameter estimates based on
the uncondensed data that
the
approximation uses. |