Flags
|
Required/Optional
|
Description |
-h <file>
|
Required
|
File of known haplotypes,
with one row per SNP and one column per haplotype. Every
haplotype file needs a corresponding legend file (see
below), and all alleles must be coded as 0 or 1 -- no
other values are allowed. See the following
section for links to the relevant HapMap and 1000
Genomes files.
|
-l <file>
|
Required |
A legend file for the SNP
markers. This file should have 4 columns with one line for
each SNP. The columns should contain an ID for each SNP
i.e. rs id of the marker, the base pair position of each
SNP, base represented by 0 and base represented by 1. The
first line of the legend file are column labels (these are
not used by the program but the file is required to
contain a header line). See the example file ex.leg. See
the following section for links to
the relevant HapMap and 1000 Genomes files.
|
-m <file>
|
Required
|
A file containing the
fine-scale recombination rate across the region. This file
should have 3 columns with one line for each SNP. The
columns should contain physical location, rate in cM/Mb to
the right of the marker and the cumulative rate in cM to
the left of the marker. A header line containing
the column labels is required. See the example file
ex.map. See the following section
for links to the relevant HapMap and 1000 Genomes files. |
-dl <int> <a>
<rr1> <rr2> ... |
Required |
Sets location, risk allele
and relative risks for each disease risk. For each disease
SNP, four numbers are required in the following order:
- physical location of SNP, which must be in the
legend file supplied to the -l flag
- risk allele (0 or 1), the corresponding base can be
found in the legend file
- heterozygote disease risk
- homozygote disease risk
For example, -dl 1085679 1 1.5 2.25
2190692 0 2 4 specifies two disease SNPs, at
positions 1085679 and 2190692, and with heterozgyote risks
1.5 and 2, homozygote risks 2.25 and 4, and risk alleles
set to 1 and 0 at each SNP respectively. There is no limit
on the number of disease SNPs. We simulate under a disease
model where the disease SNPs are independent, and the
haplotypes defined by the disease SNPs are in HWE.
This flag is optional for version 2.0.2 and above, when if
not supplied then all haplotypes will be simulated under
the null. |
-n <int> <int> |
Recommended
|
Sets the number of control
and the number of case individuals to simulate. For
example -n 100 200
simulates 100 control and 200 case individuals. The
default is to generate 1 control and 1 case individual. |
-int <int>
<int>
|
Optional
|
Specify the lower and upper
boundaries of the region in which you wish to carry out
simulation. The default is set to 0 and 500000000. |
-o <file> |
Required |
Output file prefix. For
example -o ex.out[.gz]
creates the following files for the case data:
- ex.out.cases.haps[.gz]
- A file containing the simulated haplotype data in
the same format as the file haplotype file supplied to
the -h flag.
- ex.out.legend (from
version 2.1.2 onwards) - A legend file with
information about the SNPs in the .haps files.
- ex.out.cases.gen[.gz]
- A file containing the simualted genotype data in the
file
format compatible with SNPTEST, SNPTEST2,
IMPUTE, IMPUTE2 and GTOOL.
- ex.out.cases.sample -
A sample file in the file
format compatible with SNPTEST2 for the
simulated genotype data.
- ex.out.cases.tags.gen[.gz]
- The genotype data limited to the subset of SNPs
specified by the file supplied to the -t flag (if
applicable).
A similar set of files will be produced for the control
data, with the same file names except that cases are replaced by controls.
A summary file, ex.out.[.gz]summary,
will also be produced, which summarises the simulation
parameters, input files and output files.
Note:
- If the output file prefix has a .gz extension then
the *.haps.gz, *.gen.gz and *.tags.gen.gz files
will be gzipped.
- It is possible to supress some of the output files
using the flags -no_gens_output
and -no_haps_output
(see below).
|
-output_snp_summary
|
Optional |
Output the pvalues and
effect size estimates (under an log additive model test)
for each disease SNP and under a joint model for all of
the disease SNPs in the simulated genotype data. Note,
that for version 2.1.x, this option always used by default
(with no option to switch it off) but it turns out that
this step is very time consuming and has therefore been
made optional from version 2.2.0 onwards. |
-no_haps_output
|
Optional |
No haplotype data files, *.haps[.gz], will be
outputted for the case and control data. |
-no_gens_output
|
Optional |
No genotype data files, *.gen[.gz], will be
outputted for the case and control data. However, if you
have provided an input to the -t
flag then the *.tags.gen[.gz]
will be outputted. |
-t <file>
|
Optional |
SNP subset file. This
option allows the user to output data at only a subset of
the SNP markers in the simulated dataset i.e. at a set of
tag SNPs. The file should contain the physical location of
markers that will be in the output on one line per SNP.
The physical locations must match those in the legend
file. If this option is selected then a .tags.gen output file
will be produced that contains the positions of the SNPs
in the output file. |
-Ne <int> |
Optional |
Sets effective population
size that scales the fine-scale recombination map for the
given population. For example, -Ne
11000 sets the effective population size to
11000. For autosomal chromosomes, we highly recommend the
values 11418 for CEPH, 17469 for Yoruban and 14269 for
Chinese Japanese populations. |
-theta <real>
|
Optional |
Sets mutation rate in the
model. For example, -theta 10
sets the scaled mutation rate to 10. Mutation rate is set
to that the expected number of mutations at a given SNP is
equal to 1 by default. |