META

META is a program for the meta analysis of genome-wide association studies. The program is designed to synthesizing the evidence from different association studies. Particularly, the program is able to work seamlessly with the output of SNPTEST. This program was used in the meta analysis of the genome-wide association studies of smoking related traits [1]

Home Input File Formats Output Summaries Running META Options Perl Script
Contributors
Download
Version History References
Contact Information

Contributors (top)

The following people have contributed to the development of the software for META:

Jason Liu, Jonathan Marchini

Download (top)

Pre-compiled versions of the program and example files can be downloaded from the links below. If you intend to run META on a machine running an old kernel then you probably want to use the dynamic version. If you have any problems getting the program to work on your machine please contact me.

Platform
File
Linux (x86_64) Dynamic Executable
meta_v1.7_x86_64_dynamic.tgz
Linux (x86_64) Static Executable meta_v1.7_x86_64_static.tgz
Mac OSX
meta_v1.7_Mac_OSX.tgz

Please fill out the
registration form to receive emails about updates to this software.

To unpack the files use a command like

tar zxvf meta_vX.X_x86_64.tgz

This will create an executable called META and a directory /examples that contains example data files.

Input File Formats (top)

META reads plain text files at input, with each line of each file represeting the information of a SNP. Although the format is quite flexible, following column names must be provided:

rsid SNP id.
pos Base-pair position of SNP.
allele_A non-coded allele (a.k.a non-effect allele, non-reference allele).
allele_B coded allele (a.k.a effect allele, reference allele).
info imputation quality score
(RSQR_HAT column in MACH; INFO column in PLINK; PROPER_INFO column in SNPTEST).
P_value p-value of each SNP.
beta effect size of each snp.
se standard error of effect size.

Some other columns, e.g. chr which is chromosome number (1-22), and coded_af which is the coded allele frequencies, can be aslo provided. If the chr column is specified, the input file can contain SNPs from difference chromosomes, otherwise, SNPs are assumed to come from the same chromosome. Note that for --method 3 (z-statistics combination method), beta and se are not required, only the direction of effect size is needed. An example of input file is given below ( this is not a real data set):

chr rsid pos allele_A allele_B P_value info beta se
1 rs16969968 76669980 A G 0.027185 0.99025 -0.12571 0.056914
1 rs518425 76670868 A G 0.012406 0.98888 -0.15238 0.060954
2 rs514743 76671282 A T 0.91281 0.9997 0.0061483 0.056075
3 rs615470 76673043 C T 0.90384 0.99988 0.0067651 0.055996
6 rs12899226 76674493 G T 0.69283 0.99717 -0.050464 0.12883
6 rs660652 76674887 A G 0.90419 0.99943 -0.0067418 0.056007
6 rs472054 76675049 A G 0.90419 0.99943 -0.0067418 0.056007
15 rs8029939 76675404 C T 0.96537 0.91413 -0.027428 0.63172
15 rs578776 76675455 A G 0.013069 0.98698 0.1537 0.061882
15 rs6495307 76677376 C T 0.77301 0.99926 0.016023 0.055553
15 rs12910984 76678682 A G 0.0032279 0.98707 -0.19692 0.066864
15 rs1051730 76681394 A G 0.030083 0.96504 -0.12551 0.058043
...


META can use the output files of SNPTEST as its input files because all the information mentioned above is already included in the output of SNPTEST. See SNPTEST Mode for how to read them and here for the details of output of SNPTEST.

Output Summaries (top)

The output file of META contains a line for each SNP and there is a header line which specifies the contents of each column. The following table give a description of each of the entries in this file.

chr
Chromosome number (if you specified the chromosome in input files)
rsid SNP id (taken from input files).
pos Base-pair position of SNP (taken from input files).
allele_A non-coded allele (taken from input files).
allele_B coded allele (taken from input files).
P_value combined p-value.
beta combined effect size.
se combined standard error of effect size.
Q Cochran's Q statistics.
P_heterogeneity p-value for heterogeneity.
I2 percentage of total variation across studies that is due to heterogeneity.
P_cohort_1, ..., P_cohort_n p-values of cohort 1, ..., cohort n.

Running META (top)

To run META and see the parameters it requires, simply type:

./meta

META will read gzipped or non-compressed files at input. Output files will be gzipped if the main input data file is gzipped.

Following command gives a simplest use of META:

./meta \
--method 1 \
--cohort examples/example1.txt examples/example2.txt \
--output meta.txt

or if the input files are gzipped:

./meta \
--method 1 \
--cohort examples/example1.txt.gz examples/example2.txt.gz \
--output meta.txt

This will combine the data from the files at each SNP in example1.txt and example2.txt, saving the results into the file meta.txt. The SNPs in the output file are a union of SNPs in the input files. So the number of cohorts used to combine information at each SNP can be different, as some SNPs only can be found only in some cohorts (due to different genotyping platforms, imputation quality, etc)

The following is an example using input files that have a chr column

./meta \
--method 1 \
--cohort examples/example3.txt examples/example4.txt \
--output meta.txt

Meta-analysis method (top)

There are three different meta-analysis methods available controlled by the -method option

--method 1 : inverse-variance method based on a fixed-effects model.

Let ß
i, σi2 and λi ere the β estimate, β-estimate variance and genomic control λ estimate for the ith cohort.

Let V2 =
Σi 1 / (λi σi2) then  ßMETA = Σi ßi / (λi σi2) / V and σMETA = 1 / V.

The overall Z-statistics is then calculated as ZMETAßMETA / σMETA and this is assumed to have a standard Normal distribution under the null.

The genomic-control
λi's are specified using the --lambda option (see below).

--method 2 : inverse-variance method based on a random-effects model.

--method 3 : Z-statistic based method

In this approach study-specific P-values and direction of effect are converted into a signed Z-statistic. These Z-statistics are then summed with weights proportional to the square root of the sample size for each study (see
--sample-size option below). The advantage of this approach is that it allows for incompatibility between phenotype units. 

SNP Filters

When combining data across cohorts it is crucial that the information about the alleles at each SNP is consistent. There are a few reasons why this might not be the case. Two (or more cohorts) may differ
(a) There might be differences between cohorts in the strand of the human reference sequence that is used to define the alleles a SNP.
(b) The order of the alleles at a SNP in the input files may differ between cohorts.
(c) There might be real inconsistencies in the alleles reported by each cohort.

META will try to align the allele information across cohorts at a SNP. If it finds inconsistencies at a SNP that cannot be rectified then the SNP will be removed. For example SNP rs16969968  has alleles A and G in the example file example1.txt and alleles A and T in example2.txt. This means the SNP has inconsistent alleles and is removed. When you run the following command you will see that the screen output reports that 1 SNP has been removed.

./meta \
--method 1 \
--cohort examples/example1.txt examples/example2.txt \
--output meta.txt
 

Threshold Imputation Quality (top)

Most meta-analysis of genome-wide association studies is only possible due to imputation of genotypes. Imputation is not a perfect process and sometimes SNPs and indels can be hard to impute. It has become standard to measure the quality of imputation at each SNP or indel via an information measure. . See ref [4] for more details on these measures. These measures lie in the range [0,1]. A information measure close to 1 means that the imputation is very confident that its' predictions are accurate. Many studies have chosen to only use SNPs that have an information measure above some value. Typically that threshold is around 0.4-0.6. By default, META combines p-values at SNP with imputation measure ≥ 0.5. This can be changed by setting --threshold. For example, to produce a result based on SNPs with imputation quality score ≥ 0.9, use command:

./meta \
--method 1 \
--threshold 0.9 \
--cohort examples/example1.txt examples/example2.txt \
--output meta.txt

Specify sample size of each cohort (top)

To use z-statistics combination method (--method 3), sample size of each cohort are required. In our example, the sample sizes of example1.txt and example2.txt are 100 and 120 respectively. To specfiy them, --sample-size option is used and following command is used:

./meta \
--method 3 \
--sample-size 100 120 \
--cohort examples/example1.txt examples/example2.txt \
--output meta.txt

Specify genomic control lambda for each cohort (top)

The test statitics of each cohort can be inflated due to population structure. Therefore, the genomic inflation lambda of each cohort should be checked prior to the meta analysis. And these lambdas should be added into the meta analysis procedure to adjust the standard error of effect size. To achieve this the  --lambda option is used. For example, the following command can be used to specify the genomic control lambda's of example1.txt and example2.txt as 1.05 and 1.08 :

./meta \
--method 1 \
--lambda 1.05 1.08 \
--cohort examples/example1.txt examples/example2.txt \
--output meta.txt

Select SNPs of interest (top)

With --rsid option, we can focus our interest on some specific SNPs or indels. For example:

./meta \
--method 1 \
--cohort examples/example1.txt examples/example2.txt \
--rsid rs1051730 rs16969968 \
--output meta.txt

will output the meta analysis result of two SNPs only: rs1051730 and rs16969968.

Select a subregion of SNPs (top)

The --interval option can specify the lower and upper boundary of the region of interest, in terms of the base-paired positions. See the following example:

./meta \
--method 1 \
--cohort examples/example1.txt examples/example2.txt \
--interval 76500000 77000000 \
--output meta.txt

The output file will contain the results of SNPs in the region [76500000, 77000000).

Select Best SNPs (top)

META can also give the best SNPs in terms of the combined P_value, by using --top-snp option. For example,

./meta \
--method 1 \
--cohort examples/example1.txt examples/example2.txt \
--top-snp 5 \
--output meta.txt

will give top 5 SNPs (in and ascending order of combined p-value) in the meta.txt.

SNPTEST Mode (top)

META is able to directly read the output of SNPTEST as input using the --snptest option. The format of SNPTEST output files changed with the release of SNPTEST v2.5. META will read both the old and new format files.

An example using output files from SNPTEST v2.5 is


./meta \
--snptest \
--method 1 \
--cohort examples/example7.txt examples/example8.txt \
--output meta.txt

An example using the old SNPTEST output format is

./meta \
--snptest \
--method 1 \
--cohort examples/example5.txt examples/example6.txt \
--output meta.txt

When using the --snptest option it maybe that SNPTEST (pre v2.5) was run using the -method expected option which uses the genotype dosages at imputed SNPs and means that there will be no _info column containing the relative information measure for the test being carried out. There is always a info column that measures the relative information about the allele frequency at the SNP or indel. The option --use_info_col tells META to use the info column in the SNPTEST output files. For example,

./meta \
--snptest \
--use_info_col \
--method 1 \
--cohort examples/example5.txt examples/example6.txt \
--output meta.txt

NOTE that if you want to use the --snptest option, you can only specify one model with option --frequentist in SNPTEST. 

Options (top)

A complete set of options is given in the following table :

Parameters Type Description
--method Number (1 to 3) Three different methods used to combine p-values:
1 = inverse variance method (based on fixed-effects model);
2 = inverse variance method (based on random-effects model);
3 = z-statistics combination method (based on fixed-effects model).
--cohort Files A vector of formatted files.
--output File Output file.
--snptest
Flag
Optional, use the output of SNPTEST as input files.
--use_info_col
Flag
Optional, specify use of info column in SNPTEST outpur files (use with --snptest)
--sample-size
Numbers
Optional, a vector of sample sizes for each cohort.
To use z-statistics combination method (method = 3), sample sizes have to be given.
--lambda
Numbers
Optional, a vector of genomic control lambdas for each cohort.
--threshold
Number (between 0 and 1)
Optional, define a threshold of imputation quality score (between 0 and 1), default value = 0.5.
--rsid RSIDs Optional, RSIDs of SNP of interest.
--interval Two numbers Optional, define a subset of SNPs by position (in basepairs) in the range start ≤ position ≤ end.
--top-snp Number Optional, define the number of most significant SNPs that will be output.

Version History (top)


Version
Release Time
Description
1.0
11-3-2010
First version made available:
  • Gzipped input support
  • Input files fully compatible with SNPTEST output
  • Based on fixed-effects model
1.1
20-9-2010
Changes from META v1.0
  • Random-effects model is available
  • Standard input files support
1.2
20-11-2010
Changes from META v1.1
  • Optimize the program structure
1.3
07-06-2011
Changes from META v1.2
  • Abandom the boost library, but still support gzipped input.
1.3.1
16-08-2011
Changes from META v1.3
  • Change the way the read column information in SNPTEST output
1.3.2
22-08-2011
Changes from META v1.3.1
  • Now if using standard input file, an optional column named "chr" is allowed, which means in one intput file, SNPs could be from different chromosomes. If "chr" column is not specifed, META assumes all SNPs in the input file are from the same chromosome.
1.4
19-12-2011
  • Handles indels as well as SNPs i.e. can read alleles that are of arbitrary length.
  • we have added a --use_info_col option. When using the --snptest option it maybe that SNPTEST was run using the -method expected option which uses the genotype dosages at imputed SNPs and means that there will be no _info column containing the relative information measure for the test being carried out. There is always a info column that measures the relative information about the allele frequency at the SNP or indel. This new option tells META to use this column.
1.5
29-03-2013
  • Fixed bug in --top-snp option. This now works correctly.
  • Improvements to how chromosome information is reported in output. When using SNPTEST files the chromosome info is now reported in the META output file. When using standard input files the chromosome is now reported in a separate column and is no longer added to front of base-pair position.
  • Fixed some syntax errors in examples on webpage
1.6
16-09-2014
  • Added support for reading output files from SNPTEST v2.5
1.7
17-08-2014
  • Added support for reading output files from SNPTEST v2.5.2

References (top)

[1] J. Z. Liu, et al (2010) Meta-analysis and imputation refines the association of 15q25 with smoling quantity. Nature Genetics, 42, 436-440
[2] J. Marchini, B. Howie, S. Myers, G. McVean and P. Donnelly (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genetics, 39 : 906-913
[3] P. de Bakker et al (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Human Moleculare Genetics, 17 : R122-R128
[4] J. Marchini and B. Howie (2010) Genotype imputation for genome-wide association studies. Nature Reviews Genetics [Link]

Contact Information (top)

If you have a question please send a mail to our maillist

http://www.jiscmail.ac.uk/OXSTATGEN

You will need to subscribe to the maillist to do this.

IMPORTANT : If you are having a problem with one of the programs please include details of the following when you email.
(a) the version number of the program and the type of computer you are running the program on e.g. SNPTEST v2.1.0 Mac OSX 10.6
(b) include the precise command line(s) you have used
(c) include any log file and/or screen output from the program
(d) sometimes it may be necessary for us to obtain a copy of the data you have so please be prepared to supply this. Otherwise, we may not be able to diagnose the problem.