iSECA: (independent) SNP Effect Concordance Analysis
Nyholt DR (2014) SECA: SNP effect concordance analysis using genome-wide association summary results. Bioinformatics. 2014 Apr 1. [Epub ahead of print].
To run iSECA:Approach:
iSECA first aligns the independent single nucleotide polymorphism (SNP) effects in the first genome-wide association study (GWAS) summary results file (dataset1) with the SNP effects in the second GWAS results file (dataset2). All SNPs are assumed to be on the positive (forward) strand. iSECA's SNP effect alignment is based on the extensively used METAL (Willer et al 2010) source code. As iSECA requires both the effect allele (EA) and non-effect allele (NEA) for all SNPs, iSECA appropriately accounts for situations when different input files use different reference alleles. Alleles can be coded numerically (A=1,C=2,G=3,T=4) or alphabetically (A,C,G,T,a,c,g,t) and can be on either strand if not an A/T or C/G SNP. For A/T or C/G SNPs, iSECA assumes SNPs to be on a consistent (i.e., "positive" or "forward") strand in the two input files for the results to be interpretable. For other SNPs, iSECA can automatically identify and resolve strand inconsistencies.
Restricting to SNPs associated with P_{1} ≤ {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} in dataset1, iSECA first performs exact binomial statistical tests using the R statistical package (R Development Core Team 2008) to determine whether there is an excess of SNPs associated in both datasets (dataset1 and dataset2) for the subset of SNPs associated with P_{2} ≤ {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} in dataset2. For each of the 144 SNP subsets (generated using these 12 × 12 p-value threshold {P_{1}} ⋂ {P_{2}} combinations), binomial tests of associated SNPs in dataset1 (P_{1}) and dataset2 (P_{2}) are performed. A binomial test 'heatmap' plot is generated to graphically summarise the proportion of SNP subsets with an excess [observed (obs) ≥ expected (exp)] or deficit (obs < exp) number of associated SNPs, and an empirical (permuted) p-value is calculated for the observed number of subsets with nominally significant excess. That is, a permuted p-value (P_{BT-permuted}) for the observed number of binomial tests with obs ≥ exp and P_{BT} ≤ 0.05. We note that for polygenic traits, a well-powered GWAS should produce an excess of smaller p-values. Therefore, instead of specifying P_{2} as the expected proportion of SNPs with P_{1} and P_{2}, the proportion of SNPs associated in dataset2 (irrespective of its association in dataset1) is utilised as the 'overlap' (null) probability (i.e., 'expected proportion') in the binomial tests. For example, the observed proportion of total SNPs with P_{1} ≤ 1 and P_{2} ≤ 0.01 is used as the expected proportion of SNPs with P_{1} ≤ {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} ⋂ P_{2} ≤ 0.01.
iSECA next performs Fisher's exact statistical tests to determine whether there is an excess of SNPs where the effect directions―regression coefficient (BETA) or odds ratio (OR) [OR = exp(BETA)]―are concordant across dataset1 and dataset2 for the subset of SNPs associated with P_{1} ≤ {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} in dataset1 and P_{2} ≤ {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} in dataset2. For each of the 144 SNP subsets (generated using these 12 × 12 p-value threshold {P_{1}} ⋂ {P_{2}} combinations), Fisher's exact statistical tests of SNP effects in dataset1 (BETA_{1} or OR_{1}) and dataset2 (BETA_{2} or OR_{2}) are performed on 2 × 2 tables containing the number of SNPs with both OR_{1} < 1 and OR_{2} < 1 [i.e., OR_{1} < 1 ⋂ OR_{2} < 1 (−−)], OR_{1} ≥ 1 ⋂ OR_{2} < 1 (+−), OR_{1} < 1 ⋂ OR_{2} ≥ 1 (−+), and OR_{1} ≥ 1 ⋂ OR_{2} ≥ 1 (++). A Fisher's test 'heatmap' plot is generated to graphically summarise the proportion of SNP subsets with concordant (Fisher's test odds ratio, OR_{FT} ≥ 1) and discordant (OR_{FT} < 1) SNP effects and an empirical (permuted) p-value is calculated for the observed number of subsets with nominally significant concordance. That is, a permuted p-value (P_{FT-permuted}) for the observed number of Fisher's tests with OR_{FT} ≥ 1 and P_{FT} ≤ 0.05.
An R script ('SECA_Ranalysis_HeatmapPerm.R') allowing users to increase the number of replicates (default = 1000) for the permutation of heatmap p-values using the "independent_aligned_effects.txt" file generated by iSECA can be downloaded here.
Restricting to SNPs nominally associated with P_{1} ≤ 0.05 in dataset1, iSECA next reports detailed results from binomial and Fisher's statistical tests to determine whether there is an excess of overlapping ('pleiotropic') SNPs and whether the effect directions are concordant (or discordant) across dataset1 and dataset2 for the subset of SNPs nominally associated with P_{2} ≤ 0.05 in dataset2.
iSECA next reports detailed binomial and Fisher's test results (where applicable) for two practical subsets: i) SNPs genome-wide significant (P_{1} ≤ 5 × 10^{-8}) in dataset1, and nominally associated (P_{2} ≤ 0.05) in dataset2; and ii) SNPs genome-wide suggestive (P_{1} ≤ 1 × 10^{-5}) in dataset1, and nominally associated (P_{2} ≤ 0.05) in dataset2
A 'grid' search is also performed across 150 p-value thresholds to identify the subset of SNPs overlapping dataset1 and dataset2 that produce the minimum binomial test ('pleiotropic') p-value and minimum Fisher's test (effect concordance) p-value. The search is performed at 5 equidistant p-value {P_{1}, P_{2}} increments for the interval [5 × 10^{-8}, 9 × 10^{-8}], i.e., {5 × 10^{-8}, 6 × 10^{-8}, 7 × 10^{-8}, 8 × 10^{-8}, 9 × 10^{-8}}; 9 equidistant increments for each of the intervals [1 × 10^{-7}, 9 × 10^{-7}], [1 × 10^{-6}, 9 × 10^{-6}], [1 × 10^{-5}, 9 × 10^{-5}], [1 × 10^{-4}, 9 × 10^{-4}], [1 × 10^{-3}, 9 × 10^{-3}]; and 100 increments of 0.01 between 0.01 and 1.0 [0.01, 1.0]. The results from all 150 × 150 = 22,500 SNP subsets are output to enable post-hoc examination of SNP overlap and concordance across the full range of statistical significance in dataset1 and dataset2.
Additionally, iSECA generates quantile-quantile (Q-Q) and true discovery rate (TDR) plots for dataset2 p-values (P_{2}) stratified (conditioned) on dataset1 p-values (P_{1} ≤ {0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0}) to visualise whether there is an excess of overlapping ('pleiotropic') SNPs. A leftward shift in the Q-Q or TDR curve corresponds to an excess of SNPs with smaller p-values. In the presence of pleiotropy, we expect the Q-Q and TDR curves for dataset2 SNP p-values (P_{2}) to deviate further left of the identity line as we condition on them having smaller p-values in dataset1 (P_{1}).
Finally, 'pleiotropy-informed' conditional false discover rate (FDR) results are output for dataset2 p-values (P_{2}) stratified on dataset1 p-values (P_{1} ≤ {1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001}). These results may identify SNPs associated in dataset2 after conditioning on their association significance (P_{1}) in dataset1 and/or prioritise SNPs for follow-up studies.
Input format:
The GWAS summary results should be uploaded as a text file with five
essential columns (tab or space-delimited).
Additional columns may be present, but they will not be used in the
analyses.
Each dataset file must contain the following five columns of data with
a header row listing (case insensitive) column names of 'SNP', 'EA',
'NEA', 'P', and 'BETA' or 'OR':
Column name | Description of data in column |
'SNP' | SNP name (i.e., rsID). Column names of 'MARKER', 'SNPID' or 'rs_number' are also accepted. |
'EA' | Effect allele. Column names of 'A1', 'REF' or 'reference_allele' are also accepted. |
'NEA' | Non-effect allele. Column names of 'A2', 'ALT' or 'other_allele' are also accepted. |
'P' | P-value from the association test. Column names of 'PVALUE', 'PVAL' or 'p-value' are also accepted. |
'BETA' or 'OR' | Regression coefficient (BETA) [BETA=log_{e}(OR)] or odds ratio (OR) [OR=exp(BETA)], for the effect allele (EA) relative to the non-effect allele (NEA). For example, a positive BETA (OR > 1) means EA increases risk relative to NEA. iSECA will automatically convert OR values to BETA values. Column name of 'ZSCORE' is also accepted. |
References:
Nyholt DR (2014) SECA: SNP effect concordance analysis using genome-wide association summary results. Bioinformatics. 2014 Apr 1. [Epub ahead of print].
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007 81(3):559-75.
R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
iSECA's SNP effect alignment is based on METAL (Willer et al 2010) source code (generic-metal-2011-03-25.tar.gz).
Willer CJ, Li Y,
Abecasis GR. METAL: fast and efficient meta-analysis
of genomewide association scans. Bioinformatics. 2010 26(17):2190-1.
Resources:
An R script ('SECA_Ranalysis_HeatmapPerm.R') allowing users to increase
the number of replicates (default = 1000) for the permutation of heatmap
p-values using the "independent_aligned_effects.txt" file generated by SECA
can be downloaded here.
iSECA is coded by Dale R. Nyholt.
Copyright (C) 2013, Dale R. Nyholt. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission.
4. We also request that use of this software be cited in publications as: Nyholt DR (2014) SECA: SNP effect concordance analysis using genome-wide association summary results. Bioinformatics. 2014 Apr 1. [Epub ahead of print].
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Page last updated April 16, 2014.The Neurogenetics Laboratory | The Queensland Institute of Medical Research | Contact Us |