-
Notifications
You must be signed in to change notification settings - Fork 5
HOME
FORGE is tool to perform gene based Genome-Wide Association Studies. It allows to combine information from different genetic variants into a single statistic. We have shown it provides additional power to detect true disease loci and it is useful to perform pathway or network analyses (Pedroso et al . submitted) If you are interesting in learning how to use FORGE please refer to out Tutorial section, if you know how to use it then go straight to the Downloads page. Check the references below for more details.
To run FORGE download it here and:
-
Make SNP input file by extracting the SNP id and P-value from each disease, sub-setting to those with maf >=0.05 and info >=0.80. Format = SNP id followed by p-value, space separated.
-
The gene-set files are the *.gmt provided.
-
You will also need the SNP to gene mappiing files provided or to generate your own: ENS64_snp2gene_500kb.
-
You will need the HAPMAP plink binary files for the population most closely related to yours in order that FORGE can conduct simulations for more accurate LD correction. There are 2 wasy to run FORGE.
-
1) 2 steps – where you generate scores for each gene and then use this as input for the gene set analysis.
-
2) 1 step – the pathways are treated as one big gene. This is very computationally intensive! FORGE 1.1 Gene Pvalues
-
This is an example of a script for generating scores for each gene. For scz dataset, for chromosome 1 genes, distance_five_prime=35kb, distance_three_prime=10kb
perl forge.pl -bfile hapmap_CEU_r23a_filtered -assoc scz.info_80.snp.pvalue -out scz.gene_pvalues -gene_type protein_coding -chr 1 -snpmap ENS64_snp2gene.500kb.chr.1 -mnd -mnd_max 100000 -gc_correction -distance_five_prime 35 -distance_three_prime 10Once this is done per chromosome (or however you wish to split the data and the SNP mapping files) concatenate all the files to make one big set of results/per disease, containing the ENS id followed by p-value. This is used as input for the next step – the gene set analysis.
FORGE 1.2 Gene set analysis
Gene set analysis following step 1.1. on KEGG PATHWAY FOR SCZ
perl gsa.pl -gmt KEGG.gmt -file scz.gene_pvalues -out scz.KEGG -min_size 2 -max_size 200 -gc_correction -mnd -mnd_n 100000FORGE 2.0 Gene set analysis, one step, treating each pathwasy as one big gene
# *.gmt corresponds to the file for KEGG
# -snpmap *file* NOTE THIS HAS TO BE THE ALL CHROMOSOMES FILE
perl forge.pl -bfile hapmap_CEU_r23a_filtered -assoc scz.info_80.snp.pvalue -gene_type gene_set -gmt KEGG.gmt -out scz.KEGG -snpmap ENS64_snp2gene.500kb.chr_all -mnd -mnd_max 100000 -gc_correction -gmt_min_size 2 -gmt_max_size 200 -distance_five_prime 35 -distance_three_prime 10If you wish to contact us regarding bugs, enquiries on how to use it, perform development on this software or collaborations, please send and e-mail to Inti Pedroso: intipedroso <at> gmail <dot> com or Gerome Breen: gerome.breen <at> kcl <dot> ac <dot> uk
-
To download the FORGE software as [[.tar.gz|https://github.com/inti/FORGE/tarball/master]] or [[.zip|https://github.com/inti/FORGE/zipball/master]] file. You can also clone the software repository with GIT. Assuming you have GIT installed in you system you can download the most updated version of the code into a folder called FORGE by doing:
$ git clone -b master git://github.com/inti/FORGE.git
-
To update the code later go the the same directory and do
$ git pull
-
-
Download the SNP-to-Gene mapping based on the Ensembl human genome annotation:
-
Download the perl script to perform a new SNP-to-Gene mapping and check [[our tutorial|Tutorials-1:-Make-snp-to-gene-mapping]].
-
[[Ensembl data base version 59|https://compbio.brc.iop.kcl.ac.uk:8443/forgeweb/SNPtoGeneAnnotation.tar.gz]] ( aprox 6.8 Gb )
-
-
Download the gsa.pl script to perform the gene-set enrichment analysis. Files are available as [[.tar.gz|https://github.com/inti/GeneSetAnalysis/tarball/master]] or [[.zip|https://github.com/inti/GeneSetAnalysis/zipball/master]]. Read [[our tutorial|Tutorials-4:-Gene-Set-Analyses]] on GSA analysis with the FORGE gene p-values and check the [[GSA software wiki page|https://github.com/inti/GeneSetAnalysis/wiki]] for more complete documentation.
-
[[Calculate_add_LFDR_values.R|https://github.com/inti/FORGE/blob/master/Calculate_add_LFDR_values.R]] for the R script to add Local FDR values to the FORGE output. The script is distributed as part of the FORGE folder. See the section on our [[Running FORGE|Tutorials-2:-Running-FORGE]] tutorial for instructions.
Check our [[development branch|https://github.com/inti/FORGE/tree/dev]], download it and start hacking. We highly encourage you to use GIT for your FORGE development and fully test your code pushing it to the master branch. Please contact us if you which to coordinate your development with us.
-
Inti Pedroso, Anbarasu Lourdusamy, Marcella Rietschel, Markus M Nöthen, Sven Cichon, Peter McGuffin, Ammar Al-Chalabi, Michael R. Barnes, Gerome Breen. Common genetic variants and gene-expression changes associated with bipolar disorder are over-represented in brain signaling pathway genes. Biological Psychiatry. (2012)[[Article|http://dx.doi.org/10.1016/j.biopsych.2011.12.031]]
-
Inti Inal Pedroso, Michael R Barnes, Anbarasu Lourdusamy, Ammar Al-Chalabi, Gerome Breen. FORGE: multivariate calculation of gene-wide p-values from Genome-Wide Association Studies Authors and Affiliations. [[Article|http://biorxiv.org/content/early/2015/07/31/023648]]
-
Furney, S J, Simmons, A, Breen, G, Pedroso, I, Lunnon, K, Proitsi, P, Hodges, A, Powell, J, Wahlund, L-O, Kloszewska, I, Mecocci, P, Soininen, H, Tsolaki, M, Vellas, B, Spenger, C, Lathrop, M, Shen, L, Kim, S, Saykin, A J, Weiner, M W, Lovestone, S on behalf of the Alzheimer’s Disease Neuroimaging Initiative and the AddNeuroMed Consortium. Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer’s disease. Mol Psychiatry. 2010 [[Article|http://dx.doi.org/10.1038/mp.2010.123]]
-
Wang K, Li M, Hakonarson H (2010) Analysing biological pathways in genome-wide association studies. Nat Rev Genet 11: 843-854.
-
Pedroso I: Gaining a pathway insight into genetic association data. Edited by Breen G, Barnes MR. Clifton N.J., USA, Humana Press, 2010, pp 373–382
-
Pedroso I, Breen G: Gene Set Analysis and Network Analysis for Genome-Wide Association Studies. Edited by Al-Chalabi A, Almasy L. Cold Spring Harbor Laboratory Press, 2009,
-
Kim SY, Volsky DJ. PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics. 2005;6:144.
-
Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002, 18 Suppl 1(S233-40).
-
Human Protein Reference Dabatase: [[Website|http://www.hprd.org]]
-
Keshava et al: Human Protein Reference Database—2009 update. Nucleic Acids Res 2009;37:D767–72
-
-
Ensembl API: [[Website|http://www.ensembl.org/info/docs/api/index.html]]
-
Rios D, McLaren WM, Chen Y, Birney E, Stabenau A, Flicek P, Cunningham F: A database and API for variation, dense genotyping and resequencing data. BMC Bioinformatics 2010, 11(1):238.
-