-
Notifications
You must be signed in to change notification settings - Fork 5
Tutorials 3: Correction for multiple testing
[[Home|Home]] | [[Previous|Tutorials-2:-Running-FORGE]] | [[Next|Tutorials-4:-Gene-Set-Analyses]]
After you have doing a whole genome analysis you will need to correct for multiple testing. Correcting for multiple testing is a sensible issue and large body of literature is devoted to that. We will not attempt to go any close to detailed explanation of it. Go to our Tutorial on multiple testing correction for more information. We will show you how to use one of our utility scripts to obtain Local False Discovery Rates for the gene p-values which can be used to correct for multiple testing.
From our Downloads page obtain the R script Calculate_add_LFDR_values.R, I will assume that you have file with enough genes (may be at least ~ 100). For this you will need R and the module locfdr installed. Please check the R website (www.r-project.org) to get everything installed. Please do refer to the locfdr documentation or references for information on the statistics behind it.
The R script has 3 options: filein, fileout and header. You must provide the options as filein=filen_in_ame, fileout=file_out_name and header=T. Please do not leave blank spaces between the option the equal sign and the value and do not use hyphen (-) before this options. If the file in has no header just omit the header option.
To run the script type
$ R --no-save --slave filein=my_whole_genome_output.txt fileout=my_new_file_with_LFDR_values.txt < Calculate_add_LFDR_values.R ARGUMENT 'filein=my_whole_genome_output.txt' __ignored__ ARGUMENT 'fileout=my_new_file_with_LFDR_values.txt' __ignored__ [1] "/Library/Frameworks/R.framework/Resources/bin/exec/i386/R" [2] "--no-save" [3] "--slave" [4] "filein=my_whole_genome_output.txt" [5] "fileout=my_new_file_with_LFDR_values.txt" assigned /Library/Frameworks/R.framework/Resources/bin/exec/i386/R the value of TRUE assigned --no-save the value of TRUE assigned --slave the value of TRUE assigned filein the value of | my_whole_genome_output.txt | assigned fileout the value of | my_new_file_with_LFDR_values.txt | Loading required package: splines Warning message: In max(i) : no non-missing arguments to max; returning -Inf Warning message: In locfdr(qnorm(as.numeric(data_sidak), lower.tail = F), plot = 0) : f(z) misfit = 3. Rerun with increased df Warning message: In max(i) : no non-missing arguments to max; returning -Inf Warning message: In locfdr(qnorm(as.numeric(data_galwey), lower.tail = F), plot = 0) : f(z) misfit = 1.5. Rerun with increased df
NOTE: Warning messaged come from the locfdr package runs and can generally be ignored.
The fist two lines of the output file my_new_file_with_LFDR_values.txt are
$ head my_new_file_with_LFDR_values.txt V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 new_sidak FORGE_locfdr sidak_locfdr galwey_locfdr ENSG00000233547 AL158212.1 processed_transcript 10 114710405 114711634 5.675e-13 6.243e-12 8.62788643266832e-26 139.08583 8.57632 5.53276530304478e-45 253.64921 7.82026 16 11 6.243e-12 4.60742877444518e-22 2.79750261497982e-05 1.06161231147605e-22
4 new columns have been added to the right hand side of the file: new_sidak: if the Sidak corrected p-value is 0 I will use the min p-value on the gene instead. FORGE_locfdr: LFDR value for the FORGE p-value sidak_locfdr: LFDR value for the Sidak gene p-values galwey_locfdr: LFDR value for the galwey gene p-values
By now you have whole genome gene-based p-values and a multiple testing correction. Your are ready to run some pathway and network analyses.
[[Home|Home]] | [[Previous|Tutorials-2:-Running-FORGE]] | [[Next|Tutorials-4:-Gene-Set-Analyses]]