Skip to content

Tutorials 3: Correction for multiple testing

inti edited this page Nov 15, 2010 · 2 revisions

[[Home|Home]] | [[Previous|Tutorials-2:-Running-FORGE]] | [[Next|Tutorials-4:-Gene-Set-Analyses]]

After you have doing a whole genome analysis you will need to correct for multiple testing. Correcting for multiple testing is a sensible issue and large body of literature is devoted to that. We will not attempt to go any close to detailed explanation of it. Go to our Tutorial on multiple testing correction for more information. We will show you how to use one of our utility scripts to obtain Local False Discovery Rates for the gene p-values which can be used to correct for multiple testing.

From our Downloads page obtain the R script Calculate_add_LFDR_values.R, I will assume that you have file with enough genes (may be at least ~ 100). For this you will need R and the module locfdr installed. Please check the R website (www.r-project.org) to get everything installed. Please do refer to the locfdr documentation or references for information on the statistics behind it.

The R script has 3 options: filein, fileout and header. You must provide the options as filein=filen_in_ame, fileout=file_out_name and header=T. Please do not leave blank spaces between the option the equal sign and the value and do not use hyphen (-) before this options. If the file in has no header just omit the header option.

To run the script type

 $ R --no-save --slave filein=my_whole_genome_output.txt fileout=my_new_file_with_LFDR_values.txt < Calculate_add_LFDR_values.R
ARGUMENT 'filein=my_whole_genome_output.txt' __ignored__
ARGUMENT 'fileout=my_new_file_with_LFDR_values.txt' __ignored__
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/i386/R"
[2] "--no-save"
[3] "--slave"
[4] "filein=my_whole_genome_output.txt"
[5] "fileout=my_new_file_with_LFDR_values.txt"
assigned  /Library/Frameworks/R.framework/Resources/bin/exec/i386/R  the value of TRUE
assigned  --no-save  the value of TRUE
assigned  --slave  the value of TRUE
assigned  filein  the value of | my_whole_genome_output.txt |
assigned  fileout  the value of | my_new_file_with_LFDR_values.txt |
Loading required package: splines
Warning message:
In max(i) : no non-missing arguments to max; returning -Inf
Warning message:
In locfdr(qnorm(as.numeric(data_sidak), lower.tail = F), plot = 0) :
  f(z) misfit = 3.  Rerun with increased df
Warning message:
In max(i) : no non-missing arguments to max; returning -Inf
Warning message:
In locfdr(qnorm(as.numeric(data_galwey), lower.tail = F), plot = 0) :
  f(z) misfit = 1.5.  Rerun with increased df
NOTE: Warning messaged come from the locfdr package runs and can generally be ignored.

The fist two lines of the output file my_new_file_with_LFDR_values.txt are

 $ head my_new_file_with_LFDR_values.txt
V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	V11	V12	V13	V14	V15	V16	new_sidak	FORGE_locfdr	sidak_locfdr	galwey_locfdr
ENSG00000233547	AL158212.1	processed_transcript	10	114710405	114711634	5.675e-13	6.243e-12	8.62788643266832e-26	139.08583	8.57632	5.53276530304478e-45	253.64921	7.82026	16	11	6.243e-12	4.60742877444518e-22	2.79750261497982e-05	1.06161231147605e-22

4 new columns have been added to the right hand side of the file: new_sidak: if the Sidak corrected p-value is 0 I will use the min p-value on the gene instead. FORGE_locfdr: LFDR value for the FORGE p-value sidak_locfdr: LFDR value for the Sidak gene p-values galwey_locfdr: LFDR value for the galwey gene p-values

By now you have whole genome gene-based p-values and a multiple testing correction. Your are ready to run some pathway and network analyses.

[[Home|Home]] | [[Previous|Tutorials-2:-Running-FORGE]] | [[Next|Tutorials-4:-Gene-Set-Analyses]]

Clone this wiki locally