Correlations of Columns >> “Corr” + “Col” >> “Coracle”
Install coracle from the raykrajci/coracle repository on
GitHub with devtools or your package manager of
choice:
devtools::install_github("raykrajci/coracle")
library(coracle)The main functions of coracle is corr_col() which does a pairwise
correlation between all numeric columns of two data frames. It returns a
named list with data frames of correlation values, p values, and a list
of metadata.
For example:
df1 <- data.frame(i_1 = letters[1:5],
up_1 = 1:5,
down_1 = 5:1,
random_1 = runif(5))
df2 <- data.frame(i_2 = letters[1:5],
up_2 = 1:5,
down_2 = 5:1,
random_2= runif(5))
results <- coracle::corr_col(x = df1, y = df2)
results$rho
#> # A tibble: 3 × 4
#> x up_2 down_2 random_2
#> <chr> <dbl> <dbl> <dbl>
#> 1 up_1 1 -1 -0.2
#> 2 down_1 -1 1 0.2
#> 3 random_1 -0.1 0.1 0.6
results$p
#> # A tibble: 3 × 4
#> x up_2 down_2 random_2
#> <chr> <dbl> <dbl> <dbl>
#> 1 up_1 3.97e-24 1.12e-23 0.747
#> 2 down_1 1.12e-23 3.97e-24 0.747
#> 3 random_1 8.73e- 1 8.73e- 1 0.285
results$meta
#> $version
#> [1] "3.2.1"
#>
#> $execution_time
#> Time difference of 0.03326797 secs
#>
#> $options
#> $options$x_join
#> NULL
#>
#> $options$y_join
#> NULL
#>
#> $options$x_labl
#> NULL
#>
#> $options$y_labl
#> NULL
#>
#> $options$progress
#> NULLObserve:
- The function assumes the column used to join values from the inputs is
in the first position. This can be overridden with the
optionsparameter discussed below. - The first input (
x) becomes a column of values while the second (y) become column names. This convention may be useful to streamline subsequent correlations. - The
metalist contains useful information for understanding and/or debugging the output.
The corr_col() function accepts a named list as an options
parameter. These values may be used to specify which columns from x
and y to use for joining, label outputs, or report on progress.
For example:
results_w_options <- coracle::corr_col(x = df1,
y = df2,
options = list(
x_join = "up_1",
y_join = "down_2",
x_labl = "x_data",
y_labl = "y_data",
progress = TRUE
))
results_w_options$rho
#> # A tibble: 3 × 4
#> x_data up_2 down_2 random_2
#> <chr> <dbl> <dbl> <dbl>
#> 1 up_1 -1 1 0.2
#> 2 down_1 1 -1 -0.2
#> 3 random_1 0.1 -0.1 -0.4
results_w_options$p
#> # A tibble: 3 × 4
#> x_data up_2 down_2 random_2
#> <chr> <dbl> <dbl> <dbl>
#> 1 up_1 1.12e-23 3.97e-24 0.747
#> 2 down_1 3.97e-24 1.12e-23 0.747
#> 3 random_1 8.73e- 1 8.73e- 1 0.505
results_w_options$meta
#> $version
#> [1] "3.2.1"
#>
#> $execution_time
#> Time difference of 0.0190711 secs
#>
#> $options
#> $options$x_join
#> [1] "up_1"
#>
#> $options$y_join
#> [1] "down_2"
#>
#> $options$x_labl
#> [1] "x_data"
#>
#> $options$y_labl
#> [1] "y_data"
#>
#> $options$progress
#> [1] TRUEObserve:
- By joining the “up” column from
xand the “down” column fromythe correlation results have inverted (i.e. the “up” columns have a negative correlation with the “down” columns) due to the join specifications inoptions. - The first column is now named “x_data”, as specified in
options. The label for the columns (fromy) may be found in themetalist. - The
metalist contains theoptionsinput for understanding the input and debugging.
