stepcount/README.Rmd at main · jhuwit/stepcount · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  cache = TRUE
)
stepcount::unset_reticulate_python()
stepcount::use_stepcount_condaenv()
library(stepcount)
library(reticulate)
options(digits.secs = 3)
```

# stepcount

<!-- badges: start -->
[![R-CMD-check](https://github.com/jhuwit/stepcount/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jhuwit/stepcount/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/jhuwit/stepcount/graph/badge.svg)](https://app.codecov.io/gh/jhuwit/stepcount)
<!-- badges: end -->

The goal of `stepcount` is to wrap up the https://github.com/OxWearables/stepcount algorithm.

# Installation

## Install `stepcount` Python Module

See https://github.com/OxWearables/stepcount?tab=readme-ov-file#install for how to install the `stepcount` python module.

In `R`, you can do this via:

```{r, eval = FALSE}
envname = "stepcountblah"
reticulate::conda_create(envname = envname, packages = c("python=3.9", "openjdk", "pip"))
Sys.unsetenv("RETICULATE_PYTHON")
reticulate::use_condaenv(envname)
reticulate::py_install("stepcount", envname = envname, method = "conda", pip = TRUE)
```

Once this is finished, you should be able to check this via:

```{r, eval = FALSE}
stepcount::have_stepcount_condaenv()
stepcount::have_stepcount()
```

In some cases, you ay want to set `RETICULATE_PYTHON` variable:

```{r, eval = FALSE}
stepcount::unset_reticulate_python()
clist = reticulate::conda_list()
Sys.setenv(RETICULATE_PYTHON = clist$python[clist$name == "stepcount"])
stepcount::use_stepcount_condaenv()
```


### Issues
If you are using the Random Forest model from `stepcount`, you may need `hmmlearn<0.3.0` due to some issues with its new implementation of its models as described https://github.com/OxWearables/stepcount/issues/62 (Feb 2024).

## Install `stepcount` R Package

You can install the development version of `stepcount` from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("jhuwit/stepcount")
```


# Usage

## Loading the `stepcount` conda environment
In order to use the `stepcount` conda environment that OxWearables recommends, you must run this command **before `reticulate`** is loaded:
```{r eval = FALSE}
stepcount::use_stepcount_condaenv()
```

## The `RETICULATE_PYTHON` environment variable
If you have the `RETICULATE_PYTHON` environment variable in your `.Renviron` or your `PATH`, then `reticulate` will still use that version of Python and the code will likely not work.  The `unset_reticulate_python()` function will unset that environment variable.  So the usage would start with something like:

```{r eval = FALSE}
stepcount::unset_reticulate_python()
stepcount::use_stepcount_condaenv()
```

and if you need `reticulate`, you would load it after

```{r eval = FALSE}
stepcount::unset_reticulate_python()
stepcount::use_stepcount_condaenv()
library(reticulate)
```

The `stepcount_check` function can determine if the `stepcount` module can be loaded
```{r stepcount_check}
stepcount::stepcount_check()
```

## Running `stepcount` (file)
The main function is `stepcount::stepcount`, which takes can take in a file directly:

```{r run_stepcount_file}
library(stepcount)
library(dplyr)
library(ggplot2)
library(tidyr)
file = system.file("extdata/P30_wrist100.csv.gz", package = "stepcount")
if (stepcount_check()) {
  out = stepcount(file = file)
}
```

Let's see inside the output, which is a list of values, namely a `data.frame` of `steps` with the time (in 10s increments) and the number of steps in those 10 seconds, a `data.frame` named `walking` which has indicators for if there is walking within that 10 second period:

```{r stepcount_output}
names(out)
str(out)
head(out$steps)
tail(out$steps)
tail(out$walking)
```

The `step_times` `data.frame` indicates which times are when steps occurred (at the original sample rate).  Make sure you have the `digits.secs` option set to see the sub-seconds for the times (esp for writing out files in `readr::write_csv`):

```{r stepcount_output_steptimes}
head(out$step_times)
options(digits.secs = 3)
head(out$step_times)
```

We can plot a portion of the tri-axial data and show where the steps were indicated:
```{r stepcount_plot}
df = readr::read_csv(file)
if (stepcount_check()) {
  st = out$step_times
    dat = df[10000:12000,] %>%
      dplyr::select(-annotation) %>%
      tidyr::gather(axis, value, -time)
    st = st %>%
      dplyr::mutate(time = lubridate::as_datetime(time)) %>%
      dplyr::as_tibble()
    st = st %>%
      dplyr::filter(time >= min(dat$time) & time <= max(dat$time))
    dat %>%
      ggplot2::ggplot(ggplot2::aes(x = time, y = value, colour = axis)) +
      ggplot2::geom_line() +
      ggplot2::geom_vline(data = st, ggplot2::aes(xintercept = time))
}
```

The main caveat is that `stepcount` is very precise in the format of the data, primarily it must have the columns `time`, `x`, `y`, and `z` in the data.

## Running `stepcount` (data frame)
Alternatively, you can pass out a `data.frame`, rename the columns to what you need them to be and then run `stepcount` on that:

```{r run_stepcount_df}
head(df)
out_df = stepcount(file = df)
```

Which gives same output for this data:

```{r stepcount_file_df_check, dependson=c("run_stepcount_file", "run_stepcount_df")}
all.equal(out[c("steps", "walking", "step_times")],
          out_df[c("steps", "walking", "step_times")])
```


## Running `stepcount` on multiple files
When you pass in multiple files, `stepcount` will run all of them, but it will only load the model once, which can have savings, but the results are still in memory:
```{r multifile}
if (stepcount_check()) {
  out2 = stepcount(file = c(file, file))
  length(out2)
  names(out2)
  # all.equal(out[c("steps", "walking", "step_times")],
  #           out2[[1]][c("steps", "walking", "step_times")])
}
```


## Stepcount Random Forest
The `model_type` parameter indicates the type of model being run, and the `rf` will provide the predictions from a random forest

```{r run_stepcount_rf, eval = FALSE}
if (stepcount_check()) {
  out_rf = stepcount(file = file, model_type = "rf")
}
```