-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
197 lines (137 loc) · 8.43 KB
/
README.Rmd
File metadata and controls
197 lines (137 loc) · 8.43 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
output: github_document
bibliography: "inst/REFERENCES.bib"
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
# Please put your title here to include it in the file below.
Title <- "Title of your paper goes here"
```
# pmird
<!-- [ -->
```{r pmird_icon, echo = FALSE, eval = FALSE}
knitr::include_graphics("./man/figures/pmird_icon.svg")
```
<!--]{style="float:right"} -->
**R interface to the peatland mid infrared database**
<!-- badges: start -->
[](https://www.tidyverse.org/lifecycle/#experimental)
<!-- badges: end -->
<!-- [](https://mybinder.org/v2/gh///master?urlpath=rstudio) -->
pmird is an interface to the peatland mid infrared spectra database (pmird database) [@Teickner.2025c]. 'pmird' allows to access the pmird database as [`dm`](https://dm.cynkra.com/reference/dm.html) object.
The pmird package can be installed as follows:
```{r, eval=FALSE}
remotes::install_github("henningte/pmird")
```
The pmird database can be downloaded from Zotero [@Teickner.2025c]. The downloaded database needs to be imported in a running MariaDB instance. In a linux terminal, the downloaded sql file can be imported like so:
```{bash, eval=FALSE}
mysql -u<user> -p<password> pmird < pmird-backup-2025-04-03.sql
```
Here, <user> and <password> are the respective database user name and password.
The database itself does not contain the infrared spectra. These data are in folder `pmird_prepared_data` which needs to be stored at any place in the file system.
Once the database is set up and runs in a 'MariaDB' instance, it can be accessed from within R, using the 'RMariaDB' package [@Muller.2021a]:
```{r out-pmird-r-package-connect-db, eval=TRUE, echo=TRUE}
library(pmird)
library(RMariaDB)
library(magrittr)
library(ir)
library(quantities)
# connect to database
con <-
RMariaDB::dbConnect(
drv = RMariaDB::MariaDB(),
dbname = "pmird",
default.file = "~/my.cnf",
groups = "rs-dbi"
)
```
Here, `my.cnf` is a text file that stores user and password information for the database server.
From this on, the 'pmird' R package can be used to access the database. The 'pmird' R package makes use of the R package 'dm' [@Schieferdecker.2022] to access and manipulate the database contents. `pmird::pm_get_dm()` creates a `dm` object which stores the database structure.
```{r out-pmird-r-package-pm-get-dm, eval=TRUE, echo=TRUE}
# create the dm object
dm_pmird <- pmird::pm_get_dm(con, learn_keys = TRUE)
```
The option `learn_keys = TRUE` means that information on the primary and foreign key are added to the dm object.
A `dm` object is a representation of the entire database and allows comfortable manipulation of the database from within R (e.g. addition of new rows to tables, addition of new tables, data queries) [@Schieferdecker.2022].
## Use case: obtaining general information on the datasets contained in the 'pmird' database {-}
General information on the datasets contained in the 'pmird' database are stored in table `datasets`. The 'pmird' R package provides a function to obtain this table from the `dm` object (`pmird::pm_get_table(.table_name = "datasets)`).
```{r out-pmird-r-package-pm-get-table-datasets-1, eval=TRUE, echo=TRUE}
# extract the datasets table
pmird_datasets <-
dm_pmird %>%
pmird::pm_get_table(.table_name = "datasets")
```
This table can, for example, be used to select studies for which to extract data from the 'pmird' database.
## Use case: extracting data for a specific dataset from the 'pmird' database {-}
Assume you are interested in viewing all measured data for one specific dataset, e.g. the dataset with `id_dataset == 8` in `pmird_datasets`. Using the 'dm' package and the `dm` object representing the 'pmird' database (`dm_pmird`), this data can be obtained as follows:
```{r out-pmird-r-package-extract-data-1, eval=TRUE, echo=TRUE}
# get data for the dataset with ID 8
d8 <-
dm_pmird %>%
dm::dm_zoom_to(datasets) %>%
dm::filter(id_dataset == 8) %>%
dm::left_join(samples, by = "id_dataset") %>%
dm::left_join(data_to_samples, by = "id_sample") %>%
dm::left_join(data, by = "id_measurement") %>%
dm::left_join(mir_metadata, by = "id_measurement") %>%
dm::left_join(macrofossils, by = "id_measurement") %>%
dm::pull_tbl() %>%
tibble::as_tibble()
```
The resulting data frame (`d8`) contains information on all samples and measurements for the dataset with `id_dataset == 8`. `d8` does not yet contain any spectra, but stores only information on the respective file paths to the downloaded spectra files within the downloaded folder `pmird_prepared_data`.
To load the spectra, you can use the function `pmird::pmird_load_spectra()`. In addition, you have to specify via argument `directory` in which folder `pmird_prepared_data` is stored (on the computer used for this tutorial, this was `".."`):
```{r out-pmird-r-package-pm-load-spectra, eval=TRUE, echo=TRUE, warning=FALSE}
# load the spectra
d8 <- pmird::pm_load_spectra(d8, directory = "..")
```
`pm_load_spectra()` is a wrapper function around functions from the package 'ir' [@Teickner.2022e] which in turn are wrappers around `read.spc()` [@Beleites.2021] and `read.csv()`. `pm_load_spectra()` therefore can load spectra both saved as spc and csv files. `pm_load_spectra()` also converts `d8` into an object of class `ir` from the 'ir' package. 'ir' provides functions for spectral preprocessing and manipulation [@Teickner.2022e] and is compatible with the 'irpeat' package which provides functions to analyze peat MIRS and spectral prediction models to predict peat properties from MIRS [@Teickner.2022d].
## Use case: Handling units and measurement errors {-}
The R package 'qauntities' [@Pebesma.2016; @Ucar.2018] can be used to add units and measurement errors to measured variables from the 'pmird' database. The 'pmird' R package allows batch unit and measurement error assignment:
```{r out-pmird-r-package-workflow-quantities-1, eval=TRUE, echo=TRUE}
# add information on units and errors with the 'quantities' package
d8 <-
d8 %>%
pmird::pm_add_quantities()
# show some values
head(d8$N, 3)
```
## Use case: Generating data citations {-}
Whoever uses data from the 'pmird' database must cite, in addition to the database, the original data sources for the used datasets. To make this straightforward, the 'pmird' R package contains the function `pm_get_citations()` to generate such a citation list for any extracted data subset:
```{r, out-pmird-r-package-workflow-citations-1, eval=TRUE, echo=TRUE}
# collect all citations for `d8`:
d8_citations <-
pm_get_citations(
con = con,
x = d8$id_measurement,
file = "d8_citations.bib"
)
# close connection to database
RMariaDB::dbDisconnect(con)
```
```{r, include=FALSE}
file.remove("d8_citations.bib")
```
The function takes column `id_measurement` of the extracted data and collects citations for all relevant data sources from the database. The results are exported to a bibtex file which is defined via argument `file`. This file can be imported to literature reference software. In this case, since the data have not been previously published, the created bibtex file is empty. `RMariaDB::dbDisconnect(con)` closes the connection to the database, as this is the final use case presented here.
<!-- ### How to cite
Please cite this compendium as:
> Authors, (`r format(Sys.Date(), "%Y")`). _Compendium of R code and data for `r Title`_. Accessed `r format(Sys.Date(), "%d %b %Y")`. Online at <https://doi.org/xxx/xxx>
### How to download or install
You can download the compendium as a zip from from this URL: </archive/master.zip>
Or you can install this compendium as an R package, pmird, from GitHub with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
remotes::install_github("/")
```
-->
### Licenses
**Text and figures :** [CC-BY-4.0](http://creativecommons.org/licenses/by/4.0/)
**Code :** See the [DESCRIPTION](https://github.com/henningte/pmird/blob/master/DESCRIPTION) file
**Data :** [CC-BY-4.0](http://creativecommons.org/licenses/by/4.0/)
### Contributions
We welcome contributions from everyone. Please note that the pmird project is released with a [Contributor Code of Conduct](https://henningte.github.io/pmird//CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
### References