-
Notifications
You must be signed in to change notification settings - Fork 10
Expand file tree
/
Copy pathinstructions.Rmd
More file actions
139 lines (92 loc) · 9.8 KB
/
instructions.Rmd
File metadata and controls
139 lines (92 loc) · 9.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Instructions for primer.tutorials"
output: html_document
---
If you have never worked on primer.data before, start by skimming [*R Packages*](https://r-pkgs.org/), but read closely the [data section](https://r-pkgs.org/data.html). Also, make sure to install the **usethis** and **pkgdown** packages.
### Creating New Data Sets
The basic process of creating new data sets is always the same. We get a raw data set, clean it up, and finally create a help page with details for users. Follow these steps:
* Get a raw data set and and put it into /data-raw. If it is too large to push
it to Github, compress the file.
* Create a .R file in /data_raw and name it "make_" + the name of the final
data set. This will be the script we use to prepare the raw data. Look at
make_nominate.R to see how these files should look like. The general structure
is
- Loading any required packages
- Comments on the data, including a link to the source
- Your code, which cleans up the data and saves the result in an object
- Some logical tests to make sure everything is OK after future changes
- Saving the data set using this code:
```{r}
name_of_final_dataset <- x
usethis::use_data(name_of_final_dataset, overwrite = TRUE)
```
The clean data set should have been saved in /data. Check if you see an .rda
file with the name of your final data set.
* Create a .R file in /R, naming it after your final data set. This will be
used to create the help page. We don't bother with learning the syntax, so
just use the structure of nominate.R as a template and adjust it to your
needs. Note that the last line must always be "name_of_final_dataset". If
you're satisfied, go to "Build"->"More" and click "Document". This command
will use your .R file to create an .Rd file within the /man directory. This
.Rd file is what actually generates the help page. If it's there, you're done.
A few things to consider when following the above steps:
* The name of the final data set should appear in the name of both key files
you're creating. For nominate, we would have make_nominate.R (in /data-raw)
and nominate.R (in /R). We also call any article which is associated with the
data, and placed in inst/papers, "nominate.pdf".
* Think hard about variable type.
- No ordered factors. If something has no natural ordering, make it character.
If it does have an ordering, make it a factor and give the levels the ordering
you want.
- Binary variables should generally be parsed as character variables, e.g.
"Yes"/"No". Only make it 0/1 if we expect people to use it as an outcome
variable.
- Document your reasoning in your make_dataset.R script. Examine the choices
made in make_mail.R.
### Setting Up Github Pages
Github allows you to create web pages with a username.github.io/repo URL. As an example, the website for this package is accessible at ppbds.github.io/primer.data. (Our "user" here is an organisation called PPBDS.)
The first step is to create a branch for your repo named "gh-pages". This cannot be done manually, but Github will do it for us if we activate automated checking with Github Actions. To do this, create a folder named ".github" in your root directory, and within this folder, create another one named "workflows". Go to "workflows" and create a file named "R-CMD-check.yaml". Go to https://github.com/ppbds/primer.data/blob/master/.github/workflows/R-CMD-check.yaml, copy the content you see, and paste into the empty .yaml file you just created. Save, commit, and push to Github. If everything went well, Github Actions should have been activated, and Github has automatically created a branch named "gh-pages" in your repo.
We can now activate Github Pages. First, go to your repo on Github. Go to "Settings" and scroll down to the "Github Pages" section. Click on the button that says "None" and select "gh-pages". Optionally, you can then select a folder within that repository. The options are either /(root), which is simply the root directory, or /docs, where Github will look for a folder called "docs" within the selected repo. If you don't already have a "docs" folder, just stick with /(root). Once you have selected a branch and folder, click the "Save" button. You should then see a grey bar in the Github Pages section with the message "Your site is ready to be published at https://username.github.io/repo". This means that the site has not yet been published. Wait a few minutes, refresh the page, and check again. The bar should now be green with the message "Your site is published at https://username.github.io". The page can now be accessed by anyone. Once we change files in the repo that affect the web page's appearance, Github Pages automatically updates the page. Until the page has been updated, which can indeed take some time, the settings will again show a grey bar with "Your site is ready to be published at https://username.github.io/repo" again. Once the bar turns green, the website will show the new version.
### Webpage Maintenance
Setting up Github Pages is only the first step. Since our webpage can only show what is in the repo, there are a few things to consider when trying to make changes.
The webpage consist of multiple tabs, each of which is generated from a different file. The homepage is created from README.md. If you want to update it, you must change something in README.Rmd and then knit the file to generate the corresponding .md file. The "References" tab is generated from the .Rd files that create the help pages. Any changes in the help pages are also reflected in "References". The "Changelog" tab comes from NEWS.md, and unlike for README.md, there is no .Rmd file behind it. While we could add one, there is not much content in there, so directly editing the .md is usually faster. There are a bunch of other files that impact how different parts of the page look like, but we typically do not edit them. The only exception is the package version, which can be changed through the DESCRIPTION file. This process is explained in the next section below.
Note that there is a function, `pkgdown::build_site()`, which uses the files in your local repo to create the complete webpage as it would look like. This is only to confirm that everything looks OK before you push it, but it is not necessary for the new page to show up on Github Pages. (Earlier, we though it is.)
### Package Versions
This section provides an overview of how to name package versions (see also https://r-pkgs.org/release.html#release-version). The development state of each package has a unique numbering such as "Version 1.2.3.9000". The version name usually consists of three or four digits, each of which stands for a specific type of change:
* "Major releases” are indicated by the first digit, or “1” in our example
above. This number is incremented if we make changes to the package that are
not backward compatible and may affect many people. An example would be if we
delete an entire data set or remove variables from an existing data set. In
other words, things that cause major problems in someone's code who is already
using our package - try to avoid such changes whenever possible!
* "Minor releases” are indicated by the second digit, or “2” in our example.
This number is incremented when we make bug fixes, add new features or make
backward-compatible changes. This is the most common type of change, for
instance when we add new data sets.
* "Patches” are indicated by the third digit, or “3” in our example. This
number is incremented if we fix bugs without adding relevant new features,
e.g. when making changes to help files.
* In-development packages, i.e. packages which are not yet released on CRAN
(like ours), also have a fourth digit. The default value is usually set to
9000, as in the example above. This number is only incremented when we add
important features that another in-development package should depend on. Since
this is unlikely to be the case, we can leave the fourth digit at 9000 as long
as primer.data has not been published on CRAN (from then on only the first
three digits will be used).
* No other digit, but a legitimate question is also: When do we change from
0.a.b to 1.0.0? The first digit should be set from 0 to 1 when our package is
"feature complete". This could be the case if our package contains all
necessary data sets a student needs in GOV 50. It could also mean that the
package is in a state to be released on CRAN (or both). In the end, it's up to
us to answer this question. Furthermore, note that as long as the first digit
is still 0, non-backward compatible changes will be recorded by incrementing
the second digit (and not the first, as described above).
Once we have made some changes and agreed on a proper name for the new version, we need to change the old name. First, we have to open the DESCRIPTION file and manually enter the correct version. Save the changes and run build_site() in the console the see how it looks like. The new name of the version should now show on top of the homepage. Next, update the changelog which is generated by the NEWS.md file. Simply open the file and orientate yourself on the existing content to add a new entry. If everything looks right, save the changes and use build_site() again to see the result. While there is no rule on this, most packages mention the fourth digit only in the DESCRIPTION file and not in NEWS.md. (I follow this convention as well.) Whenever new changes are made, simply repeat the described process.
# Setting up a dev branch
Follow these steps:
- Have a look at your branches: git branch -a
- create local dev branch: git checkout -b dev
- change something
- stage the change: git add .
- commit the change: git commit -m "test"
- create remote branch: git push -u origin develop