10  Adding data to an R package

We want examples - and we want to show our data! So let’s add the raw education 2014 data to the package.

To do this, we’re going to navigate back to the data-raw/DATASET.R file.

This file is effectively a data cleaning script - in our case, we’re just going to read in the data, then use usethis::use_data() on it to export it. Then we will document the data.

10.1 Reading and exporting the data

Currently, our DATASET.R file looks like this:

## code to prepare `DATASET` dataset goes here

usethis::use_data(DATASET, overwrite = TRUE)

Let’s read in the data and then export it, so it will look like this:

## 2015-04-10
library(readr)
raw_education_2014 <- read_csv("data-raw/raw_education_2014.csv")

usethis::use_data(raw_education_2014, overwrite = TRUE, compress = "xz")

(compress = "xz" is a very efficient storage format)

Once again, usethis gives us a nice chatty response:

✔ Setting active project to
  "/Users/nick/github/njtierney/learned".
✔ Adding R to Depends field in DESCRIPTION.
✔ Creating data/.
✔ Setting LazyData to "true" in DESCRIPTION.
✔ Saving "raw_education_2014" to
  "data/raw_education_2014.rda".
☐ Document your data (see
  <https://r-pkgs.org/data.html>).

With a little note about documenting your data!

Your turn
  1. Export the 2014 data, as described above

See this commit for a demonstration

To finish this off, we need to document the data. This looks like us creating a new R file, that contains some information like this:

#' ABS Education data
#'
#' Some education data used for practicing building R packages.
#'   Each row represents the number of people studying in a given age group in 
#'   a given state.
#'
#' @format ## `raw_education_2014`
#' A data frame with 72 rows and 6 columns:
#' \describe{
#'   \item{year}{The year of data - 2014}
#'   \item{state_territory}{One of 8 states or territories}
#'   \item{age_group}{age groups from 15 to 44}
#'   \item{n_studying}{how many people are studying}
#'   \item{population}{population in that state in that}
#'   \item{prop_studying}{proportion studying}
#' }
#' @source <https://www.who.int/teams/global-tuberculosis-programme/data>
"raw_education_2014"

I usually save this in R under something like data-<NAME>

Your turn
  1. Document your data
  2. Run document()

See this commit for what this looks like.

10.2 Improve our examples with our data

Now that we have some data easily accessible, we can go back and fix our examples in documentation!

For example, in boxplot_study_state.R we can change

#' @examples
#' # no example data yet

To

#' @examples
#' boxplot_study_state(raw_education_2014)

See this commit

Your turn
  1. Add the example data to all functions in learned
  2. run document()

See this commit to see what this looks like.

10.3 A note on writing documentation and examples

Good examples, and good documentation really help the user experience. Remember that the user includes you (You are always collaborating with your future self!). It’s worthwhile trying to put yourself in the shoes of a new learner, and err on the side of slightly over explaining things.

It’s also worth noting that adding examples to data documentation is a great idea as it helps people understand more about the data.

Extension

What would you change about the documentation to make it more friendly to new users?

10.4 Next up

We’re getting close now - let’s see if we can pass checks!