6  Motivation

We’ve gone through a lot of setup, and now we’re going to start building an R package. Soon. But we need to have some motivation, first. It involves a bit of a story, and a bit of imagination.

6.1 Overview

  • Teaching 30 minutes
  • Exercises 10 minutes

6.2 Questions

  • How to convert code into functions?

6.3 Objectives

  • Start to wrangle with a script to turn it into functions

6.4 How this works

One of my big goals with teaching functions, and with teaching R packages, is that I want the examples to be somewhat rooted in the familiar and the real. There are really useful toy examples of writing packages that deliver praise (e.g., ones I’ve used to teach R packages in the past: https://github.com/njtierney/praiseme), or do simple conversions between units (celcius to farenheit being a very common example).

These examples are useful because they teach you the tools, and the process. However in this course, I want to focus on a bit more than this and incorporate the process of turning code into functions. I think this is important, because it more closely represents other examples we come across in using R, and presents a bit of a richer learning journey, because in addition to learning about the tools and the process of R package building, you will also learn:

  • How to think about converting scripts to functions
  • How to write better functions

I have written up some example code, which starts as a quarto document. We are going to take this document, and then eventually turn it into an R package.

The structure of this exercise has taken inspiration from “The package within” chapter from R Packages.

6.5 The example: “learned”

We are going to be looking at a role-play situation where we imagine we are at some fictional workplace, where part of our job is to look at education data that we have acquired from some source. The overall goal of our job here is to produce some key outputs from some data.

You can see this example at: https://github.com/njtierney/learned

To download it, run the following code

library(usethis)
use_course("njtierney/learned")
Your turn
  1. Download the repository using the code above
  2. Render the document, “analysis2014.qmd”
  3. Read over the document, thinking about what we discussed in Why functions.
  4. Identify some potential problems with the code
  5. Think about what might happen if we want to read in data from 2015 (or later years), how would you like to do this?

6.6 Discussion of potential problems

After you’ve taken from time to think about some of the potential problems, open the box below

  • Copying and pasting a document could lead to errors
  • What if the data changes?
  • What if other people collaborate on this project? How do they have the source of truth?
  • Is there a way to formalise this all?

6.7 Identifying the report outputs

To get us started with some key things, let’s think about what the key outputs of this report are.

Your turn
  1. Identify the key outputs of the report
  2. Pick one of those key outputs and start to write out a function for it

They key outputs are related to the “Produce a …” steps of the document:

  1. Produce a plot of the proportion of people educated in each age group in each state
  2. Produce a box plot of proportion of people educated for each state.
  3. Produce a table of The 5 number summary (min, 1st quantile, median, 3rd quantile, max) of proportion of people educated for each state.

So now we know where we are headed - we want to write some functions that produce these plots and tables.

However, the main problem that we encountered was that there was actually a bit of data cleaning that needed to happen before we did this. Let’s focus on cleaning up and rearranging the quarto document first to identify the data cleaning steps required.

Your turn
  1. Open, “alpha-analysis2014.qmd”
  2. Move all the “data quality” checks into a new section called “data quality”
  3. Move all of the data cleaning code up to the top, so we just work with one data set, named tidy_age_state_education_2014
  4. Create two functions to clean the data:
    1. tidy the age groups
    2. remove the missing values
  5. Put these two functions into another function that does the data cleaning

6.8 Discussion of data cleaning function implementations

  • Discuss “solutions-alpha-analysis2014.qmd”

6.9 Applying functions to give the outputs

We’ve got some data cleaning functions! Now let’s see if we can capture the intention of the plots, and wrap these up into functions, too.

The overall idea here is that we can capture the overall intention of what we are doing in a concise way, that is easy to reason with. That is not to say that it is just smaller, but being easy to reason with is key here.

Your turn
  1. Open, “bravo-analysis2014.qmd”
  2. Write a function for each of the following:
    1. Bar plot of proportion of people educated by state and age
    2. Boxplot of proportion of people educated for each state
    3. Table of 5 number summary of proportion of people educated for each state
  3. Move all these functions to the top of the document into a single code chunk labelled “functions”
  4. Move the “data checks” to the end of the document. We will come back to this later
  5. Make a “libraries” code chunk and put the library call in there

6.10 Discussion of plot and table, rearranging functions

  • What do we notice now?

6.11 Moving towards a quasi-package.

We’re slowly isolating the parts of the code that we care about, and now we’re going to make an incremental change again - to move all of the functions out to an R folder, with one function per R file.

For those who are more familiar with R Packages, this might start to look a lot more like an R package - while this isn’t the “standard” process, the intention here is to demonstrate the key changes:

  • Functions: Identifying points of expression / abstraction
  • Clearly expressing functions
  • Using functions to clearly articulate your work
‘ergonomic’ / interactive helper packages

To do package development, we’re going to be using packages like devtools and usethis. These packages are things we use in the console - we use them interactively. They help us automate a lot of things, so we can focus on the task at hand.

You can be almost guaranteed that we will use any functions from devtools and usethis in the console. It’s OK for that to feel strange! It will feel better soon.

Your turn
  1. Open “charlie-analysis2014.qmd”
  2. Use use_r() from devtools/usethis to create a separate R file for each of the R functions:
    1. clean_age_groups
    2. clean_education_data
    3. plot_study_age_state
    4. boxplot_study_state
    5. summarise_prop_study
  3. load all of these R functions, by calling source at the top of the quarto document

6.12 Discussion

Now, let’s move to making the R package!