tl;dr: all workshop materials are available here:
https://rstd.io/conf20-tidy-tools
License: CC BY-SA 4.0
It was my pleasure to co-teach Building Tidy Tools
at rstudio::conf(2020)
with my brother Hadley.
This workshop was advertised for
… those who have embraced the tidyverse and now want to expand it to meet their own needs.
In practice, this means we talk a lot about package development, since packages are the easiest way to develop (and share) a set of related functions, but we also touch on workflow, design, and specific implementation details.
In this post, I’ll outline a few things that aren’t as obvious from just browsing the materials: the philosophy and themes that guide the entire workshop, as well as some of the details of delivering the materials.
Philosophy and Themes
This piece of advice shows up in the workshop when talking about
R CMD check
(which runs a set of automated package checks):
“If it hurts, do it more often” – Martin Fowler
But I feel like it sums up the philosophy of the entire workshop, with perhaps an addendum:
“If it hurts, there is probably a usethis or devtools function for it.
If it still hurts, do it more often”
In the case of R CMD check
,
the devtools check()
function is only a keyboard shortcut away,
so there is no excuse for not running it.
But, it still hurts,
because check()
is always pointing out what is wrong with your package.
Running check()
often, not only
reduces the chance you’ve introduced many new problems,
it also desensitizes you to any feelings of failure you might get
when you receive its report card on your package.
To help participants overcome these feelings we employ two strategies:
-
We demonstrate how to treat
check()
more like a todo list, than a report card. I think it’s useful to see even Hadley doesn’t rely only on his memory to build a package from scratch, he does what he remembers, then figures out what he forgot by runningcheck()
. -
We get participants used to running
check()
often, by setting the goal of acheck()
with no errors, warnings or notes in many of the your turn exercises.
Another example of something that can hurt
is getting a package started.
This is where usethis and devtools really shine!
In the first session of the workshop we
cover usethis::create_package()
, usethis::use_r()
,
devtools::load_all()
, devtools::check()
, devtools::document()
, usethis::use_mit_license()
and devtools::install()
.
And that’s enough to have a fully functioning package
before the morning coffee break!
It takes about 20mins to create a package from scratch using this workflow for the first time, but participants then practice this workflow, adding a few extra steps, throughout the two days. By the end of the workshop they have built 3-5 packages from scratch, and hopefully find it a pain-free experience.
These are both examples of one of the big themes in the workshop:
Workflow.
Embracing certain workflows when you switch to package development
leads to big payoffs.
Some of these payoffs are short term.
For example, following the package convention of
putting functions in script files
inside an R/
folder,
along with loading them with devtools::load_all()
,
is immediately much more streamlined and less error prone,
than sourcing functions individually with keyboard shortcuts, or source()
.
Other payoffs are long term, for example, writing tests adds work up front (although usethis helps a lot with the initial setup) but the long term payoff is huge: a package that is more reliable, easier to maintain, and easier to extend.
The other two big themes in the workshop are Interface and Implementation:
-
Interface describes the user facing side of your package. Your decisions about interface govern how easy users find your package to learn and use. One of the core values of the packages in the tidyverse is to “Design for humans”. In the workshop, we work through some case studies to elucidate what this value means and how to identify areas for improvement in your own packages.
-
Implementation describes the code inside your package. In the workshop we pick just a couple of topics to delve into, those we think are the most useful to know about:
- Dependencies: how and when to rely on code in other packages
- Using the tidyverse inside packages: the common problems that you run into when using tidyverse functions inside your own functions, and their solutions.
- The S3 object oriented system: why and how you might use it in your own packages.
Teaching Materials
We used the breaks to split the workshop into 8 sessions. Each session, of 1.5 hours, is a different topic, except for Interface which took up two sessions for a total of 3 hours.
Day | Session | Taught by | Theme | Materials |
---|---|---|---|---|
1 |
The Whole Game |
Charlotte |
Workflow |
|
1 |
Testing |
Charlotte |
Workflow |
|
1 |
Documentation / Sharing |
Charlotte |
Workflow |
|
1 |
Dependencies |
Hadley |
Implementation |
|
2 |
Using the Tidyverse |
Charlotte |
Implementation |
|
2 |
Interface |
Hadley |
Interface |
|
2 |
OO programming / S3 |
Charlotte |
Implementation |
You might notice some numbering inconsistencies —
3-dependencies.Rmd
is presented after 3-sharing.pdf
and there is no 6-
… 🤷.
The first three sections have slide decks (created in Apple Keynote),
but the other sections have an R Markdown file that acts as a script
(in script/
)
and another R Markdown file that records what was actually typed
(in notes/
).
Hadley and I refer to the script as we teach (usually open on another device like a phone or ipad), but create and update the notes version as the session progresses, and commit it to GitHub regularly. Participants would have their web browser pointing at the notes version on GitHub, refreshing when needed to grab any code required for the “Your Turn” exercises.
The script approach encourages live coding and allows for some digressions if participants raise specific questions. From an instructor perspective, it’s nice to have a record of both what you planned to teach and what you actually taught — you can look back and see how the timing worked, or questions participants raised.
The early sections (the “Whole Game”, Testing and Documentation) focus more on workflow and are harder to document in a live code script, since unlike code for scripts or functions, much of the code is run on the console and involves context switching — switching to a new RStudio session, switching to a script file that has opened in the Source Editor, or getting something running in the “Build” pane. So, I think keeping these initial sections as slides worked in this case.
Thanks
Our workshop wouldn’t have been as successful (or as fun!) without our fantastic teaching assistants: Julia Blum, Ingrid Rodriguez and François Michonneau.
These materials have been taught from and developed over a few years. A big thank you to everyone that has contributed to them! In particular, lots of credit belongs to Hadley and Jenny Bryan.