5 minute read

GitHub���������������������������������������������������������������������������������������������������������������������������

Chapter 11 ■ Building an r paCkage

You cannot add documentation for data directly in the data file, so you need to put it in an R file in the R/ directory. I usually have a file called data.R that I use for documenting my package data.

Advertisement

For the bears data, my documentation looks like this:

#' Statistics for populations of bears #' #' Computed $f_4(W,X;Y,Z)$ statistics for different #' populations of bears. #' #' @format A data frame with 19 rows and 6 variables: #' \describe{ #' \item{W}{The W population} #' \item{X}{The X population} #' \item{Y}{The Y population} #' \item{Z}{The Z population} #' \item{D}{The $D$ ($f_4(W,X;Y,Z)$) statistics} #' \item{Z.value}{The blocked jacknife Z values} #' } #' #' @source \url{http://onlinelibrary.wiley.com/doi/10.1111/mec.13038/abstract} #' @name bears #' @docType data #' @keywords data NULL

The NULL after the documentation is needed because Roxygen wants an object after documentation comments, but it is the @name tag that tells it that this documentation is actually for the bears object. The @docType tells it that this is data that we are documenting.

The @source tag tells us where the data is from; if you have generated it yourself for your package, you don’t need this tag.

The @format tag is the only complicated tag here. It describes the data, which is a data frame, and it uses markup that looks very different from Roxygen markup text. The documentation used by R is actually closer to LaTeX than the formatting we have been using, and the data description reflects this.

You have to put your description inside curly brackets marked up with \description{} and inside it, you have an item per data frame column. This has the format \item{column name}{column description}.

Building an R Package

In the frame to the upper right in RStudio, you should have a tab that says Build. Select it.

Inside the tab there are three choices in the toolbar—Build & Reload, Check, and More. They all do just what they say: the first builds and (re)loads your package, the second checks it, meaning it runs unit tests if you have written any and then checks for consistency with CRAN rules, and the third gives you various other options in a drop-down menu.

You use Build & Reload to recompile your package when you make changes to it. It loads all your R code (and various other things) to build the package and then it installs it and reloads it into your terminal so you can test the new functionality.

A package you have built and installed this way can also be used in other projects afterward.

When you have to send a package to someone, you can make a source package in the More drop-down menu. It creates an archive file (.tar.gz).

279

Chapter 11 ■ Building an r paCkage

Exercises

In the last chapter, you wrote functions for working with shapes and polynomials. Now try to make a package for each with documentation and correct exporting of the functions. If you haven’t implemented all the functionality for those exercises, this is your chance to do so.

280

CHAPTER 12

Testing and Package Checking

Without testing, there is little guarantee that your code will work at all. You probably test your code when you write it by calling your functions with a couple of chosen parameters, but to build robust software you need to approach testing more rigorously. And to prevent bugs from creeping into your code over time, you should test often. Ideally, you should check all your code anytime you make any changes to it.

There are different ways of testing software—software testing is almost a science in itself—but the kind of testing we do when we want to make sure that the code we just wrote is working as intended is called unit testing. The testing we do when we want to ensure that changes to the code do not break anything is called regression testing.

Unit Testing

Unit testing is called that because it tests functional units—in R, that essentially means single functions or a few related functions. Whenever you write a new functional unit, you should write test code for that unit as well. The test code is used to check that the new code is actually working as intended and if you write the tests such that they can be run automatically later on you have also made regression tests for the unit at the same time. Whenever you make any changes to your code, you can run all your automated tests, and that will check each unit and make sure that everything works as it did before.

Most programmers do not like to write tests. It is exciting to write new functionality but to probe new features for errors is a lot less interesting. However, you really do need the tests, and you will be happy that you have them in the long run. Don’t delay writing tests until after you have written all your functions. That is leaving the worst for last, and that is not the way to motivate you to write the tests. Instead, you can write your unit tests while you write your functions; some even suggest writing them before you write your functions, something called test-driven programming. The idea here is that you write the tests that specify how your function should work, and you know that your function works as intended when it passes the tests you wrote for it.

I have never found test-driven programming that useful for myself. It doesn’t match the way I work, because I like to explore different interfaces and uses of a function while I am implementing it, but some prefer to work that way. I do, however, combine my testing with my programming in the sense that I write small scripts calling my functions and fitting them together while I experiment with the functions. I write that code in a way that makes it easy for me to take the experiments, and then use them as automated tests later.

This article is from: