Roxygen��

Formulas and Their Model Matrix��

Chapter 9 ■ advanCed r programming

Function Composition

For two functions f and g, the function composition creats a new function f ∘ g such that (f ∘ g)(x) = f(g(x)).

There isn’t an operator for this in R, but you can make your own. To avoid clashing with the outer product operator, %o%, call it %.%.

Implement this operator.

Using this operator, you should, for example, be able to combine Map and unlist once and for all to get a function for the unlist(Map(...)) pattern

uMap <- unlist %.% Map

So this function works exactly like first calling Map and then unlist:

plus <- function(x, y) x + y unlist(Map(plus, 0:3, 3:0)) ## [1] 3 3 3 3 uMap(plus, 0:3, 3:0) ## [1] 3 3 3 3

With it, you can build functions by stringing together other functions (not unlike how you can create pipelines in magrittr—see https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html).

For example, you can compute the root mean square error function like this:

error <- function(truth) function(x) x - truth square <- function(x) x^2

rmse <- function(truth) sqrt %.% mean %.% square %.% error(truth)

mu <- 0.4 x <- rnorm(10, mean = 0.4) rmse(mu)(x) ## [1] 0.8976526

Combining a sequence of functions like this requires that you read the operations from right to left, so I personally prefer the approach in magrittr, but you can see the similarity.

256

CHAPTER 10 Object Oriented Programming

This chapter looks at R’s flavor of object oriented programming. Actually, R has three different systems for object oriented programming: S3, S4, and RC. We will only look at S3, which is the simplest and (I believe) the most widely used.

Immutable Objects and Polymorphic Functions

Object orientation in S3 is quite different from what you might have seen in Java or Python. Naturally so, since data in R is immutable and the underlying model in OO in languages such as Java and Python is that you have objects with states that you can call methods to change the state. You don’t have a state as such in S3; you have immutable objects. Just like all other data in R.

What’s the point then, of having object orientation if we don’t have object states? What we get from the S3 system is polymorphic functions, called “generic” functions in R. These are functions whose functionality depends on the class of an object—similar to methods in Java or Python where methods defined in a class can be changed in a subclass to refine behavior.

You can define a function foo to be polymorphic and then define specialized functions, say foo.A and foo.B. Then calling foo(x) on an object x from class A will actually call foo.A(x) and for an object from class B will actually call foo.B(x). The names foo.A and foo.B were not chosen at random here, as you will see, since it is precisely how you name functions that determine which function is called.

We do not have objects with states; we simply have a mechanism for enabling a function to depend on the class an object has. This is often called “dynamic dispatch” or “polymorphic methods”. Here of course, since we don’t have states, we can call it polymorphic functions.

Data Structures

Before we get to making actual classes and objects, though, we should look at data structures. We discussed the various built-in data structures in R in Chapters 1 and 8. Those built-in data types are the basic building blocks of data in R, but we never discussed how we can build something more complex from them.

More important than any object oriented system is the idea of keeping related data together so we can treat it as a whole. If we are working on several pieces of data that somehow belongs together, we don’t want it scattered out in several different variables, perhaps in different scopes, where we have little chance of keeping it consistent. Even with immutable data, keeping the data that different variables refer to would be a nightmare.

For data we analyze, we therefore typically keep it in a data frame. This is a simple idea for keeping data together. All the data we are working on is in the same data frame, and we can call functions with the data frame and know that they are getting all the data in a consistent state. At least as consistent as we can guarantee with data frames; we cannot promise that the data itself is not messed up somehow, but we can write functions under the assumption that data frames behave a certain way.

Chapter 10 ■ ObjeCt Oriented prOgramming

What about something like a fitted model? If we fit a model to some data, that fit is stored variables capturing the fit. We certainly would like to keep those together when we do work with the model because we would not like accidentally to use a mix of variables fitted to two different models. We might also want to keep other data together with the fitted model—e.g., some information about what was actually fitted—if we want to check that in the R shell later. Or the data it was fitted to.

The only option we have for collecting heterogeneous data together as a single object is a list. And that is how you do it in R.

Example: Bayesian Linear Model Fitting

Project two, described in the last chapter of the book, concerns Bayesian linear models. To represent such, we would wrap data for a model in a list. For fitting data, assume that you have a function like the one described here (refer to Chapter 9 for details of the mathematics).

It takes the model specification in the form of a formula as its parameter model and the prior precision alpha and the “precision” of the data beta. It then computes the mean and the covariance matrix for the model fitted to the data. The mathematics behind the code is explained in Chapter 9. It then wraps up the fitted model together with some related data—the formula used to fit the model and the data used in the model fit (here assumed to be in the variable frame)—and puts them in a list, which the function returns.

blm <- function(model, alpha = 1, beta = 1, ...) {

# Here goes the mathematics for computing the fit. frame <- model.frame(model, ...) phi <- model.matrix(frame) no_params <- ncol(phi) target <- model.response(frame)

covar <- solve(diag(alpha, no_params) + beta * t(phi) %*% phi) mean <- beta * covar %*% t(phi) %*% target

list(formula = model, frame = frame, mean = mean, covar = covar)

You can see it in action by simulating some data and calling the function:

# fake some data for our linear model x <- rnorm(10) a <- 1 ; b <- 1.3 w0 <- 0.2 ; w1 <- 3 y <- rnorm(10, mean = w0 + w1 * x, sd = sqrt(1/b))

# fit a model model <- blm(y ~ x, alpha = a, beta = b) model ## $formula ## y ~ x ##

258

Roxygen��

Next Article

Formulas and Their Model Matrix��

Function Composition

CHAPTER 10

Object Oriented Programming

Immutable Objects and Polymorphic Functions

Data Structures

Example: Bayesian Linear Model Fitting

More articles from this publication:

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Parallel Execution��

Switching to C++ ��

Speeding Up Your Code ��

Exercises��

Using git in RStudio��

Version Control and Repositories ��

Collaborating on GitHub��

This article is from:

Beginning of Data Science in R

Next Article

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Function Composition

CHAPTER 10

Object Oriented Programming

Immutable Objects and Polymorphic Functions

Data Structures

Example: Bayesian Linear Model Fitting

More articles from this publication:

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Bayesian Linear Regression�����������������������������������������������������������������������������������������

Version Control and Repositories ���������������������������������������������������������������������������������

This article is from:

Beginning of Data Science in R

Formulas and Their Model Matrix��

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Version Control and Repositories ��