Class Hierarchies��

Formulas and Their Model Matrix��

Chapter 9 ■ advanCed r programming

Keep in mind that it is just shorthand for calling a function and then reassigning the result to a variable. It is not actually modifying any data. This means that if you have two variables referring to the same object, only the one you call the replacement function on will be affected. The replacement function returns a copy that is assigned the first variable and the other variable still refers to the old object.

y <- x foo(x) <- 5 x ## $foo ## [1] 5 ## ## $bar ## [1] 3 y ## $foo ## [1] 3 ## ## $bar ## [1] 3

Because replacement functions are just syntactic sugar on a function call and then a reassignment, you cannot give a replacement function as its first argument, some expression that cannot be assigned to.

There are a few more rules regarding replacement functions. First, the parameter for the value you are assigning has to be called value. You cannot give it another name.

`foo<-` <- function(x, val) { x$foo <- val x

x <- list(foo=1, bar=2) foo(x) <- 3 ## Error in `foo<-`(`*tmp*`, value = 3): unused argument (value = 3)

The way R rewrites the expression assumes that you called the value parameter value, so do that. You don’t have to call the first parameter x, though:

`foo<-` <- function(y, value) { y$foo <- value y

x <- list(foo=1, bar=2) foo(x) <- 3 x$foo ## [1] 3

244

Chapter 9 ■ advanCed r programming

You should also have the value parameter as the last parameter if you have more than two parameters. And you are allowed to do so, as long as the object you are modifying is the first and the value parameter is the last:

`modify<-` <- function(x, variable, value) { x[variable] <- value x

x <- list(foo=1, bar=2) modify(x, "foo") <- 3 modify(x, "bar") <- 4 x$foo ## [1] 3 x$bar ## [1] 4

How Mutable Is Data Anyway?

We just saw that a replacement function creates a new copy so if we use it to modify an object, we are not actually changing it at all. Other variables that refer to the same object will see the old value and not the updated one. So we can reasonably ask: what does it take actually to modify an object?

The short, and almost always correct, answer, is that you cannot modify objects ever. 2 Whenever you “modify” an object, you are creating a new copy and assigning that new copy back to the variable you used to refer to the old value.

This is also the case for assigning to an index in a vector or list. You will be creating a copy, and while it looks like you are modifying it, if you look at the old object through another reference, you will find that it hasn’t changed.

x <- 1:4 f <- function(x) { x[2] <- 5 x

} x ## [1] 1 2 3 4 f(x) ## [1] 1 5 3 4 x ## [1] 1 2 3 4

Unless you changed the [ function (which I urge you not to do), it is a so-called primitive function. This means that it is written in C and from C you actually can modify an object. This is important for efficiency reasons. If there is only one reference to a vector then assigning to it will not make a new copy and you will modify the vector in place as a constant time operation. If you have two references to the vector, then when you assign to it the first time, a copy is created that you can then modify in place. This approach to have immutable objects and still have some efficiency is called copy on write.

2It is possible to do depending on what you consider an object. You can modify a closure by assigning to local variables inside a function scope, as you saw last week. This is because namespaces are objects that can be changed. One of the object orientation systems in R—RC—also allows for mutable objects, but we won’t look at RC in this book. In general, you are better off thinking that every object is immutable, and any modification you are doing is actually creating a new object because generally, that is what is happening.

245

Chapter 9 ■ advanCed r programming

To write correct programs, always keep in mind that you are not modifying objects but creating copies— other references to the value you “modify” will still see the old value. To write efficient programs, also keep in mind that for primitive functions you can do efficient updates (updates in constant time instead of time proportional to the size of the object you are modifying) as long as you only have one reference to that object.

Functional Programming

There are many definitions of what it means for a language to be a functional programming language, and there have been many language wars over whether any given feature is “pure” or not. I won’t go into such discussions, but some features, I think everyone would agree, are needed. You should be able to pass functions along as parameters to other functions, you should be able to create anonymous functions, and you should have closures.

Anonymous Functions

In R it is pretty easy to create anonymous functions: just don’t assign the function definition to a variable name.

Instead of doing this:

square <- function(x) x^2

You simply do this:

function(x) x^2

In other languages where function definitions have a different syntax than variable assignment, you will have a different syntax for anonymous functions, but in R it is really as simple as this.

Why would you want an anonymous function?

There are two common cases:

• You want to use a one-off function and don’t need to give it a name • You want to create a closure

Both cases are typically used when a function is passed as an argument to anther function or when returned from a function. The first case is something you would use together with functions like apply. If you want to compute the sum of squares over the rows of a matrix, you can create a named function and apply it, as follows:

m <- matrix(1:6, nrow=3) sum_of_squares <- function(x) sum(x^2) apply(m, 1, sum_of_squares) ## [1] 17 29 45

If this is the only time you need this sum of squares function, there isn’t really any need to assign it a variable; you can just use the function definition direction:

apply(m, 1, function(x) sum(x^2)) ## [1] 17 29 45

Of course, in this example, you could do even better by just exploiting that ^ is vectorized and write this:

apply(m^2, 1, sum) ## [1] 17 29 45

246

Chapter 9 ■ advanCed r programming

Using anonymous functions to create closures is what you do when you write a function that returns a function (more about that next). You could name the function as follows:

f <- function(x) { g <- function(y) x + y g

But there really isn’t much point if you just want to return it:

f <- function(x) function(y) x + y

Functions Taking Functions as Arguments

You already saw this in all the apply examples. You give as an argument to apply a function to be called across dimensions. In general, if some sub-computation of a function should be parameterized then you do this by taking a function as one of its parameters.

Say you want to write a function that works like (s/v)apply but only apply an input function on elements that satisfy a predicate. You can implement such a function by taking the vector and two functions as input:

apply_if <- function(x, p, f) { result <- vector(length=length(x)) n <- 0 for (i in seq_along(x)) { if (p(x[i])) { n <- n + 1 result[n] <- f(x[i])

} head(result, n)

} apply_if(1:8, function(x) x %% 2 == 0, function(x) x^2) ## [1] 4 16 36 64

This isn’t the most elegant way to solve this particular problem—we get back to the example in the exercises—but it illustrates the use of functions as parameters.

Functions Returning Functions (and Closures)

You create closures when you create a function inside another function and return it. Because this inner function can refer to the parameters and local variables inside the surrounding function even after you have returned from it, you can use such inner functions to specialize generic functions. It can work as a template mechanism for describing a family of functions.

247

Class Hierarchies��

Next Article

Formulas and Their Model Matrix��

How Mutable Is Data Anyway?

Functional Programming

Anonymous Functions

Functions Taking Functions as Arguments

Functions Returning Functions (and Closures)

More articles from this publication:

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Parallel Execution��

Switching to C++ ��

Speeding Up Your Code ��

Exercises��

Using git in RStudio��

Version Control and Repositories ��

Collaborating on GitHub��

This article is from:

Beginning of Data Science in R

Next Article

Formulas and Their Model Matrix���������������������������������������������������������������������������������

How Mutable Is Data Anyway?

Functional Programming

Anonymous Functions

Functions Taking Functions as Arguments

Functions Returning Functions (and Closures)

More articles from this publication:

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Bayesian Linear Regression�����������������������������������������������������������������������������������������

Version Control and Repositories ���������������������������������������������������������������������������������

This article is from:

Beginning of Data Science in R

Formulas and Their Model Matrix��

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Version Control and Repositories ��