Functional Programming��

Formulas and Their Model Matrix��

Chapter 8 ■ More r prograMMing

Where the function g is defined—inside function f—it refers to variables y and z that are not defined yet. This doesn’t matter because we only create the function g; we do not invoke it. We then set the variable y inside the context of the invocation of f and return g. Outside of the function call, we name the return value of f() h. If we call h at this point it will remember that y was defined inside f—and it will remember its value at the point in time where we returned from f. There still isn’t a value set for z so calling h results in an error. Since z isn’t defined in the enclosing scopes of where the inner function refers to it, it must be defined at the outermost global scope, but it isn’t. If we do set it there, the error goes away because now R can find the variable by searching outward from where the function was created.

I shouldn’t really be telling you this because the feature I am about to tell you about is dangerous. I will show you a way of making functions have even more side-effects than they otherwise have, and functions really shouldn’t have side-effects at all. Anyway, this is a feature of the language—and if you are very careful with how you use it—it can be very useful when you just feel the need to make functions have side-effects.

This is the problem: what if you want to assign to a variable in a scope outside the function where you want the assignment to be made? You cannot just assign to the variable because if you assign to a variable that isn’t found in the current scope, then you create that variable in the current scope.

f <- function() { x <- NULL set <- function(val) { x <- val } get <- function() x list(set = set, get = get)

x <- f() x$get() ## NULL x$set(5) x$get() ## NULL

In this code—that I urge you to read carefully because there are a few neat ideas in it—we have created a getter and setter function; the getter tells us what the variable x is, and the setter is supposed to update it. But setting x in the body of set creates a local variable inside that function; it doesn’t assign to the x one level up.

There is a separate assignment operator, <<-, you can use for that. It will not create a new local variable but instead search outward to find an existing variable and assign to that. If it gets all the way to the outermost global scope, though, it will create the variable there if it doesn’t already exist.

If we use that assignment operator in the previous example, we get the behavior we were aiming for.

f <- function() { x <- NULL set <- function(val) { x <<- val } get <- function() x list(set = set, get = get)

x <- f() x$get() ## NULL x$set(5) x$get() ## [1] 5

226

Chapter 8 ■ More r prograMMing

If we hadn’t set the variable x inside the body of f in this example, both the getter and setter would be referring to a global variable, in case you are wondering, and the first call to get would cause an error if there was no global variable. While this example shows you have to create an object where functions have sideeffects, it is quite a bit better to let functions modify variables that are hidden away in a closure like this than it is to work with global variables.

Function Names Are Different from Variable Names

One final note on scopes—which I am not sure should be considered a feature or a bug—is that if R sees something that looks like a function call, it is going to go searching for a function, even if searching outward from a function creation would get to a non-function first.

n <- function(x) x f <- function(n) n(n) f(5) ## [1] 5

Under the scoping rule that says that you should search outward, the n inside the f function should refer to the parameter to f. But it is clear that the first n is a function call and the second is its argument, so when we call f it sees that the parameter isn’t a function so it searches further outward and finds the function n. It calls that function with its argument. So the two n’s inside f actually refers to different things.

Of course, if we call f with something that is actually a function, then it recognizes that n is a function and it calls that function with itself as the argument.

f(function(x) 15) ## [1] 15

Interesting, right?

Recursive Functions

The final topic we will cover in this chapter before we get to the exercises is recursive functions. Some people find this a difficult topic, but in a functional programming language, it is one of the most basic building blocks so it is really worth spending some time wrapping your head around, even though you are much less likely to need recursions in R than you are in most pure functional languages.

At the most basic level, though, it is just that we can define a function’s computations in terms of calls to the same function—we allow a function to call itself, just with new parameters.

Consider the factorial operator n! = n × (n − 1) × … × 3 × 2 × 1. We can rewrite the factorial of n in terms of n and a smaller factorial, the factorial of n − 1 and get n! = n × (n − 1)!. This is a classical case of where recursion is useful: we define the value for some n in terms of the calculations on some smaller value. As a function, we would write factorial(n) equals n * factorial(n-1).

There are two aspects to a recursive function, though. Solving a problem for size n involves breaking down the problem into something you can do right away and combine that with calls of the function with a smaller size, here n − 1. This part we call the “step” of the recursion. We cannot keep reducing the problem into smaller and smaller bits forever—that would be an infinite recursion which is as bad as an infinite loop in that we never get anywhere—at some point we need to have reduced the problem to a size small enough that we can handle it directly. That is called the basis of the recursion.

227

Chapter 8 ■ More r prograMMing

For factorial, we have a natural basis in 1 since 1! = 1. So we can write a recursive implementation of the factorial function like this:

factorial <- function(n) { if (n == 1) { 1 } else { n * factorial(n - 1)

It is actually a general algorithmic strategy, called divide and conquer, to break down a problem into sub-problems that you can handle recursively and then combining the results some way.

Consider sorting a sequence of numbers. We could sort a sequence using this strategy by first noticing that we have a simple basis—it is easy to sort an empty sequence or a sequence with a single element since we don’t have to do anything there. For the step, we can break the sequence into two equally sized pieces and sort them recursively. Now we have two sorted sequences and if we merge these two we have combined them into a single sorted sequence.

Let’s get started.

We need to be able to merge two sequences so we can solve that problem first. This is something we should be able to do with a recursive function because if either sequence is empty, we have a basis case where we can just return the other sequence. If there are elements in both sequences, we can pick the sequence whose first element is smallest, pick that out as the first element we need in our final result and just concatenate the merging of the remaining numbers.

merge <- function(x, y) { if (length(x) == 0) return(y) if (length(y) == 0) return(x)

if (x[1] < y[1]) { c(x[1], merge(x[-1], y)) } else { c(y[1], merge(x, y[-1]))

A quick disclaimer here: normally this algorithm would run in linear time but because of the way we call recursively we are actually copying vectors whenever we are removing the first element, making it a quadratic time algorithm. Implementing a linear time merge function is left as an exercise.

Using this function, we can implement a sorting function. This algorithm is called merge sort so that is what we call the function:

merge_sort <- function(x) { if (length(x) < 2) return(x)

n <- length(x) m <- n %/% 2

merge(merge_sort(x[1:m]), merge_sort(x[(m+1):n]))

228

Chapter 8 ■ More r prograMMing

So here, using two simple recursive functions, we solved a real algorithmic problem in a few lines of code. This is typically the way to go in a functional programming language like R. Of course, when things are easier done using loops you shouldn’t stick to the pure functional recursions. Use what is easiest in any situation you are in, unless you find that it is too slow. Only then do you start getting clever.

Exercises

Try the following exercises to become more comfortable with the concepts discussed in this chapter.

Fibonacci Numbers

The Fibonacci numbers are defined as follows: The first two Fibonacci numbers are 1, F1 = F2 = 1. For larger Fibonacci numbers, they are defined as Fi = Fi−1 + Fi−2.

Implement a recursive function that computes the n’th Fibonacci number.

The recursive function for Fibonacci numbers is usually quite inefficient because you are recomputing the same numbers several times in the recursive calls. So implement another version that computes the n’th Fibonacci number iteratively (that is, start from the bottom and compute the numbers up to n, without calling recursively).

Outer Product

The outer product of two vectors, v and w, is a matrix defined as follows:

v w =vw Ä = T é ê ê ë ê

v v v

1 2 3 ù ú ú û ú é ëww 1 ww 2 3 4 ù û = vw vw 1 1 2 1 é ê ê vw vw vw v 1 2 1 3 1 4 2 2 2 2 3 2 4w v w v w 3 1vwë ê 3 2vw 3 3vw 3 4vw ù ú ú û ú

Write a function that computes the outer product of two vectors.

There actually is a built-in function, outer, that you are overwriting here. You can get to it using the name base::outer even after you have overwritten it. You can use it to check that your own function is doing the right thing.

Linear Time Merge

The merge function we used copies vectors in its recursive calls. This makes it slower than it has to be. Implement a linear time merge function.

Before you start, though, you should be aware of something. If you plan to append to a vector by writing something like this:

v <- c(v, element)

You will end up with a quadratic time algorithm again. This is because when you do this, you are actually creating a new vector where you first copy all the elements in the old v vector into the first elements and then add the element at the end. If you do this n times, you have spent on average order n2 per operation. It is because people do something like this in loops, more than the R interpreter, that has given R its reputation for slow loops. You should never append to vectors unless there is no way to avoid it.

229

Functional Programming��

Next Article

Formulas and Their Model Matrix��

Function Names Are Different from Variable Names

Recursive Functions

Exercises

Fibonacci Numbers

Outer Product

Linear Time Merge

More articles from this publication:

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Parallel Execution��

Switching to C++ ��

Speeding Up Your Code ��

Exercises��

Using git in RStudio��

Version Control and Repositories ��

Collaborating on GitHub��

This article is from:

Beginning of Data Science in R

Next Article

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Function Names Are Different from Variable Names

Recursive Functions

Exercises

Fibonacci Numbers

Outer Product

Linear Time Merge

More articles from this publication:

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Bayesian Linear Regression�����������������������������������������������������������������������������������������

Version Control and Repositories ���������������������������������������������������������������������������������

This article is from:

Beginning of Data Science in R

Formulas and Their Model Matrix��

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Version Control and Repositories ��