4 minute read

Exercises����������������������������������������������������������������������������������������������������������������������

Chapter 9 ■ advanCed r programming

If you write a function where you cannot write the body this way, but where you still want to be able to use it in vector expressions, you can typically get there using the Vectorize function.

Advertisement

As an example, say we have a table mapping keys to some values. We can imagine that we want to map names in a class to the roles the participants in the class have. In R, we would use a list to implement that kind of tables, and we can easily write a function that uses such a table to map names to roles.

role_table <- list("Thomas" = "Instructor", "Henrik" = "Student", "Kristian" = "Student", "Randi" = "Student", "Heidi" = "Student", "Manfred" = "Student")

map_to_role <- function(name) role_table[[name]]

This works the way it is supposed to when we call it with a single name:

map_to_role("Thomas") ## [1] "Instructor" map_to_role("Henrik") ## [1] "Student"

But it fails when we call the function with a vector because we cannot index the list with a vector in this

way.

x <- c("Thomas", "Henrik", "Randi") map_to_role(x) ## Error in role_table[[name]]: recursive indexing failed at level 2

So we have a function that maps a single value to a single value but doesn’t work for a vector. The easy way to make such a function work on vectors is to use the Vectorize function. This function will wrap your function so it can work on vectors and what it will do on those vectors is what you would expect: it will calculate its value for each of the elements in the vector, and the result will be the vector of all the results.

map_to_role <- Vectorize(map_to_role) map_to_role(x) ## Thomas Henrik Randi ## "Instructor" "Student" "Student"

In this particular example with a table, the reason it fails is that we are using the [[ index operator. Had we used the [ operator, we would be fine (except that the result would be a list rather than a vector).

role_table[c("Thomas", "Henrik", "Randi")] ## $Thomas ## [1] "Instructor" ## ## $Henrik ## [1] "Student" ## ## $Randi ## [1] "Student"

236

Chapter 9 ■ advanCed r programming

So we could also have handled vector input directly by indexing differently and then flattening the list

map_to_role_2 <- function(names) unlist(role_table[names])

x <- c("Thomas", "Henrik", "Randi") map_to_role_2(x) ## Thomas Henrik Randi ## "Instructor" "Student" "Student"

It’s not always that easy to rewrite a function to work on vector input, though, and when we cannot readily do that then, the Vectorize function can be very helpful.

As a side note, the issue with using [[ with a vector of values isn’t just that it this doesn’t work. It actually does work, but it does something else than what we are trying to do here. If you give [[ a vector of indices it is used to do what is called recursive indexing. It is a shortcut for looking up in the list using the first variable and pulling out the vector or list found there. It then takes that sequence and looks up using the next index and so on. Take as an example the following code:

x <- list("first" = list("second" = "foo"), "third" = "bar") x[[c("first", "second")]] ## [1] "foo"

Here we have a list of two elements, the first of which is a list with a single element. You can look up the index “first” in the first list and get the list stored at that index. This list we can then index with the “second” index to get “foo” out.

The result is analogous to this:

x[["first"]][["second"]] ## [1] "foo"

This can be a useful feature—although, to be honest, I haven’t found much use for it in my own programming—but it is not the effect we wanted in our mapping to roles example.

The apply Family

Vectorizing a function makes it possible to use it implicitly on vectors. We simply give it a vector as input, and we get a vector back as output. Notice though that it isn’t really a vectorized function just because it takes a vector as input— many functions take vectors as input and return a single value as output, e.g., sum and mean—but we use those differently than vectorized functions. If you want that kind of function, you do have to handle explicitly how it deals with a sequence as input.

Vectorized functions can be used on vectors of data exactly the same way as on single values with exactly the same syntax. It is an implicit way of operating on vectors. But you can also make it more explicit when calling a function on all the elements in a vector, which gives you a bit more control of exactly how it is called. This, in turn, lets you work with those functions that do not just map from vectors to vectors but also from vectors to single values.

There are many ways of doing this—because it is a common thing to do in R—and you will see some general functions for working on sequences and calling functions on them in various ways later. In most of the code you will read, though, the functions that do this are named something with apply in their name and those functions are what we will look at here.

Let’s start with apply. This is a function for operating on vectors, matrices (two-dimensional vectors), or arrays (higher-order dimensional vectors).

237

This article is from: