2 minute read

Quickly Reviewing Data��������������������������������������������������������������������������������������������������

Chapter 1 ■ IntroduCtIon to r programmIng

If you want to write functions that work well with pipelines, you should, therefore, make sure that the most likely parameter to come through a pipeline is the first parameter of your function. Write your functions so the first parameter is the data it operates on, and you have done most of the work.

Advertisement

For example, if you wanted a function that would sample n random rows of a data frame, you could write it such that it takes the data frame as the first argument and the parameter n as its second argument and then you could simply pop it right into a pipeline:

subsample_rows <- function(d, n) { rows <- sample(nrow(d), n) d[rows,]

d <- data.frame(x = rnorm(100), y = rnorm(100)) d %>% subsample_rows(n = 3) ## x y ## 46 0.5622234 -0.4184033 ## 17 -0.5973131 -1.5549958 ## 38 -2.0004727 -1.0736909

The magical “.” argument

Now, you cannot always be so lucky that all the functions you want to call in a pipeline take the left side of the %>% as its first parameter. If this is the case, you can still use the function, though, because magrittr interprets . in a special way. If you use . in a function call in a pipeline, then that is where the left side of the %>% operation goes instead of as default first parameter of the right side. So if you need the data to go as the second parameter, you put a . there, since x %>% f(y, .) is equivalent to f(y, x). The same goes when you need to provide the left side as a named parameter since x %>% f(y, z = .) is equivalent to f(y, z = x), something that is particularly useful when the left side should be given to a model-fitting function. Functions fitting a model to data are usually taking a model specification as their first parameter and the data they are fitting as a named parameter called data.

d <- data.frame(x = rnorm(10), y = rnorm(10)) d %>% lm(y ~ x, data = .) ## ## Call: ## lm(formula = y ~ x, data = .) ## ## Coefficients: ## (Intercept) x ## 0.0899 0.1469

We return to model fitting, and what an expression such as y ~ x means, in a later chapter, so don’t worry if it looks a little strange for now. If you are interested, you can always check the documentation of the lm() function.

24

This article is from: