Unit Testing��

Formulas and Their Model Matrix��

Chapter 10 ■ ObjeCt Oriented prOgramming

When you define your specialized function, you can add more parameters to your function, but you should define it such that you at least take the same parameters as the generic function does. R will not complain if you do not define it that way, but it is bound to lead to problems later when someone calls the function with assumptions about which parameters it takes based on the generic interface and then runs into your implementation of a specialized function that behaves a different way. Don’t do that. Implement your function so it takes the same parameters as the generic function. This includes using the same names for parameters. Someone might provide named parameters to the generic function, and that will work only if you call the parameters the same names as the generic function. That is why we used x as the input parameter for the print.blm function.

Defining Your Own Polymorphic Functions

To define a class-specific version of a polymorphic function, you just need to write a function with the right name. There is a little bit more to do if you want to define your very own polymorphic function. Then you also need to write the generic function—the function you will actually call with objects, and that is responsible for dispatching the function call to class-specific functions.

You do this using the UseMethod function. The generic function typically just does this and looks like this:

foo <- function(x, ...) UseMethod("foo")

You specify a function with the parameters the generic function should accept and then just call UseMethod with the name of the function to dispatch to. Then it does it magic and finds out which classspecific function to call and forwards the parameters to there.

When you write the generic function, it is also good style to define the default function as well.

foo.default <- function(x, ...) print("default foo")

With that, we can call the function with all types of objects. If you don’t want that to be possible, a safe default function would be one that throws an error.

foo("a string") ## [1] "default foo" foo(12) ## [1] "default foo"

And of course, with the generic function in place, we can define class-specific functions just like before.

foo.blm <- function(x, ...) print("blm foo") foo(model) ## [1] "blm foo"

You can add more parameters to more specialized functions when the generic function takes … as an argument. The generic function will just ignore the extra parameters, but the concrete function that is called might be able to do something about it.

foo.blm <- function(x, upper = FALSE, ...) { if (upper) { print("BLM FOO") } else { print("blm foo")

262

foo("a string") ## [1] "default foo" foo(model) ## [1] "blm foo" foo("a string", upper = TRUE) ## [1] "default foo" foo(model, upper = TRUE) ## [1] "BLM FOO"

Chapter 10 ■ ObjeCt Oriented prOgramming

Class Hierarchies

Polymorphic functions are one aspect of object oriented programming, another is inheritance. This is the mechanism used to build more specialized classes out of more general classes.

The best way to think about this is as levels of specialization. You have some general class of objects, say furniture, and within that class are more specific categories, say chairs, and within that class even more specific types of objects, say kitchen chairs. A kitchen chair is also a chair, and a chair is also furniture. If there is something you can do to all furniture, then you can definitely also do it to chairs. For example, you can set furniture on fire; you can set a chair on fire. It is not the case, however, that everything you can do to chairs you can do to all furniture. You can throw a chair at unwelcome guests, but you cannot throw a piano at them.

The way specialization like this works is that there are some operations you can do for the general classes. Those operations can be done on all instances of those classes, including those that are really objects of more specialized classes.

The operations might not do exactly the same thing—like we can specialize print, an operation we can call on all objects, to do something special for blm objects—but there is some meaningful way of doing the operation. Quite often the way a class is specialized is exactly by doing an operation that can be done by all objects from the general class, but just in a more specialized way.

The specialized classes, however, can potentially do more so they might have more operations that are meaningful to do to them. That is fine. As long as we can treat all objects of a specialized class the same as we can treat objects of the more general class.

This kind of specialization is partly interface and partly implementation.

Specialization as Interface

The interface is which functions we can call on objects of a given class. It is a kind of protocol for how we interact with objects of the class. If we imagine some general class of “fitted models,” we might specify that for all models we should be able to get the fitted parameters and for all models we should be able to make predictions for new values. In R, such functions exist, called coef and predict, and any model is expected to implement them.

This means that I can write code that interacts with a fitted model through these general model functions, and as long as I stick to the interface they provide, I could be working on any kind of model. If, at some point, I find out that I want to replace a linear regression model with a decision tree regression model, I can just plug in a different fitted model and communicate with it through the same polymorphic functions. The actual functions that will be called when I call the generic functions coef and predict will, of course, be different, but the interface is the same.

R will not enforce such interfaces for you. Classes in R are not typed in the same way as they are in, for example, Java, where it would be a type error to declare something as an object satisfying a certain interface if it does in fact not. R doesn’t care. Not until you call a function that isn’t there; then you might be in trouble, of course. But it is up to you to implement an interface to fit the kind of class or protocol you think your class should match.

263

Chapter 10 ■ ObjeCt Oriented prOgramming

If you implement the functions that a certain interface expects (and these functions actually do something resembling what the interface expects the functions to do and are not just named the same things),1 you have a specialization of that interface. You can do the same operations as every other class that implements the interface, but of course, your operations are uniquely fitted to your actual class.

You might implement more functions, making your class capable of more than the more general class of objects, but that is just fine. Other classes might implement those operations as well, so now you have more than one class with the more specialized operations—a new category that is more general and can be specialized further.

You have a hierarchy of classes defined by which functions they provide implementations of.

Specialization in Implementations

Specialization by providing general or more specialized interfaces—in the case of R by providing implementations of polymorphic functions—is the essential feature of the concept of class hierarchies in object oriented programming. It is what lets you treat objects of different kinds as a more general class.

There is another aspect of class hierarchies, though, that has to do with code reuse. You already get a lot of this just by providing interfaces to work with objects, of course, since you can write code that works on a general interface and then reuse it on all objects that implement this interface. But there is another type of reuse you get when you build a hierarchy of classes where you go from abstract, general classes to more specialized and concrete classes. When you are specializing a class, you are taking functionality that exists for the more abstract class and defining a new class that implements the same interface except for a few differences here and there.

When you refine a class in this way, you don’t want to implement new versions of all the polymorphic functions in its interface. Many of them will do exactly the same as the implementation for their more general class.

Let’s say you want to have a class of objects where you can call functions foo and bar. You can call that class A and define it as follows:

foo <- function(object, ...) UseMethod("foo") foo.default <- function(object, ...) stop("foo not implemented")

bar <- function(object, ...) UseMethod("bar") bar.default <- function(object, ...) stop("bar not implemented")

A <- function(f, b) structure(list(foo = f, bar = b), class = "A") foo.A <- function(object, ...) paste("A::foo ->", object$foo) bar.A <- function(object, ...) paste("A::bar ->", object$bar)

a <- A("qux", "qax") foo(a) ## [1] "A::foo -> qux" bar(a) ## [1] "A::bar -> qax"

For a refinement of that, you might want to change how bar works and add another function called baz:

baz <- function(object, ...) UseMethod("baz") baz.default <- function(object, ...) stop("baz not implemented")

1To draw means something very different when you are a gunslinger compared to when you are an artist, after all.

264

Chapter 10 ■ ObjeCt Oriented prOgramming

B <- function(f, b, bb) { a <- A(f, b) a$baz <- bb class(a) <- "B" a

bar.B <- function(object, ...) paste("B::bar ->", object$bar) baz.B <- function(object, ...) paste("B::baz ->", object$baz)

We want to leave the foo function just the way it is, but if we define the class B as shown, calling foo on a B object gives us an error because it will be calling the foo.default function.

b <- B("qux", "qax", "quux") foo(b) ## Error in foo.default(b): foo not implemented

This is because we haven’t told R that we consider the class B a specialization of class A. We wrote the constructor function—the function where we make the object, the function B—such that all B objects contain the data that is also found in an A object. We never told R that we intended B objects also to be A objects.

We could, of course, make sure that foo called on a B object behaves the same way as when called on an A object by defining foo.B such that it calls foo.A. This wouldn’t be too much work for a single function, but if there are many polymorphic functions that work on A objects, we would have to implement B versions for all of them. Tedious and error-prone work.

If only there were a way of telling R that the class B is really an extension of the class A. And there is. The class attribute of an object doesn’t have to be a string. It can be a vector of strings. If, for B objects, we say that the class is B first and A second, like this,

B <- function(f, b, bb) { a <- A(f, b) a$baz <- bb class(a) <- c("B", "A") a

Then calling foo on a B object—where foo.B is not defined—will call foo.A as its second choice and before defaulting to foo.default:

b <- B("qux", "qax", "quux") foo(b) ## [1] "A::foo -> qux" bar(b) ## [1] "B::bar -> qax" baz(b) ## [1] "B::baz -> quux"

The way the class attribute is used with polymorphic functions is that R will look for functions with the class names appended in the order of the class attributes. The first it finds will be the one that is called, and if it finds no specialized version, it will go for the .default version. When we set the class of B objects to be the vector c("B", "A"), we are saying that R should call .B functions first, if it can find one, but otherwise call the .A function.

265

Chapter 10 ■ ObjeCt Oriented prOgramming

This is a very flexible system that lets you implement multiple inheritances from classes that are otherwise not related, but you do so at your own peril. The semantics of how this works—functions are searched for in the order of the class names in the vector—the actual code that will be run can be hard to work out if these vectors get too complicated.

Another quick word of caution is this: if you give an object a list of classes, you should include the classes all the way up the class hierarchy. If you define a new class, C, intended as a specialization of B, we cannot just say that it is an object of class c("C", "B") if we also want it to behave like an A object.

C <- function(f, b, bb) { b <- B(f, b, bb) class(b) <- c("C", "B") b

c <- C("foo", "bar", "baz") foo(c) ## Error in foo.default(c): foo not implemented

When we call foo(c) here, R will try the functions, in turn: foo.C, foo.B, and foo.default. The only one that is defined is the last, and that throws an error if called.

So what we have defined here is an object that can behave like B but only in cases where B differs from A’s behavior! Our intention is that B is a special type of A, so every object that is a B object we should also be able to treat as an A object. Well, with C objects that doesn’t work.

We don’t have a real class hierarchy here like we would find in languages like Python, C++, or Java. We just have a mechanism for calling polymorphic functions, and the semantic here is just to look for them by appending the names of the classes found in the class attribute vector. Your intentions might very well be that you have a class hierarchy with A being the most general class, B a specialization of that, and C the most specialized class, but that is not what you are telling R. Because you cannot. You are telling R how it should look for dynamic functions, and with the code, you told it to look for .C functions first, then .B functions, and you didn’t tell it any more, so the next step it will take is to look for .default functions. Not .A functions. It doesn’t know that this is where you want it to look.

If you add this to the class attribute it will work, though:

C <- function(f, b, bb) { b <- B(f, b, bb) class(b) <- c("C", "B", "A") b

c <- C("foo", "bar", "baz") foo(c) ## [1] "A::foo -> foo" bar(c) ## [1] "B::bar -> bar" baz(c) ## [1] "B::baz -> baz"

266

Unit Testing��

Next Article

Formulas and Their Model Matrix��

Defining Your Own Polymorphic Functions

Class Hierarchies

Specialization as Interface

Specialization in Implementations

More articles from this publication:

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Parallel Execution��

Switching to C++ ��

Speeding Up Your Code ��

Exercises��

Using git in RStudio��

Version Control and Repositories ��

Collaborating on GitHub��

This article is from:

Beginning of Data Science in R

Next Article

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Defining Your Own Polymorphic Functions

Class Hierarchies

Specialization as Interface

Specialization in Implementations

More articles from this publication:

Formulas and Their Model Matrix���������������������������������������������������������������������������������

Bayesian Linear Regression�����������������������������������������������������������������������������������������

Version Control and Repositories ���������������������������������������������������������������������������������

This article is from:

Beginning of Data Science in R

Formulas and Their Model Matrix��

Formulas and Their Model Matrix��

Bayesian Linear Regression��

Version Control and Repositories ��