Mitchell’s Musings 11-14-16: Do We Really Have All Our Marbles? Daniel J.B. Mitchell When I was in high school a long time ago, a teacher wanted to demonstrate the laws of probability and the concept of sampling. So he filled a large jar with marbles of four colors. Since he had filled the jar himself, he knew the proportions of each color which he told the class. He then chose various students to come up and – without looking in the jar – randomly pick out a couple of marbles. The idea was to show that as the sample increased in size, it would come to reflect the actual proportions in the jar. Now this demonstration took place in Stuyvesant High School in New York City, a special school – you had to take an SAT-type test to get in - which at the time was male-only. And the teacher was a patsy. So the nerdy teenage boys in the class managed to sabotage the demonstration. I won’t go into more details on what happened, but let’s suppose that the demonstration had proceeded as the teacher had planned. Indeed, as the sample size grew larger, the proportions in the sample would have come to resemble the actual proportions in the entire jar. I thought of the marbles-in-the-jar story given the polling fiasco that occurred in the presidential election last week. Basically, pollsters (the polling industry) had a Dewey-Beats-Truman event in 2016, whatever rationalizations and defenses they offer. And there will be rationalizations offered in the weeks to come. Examples: Clinton won the popular vote so really those who projected a Clinton win were “right.” Voters changed their minds at the last minute. Etc., etc. You can just hear the selfjustifying stories now. But there is a problem. Pollsters usually qualify their results by including a “margin of error.” Journalists take this margin to mean that there is zero probability of an error outside that range. But presumably “margin of error” actually has some relationship (what relationship exactly?) to the statistical concept of a confidence interval linked to sample size. A confidence interval, however it is defined, does not mean that there is zero probability of a result outside its boundaries. That issue is a comparatively minor part of the larger story. So let’s go back to the marble story. There is no such thing as a “likely marble.” If you took the marble out of the jar, its status is clear; it was previously a marble in the jar. But there is the concept of a “likely voter.” When you poll people, you have to decide what filters to apply in order to count people who will actually vote. Whatever you do about that issue, you may be wrong. The fact that you may be wrong potentially adds to the actual “margin of error,” but not to the reported one; you can’t treat your sample as if it were marbles in a jar. Marbles in a jar don’t change their colors. No matter how you ask the question about what proportions are in the jar, the marbles and their colors are unaltered. But we know from years of research that the way a question is framed when people – not marbles – are surveyed, can influence the answer you obtain. So pollsters (hopefully) try to take account of that issue. But the way in which you take account potentially increases the “margin of error.” You might be wrong. The same is true with regard to response bias. You don’t have to worry about marbles refusing to come out of the jar. But no one has to answer the phone. No one who does answer the phone has to volunteer to be questioned. And there may be systemic links between responding to a pollster and political inclinations. Like the problem of question framing, you can add methodological steps to try and account for response bias. But you might be wrong and thus potentially add to your “margin of error.” 1