Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Martijn A. Boermans1 1 1
Utrecht University, School of Economics
Hogeschool Utrecht University of Applied Sciences, Faculty of Economics and Management
Working paper: Version 17 April 2009
Executive summary Forecasting television show results is usually done by experts familiar with the qualitative appeal of each candidate. We show that jury judgements significantly influence voting behaviour of the public audience based on millions of votes. We forecast the winner of the Dutch television version of X-Factor to be Lisa, who is also picked by the judges. Yet, our estimates demonstrate that the jury has only little saying in who wins. We find evidence that the starting position of the candidates significantly and negatively impact the number of votes they receive. Being black and receiving good jury rating positively affects the chance of staying in the show. Our research employ a unique dataset and provide many forecasts, not based on qualitative opinion but derived from statistical and objective data and theoretical models of voting behaviour over time. We find evidence that the voter’s preferences for the candidates are rather stable and that the voters of the losing candidate are a strong predictive force for the subsequent shows. Key words: forecasting, television shows, panel data, X-Factor
1
Contact: martijnboermans@yahoo.com. Do not quote without permission. These results are based on preliminary research. In the future we hope to integrate the data of the upcoming shows, extending our time dimension and number of observations. Obviously, these to be collected data may affect our results. Any comments or requests are welcome.
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
I. Introduction Our unique dataset consists of voting behaviour obtained from five episodes of the television show X-Factor. We recorded the shows up to 18th April 2009 broadcasted in the Netherlands. In total we have 60 observations of 12 candidates in five consecutive programs. 2 We run pooled OLS regressions to make forecasts of the likelihood that a particular candidate is voted out of the show. The chair represents that a candidate has the least or second least number of votes. A special feature of our analysis is that we extend the simple pooled and clustered OLS regressions with dynamics. We exploit the time-series component of the data using fixed and random-effects models and between estimators on the panel data. In other words, we use a longitudinal panel to predict the probability that candidates receive sufficient votes to continue the show, or, the chance that they are voted out‌ First we calculate the expectations of the jury, based on their evaluation of the candidates in each show. We show empirically that the higher the scores of the jury, rated on a scale from 2 to 10. After obtaining jury predictions we forces the probabilities that candidates are voted in and out the chair, and the show.3
II. Jury rating and forecasted predictions The ratings of the candidates are used to predict the candidates score’s of the jury. As can be seen, the average predicted ratings for the candidates are rather high (mean = 8.46). The reason for this is that the model also calculates the probabilities of the candidates that are already voted off, which as predicted get low ratings (not reported here). In table 1, according to the jury in episode 14 the most likely candidate to be voted out is Roan, who still has a BB+ rating. The candidate most likely for the chair after Roan is Rachel, with a decent A- rating. The other candidates all received higher ratings, with median of AA+. The three candidates with the highest ratings are in descending order Lisa, Whatever, and Jamal and Rev & Ross with equal scores on third place.
2
The 12 candidates include Hesther, Irma, Jamal, Jennifer, KLEM, Laurens, Liza, Luigiano, Rachel, Rev & Ross, Roan, and Whatever. 3 The variable jury total score has an average of 8.2 and standard deviation of 1.6. The variable is not normally distributed. Although a logarithmic transformation would make this aggregate of the single jury evaluations normally distributed we do not apply this log specification because we use disaggregate jury scores.
1
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Table 1: Jury ratings Candidates
Mean jury ratings Dutch rating
S&P rating
Hesther
8.3
AA-
Jamal
8.6
AA+
Liza
9.0
AAA
Luigiano
8.5
AA
Rachel
7.9
A-
Rev & Ross
8.6
AA+
Roan
7.5
BB+
Whatever
8.8
AAA
In table 2 we show the forecasted jury ratings. Our estimates for episode 14 demonstrate that the candidate with the lowest score is Irma, who was already voted off in the show before (not reported here). We also see that the remaining candidates all receive high ratings (mean = 8.43).
The results of the forecasted jury ratings are in agreement with the mean jury
judgement in the show.4 Our model forecasts that Roan and Hesther sit on the chair and that Roan is the most likely candidate to be voted out of the show. Roan has a predicted rating B and Hesther an estimated rating A. The three candidates with the highest ratings are in descending order Lisa, Whatever, and Luigiano. Surprisingly, the only similarity is the candidate Lisa. Based on these to approaches we select Lisa as the forecasted winner. 5 Most interestingly, we forecast that Whatever or Luigiano makes it to the final show, in contrast to the current jury evaluations who predict Jamal and Rev & Ross. We provide and overview of the findings in table 4 for a summary of the forecasts of several models. Based on the jury ratings in table 2 we deduced we predicted correctly that Irma would be voted off. Since this is exactly the candidate that left before we have Roan with the lowest score of the remaining candidates. The candidates have an average rating of 8.45, or, translated an A rating, which implies that the risk of being voted out is small. However, Roan’s score is a 7.18, or, an B grade, which is a much lower rating.
4
Our result’s estimated average score is 8.43 and the observed mean grade is 4.46, while as expected the due to disturbance in the estimated error term our standard deviation is much larger of the predicted mean jury rating, as can be derived from a comparison of the ratings in table 1 and table 2. 5 Hesther’s predicted rating is a Dutch score of 8.06 and for Rachel this is 8.12. In the S&P ratings, both receive an estimated A rating.
2
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Table 2: Forecasted jury ratings Candidates
Mean jury ratings Dutch rating
S&P rating
Hesther
8.06
A
Jamal
8.38
AA
Liza
9.25
AAA
Luigiano
8.81
AAA
Rachel
8.12
A
Rev & Ross
8.50
AA
Roan
7.19
B
Whatever
9.19
AAA
III. Econometric model forecasts We outlined the forecasts of voting behaviour based on jury judgements, both actual and forecasted ratings. In this part we apply a longitudinal panel between estimator approach. Based on several critical variables we run a multivariate time-series analysis with group means OLS. The econometrics model fits the data (F = 5.05; p < 0.05) and almost all of the independent variables are significant. Our model (1) is highly robust (see appendices) to various other estimation strategies and inclusion or omissions of several variables. 6
(1)
Out it = Îą it + Jury1it + Jury 2 it + StartPos it + Black it + Chairi ,t â&#x2C6;&#x2019;1 + u i + Îľ it
In order to forecast which candidate will leave the show we collected unique data on the jury ratings and on which candidates had the least official number of public votes (which runs in the millions). Our predictors include the forecasted jury rating of the first judge (Eric) and the second jury member (Stacy) and the starting position of the candidate during the show. We clearly find a pattern that candidates that are in the beginning of the show can expect more votes than candidates late in the show. We also note that in the Dutch X Factor candidates with an African background (5 out of the 12) have a lower probability of being voted out of the show. 6
The model includes the first judge (Eric), the second jury member (Stacy), candidate race and the shows starting postions, and the number of times a candidate was on the chair before. The critical variables we selected are based on both statistical procedures and theoretical hypotheses not reported here, document is available on request. We also do not show which other variables we applied to the forecasts in other significant models. The selection procedure is not reported but can be checked using the provides appendices which show many different estimation specifications.
3
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
In sum, we have just uncovered three interesting finding. Firstly, the model specification shows that the first judge his rating (Eric) is decisive for the predicted candidates both on the chair and out of the show. Secondly, our results indicate that candidatesâ&#x20AC;&#x2122; starting position significantly impacts the forecast probabilities of receiving the least number of votes. Candidates that appear late in the show can expect less votes. Thirdly, there are no since of racism in the Dutch television show. Black candidates have a smaller chance of receiving the least votes. Most important is our fourth and final outcome of the analysis on our predictor variables. The group mean estimates show that candidates who previously received the least votes (the lagged chair observation) has a higher probability of being voted out the show this episode. We find no evidence of a â&#x20AC;&#x153;reversal of fortunesâ&#x20AC;? effect and expect that voters show somewhat persistent preferences. The between estimators can indicate that the voters of the previous loser have a strong impact on the current show results. The candidate with the least votes in the previous show compared to the currently remaining candidates is the most likely to be on the chair again (Brownian motion), see also table 4 (and appendices). Our estimates in table 3 indicate that now Rachel is the most likely candidate to be voted out. This is rather surprising and is explained by the fact that the jury ratings of Rachel were relatively low, that she is not black and that she actually features as last candidate during episode 14, which gives her a predicted disadvantage, since voters have less time. Our forecast demonstrates that Jamal will end up on the chair together with Rachel. The three candidates with the highest ratings are in descending order Lisa, Roan, and Hesther. Again as in all other predictions, based on dynamic between estimation testing we select Lisa as the forecasted winner.
Table 3: Estimated ratings for candidates Candidates Score in proportions Hesther
0.106
Jamal
-0.015
Liza
0.322
Luigiano
-0.012
Rachel
-0.071
Rev & Ross
0.032
Roan
0.107
Whatever
0.059
4
Probability Chair
Hazard score
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
IV. Final Forecasting models The unanimous forecasted Dutch X-Factor winner is Lisa. All our model specifications predict that Lisa is the least likely to be on the chair or to leave the show. The estimation procedures provided somewhat mixed results for the top three candidates. As explained, Lisa is on top of all lists. The mean jury ratings place Jamal and Rev & Ross as most likely final show candidates. We build a forecast rating model and predict that the most like final competitors of Lisa are Whatever and Luigiano. However, our preferred logitudianl panel between estimations forecast that Roan and Hesther are top 3 candidates. We thus end up with a final with Lisa and Jamal, Whatever, Roan, Rev & Ross, Luigiano or Heshter. Most interesting, only Rachel is not predicted to end up in the top three. The mean jury ratings forecast that Rachel will leave, and the between estimator model predicts she ends up on the chair. This confirms the robustness of our results, just as the fact that Lisa is always the forecasted winner. However, the forecasted model estimation of ratings indicates that Hester will leave. Roan is predicted to sit on the chair by both the mean jury and the forecasted jury ratings. In total, for episode 14 Rachel is the most likely to end up on the chair, but we predict in our models that she will not leave the show. We expect that either Hesther or Jamal leaves the show, while juryâ&#x20AC;&#x2122;s think of Roan. So we forecast that either Hesther, Jamal, Rachel, and Roan are the candidates on the chair. Our preferred estimators shows that Jamal and Rachel sit on the chair, and Jamal leaves.
Table 4: Summary forecasts Candidates
Mean jury
Forecast jury
Model 1: Between forecasts
Chair
Rachel
Hesther
Jamal
Roan
Roan
Rachel
Out
Roan
Roan
Rachel
Top 3
1. Lisa
1. Lisa
1. Lisa
2. Jamal
2. Whatever
2. Roan
3. Rev & Ross
3. Luigiano
3. Hesther
Lisa
Lisa
Lisa
Forecasted winnar
V. Discussion and further predictions After performing the forecasts we looked at the new television show results, provided post hoc in table 5. Interestingly, our preferred model was the only model to predict correctly who came on the chair. We forecasted that Jamal and Rachel would be the two candidates with the
5
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
least votes, while none of the others predicted that Jamal would be sitting on the chair. Likewise, none of the forecasts put Rev & Ross on the chair.
Table 5: True observation (episode 14, post hoc) Candidates
Mean jury
Chair
Rev & Ross Jamal
Out
Rev & Ross
A comparison reveals that our forecasts are correct. Given that we have two candidates with the least votes (and that now the jury decides who leaves) we now forecast which of the two, Jamal or Rev & Ross, will leave the show. Based on the current jury ratings in table 1 Jamal received a 8.6 or AA+, where Rev & Ross also got a 8.6 or AA+. In other words, the current judgements suggest the jury is actually indifferent, placing them in a very difficult situation.7 Our findings in table 2 again appear robust, since we also estimate similar ratings for the two candidates. Jamal has a forecasted rating of 8.38 and Rev & Ross have a 8.50, which are both equivalent to an AA rating. This, like the findings in table 1 show that both candidates were unlikely to receive little votes. The first gives AA+ and the second give AA to them, which imply very low chances of getting the least votes. In contrast, our forecasting model correctly predicts that Jamal ends up on the chair. In table 3 we obtained the second highest probability of exit for Jamal, after Rachel. Our prediction for Rev & Ross was that they end up in the bottom of votes, namely at place 5, just before Luigiano, Jamal and Rachel. Thus, we belief that we have found a robust and (very) good forecasting technique using uniquely collected data and by applying new quantitative prediction methods. In table 6 we dividend the jury ratings of the current show by judges. Here we clearly see that Jamal has the lowest ratings, averaging only 3.75 while Rev & Ross obtain a 4.75. However, as we have seen before in table 1, these jury scorings are not consistent estimators.
7
Actually (post hoc), during the final decision making process the second judge (Stacy, coach of Jamal) who is the first to react frustrated notes: “I really don’t understand anything of the voting behaviour. I cannot make anything out of this.” Nevertheless, the is in favour of her 16 year old pupil Jamal. Unsurprisingly, judge one (Erick, coach of Rev & Ross) who next reacts also blindly chooses his own team. The next judge is jury member four (Gordon) who again emphasises the big dilemma they are faced: “What a surprise” he says.
6
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
The difference in the proportion score in our preferred model specification (1) is only 0.047, or less than 5%.8
Table 6: Candidate jury ratings per judge (current score) Candidate Hesther Jamal Liza Luigiano Rachel Rev & Ross Roan Whatever
jury1 5 3 3 2 3 5 5 4
jury2 4 4 4 5 3 5 5 2
jury3 5 3 4 3 5 4 5 3
jury4 5 3 5 4 3 5 5 1
VI. Conclusions Forecasting of television shows employs the same scientific tools applied in economics. This research using seminal techniques to predict the winning and losing candidates for each consecutive show. People who watch X-Factor tend to have an intuitive appeal to who is likely to win. Furthermore, the public is the one that does the voting. A nice feature of the show is that there are expert involved who are the once that ultimately decide who can stay and who must go. Our predictions find that Lisa is the most likely candidate to win the show, both according to expert and public ruling. We constructed a unique dataset which consists of voting behaviour obtained from five episodes of the Dutch version of the television show X-Factor. We recorded the shows up to 18th April 2009. We run several estimation models to make forecasts of the likelihood that a particular candidate is voted out of the show or ends up on the chair. The chair represents that a candidate has the least or second least number of votes. Post-hoc supplemented data revealed that Rev & Ross had to leave the show and that Jamal only hit the chair. According to the experts in episode 14 the most likely candidate to be voted out is Roan, who has a BB+ rating and Rachel can join the chair but with a decent A- rating. This simple expert-backyard-stick method strongly fails to predict the X-Factor show outcomes. We specified a judge rating forecast which provide us with a perfect forecast, namely that a candidate who was previously voted away (Irma) was most likely to leave. This clearly shows that our predictions in table 2 are robust. This basic OLS model forecasts that Roan and 8
Although the jury ratings were similar, our between estimation results from the multivariate time-series panel models give Rev & Ross a small advantage for winning to Jamal. This outcome is partially explained by the fact that the first judge (Eric) is the group coach of which Rev & Ross are part; see especially model (1).
7
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Hesther sit on the chair and that Roan is the most likely candidate to be voted out of the show. Based on these approaches we select Lisa as the forecasted winner. The three candidates with the highest ratings are in our preferred forecast model: Lisa, Roan, and Hesther (in descending order). The unanimous forecasted Dutch X-Factor winner is Lisa. Most interestingly, based on the forecasted ratings (table 2) we predict that Lisa wins, and that Whatever or Luigiano makes it to the final show, in contrast to the mean jury evaluations who predict Jamal and Rev & Ross, but also acclaim Lisa as winner. Nevertheless, this model neglects the important dynamics of voting behaviour. Where the mean jury score give inconsistent results, the forecasted jury ratings put the wrong candidates on the chair, namely Hesther and Roan, which indicates tat this simple OLS specification is not robust. We finally obtained a robust longitudinal between panel estimator (see model (1) and table 3). Our preferred multivariate time-series analysis with group means OLS fits the data, is highly significant and is robust to various changes. We show that the judge ratings do strongly affect the voting behaviour and help to explain both the expected candidates on the chair and the loser. Another outcome is that candidates who appear more in the beginning of the show can expect more votes than candidates later in the show. The results confirm that there are no signs of racism with the voters in the Dutch case. Black candidates have a smaller chance of receiving the least votes. In addition we demonstrate that candidates who previously received the least votes (the lagged chair observation) have a higher probability of being voted out the show in new episodes. We find no evidence of a â&#x20AC;&#x153;reversal of fortunesâ&#x20AC;? effect and expect that voters show persistent preferences. The between estimators can indicate that the voters of the previous loser have a strong impact on the current show results. The candidate with the least votes in the previous show compared to the currently remaining candidates is the most likely to be on the chair again, just as intuitive theoretical reasoning on these dynamics would indicate, which is a good robustness check. Our estimates forecasted that Rachel is the most likely candidate to be voted out and that Jamal will end up on the chair, but can stay. The latter forecast was correct and actually was the only model to show the right prediction. Satisfactorily, our preferred model was the only model to predict correctly who came on the chair. Also, the individual expertâ&#x20AC;&#x2122;s ratings had indicated that Rev & Ross would stay, whereas our model found the total proportional scoring difference between the votes of less than 5%. Note we already hypothesized that this model would yield consistent and robust predictions and be the best forecasting tool.
8
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Forecasting television show results is usually done by experts familiar with the qualitative appeal of each candidate. We show that jury judgements significantly influence voting behaviour of the public audience based on millions of votes. We forecast the winner of the Dutch television version of X-Factor to be Lisa, who is also picked by the judges. Yet, our estimates demonstrate that the jury has only little saying in who wins. We find evidence that the starting position of the candidates significantly and negatively impact the number of votes they receive. Being black and receiving good jury rating positively affects the chance of staying in the show. Our research employ a unique dataset and provide many dynamic forecasts, not based on qualitative opinion but derived from statistical and objective data and theoretical models of voting behaviour over time. We find evidence that the voterâ&#x20AC;&#x2122;s preferences for the candidates are rather stable and that the voters of the losing candidate are a strong predictive force for the subsequent shows.
Appendices Appendix 1 presents the pooled (clustered) OLS regression results for several specifications. In Appendix 2 we provide random-effect time-series models to capture the dynamics and persistence effects of candidates using GLS estimators. In Appendix 3 we have our preferred estimation technique, the between (group mean) estimators, which are ultimately selected based on the theoretical and significant notions and outlined.
Appendix 1: OLS regression results Source
SS
df
MS
Model Residual
.5 902 93 42 2 3 .02 87 54 2
3 38
. 19 676 44 74 . 07 970 40 58
Total
3. 619 04 76 2
41
. 08 826 94 54
out
Coef.
start black ladies _cons
-. 026 13 98 -. 173 43 46 -. 217 55 32 . 371 12 44
Std. Err. .0 13 986 7 .0 98 500 3 .1 25 136 8 .1 12 852 9
t -1 .8 7 -1 .7 6 -1 .7 4 3 .2 9
Number of obs F( 3, 38) Prob > F R-squared Adj R-squared Root MSE P>|t| 0 .0 69 0 .0 86 0 .0 90 0 .0 02
9
= = = = = =
42 2 .4 7 0 .0 76 7 0 .1 63 1 0 .0 97 0 . 28 23 2
[95% Conf. Interval] -. 05 44 544 -. 37 28 381 -. 47 08 793 . 14 26 656
.00 21 74 9 .02 59 68 8 .0 35 77 3 .59 95 83 2
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case Source
SS
df
MS
Model Residual
1. 394 63 63 1 5. 081 55 41 6
5 36
. 27 892 72 63 . 14 115 42 82
Total
6. 476 19 04 8
41
. 15 795 58 65
chair
Coef.
start black groups old ladies _cons
-. 040 74 87 -. 040 52 55 -. 275 20 75 -. 273 60 53 -. 270 84 06 . 635 23 91
Source
SS
Std. Err. .0 18 682 8 .1 33 342 4 .1 64 501 6 .1 51 045 4 . 18 765 7 .1 71 132 1
df
t -2 .1 8 -0 .3 0 -1 .6 7 -1 .8 1 -1 .4 4 3 .7 1
3. 097 12 07 4 1. 702 87 92 6
6 23
. 51 618 67 89 . 07 403 82 29
Total
4. 8
29
. 16 551 72 41
Coef.
start black groups old ladies lchair _cons
-. 063 43 25 -. 091 64 72 -. 579 01 17 -. 480 81 83 -. 565 83 86 -. 353 98 02 1 .00 54 31
Source
SS
Std. Err. .0 18 469 2 .1 25 857 9 .1 50 701 5 .1 35 866 2 .1 63 132 5 . 18 290 5 .1 64 950 3
df
t -3 .4 3 -0 .7 3 -3 .8 4 -3 .5 4 -3 .4 7 -1 .9 4 6 .1 0
.8 020 51 69 7 1 .89 79 48 3
5 24
. 16 041 03 39 . 07 908 11 79
Total
2. 7
29
. 09 310 34 48
Coef.
start black groups ladies lchair _cons
-. 042 56 45 -. 270 03 92 -. 180 46 25 -. 312 63 97 -. 017 05 45 . 570 86 89
Source
SS
Std. Err. .0 18 868 1 .1 29 381 3 .1 35 446 3 .1 57 488 3 .1 79 248 1 .1 62 072 6
df
t -2 .2 6 -2 .0 9 -1 .3 3 -1 .9 9 -0 .1 0 3 .5 2
. 803 06 40 8 2. 815 98 35 4
4 37
.2 007 66 02 . 07 610 76 63
Total
3. 619 04 76 2
41
. 08 826 94 54
out
Coef. -. 048 20 16 - .02 69 96 - .16 28 89 -. 184 52 16 . 761 87 43
P>|t| 0 .0 02 0 .4 74 0 .0 01 0 .0 02 0 .0 02 0 .0 65 0 .0 00
Std. Err. .0 28 828 3 .0 13 677 1 .0 96 458 8 .1 23 866 5 .2 58 411 4
t -1 .6 7 -1 .9 7 -1 .6 9 -1 .4 9 2 .9 5
-. 07 86 391 -. 31 09 564 -. 60 88 321 -. 57 99 396 -. 65 14 266 . 28 81 671
0 .0 33 0 .0 48 0 .1 95 0 .0 59 0 .9 25 0 .0 02
- .1 01 639 -. 35 20 041 -. 89 07 616 - .7 61 879 -. 90 33 039 - .7 32 348 . 66 42 049
P>|t|
10
- .00 28 58 2 .22 99 05 4 .05 84 17 1 .03 27 29 1 .10 97 45 3 .9 82 31 1
= = = = = =
30 6 .9 7 0 .0 00 3 0 .6 45 2 0 .5 52 7 .2 72 1
- .02 52 26 1 .16 87 09 8 - .26 72 61 9 - .19 97 57 7 - .22 83 73 4 .02 43 87 7 1.3 46 65 6
= = = = = =
30 2 .0 3 0 .1 10 6 0 .2 97 1 0 .1 50 6 . 28 12 1
[95% Conf. Interval] -. 08 15 062 -. 53 70 691 -. 46 00 098 -. 63 76 794 -. 38 70 044 . 23 63 674
Number of obs F( 4, 37) Prob > F R-squared Adj R-squared Root MSE
0 .1 03 0 .0 56 0 .1 00 0 .1 45 0 .0 06
42 1 .9 8 0 .1 05 9 0 .2 15 3 0 .1 06 4 . 37 57 1
[95% Conf. Interval]
Number of obs F( 5, 24) Prob > F R-squared Adj R-squared Root MSE P>|t|
= = = = = =
[95% Conf. Interval]
Number of obs F( 6, 23) Prob > F R-squared Adj R-squared Root MSE
MS
Model Residual
jurytot start black ladies _cons
0 .0 36 0 .7 63 0 .1 03 0 .0 78 0 .1 58 0 .0 01
MS
Model Residual
out
P>|t|
MS
Model Residual
chair
Number of obs F( 5, 36) Prob > F R-squared Adj R-squared Root MSE
- .00 36 22 7 - .00 30 09 3 .09 90 84 9 .01 24 00 1 .35 28 95 4 .90 53 70 4
= = = = = =
42 2 .6 4 0 .0 49 2 0 .2 21 9 0 .1 37 8 . 27 58 8
[95% Conf. Interval] -. 10 66 133 -. 05 47 085 -. 35 83 332 -. 43 54 991 . 23 82 831
.01 02 10 2 .00 07 16 5 .03 25 55 2 .06 64 55 8 1.2 85 46 6
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case SS
Source
df
MS
Model Residual
.5 689 26 38 3 3. 050 12 12 4
2 39
. 28 446 31 92 . 07 820 82 37
Total
3. 619 04 76 2
41
. 08 826 94 54
out
Coef.
start jury1 _cons
-. 026 65 83 - .07 76 99 . 557 20 61
Std. Err. .0 138 6 .0 38 881 5 .1 82 718 1
SS
Source
df
t -1 .9 2 -2 .0 0 3 .0 5
.8 675 33 96 1 1. 832 46 60 4
7 22
. 12 393 34 23 . 08 329 39 11
Total
2. 7
29
. 09 310 34 48
Coef.
jurytot start black groups old ladies lchair _cons
- .02 04 21 -. 040 19 92 -. 256 38 69 - .20 70 08 -. 106 20 15 -. 340 90 68 -. 037 24 15 . 763 58 94
P>|t| 0 .0 62 0 .0 53 0 .0 04
MS
Model Residual
out
Number of obs F( 2, 39) Prob > F R-squared Adj R-squared Root MSE
Std. Err. .0 35 421 7 .0 19 600 1 .1 33 709 9 .1 66 376 2 .1 44 975 1 .1 74 569 9 .1 97 491 7 .3 22 482 4
t -0 .5 8 -2 .0 5 -1 .9 2 -1 .2 4 -0 .7 3 -1 .9 5 -0 .1 9 2 .3 7
0 .5 70 0 .0 52 0 .0 68 0 .2 27 0 .4 72 0 .0 64 0 .8 52 0 .0 27
Linear regression
42 3 .6 4 0 .0 35 6 0 .1 57 2 0 .1 14 0 . 27 96 6
[95% Conf. Interval] -. 05 46 928 -. 15 63 443 . 18 76 239
.00 13 76 2 .00 09 46 3 .92 67 88 2
Number of obs F( 7, 22) Prob > F R-squared Adj R-squared Root MSE P>|t|
= = = = = =
= = = = = =
30 1 .4 9 0 .2 22 9 0 .3 21 3 0 .1 05 4 . 28 86 1
[95% Conf. Interval] -. 09 38 812 -. 08 08 473 -. 53 36 843 -. 55 20 511 -. 40 68 614 -. 70 29 426 -. 44 68 141 . 09 48 018
.05 30 39 2 .00 04 48 8 .02 09 10 5 .13 80 35 2 .19 44 58 3 .02 11 29 1 .37 23 31 2 1.4 32 37 7
Number of obs = F( 3, 11) = Prob > F = R-squared = Root MSE =
42 1 .7 9 0 .2 07 2 0 .1 63 1 . 28 23 2
(Std. Err. adjusted for 12 clusters in participant) out
Coef.
start black ladies _cons
-. 026 13 98 -. 173 43 46 -. 217 55 32 . 371 12 44
Robust Std. Err. .0 12 514 9 . 12 442 6 .1 16 938 2 .1 65 056 9
t -2 .0 9 -1 .3 9 -1 .8 6 2 .2 5
P>|t| 0 .0 61 0 .1 91 0 .0 90 0 .0 46
[95% Conf. Interval] -. 05 36 848 -. 44 72 944 -. 47 49 324 . 00 78 365
.00 14 05 3 .10 04 25 1 .0 39 82 6 .73 44 12 3
Appendix 2: Regressions Random-Effects GLS Random-effects GLS regression Group variable: participant
Number of obs Number of groups
= =
42 12
R-sq:
Obs per group: min = avg = max =
1 3. 5 4
within = 0.0644 between = 0.3809 overall = 0.1526
Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) Std. Err.
Wald chi2(3 3) Prob > chi2
out
Coef.
z
start black ladies _cons
-. 021 33 33 -. 245 21 39 -. 290 41 39 . 415 74 73
.0 12 589 7 .1 53 982 2 .2 00 031 8 .1 34 870 9
sigma_u sigma_e rho
.1 980 06 69 .2 486 64 77 .3 880 27 84
(fraction of variance due to u_i)
-1 .6 9 -1 .5 9 -1 .4 5 3 .0 8
P>|z| 0 .0 90 0 .1 11 0 .1 47 0 .0 02
11
= =
6 .1 2 0 .1 05 9
[95% Conf. Interval] -. 04 60 088 -. 54 70 135 -. 68 24 691 . 15 14 051
.00 33 42 1 .05 65 85 7 .10 16 41 3 .68 00 89 4
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case
Random-effects GLS regression Group variable: participant
Number of obs Number of groups
= =
42 12
R-sq:
Obs per group: min = avg = max =
1 3. 5 4
within = 0.0644 between = 0.3809 overall = 0.1526
Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)
Wald chi2(3 3) Prob > chi2
= =
5 .8 7 0 .1 18 3
(Std. Err. adjusted for 12 clusters in participant) Robust Std. Err.
out
Coef.
z
black ladies start _cons
-. 245 21 39 -. 290 41 39 -. 021 33 33 . 415 74 73
.1 50 023 5 .1 43 248 8 .0 11 031 8 .1 77 186 6
sigma_u sigma_e rho
.1 980 06 69 .2 486 64 77 .3 880 27 84
(fraction of variance due to u_i)
-1 .6 3 -2 .0 3 -1 .9 3 2 .3 5
P>|z| 0 .1 02 0 .0 43 0 .0 53 0 .0 19
[95% Conf. Interval] -. 53 92 546 -. 57 11 763 -. 04 29 553 .0 68 468
.04 88 26 8 - .00 96 51 5 .00 02 88 6 .76 30 26 6
Random-effects GLS regression Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.3569 between = 0.5108 overall = 0.3561
Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) Std. Err.
Wald chi2(4 4) Prob > chi2
chair
Coef.
z
start black ladies lchair _cons
-. 074 88 59 -. 124 43 19 -. 279 70 03 -. 196 74 28 . 761 80 61
.0 23 317 7 .1 90 884 8 . 23 361 2 .2 09 012 7 .2 06 936 5
sigma_u sigma_e rho
.1 837 82 02 .3 029 64 75 .2 689 94 08
(fraction of variance due to u_i)
-3 .2 1 -0 .6 5 -1 .2 0 -0 .9 4 3 .6 8
P>|z| 0 .0 01 0 .5 14 0 .2 31 0 .3 47 0 .0 00
= =
14 .9 2 0 .0 04 9
[95% Conf. Interval] -. 12 05 877 -. 49 85 592 -. 73 75 714 -. 60 64 002 .3 56 218
- .02 91 84 1 .24 96 95 3 .17 81 70 9 .21 29 14 6 1.1 67 39 4
Random-effects GLS regression Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.0282 between = 0.6529 overall = 0.2413
Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) Std. Err.
Wald chi2(4 4) Prob > chi2
out
Coef.
z
start black ladies lchair _cons
-. 040 85 05 -. 277 54 94 -. 288 31 58 .03 87 15 . 526 96 65
.0 19 574 9 .1 49 218 3 .1 80 541 6 .1 78 695 3 .1 69 393 3
sigma_u sigma_e rho
.1 164 63 08 .2 580 95 95 .1 691 70 95
(fraction of variance due to u_i)
-2 .0 9 -1 .8 6 -1 .6 0 0 .2 2 3 .1 1
P>|z| 0 .0 37 0 .0 63 0 .1 10 0 .8 28 0 .0 02
12
= =
7 .0 7 0 .1 32 2
[95% Conf. Interval] -. 07 92 165 -. 57 00 119 -. 64 21 709 -. 31 15 214 . 19 49 618
- .00 24 84 5 .01 49 13 2 .06 55 39 3 .38 89 51 4 .85 89 71 2
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case Random-effects GLS regression Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.0182 between = 0.6721 overall = 0.2660
Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) Std. Err.
Wald chi2(5 5) Prob > chi2
out
Coef.
z
jurytot start black ladies lchair _cons
-. 022 58 79 -. 040 19 71 -. 280 06 82 -. 283 12 92 .06 24 57 . 707 78 17
.0 35 215 5 .0 19 884 5 .1 55 457 2 . 18 961 4 .1 83 001 4 .3 31 548 7
sigma_u sigma_e rho
.1 347 15 95 .2 646 60 86 . 205 77 86
(fraction of variance due to u_i)
-0 .6 4 -2 .0 2 -1 .8 0 -1 .4 9 0 .3 4 2 .1 3
P>|z| 0 .5 21 0 .0 43 0 .0 72 0 .1 35 0 .7 33 0 .0 33
= =
7 .0 9 0 .2 14 4
[95% Conf. Interval] -. 09 16 089 -. 07 917 -. 58 47 586 -. 65 47 658 -. 29 62 191 . 05 79 581
.04 64 33 1 - .00 12 24 1 .02 46 22 3 .08 85 07 3 .42 11 33 1 1.3 57 60 5
Appendix 3: Group Means Regressions (Between) Between regression (regression on group means) Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.0339 between = 0.6394 overall = 0.2111
sd(u_i + avg(e_i.))=
F(4 4,6 6) Prob > F
.2 670 68
chair
Coef.
lchair black ladies start _cons
.61 32 45 -. 320 84 56 -. 468 93 24 - .09 93 01 . 962 53 08
Std. Err. .6 20 267 2 .2 01 269 8 .2 53 545 8 . 03 986 5 .2 71 044 4
t 0 .9 9 -1 .5 9 -1 .8 5 -2 .4 9 3 .5 5
P>|t| 0 .3 61 0 .1 62 0 .1 14 0 .0 47 0 .0 12
= =
2 .6 6 0 .1 36 8
[95% Conf. Interval] -. 90 44 941 - .8 13 335 -1 .0 89 337 -. 19 68 472 . 29 93 091
2.1 30 98 4 .17 16 43 8 .15 14 71 9 - .00 17 54 8 1.6 25 75 2
Between regression (regression on group means) Group variable: participant
Number of obs Number of groups
= =
42 12
R-sq:
Obs per group: min = avg = max =
1 3. 5 4
within = 0.0644 between = 0.5461 overall = 0.1167
sd(u_i + avg(e_i.))=
F(3 3,8 8) Prob > F
. 24 545 35
out
Coef.
black ladies start _cons
-. 319 40 22 -. 332 47 29 -. 164 13 02 1 .29 67 38
Std. Err. .1 55 244 1 .2 06 027 7 .0 79 481 8 .4 63 580 6
t -2 .0 6 -1 .6 1 -2 .0 7 2 .8 0
P>|t| 0 .0 74 0 .1 45 0 .0 73 0 .0 23
13
= =
3 .2 1 0 .0 83 2
[95% Conf. Interval] -. 67 73 957 -. 80 75 735 -. 34 74 156 .2 27 719
.03 85 91 4 .14 26 27 8 .01 91 55 2 2.3 65 75 7
Econometric Models for Forecasting Television Shows: The Dutch X-Factor Case Between regression (regression on group means) Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.0164 between = 0.7680 overall = 0.1922
sd(u_i + avg(e_i.))=
F(4 4,6 6) Prob > F
. 20 202 63
out
Coef.
lchair black ladies start _cons
-. 463 46 52 -. 388 40 55 -. 288 77 68 -. 107 74 75 1 .01 25 06
Std. Err. .4 69 207 4 .1 52 252 6 .1 91 797 3 .0 30 156 3 .2 05 034 3
t -0 .9 9 -2 .5 5 -1 .5 1 -3 .5 7 4 .9 4
P>|t| 0 .3 61 0 .0 43 0 .1 83 0 .0 12 0 .0 03
= =
4 .9 7 0 .0 41 3
[95% Conf. Interval] -1 .6 11 575 -. 76 09 541 - .7 58 088 -. 18 15 373 . 51 08 051
.6 84 64 4 - .01 58 56 9 .18 05 34 3 - .03 39 57 7 1.5 14 20 7
Between regression (regression on group means) Group variable: participant
Number of obs Number of groups
= =
30 11
R-sq:
Obs per group: min = avg = max =
1 2. 7 3
within = 0.0145 between = 0.7783 overall = 0.2096
sd(u_i + avg(e_i.))=
F(5 5,5 5) Prob > F
. 21 634 07
out
Coef.
jurytot lchair black ladies start _cons
-. 023 85 43 -. 404 84 22 -. 385 03 48 -. 281 38 98 -. 104 91 26 1 .18 30 88
Std. Err. .0 49 495 4 .5 16 966 6 .1 63 190 2 .2 05 958 1 .0 32 824 3 .4 16 512 4
t -0 .4 8 -0 .7 8 -2 .3 6 -1 .3 7 -3 .2 0 2 .8 4
P>|t| 0 .6 50 0 .4 69 0 .0 65 0 .2 30 0 .0 24 0 .0 36
= =
3 .5 1 0 .0 97 2
[95% Conf. Interval] -. 15 10 863 -1 .7 33 747 -. 80 45 287 - .8 10 822 -. 18 92 902 .1 12 409
.10 33 77 7 .92 40 62 6 .03 44 59 1 .24 80 42 3 - .02 05 34 9 2.2 53 76 7
Between regression (regression on group means) Group variable: participant
Number of obs Number of groups
= =
42 12
R-sq:
Obs per group: min = avg = max =
1 3. 5 4
within = 0.0481 between = 0.9596 overall = 0.1327
sd(u_i + avg(e_i.))=
F(7 7,4 4) Prob > F
. 10 361 09
out
Coef.
jury1 jury2 jury3 jury4 black ladies start _cons
-. 182 36 72 . 171 64 39 .20 87 04 -. 209 65 73 -. 230 95 01 -. 136 30 65 -. 193 64 52 1 .35 33 35
Std. Err. .0 79 161 3 .0 72 920 9 .1 13 675 7 .0 36 412 9 .0 70 170 9 .0 92 915 6 . 03 609 6 .3 50 079 3
t -2 .3 0 2 .3 5 1 .8 4 -5 .7 6 -3 .2 9 -1 .4 7 -5 .3 6 3 .8 7
P>|t| 0 .0 83 0 .0 78 0 .1 40 0 .0 05 0 .0 30 0 .2 16 0 .0 06 0 .0 18
14
= =
13 .5 6 0 .0 12 0
[95% Conf. Interval] -. 40 21 543 -. 03 08 171 -. 10 69 104 -. 31 07 557 -. 42 57 757 -. 39 42 814 -. 29 38 638 . 38 13 592
.03 74 19 8 .37 41 04 9 .52 43 18 4 - .10 85 58 9 - .03 61 24 5 .12 16 68 5 - .09 34 26 5 2.3 25 31 1