Memo Date:
August 16, 2013
To:
Khalil Fuller, NBA Math Hoops
From
Juan D. Bonilla & Kirk Walters
Re:
Preliminary Analysis of the NBA Math Hoops Program
This memo presents the results of the preliminary analysis on the NBA Math Hoops Program. NBA Math Hoops is a competitive board game where students learn and apply fundamental math skills using specially designed NBA and WNBA player cards. The program was implemented by more than 130 volunteer teachers and 4000 students (predominantly from grades 4-9) throughout the U.S. during the 2012-13 school year. 1. Data For the analysis, we used the results of a 15-question math test that was applied to participating students before the beginning of the program (baseline test) as well as at the end of the program (follow-up test). Both tests have also information on students’ attitudes and abilities towards math—follow-up data also include students’ perceptions of the NBA Hoops program itself. The baseline data include 4060 student observations assigned to more than 130 participating teachers. In turn, the follow-up data has 1490 student scores distributed over 60 teachers. That is, there are teachers and students with (1) both pre and post information, (2) only baseline information, and (3) only follow-up information. In order to analyze the program results, student answers were graded and placed on a scale from 0 to 100. To facilitate the interpretation of the results, each student test score was standardized using the results from the baseline test. That is, from here on, all the test results provided will be interpreted in terms of the number of standard deviations relative to the baseline test. Ideally, we would like to have pre and post test score data for participating students in order to measure the effect of the program. Given that student IDs were absent, pre and post student data were linked through teacher names and student names. While making sure that teacher names were written in the same way in both applications (pre and post), linking pre and post data on students using their names was substantially more challenging. To do so, a Levenshtein editdistance algorithm was applied to find students that, despite having minor changes in their names, were essentially the same students.
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 2 of 10 2. Results on Test Scores Table 1 compares average standardized scores for baseline and follow-up students. It shows that, on average, students who took the follow-up test scored 0.35 standard deviations higher than students at baseline. This increase may be due to multiple reasons including: • • •
a true positive effect of the NBA Math Hoop program; using the same test at both baseline and follow-up; and/or the follow-up sample is a selected sample.
Regarding the selected sample, one hypothsis is that only those students/teachers who expected to get good results continued with the program and took the test. Thus, the 0.35sd effect is not necessarily the expected impact of the program on the average participating student. Table 1. Pre and Post Score Comparison (All Observations) Test Baseline Follow-‐Up Total
Mean 0.00 0.35 0.09
SD 1.00 1.00 1.01
N 4060 1490 5550
To investigate the extent of the selected sample, Table 2 compares the average baseline score for students with pre and post data (i.e., linked students) and those not linked. The results show that linked students scored slightly higher (0.1sd=0.08+0.02) at baseline on average than those not linked. This suggests that part of the 0.35sd estimated effect in Table 1 is due to having a selected sample since students who took the follow-up (linked) were already better even before the program started. Table 2.
Baseline Score Comparison for Linked and Unlinked Students
Linked No Yes Total
Mean -‐0.02 0.08 0.00
SD 1.01 0.95 1.00
N 3147 913 4060
Another reason that may affect the overall estimate of the program is that the average followup score for those linked is different from the average score of those unlinked. Table 3 shows that students who did not take the baseline test, but did take the follow up test, scored less than those with linked data, 0.47sd. This could happen, for example, if some students were partially exposed to the program (i.e., started participating after the program started) and
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 3 of 10 therefore were not present at baseline. If this were the reason, then the estimated effect on a student fully exposed to the program would be higher. Table 3. Follow-Up Score Comparison for Linked and Unlinked Students Linked No Yes Total
Mean 0.16 0.47 0.35
SD 1.09 0.93 1.00
N 577 913 1490
The estimated effects may be biased given that not all teachers whose students participated at baseline continued participating. This is important because teacher participation is a choice and we would expect that teachers who find the program useful are the ones who used it. As a result, the effect of the estimated effect of the program given in Table 1 cannot be interpreted as the average effect on the students of a random teacher. To see this more clearly, one can compare the baseline results for students who belong to teachers who only participated at baseline versus the scores of students who belong to teachers with at least one student with baseline and follow-up scores. As shown in Table 4, students of teachers with pre and post data scored 0.1sd (=0.06+0.04) higher than those of teachers who only participated at baseline. That is, students of teachers who maintain participation were better at baseline than those who did not. Table 4. Baseline Score Comparison for Teachers with pre test data only vs. Teachers with pre and post data Teacher in Both No Yes Total
Mean -‐0.04 0.06 0.00
SD 1.02 0.97 1.00
N 2453 1607 4060
These results together suggest that to estimate the average effect of the program on those who decide to participate, we should compare the baseline results for linked and unlinked students using only those students who belong to teachers who have at least one student with baseline and follow-up scores. Table 5 shows that students with both pre and post results scored only 0.04sd higher at baseline than those who did not take the follow up test.
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 4 of 10
Table 5.
Baseline Score Comparison for Linked and Unlinked Students for Teachers with pre and post student scores Linked No Yes Total
Mean 0.04 0.08 0.06
SD 1.00 0.95 0.97
N 694 913 1607
Thus, if we use only linked students –students with pre and post data, who belong to teachers with pre and post students – to estimate the average performance at baseline and follow-up, we see a 0.35sd increase (=0.47-0.08) in test scores, as shown in Table 6. Table 6. Pre and Post Score Comparison (Linked Students only) Time Baseline Follow-‐Up Total
Mean 0.08 0.47 0.28
SD 0.95 0.93 0.96
N 913 913 1826
One interesting question with the data at hand is to look at the differences in test scores between grades. Table 7 compares baseline and follow-up average scores by grade for the sample of students who are linked and have non-missing grade. For example, participating 4th graders show a score gain of about 0.8sd (=0.08+0.75). The results suggest that, although the baseline test was especially difficult for them, scoring 0.75sd at baseline lower than students from other grades, they obtained a significant improvement after the program. The results for 5th, 6th, and 7th graders also show large gains in scores after the program. The lower gains are for 8th and 9th graders.
Table 7. Test Baseline Follow-‐Up N
4 -‐0.75 (0.66) 0.08 (0.85) 92
Pre and Post Score Comparison by Grade (Linked Students only) 5 -‐0.05 (0.88) 0.58 (0.90) 163
Grade 6 7 0.03 0.14 (0.91) (0.87) 0.39 0.53 (0.88) (0.95) 201 208
8 0.59 (0.97) 0.67 (0.94) 167
9 0.27 (0.90) 0.44 (1.02) 53
Total 0.08 (0.95) 0.48 (0.93) 884
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 5 of 10 To determine to what extent these results are due to having a selected sample, Table 8 compares the results at baseline for linked and unlinked students by grade. Table 8.
Baseline Score Comparison by grade for Linked and Unlinked Students
Linked No
Grade 4 -‐0.75 (0.85) 196 -‐0.75 (0.66) 92 92
Yes N
5 -‐0.22 (0.96) 361 -‐0.05 (0.88) 163 163
6 -‐0.09 (0.95) 544 0.03 (0.91) 201 201
7 0.04 (0.93) 939 0.14 (0.87) 208 208
8 0.27 (1.01) 716 0.59 (0.97) 167 167
9 0.38 (1.12) 161 0.27 (0.90) 53 53
Total 0.01 (1.00) 2917 0.08 (0.95) 884 884
Interestingly, the results suggest that for 4th and 5th graders students for whom we have pre and post data (i.e., linked students) were not better at baseline than those unlinked. For other grades, there are significant differences between the linked and unlinked samples at baseline in favor of the linked sample. Once again, this means that for 6th to 9th graders linked students are a selected sample as they were better at baseline than those who did not take the follow-up test. On the other hand, the small differences at baseline for 4th and 5th graders suggest that the estimated effects for them may be less contaminated by sample selection.
3. Results on Attitudes and Perceptions Attitudes towards NMH and relationship to Test Scores • • •
The 5 categories about program perception were collapsed into 3 categories The table shows the Fraction (%) of students who disagree, are neutral, or agree to each one of the statements. The sample used is for the students with both baseline and follow-up test Section 1 Post Test Answers and Post Test scores Question 1
Question 2
Disagree
Pre
Post
%
-‐0.26
0.14
6
Neutral
0.04
0.42
17
Agree
0.13
0.55
77
0.09
0.51
Total
826
Question 3
Pre
Post
%
0.1
0.44
17
0.14
0.54
29
0.07
0.51
54
0.09
0.51
825
Question 4
Pre
Post
%
0.2
0.48
17
0.27
0.62
36
-‐0.08
0.43
47
0.09
0.5
825
Question 5
Pre
Post
%
-‐0.03
0.28
9
0.17
0.58
25
0.08
0.52
65
0.09
0.51
827
Pre
Post
%
0.17
0.54
29
0.24
0.66
30
-‐0.07
0.37
41
0.09
0.51
825
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 6 of 10 •
•
• •
•
Q1: 77% of the students in this sample like playing NMH. Those who either liked it or are neutral scored more than 0.3sd (=0.04+0.26) higher in both the pre- and post-test relative to those who didn’t like the program. Note also that the gain in scores between pre and post is around 0.4sd for all categories. Q2: 54% of the students in this sample say NMH made them like math more. They show a test score gain of about 0.4sd. Test scores are not statistically different between the categories. Q3: 47% of the students in this sample say NMH made them better at math. Test scores are not statistically different between the categories. Q4: 65% of the students in this sample say NMH helped them work well with others. Post test scores for those who are in the neutral or agree categories are statistically different from those who disagree. That is, those who worked better with others are also the ones who did better Q5: No clear pattern here for “NBA Math Hoops made me more physically active or excited to play outside”. Tabulation of Attitudes towards NMH: Section 1 Questions – Post self-assessment versus attitudes towards Math
• • •
This analysis tabulates the answers for the sample of students with both pre and post test scores Answers for attitudes towards math use the pre-self-assessment data Original categories were collapsed into 3 categories
Question 1 Post-S1: I Like playing NMH:
Q1 Section 1 Post Self-‐Assesment
Section 1 Pre Self-‐Assessment Question 1
Disagree
Question 2
Never
Some
Most
Tot
0
9
39
48
% Row
0.0
18.8
81.3
100
Neutral
4
24
113
141
% Row
2.8
17.0
80.1
100
Agree
7
76
549
632
% Row
1.1
12.0
86.9
100
Total
11
109
701
821
% Row
1.3
13.3
85.4
100
•
Question 3
Never
Some
Most
Tot
1
6
41
48
2.1
12.5
85.4
100
4
22
117
143
2.8
15.4
81.8
100
5
67
557
629
0.8
10.7
88.6
100
10
95
715
820
1.2
11.6
87.2
100
Question 4
Never
Some
Most
Tot
8
23
16
47
17.0
48.9
34.0
100
40
58
43
141
28.4
41.1
30.5
100
257
280
93
630
40.8
44.4
14.8
100
305
361
152
818
37.3
44.1
18.6
100
Never
Some
Most
0
9
39
48
0.0
18.8
81.3
100
2
23
118
143
1.4
16.1
82.5
100
11
77
538
626
1.8
12.3
85.9
100
13
109
695
817
1.6
13.3
85.1
100
Q1-Pre-S1: Among the students who agree that they like playing NMH, 87% claim they try hard in math class most of the time. This figure is somehow smaller for those who are
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Tot
Page 7 of 10 either neutral or don’t like the program. That is, those who like math tend to find the game more interesting as much. Q2-Pre-S1: Among the students who agree that they like playing NMH, 88% claim they pay attention in math class most of the time. This figure is again somehow smaller for those who are either neutral or don’t like the program as much. Q3-Pre-S1: Among the students who agree that they like playing NMH, 59% (=44.4+14.8) claim they feel bore in math class some or most of the times. Q4-Pre-S1: Among the students who agree that they like playing NMH, 86% claim they complete their math assignment most of the time. Once again, this figure is somehow smaller for those who are either neutral or don’t like the program as much. In general, this table indicates that students are very responsible with their math assignments, but a big fraction of them feel bored in class. Then, they see the NMH program as a fun learning tool.
•
• •
•
Question 2 Post-S1: NMH made them like math more
Q2 Section 1 Post Self-‐Assesment
Section 1 Pre Self-‐Assessment Question 1
Disagree
Never
Some
Most
Tot
2
20
121
143
1.4
14.0
84.6
100
% Row
Some
Most
Tot
6
23
114
143
4.2
16.1
79.7
100
0
32
203
235
0.0
13.6
86.4
100
4
40
397
441
0.9
9.1
90.0
100
10
95
714
819
1.2
11.6
87.2
100
3
37
195
235
% Row
1.3
15.7
83.0
100
Agree
6
53
383
442
% Row
1.4
12.0
86.7
100
Total
11
110
699
820
% Row
1.3
13.4
85.2
100
• •
• •
•
Never
Neutral
Question 2
Question 3
Never
Some
Most
Tot
32
62
46
140
22.9
44.3
32.9
100
81
113
42
236
34.3
47.9
17.8
100
190
187
64
441
43.1
42.4
14.5
100
303
362
152
817
37.1
44.3
18.6
100
Question 4 Never
Some
Most
Tot
3
25
114
142
2.1
17.6
80.3
100
5
34
195
234
2.1
14.5
83.3
100
5
50
385
440
1.1
11.4
87.5
100
13
109
694
816
1.6
13.4
85.1
100
Q1-Pre-S1: Among the students who agree that NMH made them like math more, 87% claim they try hard in math class most of the time. Q2-Pre-S1: Among the students who agree that NMH made them like math more, 90% claim they pay attention in math class most of the time. This figure is somehow smaller for those who are either neutral or don’t like the program as much. Q3-Pre-S1: Among the students who agree that NMH made them like math more, 85% (=43.1+42.4) claim they feel bored in math class some or most of the times. Q4-Pre-S1: Among the students who agree that NMH made them like math more, 87% claim they complete their math assignment most of the time. Once again, this figure is somehow smaller for those who are either neutral or don’t like the program as much. In general, this table indicates that students are very responsible with their math assignments, but a large proportion feel bored in class. Then, they see the NMH program made them like math more.
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 8 of 10
Question 3 Post-S1: NMH helped be better in math
Section 1 Pre Self-‐Assessment Question 1
Disagree
Q3 Section 1 Post Self-‐Assesment
Never
% Row
Some
Question 2
Most
Tot
3
24
116
143
2.1
16.8
81.1
100
Neutral
3
31
255
289
% Row
1.0
10.7
88.2
100
Agree
5
54
329
388
% Row
1.3
13.9
84.8
100
Total
11
109
700
820
% Row
1.3
13.3
85.4
100
Never
Some
Most
Question 3 Tot
5
17
121
143
3.5
11.9
84.6
100
1
33
256
290
0.3
11.4
88.3
100
4
44
338
386
1.0
11.4
87.6
100
10
94
715
819
1.2
11.5
87.3
100
Never
Some
Question 4
Most
Tot
44
65
33
142
31.0
45.8
23.2
100
97
136
56
289
33.6
47.1
19.4
100
164
159
63
386
42.5
41.2
16.3
100
305
360
152
817
37.3
44.1
18.6
100
Never
Some
Most
Tot
4
19
120
143
2.8
13.3
83.9
100
3
40
245
288
1.0
13.9
85.1
100
6
49
330
385
1.6
12.7
85.7
100
13
108
695
816
1.6
13.2
85.2
100
Q1-Pre-S1: Among the students who agree that NMH helped them be better in math, 85% claim they try hard in math class most of the time. Note, however, that, unlike the two preceding tables, the students who agree are a lower fraction of the total (388/820) In general, this table shows patterns similar to the ones found in the previous tables, except that there is less consensus about the program helping them be better at math. This, in spite of the better results!
• • • •
Question 4 Post-S1: NMH helped them work better with others
Never
Some
Never
Some
Q4 Section 1 Post Self-‐Assesment
Question 1
Disagree % Row
Most
2
14
61
77
2.6
18.2
79.2
100
3
40
164
207
% Row
1.5
19.3
79.2
100
Agree
6
56
476
538
% Row
1.1
10.4
88.5
100
Total
11
110
701
822
1.3
•
13.4
85.3
Section 1 Pre Self-‐Assessment
Question 2 Tot
Neutral
% Row
100
Most
2
12
63
77
2.6
15.6
81.8
100
3
29
177
209
1.4
13.9
84.7
100
5
54
476
535
0.9
10.1
89.0
100
10
95
716
821
1.2
11.6
87.2
Never
Some
Question 3 Tot
100
Never
Some
Most
21
36
18
75
28.0
48.0
24.0
100
69
90
50
209
33.0
43.1
23.9
100
214
236
85
535
40.0
44.1
15.9
100
304
362
153
819
37.1
44.2
18.7
Question 4 Tot
100
Most
Tot
1
13
62
76
1.3
17.1
81.6
100
4
29
175
208
1.9
13.9
84.1
100
8
66
460
534
1.5
12.4
86.1
100
13
108
697
818
1.6
13.2
85.2
100
This table shows patterns similar to the ones found in the first two tables of this section.
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 9 of 10
Perceptions on Math over time: pre and post comparisons
Never
Pre Self-‐Assesment
Question 1
Never
Some
Most
Tot
3
3
6
12
25.0
25.0
50.0
100
Some
7
48
56
111
% Row
6.3
43.2
50.5
100
6
69
637
712
% Row
0.8
9.7
89.5
100
Total
16
120
699
835
% Row
1.9
14.4
83.7
100
• • • •
Post Self-‐Assessment
Question 2
% Row
Most
Question 3
Never
Some
Most
Tot
0
4
6
10
0.0
40.0
60.0
100
8
40
49
97
8.3
41.2
50.5
100
11
91
624
726
1.5
12.5
86.0
100
19
135
679
833
2.3
16.2
81.5
100
Question 4
Never
Some
Most
Tot
167
109
33
309
54.1
35.3
10.7
100
75
211
76
362
20.7
58.3
21.0
100
19
59
78
156
12.2
37.8
50.0
100
261
379
187
827
31.6
45.8
22.6
100
Never
Some
Most
Tot
2
9
5
16
12.5
56.3
31.3
100
3
33
75
111
2.7
29.7
67.6
100
9
88
611
708
1.3
12.4
86.3
100
14
130
691
835
1.7
15.6
82.8
100
Note that some categories have too few observations to have a meaningful interpretation: The “Never” bins in questions 1, 2, and 4. In this table the changes of perception are especially worth noting. Few students saw their math motivation being reduced due to the program In addition to this: o Q1: How often do you try as hard in math class? 50% of the students who at baseline answered that they try hard just some times, answered they were trying hard most of the time after the NMH program o Q2: How often do you pay attention in math class? 50% of the student who at baseline answered that they paid attention in class just some times, answered they were paying attention most of the time after the NMH program o Q4: How often do you complete your math assignment? 67% of the student who at baseline answered that they completed their assignment just some times, answered they were completing it most of the time after the NMH program
4. Conclusions In general, the results presented here suggest that the NBA Math Hoops program may have improved math skills for participating students, especially for lower grades. At the same time, a significant proportion of participating students claim they like the game and find it useful to learn and like math. It is important, however, to emphasize that all the results presented in this analysis could vary substantially should the program be implemented using a more rigorous setup. The main concerns one could have about the presented results are:
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org
Page 10 of 10 •
Sample Selection: Teachers and students who participate in the program are the ones who most likely will benefit from the program. Therefore, the estimated results cannot be interpreted as the expected outcome for a randomly chosen student or teacher
•
Test Repetition: Sound evaluations of education programs require applying different yet comparable tests at baseline and follow-up. Otherwise, estimated comparisons between baseline and follow-up capture not only the effect of the program, but also the effect of taking the same test twice.
1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 | 202.403.5000 | TTY 877.334.3499 | www.air.org