Difference in Difference estimator • The DD strategies offer simple ways to estimate causal effects in panel data when certain groups of observations are exposed to the causing variable and others not – particularly suited to estimate the effect of sharp changes in economic environment or in government policy.
Examples 1. Card and Sullivan (1988): use a DD estimator to evaluate the effect of a training program on the probability of employment after training 2. Card (1990) examines effect of immigration on the employment of natives using the “natural experiment” generated by the sudden largescale migration from Cuba to Miami known as the “Mariel Boatlift” 3. Attanasio and Brugiavini (QJE, 2003)=> effect of Italy’s 1992 pension’s reform on savings decisions • …and many others
Identification strategy • Take Card example: • Identification strategy is based on the comparison between what happened in Miami and what happened in other comparable US cities, assumed to be representative of what would have happened in Miami absent the Mariel immigration
Effect of immigration on employment of natives (%) T Employment Rate Before immigration
T+1 Employment Rate After immigration
Difference (T+1 –T)
A) Miami
85
80
-5
B) Other comparable cities Difference (A-B)
82
84
2
3
-4
-7 Dif in Dif
Basic idea of Dif-in-Dif • Identify a “treatment group” => those subject to the policy / or change (people in Miami, in the example) • Compare it with a control group (people in non-immigrants cities) • Ideally we would like the ‘treatment’ and ‘control’ group to be similar in every way except receipt of treatment • This may be very difficult to do in practice
An alternative assumption • Assume that, in absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over time (two groups would have evolved in same way absent policy) • Under this assumption can use observations on treatment and control group pre- and posttreatment to estimate causal effect • Idea – Difference pre-treatment is ‘normal’ difference – Difference post-treatment is ‘normal’ difference + causal effect – Difference-in-difference is causal effect
Identification strategy Effect of change “Treated� group (group affected by change)
History in the absence of change (unobserved) Non-treated group (group not affected by change)
Normal difference
T=year of event
Key assumption • Dif-in-dif: assumes trends in outcome variable is the same for treatment and control groups • This assumption in general is not testable • If only two periods are available you cannot figure out whether this assumption is plausible • if you have several periods prior to change, you can use them
Dif-in-Dif formally We can write the Dif-in-Dif model as follows . Let Yict = Outcome variable at time t (0,1) in city c for individual i (unemployment) Then the Dif in Dif model can be written as: E(Yict c = NC, t ) = βt + γ c E(Yict c = M , t ) = βt + γ c +δ
if not treated (no immigration, NM) if treated (city exposed to immigration, M)
Note:
βt is a time effect and is common to both types of cities γ c is a city effect equal for all time periods δ is the policy effect (which appears only if and when the city is exposed to the policy) - First differencing for each group over time eliminates the city fixed effect - First differencing across groups eliminates the time effect assuming is the same Δ(Yict M ) − Δ(Yict NM ) = Δβt +δ − Δβt = δ This leaves th policy effect
Dif-in-Dif in regression form All this can be put conveniently in regression form. Let: Dict = A dummy =1 if individual lives in a "treated" city and t =after treatment , 0 otherwise The Dif in Dif model can be written as Yict =βt + γ c +δ Dict +ε ict Note −βt the time effect is forced to be the same (same slopes); - city effects can differ (different constant) - In this format one can test easily the effect of policy - We can allow for other controls to account for differences across individuals in treatment and control group (and so make control more similar to treatment) and estimate Yict =βt + γ c +δ Dict +X ictα + ε ict
Dif-in-Dif with no-time variation • To run did-in-dif you need more than one source of variation, not necessarily over time • What matters is exposure to “treatment” and not (a control group) • GSZ: use Dif-in-Diff to check results about effect of history on social capital: idea, communal movement only in the north • Treatment: all northern cities, that gained independence and Southern cities that would have gained independence • Control: those that did not gained independence
Dif-in-Dif and social capital North
Independent city
Non independent city
Diff
South
Diff
6.206***
3.479**
2.727***
(1.761)
(1.439)
(0.356)
5.1105***
3.201***
1.9095***
(1.979)
(1.358)
(0.135)
1.0955***
0.278
0.8175
(0.254)
(0.269) Diff-in-Diff
Differences-in-Differences: Summary • A very useful and widespread approach • Validity does depend on assumption that trends would have been the same in absence of treatment • Can use other periods to see if this assumption is plausible or not • Uses 2 observations on same individual – most rudimentary form of panel data