DID estimation uses four data points to deduce the impact of a policy change or some other shock (a.k.a. treatment) on the treated population: the effect of the treatment on the treated. The structure of the experiment implies that the treatment group and control group have similar characteristics and are trending in the same way over time. This means that the counterfactual (unobserved scenario) is that had the treated group not received treatment, its mean value would be the same distance from the control group in the second period. See the diagram below; the four data points are the observed mean (average) of each group. These are the only data points necessary to calculate the effect of the treatment on the treated. The dotted lines represent the trend that is not observed by the researcher. Notice that although the means are different, they both have the same time trend (i.e. slope).
For a more thorough work through of the effect of the Earned Income Tax Credit on female employment, see an earlier post of mine:
Calculate the D-I-D Estimate of the Treatment EffectWe will now use R and Stata to calculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC expansion on employment of single women.
R:# Load the foreign packagerequire(foreign)# Import data from web siterequire(foreign)# update: first download the file eitc.dta from this link:# https://docs.google.com/open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM# Then import from your hard drive:eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")# Create two additional dummy variables to indicate before/after# and treatment/control groups.# the EITC went into effect in the year 1994eitc$post93 = as.numeric(eitc$year >= 1994)# The EITC only affects women with at least one child, so the# treatment group will be all women with children.eitc$anykids = as.numeric(eitc$children >= 1)# Compute the four data points needed in the DID calculation:a = sapply(subset(eitc, post93 == 0 & anykids == 0, select=work), mean)b = sapply(subset(eitc, post93 == 0 & anykids == 1, select=work), mean)c = sapply(subset(eitc, post93 == 1 & anykids == 0, select=work), mean)d = sapply(subset(eitc, post93 == 1 & anykids == 1, select=work), mean)# Compute the effect of the EITC on the employment of women with children:(d-c)-(b-a)The result is the width of the “shift” shown in the diagram above.
STATA:cd "C:\DATA\Econ 562\homework"use eitc, cleargen anykids = (children >= 1)gen post93 = (year >= 1994)mean work if post93==0 & anykids==0 /* value 1 */mean work if post93==0 & anykids==1 /* value 2 */mean work if post93==1 & anykids==0 /* value 3 */mean work if post93==1 & anykids==1 /* value 4 */Then you must do the calculation by hand (shown on the last line of the R code).
(value 4 – value 3) – (value 2 – value 1)