20  Difference-in-differences

20.1 Example: effect of minimum wage on employment

Suppose that we would like to estimate the effect of raising the minimum wage on employment. With a lot of money and power, we could perform a randomized experiment by flipping a coin for each local market in countries. If it comes up head, we raise the minimum wage; if it comes up tail, we keep it the same.

Of course, this is just a thought experiment—the randomized experiment is not feasible. Nonetheless, it is possible to estimate the treatment effect when we have before-after data of a pair of units: both are controlled before, but only one of them is treated after. This is what Card and Krueger (1993) did after seeing that New Jersey’s minimum wage was about to be raised from $4.25 to $5.05 in November 1992, while a neighboring Pennsylvania’s minimum wage stayed the same at $4.25. They seized this opportunity and fielded two surveys to 400 fast food restaurants in both states: the first one in February 1992 and the second one in November 1992.

Let \(\alpha\) and \(\beta\) be the deployment in New Jersey and Pennsylvania, respectively. Let \(\delta\) be the effect of raising the minimum wage, and assume that any other factor had the same effect \(\gamma\) on both states (which might be possible since these two are adjacent). The data of employment obtained from the surveys would look like the following table:

February 1992 November 1992 Difference
New Jersey \(\alpha\) \(\alpha+\gamma+\delta\) \(\gamma+\delta\)
Pennsylvania \(\beta\) \(\beta+\gamma\) \(\gamma\)
Difference \(\delta\)

We see that the treatment effect \(\delta\) is the difference between the state-wise before-and-after differences, or the difference-in-differences. Of course, this generally does not match the treatment effect due to noises, in which case we have the difference-in-differences (DID) estimate of the treatment effect.

The raw data can be downloaded from Card’s personal website. Here, we will use the preprocessed data stored in wage92.csv.

wage92 <- read.csv("data/wage92.csv")
wage92 <- na.omit(wage92)  # remove NA rows

head(wage92[, c("d_nj",
   d_nj y_ft_employment_before y_ft_employment_after
4     0                   34.0                  20.0
5     0                   24.0                  35.5
7     0                   70.5                  29.0
8     0                   23.5                  36.5
9     0                   11.0                  11.0
10    0                    9.0                   8.5

Below are descriptions of the relevant variables:

Name Description
d_nj 1 if New Jersey; 0 if Pennsylvania (Treatment)
y_ft_employment_before Full time equivalent employment before treatment (Outcome)
y_ft_employment_after Full time equivalent employment after treatment (Outcome)

Now we can compute the difference-in-differences estimate using the difference in the means of the employments.

wage_nj <- subset(wage92, d_nj == 1)
wage_pa <- subset(wage92, d_nj == 0)

before_nj <- mean(wage_nj$y_ft_employment_before)
after_nj <- mean(wage_nj$y_ft_employment_after)
diff_nj <- after_nj - before_nj

before_pa <- mean(wage_pa$y_ft_employment_before)
after_pa <- mean(wage_pa$y_ft_employment_after)
diff_pa <- after_pa - before_pa

did <- diff_nj - diff_pa

Let us summarize this in a table as shown above.

result <- data.frame(State = c("New Jersey", "Pennsylvania", "Difference"),
                     Before = c(before_nj, before_pa, NA),
                     After = c(after_nj, after_pa, NA),
                     Difference = c(diff_nj, diff_pa, did))

         State   Before    After Difference
1   New Jersey 20.65775 21.04842   0.390669
2 Pennsylvania 23.70455 21.82576  -1.878788
3   Difference       NA       NA   2.269457

The DID estimate tells us that raising the minimum wage from $4.25 to $5.05 would increase the employment by 2.27 on average.

20.2 Regression for the difference-in-differences estimate

We can also use a linear regression to obtain the DID estimate. Let \(y_{\text{before}}\) and \(y_{\text{after}}\) be the outcome before and after the time period, and \(z\) be the treatment assignment. We can regress the difference on the treatment variable:

\[ y_{\text{after}} - y_{\text{before}} = \beta + \delta z + \varepsilon. \tag{20.1}\]

Then, the coefficient of the interaction term \(\delta\) is the DID estimate. This is because

\[ \mathbb{E}[y_{\text{after}} - y_{\text{before}}\vert z=1]-\mathbb{E}[y_{\text{after}} - y_{\text{before}}\vert z=0] = (\beta_0+\delta)-\beta_0 = \delta. \]

Let us try this method on the employment data. First, we have to combine the employments before and after the wage raise into a single column, and add a time indicator.

fit_1 <- stan_glm((y_ft_employment_after - y_ft_employment_before) ~ d_nj,
                  seed=0, refresh=0)

print(fit_1, digit=2)
 family:       gaussian [identity]
 formula:      (y_ft_employment_after - y_ft_employment_before) ~ d_nj
 observations: 350
 predictors:   2
            Median MAD_SD
(Intercept) -1.88   1.09 
d_nj         2.23   1.17 

Auxiliary parameter(s):
      Median MAD_SD
sigma 8.74   0.33  

* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

The DID estimate is 2.23, with 1.17 standard error, which is close to the point estimate of 2.27 that we just computed directly from the differences between the means.

20.2.1 Different observations before and after the treatment time

Let \(P\) be a time indicator with \(P=0\) and \(P=1\) signifies the time before and after the treatment took effect, respectively. If the observations at \(P=0\) are different than those at \(P=1\), then we cannot compute \(y_{\text{after}} - y_{\text{before}}\). Assuming that the observations in each of the treatment and control groups are independently from the same distribution, we can instead fit the following regression with an interaction term:

\[ y = \beta_0 + \beta_1z+ \beta_2P +\delta zP + \varepsilon. \]

The DID estimate is the coefficient \(\delta\) of the interaction term, as it is the difference between the two coefficients of \(z\) from fitting \(y = a+bz\) on the data with \(P=1\) and \(P=0\), respectively. More explicitly,

\[\begin{align*} \mathbb{E}[y\vert z=1, P=1] - \mathbb{E}[y\vert z=0, P=1] &= (\beta_0 + \beta_1+\beta_2+\delta)- (\beta_0+\beta_2+\delta) \\ &= \beta_1+\delta \\ \mathbb{E}[y\vert z=1, P=0] - \mathbb{E}[y\vert z=0, P=0] &= (\beta_0 + \beta_1)- \beta_0 \\ &= \beta_1. \end{align*}\]

Subtracting these two equalities yields

\[ \text{DID} = (\beta_1+\delta) - \beta_1= \delta. \]

20.2.2 Difference-in-differences by matching

Alternatively, we can use propensity score matching to match each unit that was observed before the treatment time to a unit in the same group that was observed after. Then, we treat each pair as a single observation with the observed values of \(y_{\text{before}}\) and \(y_{\text{after}}\). With these new observations, we can obtain the DID estimate by fitting the regression Equation 20.1.

In all cases, we have made a strong assumption that the changes in the outcomes without the treatment effect would be the same in both New Jersey and Pennsylvania. We will discuss more about the assumptions for the DID estimate in the next section.


Card, David, and Alan B. Krueger. 1993. “Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania.” Working {Paper}. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w4509.