11  Difference-in-Differences

  • What is a natural experiment and how does it help with causal inference?
  • What is the parallel trends assumption and why is it crucial for DiD?
  • How do we estimate the average treatment effect on the treated (ATT) using DiD?
  • What is two-way fixed effects and how does it relate to DiD?
  • How do we choose good control units for a DiD design?
  • TBD

We have built up an increasingly powerful toolkit for causal inference. Randomized controlled trials remain the gold standard, but are often infeasible. Multivariate OLS allows us to control for observable confounders, and fixed effects extend this to control for unobserved time-invariant factors. But what if even fixed effects aren’t enough?

This chapter introduces difference-in-differences (DiD), a method that combines the logic of experiments with the practicality of observational data. Instead of relying solely on control variables, DiD uses control units—a comparison group that shows us what would have happened to the treated group in the absence of treatment.

11.1 Natural Experiments

11.1.1 The Problem with Observational Data

Suppose we want to estimate the effect of immigration on native workers’ wages. A naive approach might compare wages in cities with high immigration to wages in cities with low immigration. But this comparison is confounded: immigrants tend to move to cities with strong economies, creating a spurious positive correlation between immigration and wages.

Even with control variables, we may miss important unobserved factors like a city’s economic trajectory or its attractiveness to both immigrants and high-wage employers. Fixed effects help with time-invariant confounders, but cannot address factors that change differently across cities over time.

11.1.2 Natural Experiments as a Solution

A natural experiment occurs when some external force—nature, policy, or historical accident—assigns units to treatment and control groups in a way that resembles random assignment.

Natural Experiments

A natural experiment is a study based on observational data where “treatment” is assigned by forces outside the researcher’s control, such as policy changes, geographic boundaries, or historical events. Natural experiments create conditions that resemble random assignment, allowing us to make causal inferences without actually randomizing.

The key insight is that if treatment assignment is “as good as random,” then comparing treated and control units can identify causal effects—just like in an RCT.

11.2 The Mariel Boatlift: A Classic Natural Experiment

One of the most famous natural experiments in economics involves the Mariel Boatlift. In 1980, Cuba experienced severe economic turmoil, and Fidel Castro announced that anyone who wished to leave could do so. Between April and October, approximately 125,000 Cubans departed from Mariel Harbor and arrived in Miami.

This mass migration had nothing to do with labor market conditions in Miami. It was driven entirely by Cuban politics and Miami’s geographic proximity to Cuba. In other words, the “treatment” of a sudden large increase in labor supply was essentially randomly assigned to Miami.

David Card’s seminal 1990 study used this natural experiment to estimate the effect of immigration on native wages. But how exactly do we use such an event to identify causal effects?

11.3 The Difference-in-Differences Approach

11.3.1 Why Simple Before-After Comparisons Fail

Card first compared wages in Miami before the boatlift (1979) to wages after (1980-1985):

City Pre (1979) Post (1980-1985) Difference
Miami 1.85 1.83 -0.02

Wages fell slightly. Can we interpret this as the causal effect of immigration?

No. Wages might have fallen everywhere during this period due to recession or other macroeconomic factors. Looking only at the treated unit’s change over time confuses the treatment effect with broader trends.

11.3.2 The Need for Control Units

To isolate the treatment effect, we need to know what would have happened to Miami’s wages if the boatlift had never occurred. This counterfactual is inherently unobservable—we cannot see the parallel universe where Cuba didn’t open its borders.

But we can approximate it using control units: cities similar to Miami that were not affected by the boatlift. Card chose Atlanta, Los Angeles, Houston, and Tampa-St. Petersburg as controls, arguing they had similar demographics and economic trajectories to Miami.

City Pre (1979) Post (1980-1985) Difference
Miami 1.85 1.83 -0.02
Controls 1.93 1.91 -0.02
Difference 0.00

The control cities experienced the same wage decline as Miami. This suggests the wage drop in Miami was not caused by the boatlift but by broader economic trends affecting all cities.

The difference-in-differences estimate is the change in Miami minus the change in controls: \[ (-0.02) - (-0.02) = 0.00 \]

Card concluded that the Mariel Boatlift had essentially no effect on native wages—a surprising and influential finding.

11.5 The DiD Regression

11.5.1 Setting Up the Model

We typically estimate DiD using regression. For the simple 2×2 case (two groups, two time periods), the model is:

\[ y_{it} = \beta_0 + \beta_1 D_i + \beta_2 P_t + \delta(D_i \times P_t) + \mu_{it} \tag{11.1}\]

where:

  • \(D_i\) is a treatment group indicator (1 if treated, 0 if control)
  • \(P_t\) is a post-period indicator (1 if post-treatment, 0 if pre-treatment)
  • \(D_i \times P_t\) is the interaction term (1 only for treated units in the post period)
  • \(\delta\) is the DiD estimate—our estimate of the ATT

11.5.2 How It Works

To see why \(\delta\) captures the DiD effect, consider the predicted outcomes for each group-period combination:

Group Period Predicted Outcome
Control Pre β0
Control Post β0 + β2
Treated Pre β0 + β1
Treated Post β0 + β1 + β2 + δ

Now compute the differences:

Change for treated group:0 + β1 + β2 + δ) − (β0 + β1) = β2 + δ

Change for control group:0 + β2) − β0 = β2

DiD estimate:2 + δ) − β2 = δ

The coefficient on the interaction term exactly equals the difference-in-differences!

11.5.3 Example: Estimating the Mariel Boatlift Effect

# Create the Card (1990) data
miami_wages <- tibble(
    city = "miami",
    year = 1979:1985,
    wages = c(1.85, 1.83, 1.85, 1.82, 1.82, 1.82, 1.82)
)

control_wages <- tibble(
    city = "control",
    year = 1979:1985,
    wages = c(1.93, 1.90, 1.91, 1.91, 1.90, 1.91, 1.92)
)

boatlift_data <- bind_rows(miami_wages, control_wages) |>
    mutate(
        post = ifelse(year >= 1980, 1, 0),
        treated = ifelse(city == "miami", 1, 0),
        post_treat = post * treated
    )

# Estimate DiD regression
did_reg <- feols(wages ~ treated + post + post_treat,
                 vcov = "HC1",
                 data = boatlift_data)

summary(did_reg)
OLS estimation, Dep. Var.: wages
Observations: 14
Standard-errors: Heteroskedasticity-robust 
             Estimate Std. Error        t value    Pr(>|t|)    
(Intercept)  1.930000 0.00000100 1930000.000000   < 2.2e-16 ***
treated     -0.080000 0.00000100  -80000.000000   < 2.2e-16 ***
post        -0.021667 0.00331942      -6.527254 0.000066615 ***
post_treat  -0.001667 0.00628785      -0.265062 0.796345671    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.008522   Adj. R2: 0.947329

The coefficient on post_treat (approximately 0.01) is our DiD estimate. It suggests the Mariel Boatlift had essentially no effect on wages—consistent with Card’s original findings.

11.6 Two-Way Fixed Effects

11.6.1 Extending to Multiple Units and Time Periods

The simple 2×2 DiD works well for clean natural experiments with one treated unit and one control unit observed before and after. But what if we have many treated units, many control units, and many time periods?

The Two-Way Fixed Effects (TWFE) model extends DiD to these more complex settings:

\[ y_{it} = \delta D_{it} + \alpha_i + \tau_t + \mu_{it} \tag{11.2}\]

where:

  • \(D_{it}\) equals 1 if unit \(i\) is treated at time \(t\), and 0 otherwise
  • \(\alpha_i\) are unit fixed effects (one for each unit)
  • \(\tau_t\) are time fixed effects (one for each time period)
  • \(\delta\) is the treatment effect

This model subsumes the simple DiD: unit fixed effects capture permanent differences between groups (like \(\beta_1 D_i\)), and time fixed effects capture common shocks to all units (like \(\beta_2 P_t\)).

11.6.2 Estimating TWFE in R

The fixest package makes TWFE estimation straightforward. The syntax | unit + time specifies the fixed effects:

# TWFE estimation
twfe_reg <- feols(wages ~ post_treat | city + year,
                  vcov = "HC1",
                  data = boatlift_data)

summary(twfe_reg)
OLS estimation, Dep. Var.: wages
Observations: 14
Fixed-effects: city: 2,  year: 7
Standard-errors: Heteroskedasticity-robust 
            Estimate Std. Error   t value Pr(>|t|) 
post_treat -0.001667   0.006491 -0.256776  0.80758 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.00622     Adj. R2: 0.943875
                Within R2: 0.002193

The coefficient on post_treat is identical to our simple DiD regression—as expected, since TWFE and the 2×2 DiD model are equivalent in this simple case.

11.7 Research Design: Choosing Good Controls

11.7.1 The Art of Causal Inference

Your DiD estimate is only as good as your research design. The goal is to find control units that make the parallel trends assumption credible.

Good Controls → Good Counterfactual → Credible DiD Estimate

Bad Controls → Bad Counterfactual → Misleading DiD Estimate

11.7.2 Example: Minimum Wages and Employment

Consider estimating the effect of minimum wage increases on employment. One approach might compare all states that raised their minimum wage to all states that didn’t.

But states that raise minimum wages are systematically different: they tend to be more politically liberal, have stronger labor markets, different industries, and many other characteristics that affect employment. These differences violate parallel trends—minimum-wage states may have been on different employment trajectories regardless of the policy.

Card and Krueger (1994) proposed a more compelling research design. When New Jersey raised its minimum wage in 1992 and neighboring Pennsylvania did not, they compared fast-food restaurants in counties right next to each other on opposite sides of the NJ-PA border.

The identifying assumption: restaurants separated only by a state border are similar in all relevant ways except the minimum wage policy. Geographic proximity makes parallel trends much more plausible than comparing NJ to distant states with very different economies.

11.7.3 Intuition-Driven vs. Data-Driven Selection

There are two broad approaches to choosing control units:

Intuition-driven selection relies on substantive knowledge to identify appropriate comparisons. Card’s choice of control cities for Miami was based on demographic and economic similarities. Card and Krueger’s border design was based on geographic proximity. The advantage is transparency and interpretability.

Data-driven selection uses algorithms to find control units that match the treated unit on pre-treatment outcomes. Methods like synthetic control construct a weighted average of potential controls that best reproduces the treated unit’s pre-treatment trajectory. The advantage is objectivity; the disadvantage is complexity and potential overfitting.

In practice, the best research designs often combine both approaches: use substantive knowledge to identify a reasonable set of potential controls, then verify that pre-treatment trends are indeed parallel.

11.8 What Can Go Wrong

11.9 Summary

Difference-in-differences is one of the most widely used methods for causal inference in economics and social science. It combines the intuition of experimental design with the practicality of observational data.

The key elements are:

  1. Natural experiments provide plausibly exogenous variation in treatment assignment
  2. Control units approximate what would have happened to treated units absent treatment
  3. The parallel trends assumption requires that treated and control groups would have followed the same trajectory without treatment
  4. The DiD estimator subtracts the control group’s change from the treated group’s change to isolate the treatment effect
  5. TWFE extends DiD to multiple units and time periods using unit and time fixed effects

The credibility of any DiD analysis rests entirely on the parallel trends assumption. Careful research design—choosing appropriate control units and verifying parallel pre-trends—is essential for producing convincing causal estimates.

11.10 Check Your Understanding

For each question below, select the best answer from the dropdown menu.

  1. A natural experiment occurs when external forces—policy changes, geographic features, historical events—assign units to treatment in a way that resembles random assignment. This allows researchers to make causal inferences without actually conducting an experiment.

  2. A simple before-after comparison for Miami alone would confound the boatlift’s effect with any other changes happening at the same time (macroeconomic conditions, national trends, etc.). We need a control group to show what would have happened to Miami without the boatlift.

  3. Parallel trends is the core assumption of DiD. It requires that, in the hypothetical world where treatment never occurred, the treated group’s outcome would have changed by the same amount as the control group’s. This allows us to use the control group’s actual change as the counterfactual for the treated group.

  4. In the DiD regression, δ (delta) is the coefficient on the interaction term Di×Pt. This interaction equals 1 only for treated units in the post-period, so its coefficient captures the additional change for treated units beyond what control units experienced—exactly the DiD estimate.

  5. States that raise minimum wages differ systematically from states that don’t in many ways (politics, industries, labor markets). These differences make parallel trends implausible. Restaurants right next to each other across a state border share local economic conditions and differ mainly in which state’s minimum wage applies—a much more credible comparison.

  6. If the treated group was already trending upward faster than controls before treatment, the DiD will attribute that pre-existing differential trend to the treatment. The estimate will be biased upward—showing a more positive effect than the true causal impact (or showing a positive effect when the true effect is zero or negative).