11  Panel Data and Fixed Effects

  • What distinguishes panel data from cross-sectional data?
  • How can observing units over time help us address omitted variable bias?
  • What is a fixed effect and what does it control for?
  • How do we estimate fixed effects models in practice?
  • What types of confounders can fixed effects eliminate, and what can they not?
  • Wooldridge (2019), Ch. 13-14

Throughout this course, we have emphasized that omitted variable bias poses the central threat to causal inference. We can add control variables, but we can only control for what we can measure. What about unobserved factors like natural ability, motivation, local culture, or a city’s history?

Panel data—observing the same units over multiple time periods—offers a powerful solution. By tracking how outcomes change within the same unit over time, we can control for all characteristics of that unit that remain constant, even if we cannot observe or measure them directly. This chapter introduces panel data methods, with a focus on fixed effects estimation.

11.1 Cross-Sectional vs. Panel Data

11.1.1 Cross-Sectional Data

Most datasets we have examined so far are cross-sectional: each unit (person, firm, city) is observed at a single point in time. Examples include housing prices in Ames, Iowa in 2010; penguin measurements from a single expedition; or wage data from one year of the Current Population Survey.

With cross-sectional data, we can describe populations, identify correlations, and use multivariate regression to control for observable confounders. But we face a fundamental limitation: we cannot control for unobserved characteristics that differ across units.

For example, suppose we survey three cities in 2020:

City Year Unemployment (%) Crime Rate
Boston 2020 6.2 312
Denver 2020 5.1 285
Miami 2020 7.8 410

We observe each city once. We can compare across cities, but we cannot separate the effect of unemployment from all the other ways these cities differ.

11.1.2 Panel Data

Panel data (also called longitudinal data) tracks the same units across multiple time periods. Examples include:

  • NBA players observed across multiple seasons
  • Employment and wages in states tracked quarterly over several years
  • Crime rates in cities measured annually for a decade
  • Stock prices for companies tracked daily

The key feature is that we observe the same unit at different times, allowing us to see how that unit changes when circumstances change.

Extending our example, suppose we observe those same three cities across three years:

City Year Unemployment (%) Crime Rate
Boston 2018 4.8 298
Boston 2019 5.5 305
Boston 2020 6.2 312
Denver 2018 4.0 270
Denver 2019 4.5 278
Denver 2020 5.1 285
Miami 2018 6.5 390
Miami 2019 7.0 401
Miami 2020 7.8 410

Now we have 9 observations: 3 cities \(\times\) 3 years. Because we see each city multiple times, we can ask a different question: when unemployment rises within a given city, does crime in that same city tend to rise as well? This is the core idea behind fixed effects, which we turn to next.

11.2 Why Panel Data Helps: The Intuition

Consider estimating the effect of unemployment on crime. With cross-sectional data from a single year, we might compare cities with high unemployment to cities with low unemployment. But cities differ in countless ways: geography, demographics, policing strategies, local culture, economic history. Many of these factors affect both unemployment and crime, creating omitted variable bias.

Now imagine we observe the same cities over two years. Some cities experience rising unemployment; others see it fall. We can ask: when unemployment rises within a city, does crime also rise?

This within-unit comparison automatically controls for everything about that city that stays constant over time—its geography, its historical legacy, its baseline demographics. We don’t need to measure these factors; we just need them to be stable across the time periods we observe.

ImportantThe Key Insight

Panel data allows us to control for all time-invariant characteristics of each unit, whether observed or unobserved. The variation we use for identification comes from changes within units over time, not comparisons across different units.

11.3 The Fixed Effects Model

11.3.1 Setting Up the Model

Let’s formalize this intuition. Suppose we observe units \(i = 1, 2, ..., N\) over time periods \(t = 1, 2, ..., T\). Our outcome is \(y_{it}\) and our variable of interest is \(x_{it}\).

The panel data model is:

\[ y_{it} = \beta_0 + \beta_1 x_{it} + a_i + \tau_t + \mu_{it} \tag{11.1}\]

where:

  • \(y_{it}\) is the outcome for unit \(i\) at time \(t\)
  • \(x_{it}\) is the explanatory variable (which varies across units and time)
  • \(a_i\) is the unit fixed effect—capturing all time-invariant characteristics of unit \(i\)
  • \(\tau_t\) is the time fixed effect—capturing shocks that affect all units equally in period \(t\)
  • \(\mu_{it}\) is the idiosyncratic error term

The unit fixed effect \(a_i\) is the crucial element. It absorbs everything about unit \(i\) that doesn’t change over time: genetics, geography, institutional history, baseline culture. In our crime example, \(a_i\) captures each city’s fixed characteristics that affect crime rates.

11.3.2 What Does This Solve?

Recall that omitted variable bias occurs when an unobserved factor is correlated with both \(x\) and \(y\). In the standard cross-sectional regression:

\[ y_i = \beta_0 + \beta_1 x_i + \mu_i \]

any time-invariant unobserved factor correlated with \(x\) gets absorbed into the error term, biasing our estimate of \(\beta_1\).

With fixed effects, those time-invariant factors are captured by \(a_i\) and explicitly controlled for. As long as the confounders don’t change over time, they cannot bias our estimate.

11.4 Estimating Fixed Effects: The Within Transformation

How do we actually estimate Equation 11.1 when we don’t observe \(a_i\)?

One approach is the within transformation (also called the “demeaning” approach). For each unit \(i\), we compute the average of each variable over time:

\[ \bar{y}_i = \beta_0 + \beta_1 \bar{x}_i + a_i + \bar{\tau} + \bar{\mu}_i \]

where \(\bar{y}_i = \frac{1}{T}\sum_{t=1}^{T} y_{it}\) is unit \(i\)’s average outcome across all time periods.

Now subtract this time-averaged equation from the original:

\[ y_{it} - \bar{y}_i = (\beta_0 - \beta_0) + \beta_1(x_{it} - \bar{x}_i) + (a_i - a_i) + (\tau_t - \bar{\tau}) + (\mu_{it} - \bar{\mu}_i) \]

The intercept drops out: \(\beta_0 - \beta_0 = 0\). More importantly, the fixed effect drops out: since \(a_i\) is constant over time, its time average is just itself (\(\bar{a}_i = a_i\)), so \(a_i - a_i = 0\). We are left with:

\[ y_{it} - \bar{y}_i = \beta_1(x_{it} - \bar{x}_i) + (\tau_t - \bar{\tau}) + (\mu_{it} - \bar{\mu}_i) \]

We can write this more compactly as:

\[ \ddot{y}_{it} = \beta_1 \ddot{x}_{it} + \ddot{\tau}_t + \ddot{\mu}_{it} \]

where the double-dots indicate “time-demeaned” variables (deviations from unit means).

NoteWhy “Within” Transformation?

This estimator is called the “within” transformation because the variation used to identify \(\beta_1\) comes entirely from variation within each unit over time. Cross-sectional differences between units are eliminated by the demeaning process.

11.4.1 Example: Unemployment and Crime

Let’s see this in action. We have data on crime rates and unemployment for 46 cities observed in two years (1982 and 1987).

First, let’s see what happens with simple cross-sectional regression using just 1987 data:

crime_data <- wooldridge::crime2

# Cross-sectional regression using only 1987
reg_cross <- lm(crmrte ~ unem, data = filter(crime_data, year == 87))
summary(reg_cross)

Call:
lm(formula = crmrte ~ unem, data = filter(crime_data, year == 
    87))

Residuals:
   Min     1Q Median     3Q    Max 
-57.55 -27.01 -10.56  18.01  79.75 

Coefficients:
            Estimate Std. Error t value   Pr(>|t|)    
(Intercept)  128.378     20.757   6.185 0.00000018 ***
unem          -4.161      3.416  -1.218       0.23    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 34.6 on 44 degrees of freedom
Multiple R-squared:  0.03262,   Adjusted R-squared:  0.01063 
F-statistic: 1.483 on 1 and 44 DF,  p-value: 0.2297

The coefficient on unemployment is negative (-4.16), suggesting higher unemployment is associated with lower crime. This counterintuitive result likely reflects omitted variable bias: cities with low unemployment might also have other characteristics (wealth, strong institutions) that lead to lower crime.

Now let’s use both years and apply the within transformation:

# Create city identifier
crime_data <- crime_data |> 
    mutate(city_id = rep(1:46, each = 2))

# Compute city-level means
city_means <- crime_data |> 
    group_by(city_id) |> 
    summarize(
        mean_crmrte = mean(crmrte),
        mean_unem = mean(unem),
        mean_d87 = mean(d87)
    )

# Merge and demean
crime_data <- crime_data |> 
    left_join(city_means, by = "city_id") |> 
    mutate(
        crmrte_demean = crmrte - mean_crmrte,
        unem_demean = unem - mean_unem,
        time_demean = d87 - mean_d87
    )

# Estimate on demeaned data
reg_within <- lm(crmrte_demean ~ unem_demean + time_demean, data = crime_data)
summary(reg_within)

Call:
lm(formula = crmrte_demean ~ unem_demean + time_demean, data = crime_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-26.458  -6.384   0.000   6.384  26.458 

Coefficients:
                         Estimate            Std. Error t value  Pr(>|t|)    
(Intercept) -0.000000000000000027  1.039332417007331477   0.000  1.000000    
unem_demean  2.217999508245958484  0.617247710094903312   3.593  0.000535 ***
time_demean 15.402203621534523492  3.306166769286675411   4.659 0.0000111 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.969 on 89 degrees of freedom
Multiple R-squared:  0.1961,    Adjusted R-squared:  0.178 
F-statistic: 10.85 on 2 and 89 DF,  p-value: 0.00006059

Now the coefficient on unemployment is positive (2.22)! After controlling for all time-invariant city characteristics, we find that when unemployment rises within a city, crime rises too. This makes much more intuitive sense.

The within transformation eliminated the bias from unobserved city characteristics that were driving the negative cross-sectional correlation.

11.5 Estimating Fixed Effects: The Dummy Variable Approach

In practice, we rarely compute the within transformation manually. Instead, we use an equivalent approach: including dummy variables for each unit.

We estimate:

\[ y_{it} = \beta_0 + \beta_1 x_{it} + \sum_{j=2}^{N} \gamma_j D_j + \sum_{s=2}^{T} \delta_s T_s + \mu_{it} \]

where \(D_j\) is a dummy variable equal to 1 if the observation is from unit \(j\), and \(T_s\) is a dummy equal to 1 if the observation is from time period \(s\). We omit one unit and one time period to avoid the dummy variable trap.

This approach is mathematically equivalent to the within transformation but more flexible: it’s easier to add control variables, and we can see the estimated fixed effects if desired.

# Create factor variable for city and dummy for 1987
crime_data <- crime_data |>
    mutate(
        city_f = factor(city_id),
        year_87 = ifelse(year == 87, 1, 0)
    )

# Estimate with dummy variables
reg_fe <- lm(crmrte ~ unem + year_87 + city_f, data = crime_data)

# Show just the key coefficients (not all 46 city dummies)
summary(reg_fe)$coefficients[1:4, ]
            Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 51.48923 12.3457756 4.170595 0.0001404348
unem         2.21800  0.8778658 2.526581 0.0151893177
year_87     15.40220  4.7021169 3.275589 0.0020604685
city_f2     17.29171 14.1954509 1.218116 0.2296714584

The coefficient on unem (2.22) is identical to our within transformation estimate!

The dummy variable coefficients themselves are usually not of primary interest, but they can provide useful information. For instance, we can see which cities have unusually high or low crime rates after accounting for unemployment.

11.6 A Small Example: Police Spending and Property Crime

Before moving on, let’s work through a small example where we can examine every fixed effect individually. This will also introduce the feols() function from fixest.

Suppose we observe 5 cities over 4 years and record each city’s property crime rate (per 1,000 residents) and its per-capita police spending (in hundreds of dollars):

city_panel <- tibble(
    city = rep(c("Austin", "Baltimore", "Denver", "Portland", "Raleigh"), each = 4),
    year = rep(2017:2020, times = 5),
    police_spend = c(
        3.8, 4.0, 4.3, 4.1,   # Austin: modest increase then dip
        6.2, 6.5, 6.4, 6.0,   # Baltimore: high but declining
        4.5, 4.7, 5.0, 4.6,   # Denver: increase then cut
        3.5, 3.4, 3.2, 3.0,   # Portland: steady decline
        3.9, 4.2, 4.5, 4.3    # Raleigh: increase then dip
    ),
    property_crime = c(
        35.2, 34.1, 32.8, 33.5,   # Austin
        48.6, 47.3, 47.9, 49.2,   # Baltimore
        38.4, 37.1, 35.6, 37.8,   # Denver
        42.1, 43.5, 44.8, 46.2,   # Portland
        30.1, 29.2, 27.8, 28.9    # Raleigh
    )
)

city_panel
# A tibble: 20 × 4
   city       year police_spend property_crime
   <chr>     <int>        <dbl>          <dbl>
 1 Austin     2017          3.8           35.2
 2 Austin     2018          4             34.1
 3 Austin     2019          4.3           32.8
 4 Austin     2020          4.1           33.5
 5 Baltimore  2017          6.2           48.6
 6 Baltimore  2018          6.5           47.3
 7 Baltimore  2019          6.4           47.9
 8 Baltimore  2020          6             49.2
 9 Denver     2017          4.5           38.4
10 Denver     2018          4.7           37.1
11 Denver     2019          5             35.6
12 Denver     2020          4.6           37.8
13 Portland   2017          3.5           42.1
14 Portland   2018          3.4           43.5
15 Portland   2019          3.2           44.8
16 Portland   2020          3             46.2
17 Raleigh    2017          3.9           30.1
18 Raleigh    2018          4.2           29.2
19 Raleigh    2019          4.5           27.8
20 Raleigh    2020          4.3           28.9

A few things to notice in the raw data. Baltimore spends the most on policing and has the highest crime rate. Raleigh spends relatively little and has the lowest crime rate. A naive cross-sectional comparison would suggest that more police spending is associated with more crime—but this conflates the spending effect with all the other ways these cities differ. Cities with high crime invest more in policing precisely because they have high crime, and those same cities may have persistent characteristics (poverty, density, drug markets) that keep crime elevated regardless of spending levels.

11.6.1 Estimating with feols()

The feols() function from the fixest package uses a slightly different syntax than lm(). Fixed effects go after a vertical bar (|):

fe_reg <- feols(property_crime ~ police_spend | city + year, data = city_panel)
summary(fe_reg)
OLS estimation, Dep. Var.: property_crime
Observations: 20
Fixed-effects: city: 5,  year: 4
Standard-errors: IID 
             Estimate Std. Error  t value      Pr(>|t|)    
police_spend -5.64412   0.554777 -10.1737 0.00000062226 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.276304     Adj. R2: 0.997313
                 Within R2: 0.903933

Everything to the left of | is the standard regression formula. Everything to the right lists the fixed effects—here, city and year. Behind the scenes, feols() will automatically create a dummy variable for each unique value city and year will take and include it in the regression. Unlike the dummy variable approach with lm(), feols() does not report the individual fixed effect coefficients in the main output. It assumes that you are not interested in the estimate coefficients on the dummy variabes themselves, and so it only reports the coefficients on variables to the left of | (here, police_spend).

11.6.2 Extracting and Interpreting the Fixed Effects

We can retrieve the estimated fixed effects using the fixef() function:

fe_estimates <- fixef(fe_reg)
fe_estimates$city
   Austin Baltimore    Denver  Portland   Raleigh 
 56.42819  83.33636  63.42187  62.30399  52.51591 
fe_estimates$year
     2017      2018      2019      2020 
0.0000000 0.3759424 0.5932373 0.3528825 

City fixed effects. Each value represents that city’s baseline property crime rate, after accounting for police spending and year effects. Baltimore has the highest city fixed effect—it has a high crime rate for reasons unrelated to how much it spends on policing in any given year (persistent poverty, population density, historical disinvestment, etc.). Raleigh has the lowest, reflecting characteristics that keep its crime rate low independent of police budgets.

These are the time-invariant unobservables \(a_i\) from our model. Absorbing them prevents cross-city differences from biasing our estimate of the effect of police spending on crime.

Year fixed effects. Each value captures shocks common to all cities in a given year. If 2020 shows a positive year effect, that would reflect a nationwide increase in property crime that affected all five cities—something we want to account for so it doesn’t get attributed to changes in police spending that happened to occur at the same time.

In practice, we rarely interpret the individual fixed effect estimates. With 5 cities we can read through each one, but a typical application has hundreds or thousands of units, and reporting each one would be unwieldy and beside the point. The fixed effects are there to absorb unobserved heterogeneity so that our estimate of \(\beta_1\) is not biased—they are not themselves the object of interest. This is why feols() omits them from the default output and why regression tables in published papers just note “City FE: Yes” and “Year FE: Yes” rather than reporting every coefficient. The fixef() function is available for diagnostics or the occasional case where the unit-level intercepts matter, but most of the time your attention should be on the treatment variable.

Why is there no intercept? You may have noticed that the feols() output includes no intercept term, and that there is a separate dummy variable (fixed effect) for every city and every year—not \(N - 1\) or \(T - 1\) dummies as you might expect from a standard regression with categorical variables. This is a deliberate modeling choice. In a typical OLS regression with a factor variable, we include an intercept and omit one category as the reference group. The intercept captures the baseline level for the omitted category, and each dummy coefficient is interpreted relative to that baseline.

feols() takes a different approach: it drops the intercept entirely and instead estimates a fixed effect for every group. Each fixed effect is then interpretable as the baseline level for that specific unit or time period, not as a deviation from some arbitrarily chosen reference category. When we looked at the city fixed effects above, Baltimore’s value was Baltimore’s baseline crime rate—not “how much higher Baltimore is than Austin” or whichever city happens to come first alphabetically. This makes the individual fixed effects more intuitive to read, because their interpretation does not depend on which category was omitted.

Mathematically the two approaches are equivalent. Including an intercept with \(N - 1\) dummies spans the same column space as including \(N\) dummies with no intercept; they produce identical fitted values, residuals, and coefficient estimates for the treatment variable. The only difference is how the fixed effects themselves are parameterized. Since we almost never interpret the individual fixed effects anyway, this distinction rarely matters for applied work—but it explains why your output looks the way it does.

11.7 What Fixed Effects Control For (and What They Don’t)

Fixed effects are useful, but they are not a cure-all. It is worth being precise about what they do and do not control for.

11.7.1 What Unit Fixed Effects Control For

Unit fixed effects (\(a_i\)) absorb all characteristics of unit \(i\) that are constant over time:

  • Geography and location
  • Historical legacy and founding conditions
  • Stable institutional features
  • Baseline demographics (to the extent they don’t change much)
  • Measurement differences (e.g., how crime is recorded)
  • “Culture” and other hard-to-measure factors

11.7.2 What Time Fixed Effects Control For

Time fixed effects (\(\tau_t\)) absorb all factors that affect all units equally in a given period:

  • Macroeconomic conditions (recessions, booms)
  • National policy changes
  • Seasonal patterns
  • Technological changes affecting everyone

11.7.3 What Fixed Effects Do NOT Control For

Fixed effects cannot eliminate bias from factors that vary both across units and over time in ways correlated with the treatment:

  • Time-varying confounders specific to certain units
  • Differential trends (some units changing faster than others)
  • Reverse causality (if \(y\) affects \(x\) within a period)
WarningFixed Effects ≠ No OVB

Fixed effects dramatically reduce omitted variable bias by controlling for time-invariant confounders. But they don’t eliminate all possible bias. Unit-specific, time-varying confounders can still cause problems. Always think carefully about what might be changing differently across your units over time.

For example, in our crime analysis, fixed effects control for each city’s baseline characteristics. But if some cities increased their police budgets while others cut them, and these budget changes are correlated with unemployment changes, we would still have omitted variable bias. We could address this by adding police spending as an explicit control variable.

11.8 Adding Control Variables

We can include additional time-varying control variables in fixed effects models:

\[ y_{it} = \beta_1 x_{it} + \gamma z_{it} + a_i + \tau_t + \mu_{it} \]

where \(z_{it}\) is a control variable that changes over time within units.

This addresses unit-specific, time-varying confounders—the one source of bias that fixed effects alone cannot eliminate. We can only control for variables we observe and measure, but adding them narrows the remaining scope for omitted variable bias.

Returning to our city crime panel, our earlier model estimated the effect of police spending on property crime with city and year fixed effects. But local economic conditions are a plausible time-varying confounder: when a city’s economy weakens, property crime may rise and the city may cut police budgets due to falling tax revenue. If we don’t account for this, the estimated effect of police spending could be biased.

We can add the unemployment rate as a control variable. In feols(), time-varying controls go on the left side of the |, just like in a standard regression formula:

# Add unemployment as a time-varying control
city_panel <- city_panel |>
    mutate(
        unem_rate = c(
            3.1, 2.8, 2.9, 5.0,   # Austin
            5.8, 5.5, 5.3, 7.6,   # Baltimore
            3.0, 2.7, 2.8, 5.4,   # Denver
            3.8, 3.6, 3.9, 6.2,   # Portland
            3.4, 3.1, 3.0, 5.1    # Raleigh
        )
    )

# Without the control
fe_no_control <- feols(property_crime ~ police_spend | city + year,
                       data = city_panel)

# With the control
fe_with_control <- feols(property_crime ~ police_spend + unem_rate | city + year,
                         data = city_panel)

modelsummary(list("No Control" = fe_no_control,
                  "With Unemployment" = fe_with_control),
             gof_map = c("adj.r.squared"),
             add_rows = tribble(
                 ~term, ~`No Control`, ~`With Unemployment`,
                 "City FE", "Yes", "Yes",
                 "Year FE", "Yes", "Yes"
             ))
No Control With Unemployment
police_spend -5.644 -5.395
(0.555) (0.624)
unem_rate 0.644
(0.713)
R2 Adj. 0.997 0.997
City FE Yes Yes
Year FE Yes Yes

Adding a control is no different from a standard regression—just include it on the left side of |. The fixed effects on the right side still absorb time-invariant city characteristics and common year shocks. The unemployment rate, because it varies across cities and over time, cannot be absorbed by fixed effects and must be included explicitly.

One restriction worth noting: we cannot include time-invariant variables (like a city’s region or founding year) as controls in a fixed effects model. Those are perfectly collinear with the city fixed effects—the fixed effects already capture them. Only variables that change within a unit over time belong on the left side of |.

11.9 Practical Considerations

11.9.1 When to Use Fixed Effects

Fixed effects are most valuable when:

  1. You have genuine panel data (same units observed multiple times)
  2. There is meaningful variation in \(x\) within units over time
  3. Time-invariant confounders are a major concern
  4. The treatment effect is expected to be relatively immediate

11.9.2 When Fixed Effects May Not Work Well

Fixed effects may be less useful when:

  1. Little within-unit variation exists (everyone’s \(x\) is stable over time)
  2. The treatment effect takes a long time to materialize
  3. Time-varying confounders are the main concern
  4. You have very few time periods (less precise estimates)

11.9.3 Standard Errors

With panel data, observations within the same unit are often correlated over time. Standard OLS standard errors assume independence and can be too small. In practice, researchers typically use clustered standard errors at the unit level to account for this correlation. We’ll discuss this more in later chapters.

11.10 Summary

Panel data—observing the same units over time—offers a powerful tool for causal inference. By tracking how outcomes change within units when circumstances change, we can control for all time-invariant characteristics of each unit, whether observed or not.

The fixed effects estimator implements this idea. It can be computed via the within transformation (demeaning the data) or equivalently by including dummy variables for each unit. The coefficient of interest is identified purely from within-unit variation over time.

Fixed effects control for:

  • All time-invariant unit characteristics (observed and unobserved)
  • All time-specific shocks affecting all units equally (when time fixed effects are included)

Fixed effects do NOT control for:

  • Unit-specific, time-varying confounders
  • Differential trends across units
  • Reverse causality

Panel methods are not a complete solution to omitted variable bias, but they represent a major improvement over cross-sectional analysis when time-invariant confounders are a primary concern.

11.11 Check Your Understanding

For each question below, select the best answer from the dropdown menu.

  1. Panel data’s defining feature is that it follows the same units (people, firms, cities, etc.) across multiple time periods. This allows us to observe how each unit changes over time, rather than just comparing different units at one point in time.

  2. The term \(a_i\) is the unit fixed effect. It captures everything about unit \(i\) that is constant over time—observed characteristics like location, but also unobserved factors like institutional history, culture, or baseline conditions. This is what makes fixed effects so powerful for addressing omitted variable bias.

  3. The within transformation subtracts each unit’s time-average from its observations. Since \(a_i\) doesn’t vary over time, its average equals itself: \(\bar{a}_i = a_i\). So when we compute \(a_i - \bar{a}_i = a_i - a_i = 0\), the fixed effect is eliminated.

  4. The cross-sectional estimate was biased because unobserved city characteristics (wealth, institutions, etc.) were correlated with both unemployment and crime. Cities with “good” characteristics had both low unemployment AND low crime, creating a spurious negative correlation. Fixed effects removed these time-invariant confounders, revealing the true positive effect of unemployment on crime.

  5. Unit fixed effects only control for characteristics that are constant over time. A city’s police budget changes from year to year, so it’s a time-varying factor that fixed effects won’t automatically control for. We would need to include it as an explicit control variable.

  6. While mathematically equivalent for estimating \(\beta_1\), the dummy variable approach is more practical. It’s straightforward to add additional control variables, it works naturally with standard regression software, and we can examine the estimated fixed effects if they’re of substantive interest. The within transformation requires manual computation of demeaned variables.