Throughout this course, we have emphasized that omitted variable bias poses the central threat to causal inference. We can add control variables, but we can only control for what we can measure. What about unobserved factors like natural ability, motivation, local culture, or a city’s history?
Panel data—observing the same units over multiple time periods—offers a powerful solution. By tracking how outcomes change within the same unit over time, we can control for all characteristics of that unit that remain constant, even if we cannot observe or measure them directly. This chapter introduces panel data methods, with a focus on fixed effects estimation.
11.1 Cross-Sectional vs. Panel Data
11.1.1 Cross-Sectional Data
Most datasets we have examined so far are cross-sectional: each unit (person, firm, city) is observed at a single point in time. Examples include housing prices in Ames, Iowa in 2010; penguin measurements from a single expedition; or wage data from one year of the Current Population Survey.
With cross-sectional data, we can describe populations, identify correlations, and use multivariate regression to control for observable confounders. But we face a fundamental limitation: we cannot control for unobserved characteristics that differ across units.
For example, suppose we survey three cities in 2020:
City
Year
Unemployment (%)
Crime Rate
Boston
2020
6.2
312
Denver
2020
5.1
285
Miami
2020
7.8
410
We observe each city once. We can compare across cities, but we cannot separate the effect of unemployment from all the other ways these cities differ.
11.1.2 Panel Data
Panel data (also called longitudinal data) tracks the same units across multiple time periods. Examples include:
NBA players observed across multiple seasons
Employment and wages in states tracked quarterly over several years
Crime rates in cities measured annually for a decade
Stock prices for companies tracked daily
The key feature is that we observe the same unit at different times, allowing us to see how that unit changes when circumstances change.
Extending our example, suppose we observe those same three cities across three years:
City
Year
Unemployment (%)
Crime Rate
Boston
2018
4.8
298
Boston
2019
5.5
305
Boston
2020
6.2
312
Denver
2018
4.0
270
Denver
2019
4.5
278
Denver
2020
5.1
285
Miami
2018
6.5
390
Miami
2019
7.0
401
Miami
2020
7.8
410
Now we have 9 observations: 3 cities \(\times\) 3 years. Because we see each city multiple times, we can ask a different question: when unemployment rises within a given city, does crime in that same city tend to rise as well? This is the core idea behind fixed effects, which we turn to next.
11.2 Why Panel Data Helps: The Intuition
Consider estimating the effect of unemployment on crime. With cross-sectional data from a single year, we might compare cities with high unemployment to cities with low unemployment. But cities differ in countless ways: geography, demographics, policing strategies, local culture, economic history. Many of these factors affect both unemployment and crime, creating omitted variable bias.
Now imagine we observe the same cities over two years. Some cities experience rising unemployment; others see it fall. We can ask: when unemployment rises within a city, does crime also rise?
This within-unit comparison automatically controls for everything about that city that stays constant over time—its geography, its historical legacy, its baseline demographics. We don’t need to measure these factors; we just need them to be stable across the time periods we observe.
ImportantThe Key Insight
Panel data allows us to control for all time-invariant characteristics of each unit, whether observed or unobserved. The variation we use for identification comes from changes within units over time, not comparisons across different units.
11.3 The Fixed Effects Model
11.3.1 Setting Up the Model
Let’s formalize this intuition. Suppose we observe units \(i = 1, 2, ..., N\) over time periods \(t = 1, 2, ..., T\). Our outcome is \(y_{it}\) and our variable of interest is \(x_{it}\).
\(y_{it}\) is the outcome for unit \(i\) at time \(t\)
\(x_{it}\) is the explanatory variable (which varies across units and time)
\(a_i\) is the unit fixed effect—capturing all time-invariant characteristics of unit \(i\)
\(\tau_t\) is the time fixed effect—capturing shocks that affect all units equally in period \(t\)
\(\mu_{it}\) is the idiosyncratic error term
The unit fixed effect \(a_i\) is the crucial element. It absorbs everything about unit \(i\) that doesn’t change over time: genetics, geography, institutional history, baseline culture. In our crime example, \(a_i\) captures each city’s fixed characteristics that affect crime rates.
11.3.2 What Does This Solve?
Recall that omitted variable bias occurs when an unobserved factor is correlated with both \(x\) and \(y\). In the standard cross-sectional regression:
\[
y_i = \beta_0 + \beta_1 x_i + \mu_i
\]
any time-invariant unobserved factor correlated with \(x\) gets absorbed into the error term, biasing our estimate of \(\beta_1\).
With fixed effects, those time-invariant factors are captured by \(a_i\) and explicitly controlled for. As long as the confounders don’t change over time, they cannot bias our estimate.
11.4 Estimating Fixed Effects: The Within Transformation
How do we actually estimate Equation 11.1 when we don’t observe \(a_i\)?
One approach is the within transformation (also called the “demeaning” approach). For each unit \(i\), we compute the average of each variable over time:
The intercept drops out: \(\beta_0 - \beta_0 = 0\). More importantly, the fixed effect drops out: since \(a_i\) is constant over time, its time average is just itself (\(\bar{a}_i = a_i\)), so \(a_i - a_i = 0\). We are left with:
where the double-dots indicate “time-demeaned” variables (deviations from unit means).
NoteWhy “Within” Transformation?
This estimator is called the “within” transformation because the variation used to identify \(\beta_1\) comes entirely from variation within each unit over time. Cross-sectional differences between units are eliminated by the demeaning process.
11.4.1 Example: Unemployment and Crime
Let’s see this in action. We have data on crime rates and unemployment for 46 cities observed in two years (1982 and 1987).
First, let’s see what happens with simple cross-sectional regression using just 1987 data:
crime_data <- wooldridge::crime2# Cross-sectional regression using only 1987reg_cross <-lm(crmrte ~ unem, data =filter(crime_data, year ==87))summary(reg_cross)
Call:
lm(formula = crmrte ~ unem, data = filter(crime_data, year ==
87))
Residuals:
Min 1Q Median 3Q Max
-57.55 -27.01 -10.56 18.01 79.75
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 128.378 20.757 6.185 0.00000018 ***
unem -4.161 3.416 -1.218 0.23
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 34.6 on 44 degrees of freedom
Multiple R-squared: 0.03262, Adjusted R-squared: 0.01063
F-statistic: 1.483 on 1 and 44 DF, p-value: 0.2297
The coefficient on unemployment is negative (-4.16), suggesting higher unemployment is associated with lower crime. This counterintuitive result likely reflects omitted variable bias: cities with low unemployment might also have other characteristics (wealth, strong institutions) that lead to lower crime.
Now let’s use both years and apply the within transformation:
# Create city identifiercrime_data <- crime_data |>mutate(city_id =rep(1:46, each =2))# Compute city-level meanscity_means <- crime_data |>group_by(city_id) |>summarize(mean_crmrte =mean(crmrte),mean_unem =mean(unem),mean_d87 =mean(d87) )# Merge and demeancrime_data <- crime_data |>left_join(city_means, by ="city_id") |>mutate(crmrte_demean = crmrte - mean_crmrte,unem_demean = unem - mean_unem,time_demean = d87 - mean_d87 )# Estimate on demeaned datareg_within <-lm(crmrte_demean ~ unem_demean + time_demean, data = crime_data)summary(reg_within)
Call:
lm(formula = crmrte_demean ~ unem_demean + time_demean, data = crime_data)
Residuals:
Min 1Q Median 3Q Max
-26.458 -6.384 0.000 6.384 26.458
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.000000000000000027 1.039332417007331477 0.000 1.000000
unem_demean 2.217999508245958484 0.617247710094903312 3.593 0.000535 ***
time_demean 15.402203621534523492 3.306166769286675411 4.659 0.0000111 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.969 on 89 degrees of freedom
Multiple R-squared: 0.1961, Adjusted R-squared: 0.178
F-statistic: 10.85 on 2 and 89 DF, p-value: 0.00006059
Now the coefficient on unemployment is positive (2.22)! After controlling for all time-invariant city characteristics, we find that when unemployment rises within a city, crime rises too. This makes much more intuitive sense.
The within transformation eliminated the bias from unobserved city characteristics that were driving the negative cross-sectional correlation.
11.5 Estimating Fixed Effects: The Dummy Variable Approach
In practice, we rarely compute the within transformation manually. Instead, we use an equivalent approach: including dummy variables for each unit.
where \(D_j\) is a dummy variable equal to 1 if the observation is from unit \(j\), and \(T_s\) is a dummy equal to 1 if the observation is from time period \(s\). We omit one unit and one time period to avoid the dummy variable trap.
This approach is mathematically equivalent to the within transformation but more flexible: it’s easier to add control variables, and we can see the estimated fixed effects if desired.
# Create factor variable for city and dummy for 1987crime_data <- crime_data |>mutate(city_f =factor(city_id),year_87 =ifelse(year ==87, 1, 0) )# Estimate with dummy variablesreg_fe <-lm(crmrte ~ unem + year_87 + city_f, data = crime_data)# Show just the key coefficients (not all 46 city dummies)summary(reg_fe)$coefficients[1:4, ]
The coefficient on unem (2.22) is identical to our within transformation estimate!
The dummy variable coefficients themselves are usually not of primary interest, but they can provide useful information. For instance, we can see which cities have unusually high or low crime rates after accounting for unemployment.
11.6 A Small Example: Police Spending and Property Crime
Before moving on, let’s work through a small example where we can examine every fixed effect individually. This will also introduce the feols() function from fixest.
Suppose we observe 5 cities over 4 years and record each city’s property crime rate (per 1,000 residents) and its per-capita police spending (in hundreds of dollars):
A few things to notice in the raw data. Baltimore spends the most on policing and has the highest crime rate. Raleigh spends relatively little and has the lowest crime rate. A naive cross-sectional comparison would suggest that more police spending is associated with more crime—but this conflates the spending effect with all the other ways these cities differ. Cities with high crime invest more in policing precisely because they have high crime, and those same cities may have persistent characteristics (poverty, density, drug markets) that keep crime elevated regardless of spending levels.
11.6.1 Estimating with feols()
The feols() function from the fixest package uses a slightly different syntax than lm(). Fixed effects go after a vertical bar (|):
fe_reg <-feols(property_crime ~ police_spend | city + year, data = city_panel)summary(fe_reg)
Everything to the left of | is the standard regression formula. Everything to the right lists the fixed effects—here, city and year. Behind the scenes, feols() will automatically create a dummy variable for each unique value city and year will take and include it in the regression. Unlike the dummy variable approach with lm(), feols() does not report the individual fixed effect coefficients in the main output. It assumes that you are not interested in the estimate coefficients on the dummy variabes themselves, and so it only reports the coefficients on variables to the left of | (here, police_spend).
11.6.2 Extracting and Interpreting the Fixed Effects
We can retrieve the estimated fixed effects using the fixef() function:
City fixed effects. Each value represents that city’s baseline property crime rate, after accounting for police spending and year effects. Baltimore has the highest city fixed effect—it has a high crime rate for reasons unrelated to how much it spends on policing in any given year (persistent poverty, population density, historical disinvestment, etc.). Raleigh has the lowest, reflecting characteristics that keep its crime rate low independent of police budgets.
These are the time-invariant unobservables \(a_i\) from our model. Absorbing them prevents cross-city differences from biasing our estimate of the effect of police spending on crime.
Year fixed effects. Each value captures shocks common to all cities in a given year. If 2020 shows a positive year effect, that would reflect a nationwide increase in property crime that affected all five cities—something we want to account for so it doesn’t get attributed to changes in police spending that happened to occur at the same time.
In practice, we rarely interpret the individual fixed effect estimates. With 5 cities we can read through each one, but a typical application has hundreds or thousands of units, and reporting each one would be unwieldy and beside the point. The fixed effects are there to absorb unobserved heterogeneity so that our estimate of \(\beta_1\) is not biased—they are not themselves the object of interest. This is why feols() omits them from the default output and why regression tables in published papers just note “City FE: Yes” and “Year FE: Yes” rather than reporting every coefficient. The fixef() function is available for diagnostics or the occasional case where the unit-level intercepts matter, but most of the time your attention should be on the treatment variable.
Why is there no intercept? You may have noticed that the feols() output includes no intercept term, and that there is a separate dummy variable (fixed effect) for every city and every year—not \(N - 1\) or \(T - 1\) dummies as you might expect from a standard regression with categorical variables. This is a deliberate modeling choice. In a typical OLS regression with a factor variable, we include an intercept and omit one category as the reference group. The intercept captures the baseline level for the omitted category, and each dummy coefficient is interpreted relative to that baseline.
feols() takes a different approach: it drops the intercept entirely and instead estimates a fixed effect for every group. Each fixed effect is then interpretable as the baseline level for that specific unit or time period, not as a deviation from some arbitrarily chosen reference category. When we looked at the city fixed effects above, Baltimore’s value was Baltimore’s baseline crime rate—not “how much higher Baltimore is than Austin” or whichever city happens to come first alphabetically. This makes the individual fixed effects more intuitive to read, because their interpretation does not depend on which category was omitted.
Mathematically the two approaches are equivalent. Including an intercept with \(N - 1\) dummies spans the same column space as including \(N\) dummies with no intercept; they produce identical fitted values, residuals, and coefficient estimates for the treatment variable. The only difference is how the fixed effects themselves are parameterized. Since we almost never interpret the individual fixed effects anyway, this distinction rarely matters for applied work—but it explains why your output looks the way it does.
11.7 What Fixed Effects Control For (and What They Don’t)
Fixed effects are useful, but they are not a cure-all. It is worth being precise about what they do and do not control for.
11.7.1 What Unit Fixed Effects Control For
Unit fixed effects (\(a_i\)) absorb all characteristics of unit \(i\) that are constant over time:
Geography and location
Historical legacy and founding conditions
Stable institutional features
Baseline demographics (to the extent they don’t change much)
Measurement differences (e.g., how crime is recorded)
“Culture” and other hard-to-measure factors
11.7.2 What Time Fixed Effects Control For
Time fixed effects (\(\tau_t\)) absorb all factors that affect all units equally in a given period:
Macroeconomic conditions (recessions, booms)
National policy changes
Seasonal patterns
Technological changes affecting everyone
11.7.3 What Fixed Effects Do NOT Control For
Fixed effects cannot eliminate bias from factors that vary both across units and over time in ways correlated with the treatment:
Time-varying confounders specific to certain units
Differential trends (some units changing faster than others)
Reverse causality (if \(y\) affects \(x\) within a period)
WarningFixed Effects ≠ No OVB
Fixed effects dramatically reduce omitted variable bias by controlling for time-invariant confounders. But they don’t eliminate all possible bias. Unit-specific, time-varying confounders can still cause problems. Always think carefully about what might be changing differently across your units over time.
For example, in our crime analysis, fixed effects control for each city’s baseline characteristics. But if some cities increased their police budgets while others cut them, and these budget changes are correlated with unemployment changes, we would still have omitted variable bias. We could address this by adding police spending as an explicit control variable.
11.8 Adding Control Variables
We can include additional time-varying control variables in fixed effects models:
where \(z_{it}\) is a control variable that changes over time within units.
This addresses unit-specific, time-varying confounders—the one source of bias that fixed effects alone cannot eliminate. We can only control for variables we observe and measure, but adding them narrows the remaining scope for omitted variable bias.
Returning to our city crime panel, our earlier model estimated the effect of police spending on property crime with city and year fixed effects. But local economic conditions are a plausible time-varying confounder: when a city’s economy weakens, property crime may rise and the city may cut police budgets due to falling tax revenue. If we don’t account for this, the estimated effect of police spending could be biased.
We can add the unemployment rate as a control variable. In feols(), time-varying controls go on the left side of the |, just like in a standard regression formula:
# Add unemployment as a time-varying controlcity_panel <- city_panel |>mutate(unem_rate =c(3.1, 2.8, 2.9, 5.0, # Austin5.8, 5.5, 5.3, 7.6, # Baltimore3.0, 2.7, 2.8, 5.4, # Denver3.8, 3.6, 3.9, 6.2, # Portland3.4, 3.1, 3.0, 5.1# Raleigh ) )# Without the controlfe_no_control <-feols(property_crime ~ police_spend | city + year,data = city_panel)# With the controlfe_with_control <-feols(property_crime ~ police_spend + unem_rate | city + year,data = city_panel)modelsummary(list("No Control"= fe_no_control,"With Unemployment"= fe_with_control),gof_map =c("adj.r.squared"),add_rows =tribble(~term, ~`No Control`, ~`With Unemployment`,"City FE", "Yes", "Yes","Year FE", "Yes", "Yes" ))
No Control
With Unemployment
police_spend
-5.644
-5.395
(0.555)
(0.624)
unem_rate
0.644
(0.713)
R2 Adj.
0.997
0.997
City FE
Yes
Yes
Year FE
Yes
Yes
Adding a control is no different from a standard regression—just include it on the left side of |. The fixed effects on the right side still absorb time-invariant city characteristics and common year shocks. The unemployment rate, because it varies across cities and over time, cannot be absorbed by fixed effects and must be included explicitly.
One restriction worth noting: we cannot include time-invariant variables (like a city’s region or founding year) as controls in a fixed effects model. Those are perfectly collinear with the city fixed effects—the fixed effects already capture them. Only variables that change within a unit over time belong on the left side of |.
11.9 Practical Considerations
11.9.1 When to Use Fixed Effects
Fixed effects are most valuable when:
You have genuine panel data (same units observed multiple times)
There is meaningful variation in \(x\) within units over time
Time-invariant confounders are a major concern
The treatment effect is expected to be relatively immediate
11.9.2 When Fixed Effects May Not Work Well
Fixed effects may be less useful when:
Little within-unit variation exists (everyone’s \(x\) is stable over time)
The treatment effect takes a long time to materialize
Time-varying confounders are the main concern
You have very few time periods (less precise estimates)
11.9.3 Standard Errors
With panel data, observations within the same unit are often correlated over time. Standard OLS standard errors assume independence and can be too small. In practice, researchers typically use clustered standard errors at the unit level to account for this correlation. We’ll discuss this more in later chapters.
11.10 Summary
Panel data—observing the same units over time—offers a powerful tool for causal inference. By tracking how outcomes change within units when circumstances change, we can control for all time-invariant characteristics of each unit, whether observed or not.
The fixed effects estimator implements this idea. It can be computed via the within transformation (demeaning the data) or equivalently by including dummy variables for each unit. The coefficient of interest is identified purely from within-unit variation over time.
Fixed effects control for:
All time-invariant unit characteristics (observed and unobserved)
All time-specific shocks affecting all units equally (when time fixed effects are included)
Fixed effects do NOT control for:
Unit-specific, time-varying confounders
Differential trends across units
Reverse causality
Panel methods are not a complete solution to omitted variable bias, but they represent a major improvement over cross-sectional analysis when time-invariant confounders are a primary concern.
11.11 Check Your Understanding
For each question below, select the best answer from the dropdown menu.
TipShow Explanation
Panel data’s defining feature is that it follows the same units (people, firms, cities, etc.) across multiple time periods. This allows us to observe how each unit changes over time, rather than just comparing different units at one point in time.
The term \(a_i\) is the unit fixed effect. It captures everything about unit \(i\) that is constant over time—observed characteristics like location, but also unobserved factors like institutional history, culture, or baseline conditions. This is what makes fixed effects so powerful for addressing omitted variable bias.
The within transformation subtracts each unit’s time-average from its observations. Since \(a_i\) doesn’t vary over time, its average equals itself: \(\bar{a}_i = a_i\). So when we compute \(a_i - \bar{a}_i = a_i - a_i = 0\), the fixed effect is eliminated.
The cross-sectional estimate was biased because unobserved city characteristics (wealth, institutions, etc.) were correlated with both unemployment and crime. Cities with “good” characteristics had both low unemployment AND low crime, creating a spurious negative correlation. Fixed effects removed these time-invariant confounders, revealing the true positive effect of unemployment on crime.
Unit fixed effects only control for characteristics that are constant over time. A city’s police budget changes from year to year, so it’s a time-varying factor that fixed effects won’t automatically control for. We would need to include it as an explicit control variable.
While mathematically equivalent for estimating \(\beta_1\), the dummy variable approach is more practical. It’s straightforward to add additional control variables, it works naturally with standard regression software, and we can examine the estimated fixed effects if they’re of substantive interest. The within transformation requires manual computation of demeaned variables.
Wooldridge, Jeffrey M. 2019. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning.