**Implementing Fixed Effects Estimation**

Our purpose in writing this paper
was to examine the econometrics underlying the ad hoc estimation methods commonly
used to account for unobserved heterogeneity in the finance literature.
Our investigation found that these methods should not be used because they
typically provide inconsistent estimates. For example, *Adj*Y
estimation transforms only the dependent variable and does not remove
problematic correlations from the independent variables. Fixed effects
(FE) estimation, on the other hand, is consistent and should be used in place
of these other estimators. But it is not always obvious how to implement fixed
effects.

This website provides examples and corresponding code to illustrate how to implement fixed effects in these cases. We also provide suggestions on how to overcome computational hurdles that arise when estimating models with multiple high-dimensional fixed effects. The code we provide is for Stata and SAS. If you want to suggest ways to handle these issues in other languages, we are happy to post links.

If you use this information or code, please cite Gormley and
Matsa (forthcoming *RFS*). Our
paper, which provides deeper analysis of these ideas, is available here. Lecture slides used by Gormley to teach these
methods to PhD students are available here.

**Examples of how to run a
FE estimation in place of AdjY or AvgE**

A FE estimator correctly transforms both the dependent and
independent variables and should be used in place of *Adj*Y and *Avg*E
estimators. Commands for implementing the FE estimator in Stata are in
bold and the variable names, which the user must specify, are in italics.
Here are two examples: (1) industry-adjusting and (2)
characteristically-adjusting stock returns.

__Example #1 –
Industry-adjusting__

Industry-adjusting, an *Adj*Y estimator, can take many
forms. A common form is to demean the dependent variable with respect to
industry mean (or median) before estimating the model with OLS. However,
this estimate is inconsistent whenever there are within-industry correlations among
independent variables. Instead, a researcher should estimate a model with
industry FE. Any of the following four sets of estimation commands can be
used:

__Stata__

** **

**reg**** dependent_variable independent_variables i.industry;**

** **

**areg**** dependent_variable independent_variables, a(industry);**

** **

**xtset**** industry;**

**xtreg**** dependent_variable independent_variables,
fe;**

* *

__SAS __

**proc**** sort data= dataset; **

**by**** industry; **

**proc**** glm data= dataset;**

**absorb****
industry;**

**model****
dependent_variables = independent_variables
/ solution;**

**Note #1: **Unless you are interested in the individual
group means, AREG, XTREG, or PROC GLM are typically preferable, because of
shorter computation times.

** **

**Note #2:** While these various methods yield identical
coefficients, the standard errors may differ when Stata’s
cluster option is used. When clustering, AREG reports
cluster-robust standard errors that reduce the degrees of freedom by the number
of fixed effects swept away in the within-group transformation; XTREG reports
smaller cluster-robust standard errors because it does not make such an
adjustment. XTREG’s approach of not
adjusting the degrees of freedom is appropriate when the fixed effects swept
away by the within-group transformation are nested within clusters (meaning all
the observations for any given group are in the same cluster), as is commonly
the case (e.g., firm fixed effects are nested within firm, industry, or state
clusters). See Wooldridge (2010, Chapter 20).

XTREG-clustered standard errors can be recovered from AREG as follows:

1. Run
the AREG command *without* clustering

2. Then, construct two variables using the following code:

**gen**** df_areg = e(N)
– e(rank) – e(df_a);**

**gen**** df_xtreg = e(N)
– e(rank);**

3. Run
the AREG command again *with*
clustering

4. Multiply the reported cluster-robust standard errors by sqrt(df_areg / df_xtreg)

If the desired industry-adjusting is on a yearly basis, then
instead of using the mean or median of observations in the same industry-year
to adjust the dependent variable, estimate a model with *industry×year*
fixed effects:

__Stata__

** **

**reg**** dependent_variable independent_variables i.industry#i.year;**

* *

**egen**** industry_year = group(industry
year);**

**areg**** dependent_variable
independent_variables, a(industry_year);**

** **

**egen**** industry_year
= group(industry year);**

**xtset**** industry_year;**

**xtreg**** dependent_variable independent_variables,
fe;**

__SAS __

**proc**** sort data= dataset; **

**by**** industry year; **

**proc**** glm data= dataset;**

**absorb****
industry year;**

**model****
dependent_variables = independent_variables
/ solution;**

If you are interested in combining industry-year FE with another fixed effect, like firm FE, then absorb the fixed effect of highest dimension and control for the other(s) using indicator variables:

__Stata__

** **

**areg**** dependent_variable independent_variables i.industry#i.year, a(firm);**

** **

**xtset**** firm;**

**xtreg**** dependent_variable independent_variables
i.industry#i.year, fe;**

** **

__SAS __

** **

**proc**** sort data= dataset; **

**by**** firm; **

**proc**** glm data= dataset;**

**absorb****
firm;**

**class****
industry year;**

**model****
dependent_variables = independent_variables
industry*year**

**Note:** The above specification may be computationally
difficult to estimate if the number of industry-year indicator variables is
large. To resolve this, please see the discussion below about Stata
programs that can be used to estimate models with multiple high-dimensional FE.

__Example #2 –
Characteristically-adjusted stock returns__

Although there are many ways to construct characteristically-adjusted stock returns, the basic idea is the same. Before analyzing stock returns, you first construct a set of benchmark portfolios based on various firm characteristics, and then “characteristically-adjust” the individual stock returns by subtracting the equal- or value-weighted average return of their corresponding benchmark portfolio for each period. For example, construct 25 size and value portfolios each period by first dividing stocks into quintiles based on their size and then further subdividing them into quintiles based on their market-to-book ratios. A firm’s size and market-to-book ratio in a given period then determines which benchmark portfolio is used to adjust the firm’s stock return in that period. After constructing these “characteristically-adjusted” returns, you then further sorts the stocks based on an independent variable of interest to determine whether stock returns vary across this independent variable. Such analyses typically sort stocks into quintiles based on the independent variable and then compare returns across the top and bottom quintiles.

This method is equivalent to *Adj*Y in that it only
transforms the dependent variable (stock returns), and it doesn’t account for
correlations of the independent variable within groups (i.e.,
portfolios). To avoid potential biases that might occur because of such
correlations, one should instead estimate a model with fixed effects for each
of the portfolio-periods and indicators for each quintile of the independent
variable, excluding an indicator for the bottom quintile. The resulting estimates
indicate how the average stock return across each quintile differs from the
average stock return for the bottom quintile:

** **

__Stata__

** **

**reg**** stock_return
ind_var_quintile2 ind_var_quintile3
ind_var_quintile4 ind_var_quintile5 i.benchmark_portfolio#i.period;**

**egen**** portfolio_period = group(benchmark_portfolio period);**

**areg**** stock_return
ind_var_quintile2 ind_var_quintile3
ind_var_quintile4 ind_var_quintile5, a(portfolio_period);**

** **

**egen**** portfolio_period = group(benchmark_portfolio period);**

**xtset**** portfolio_period;**

**xtreg**** stock_return ind_var_quintile2
ind_var_quintile3 ind_var_quintile4 ind_var_quintile5,
fe;**

__SAS__

** **

**proc**** sort data= dataset; **

**by**** benchmark_portfolio period; **

**proc**** glm data= dataset;**

**absorb**** benchmark_portfolio
period;**

**model****
stock_return = ind_var_quintile2
ind_var_quintile3 ind_var_quintile4 ind_var_quintile5**

**Stata programs that can be
used to estimate models with multiple high-dimensional FE**

Estimating fixed effects models with multiple sources of unobserved heterogeneity can be computationally difficult when there are a high number of FE that need to be estimated. As discussed in our paper, only one FE can typically be removed by transforming the data. The other fixed effects need to be estimated directly, which can cause computational problems. For example, to estimate a regression on Compustat data spanning 1970-2008 with both firm and 4-digit SIC industry-year fixed effects, Stata’s XTREG command requires nearly 40 gigabytes of RAM.

__User-written commands
in Stata__

As noted in our paper, there are memory-saving and iteration
techniques that can be used to avoid these limitations. As of the writing
of our paper, there were two user-written Stata programs one could use to do
this: FELSDVREG and REG2HDFE. Both programs are capable of handling two
high-dimensional FE and are available from the Statistical Software Components
(SSC) archive. To download either program, simply type the following
command once in Stata (replacing *program_name*
with FELSDVREG or REG2HDFE):

**ssc**** install program_name**

This command will load everything associated with programs, including the help files.

Both commands can be used to estimate models with two high-dimensional fixed effects. For example, if one wanted to estimate a model with firm and industry-year fixed effects (as in example #1 above), the commands could be used as follows:

**egen**** industry_year
= group(industry year);**

**felsdvreg****
***dependent_variable** independent_variables***,
ivar( firm) jvar(industry_year)
xb(xb) peff(peff) feff(feff) res(res) mover(mover) mnum(mnum) pobs(pobs) group(group)**

** egen industry_year =
group(industry year);**

**reg2hdfe***
dependent_variable independent_variables,*
id1(*firm*) id2 (*industry_year*);

Refer to the help files for more details on how to use these commands. Please address any questions you might have about these programs directly to their respective authors. Our personal experience is that REG2HDFE often executes much more quickly than FELSDVREG, but run time will depend on the specific application and data structure.

**Note: **These programs report
cluster-robust errors that reduce the degrees of freedom by the number of fixed
effects swept away in the within-group transformation. This is the same
adjustment applied by the AREG command. To recover the cluster-robust
standard errors one would get using the XTREG command, which does not reduce
the degrees of freedom by the number of fixed effects swept away in the
within-group transformation, you can apply the following adjustments:

· For FELSDVREG, use the noadji option built into the command

·
For REG2HDFE, multiply the reported standard
errors by sqrt([e(N) - e(df_r)] / [e(N) - [e(df_r) - (*G _{1 }*-
1)]]), where

**egen**** industry_year
= group(industry year);**

**reg2hdfe*** dependent_variable
ind_variable1 ind_variable2,* id1(*firm*)
id2 (*industry_year*)
cluster(*firm*);

** matrix varTemp = e(V);**

** qui distinct firm
if ind_variable1 != . & ind_variable2 != . & industry_year != .**

** disp “SE ind_variable1:
“ sqrt(varTemp[1,1]) * sqrt((e(N)-e(df_r))/(e(N)-(e(df_r)-(r(ndistinct)-1))));**

**disp**** “SE ind_variable2:
“ sqrt(varTemp[2,2]) * sqrt((e(N)-e(df_r))/(e(N)-(e(df_r)-(r(ndistinct)-1))));**

As discussed above in the context of AREG vs. XTREG, this adjustment is only applied when the panel variable is nested within clusters. If you are ever unsure which standard errors are correct in a particular application, reporting the higher standard error is prudent.

As of the writing of our paper, we are not aware of any
user-written commands that estimate models with three or more high-dimensional
fixed effects. Such a command is necessary, for example, if you want to
estimate a model with firm, state-year, and industry-year fixed effects.
However, Paulo Guimaraes of the University of South Carolina (the author of
REG2HDFE) is currently working on a program, REGHDFE, which will do this.
Researchers interested in using this program should contact him at guimaraes@moore.sc.edu. An
earlier version of the program, called REG3HDFE, can also be obtained from the *American
Economic Journal: Macroeconomics* website
(click “Download Data Set”).

__User-written package for R__

Simen Gaure of the University of Oslo wrote an R-package, called LFE, that can handle multiple fixed effects. The method is described here. Questions can be directed to him at simen.gaure@frisch.uio.no.

If you find errors or corrections, please e-mail us.

Todd A. Gormley and David A. Matsa