*This stata file outlines how to use the shale well data posted on my website with summary of deposit data
*the merged shale well and deposit data is loaded in the shale_dep.dta file
*the unit of observation in the data set is at the county-year
*c_wells = the cumulative number of fracking wells drilled through June 30 of a given year (June 30 is when the summary of deposit data is reported
*wells = the number of fracking wells drilled in a given year, from July 1 in one year to June 30 in the next
*specifically, the observations tied to 2005 are wells drilled between July 1, 2004 to June 30, 2005.
clear all
set more off
//Set data path here
use "C:\Users\gilje\Dropbox (Personal)\Data for Posting\shale_dep.dta"
//create county ids and state-year ids
egen stcntyid=group(stcntybr_txt)
egen state_yr=group(state year)
xtset stcntyid year
gen ln_dep = ln(sumdep + 1)
gen ln_c_wells = ln(c_wells + 1)
gen dep_growth_w = sumdep/L.sumdep
*Gilje 2019 identifies booms as county-years with greater than 17 producing wells
*in this sample 96.4% of wells are drilled in "boom" years.
gen boom = 0
replace boom = 1 if c_wells > 17 & c_wells ~= .
*Gilje 2019 - Table 4, specification (2) coefficient of 0.026 for log wells
*this data is the whole US though, so there are slight differences
reghdfe ln_dep ln_c_wells if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
*Gilje 2019 - Table 4, specification (1) coefficient of 0.093 for boom dummy
*this data is the whole US though, so there are slight differences
reghdfe ln_dep boom if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
*check for differences post-2009
reghdfe ln_dep ln_c_wells if year > 2009 & year ~= ., absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep boom if year > 2009 & year ~= ., absorb(stcntyid year) cluster(stcntyid)
*report effect over the whole sample
reghdfe ln_dep ln_c_wells, absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep boom, absorb(stcntyid year) cluster(stcntyid)
*Economically, booms effect both the level and growth of deposits over time
*with each well there is a one-time increase from lease bonus payments and initial royalty checks
*and then there are increased deposits over time from royalty checks in future years
*see Gilje (2019) for a detailed description on this
*In general the cumulative number of wells should relate to the level of deposits, while the number of new wells
*in a given year will relate to the growth in deposits. The "Boom" variable could relate to either the level or
*the growth rate. Whether looking at boom, number of wells, log wells, etc. these are all representations/proxies
*for the cash flows that local mineral owners are receiving from oil and gas development, as well as individuals
*that live locally and experience higher wages.
*In Gilje, Loutskina, Strahan (JF 2016) the authors use the Boom variable to proxy for growth in deposits
reghdfe dep_growth_w boom if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
*level regressions full sample
reghdfe ln_dep ln_c_wells, absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep boom, absorb(stcntyid year) cluster(stcntyid)
*growth regressions full sample
reghdfe dep_growth_w wells, absorb(stcntyid year) cluster(stcntyid)
reghdfe dep_growth_w boom, absorb(stcntyid year) cluster(stcntyid)
*tests to look at pre-trends
*create dummy variable for one year in advance of the first well being drilled
quietly: gen placebo = 0
quietly: replace placebo = 1 if F.c_wells > 0 & F.c_wells ~= . & c_wells == 0
*create dummy variable for 2 to 5 years in advance of the first well being drilled
quietly: gen placebo1 = 0
quietly: replace placebo1 = 1 if F.placebo == 1
quietly: gen placebo2 = 0
quietly: replace placebo2 = 1 if F.placebo1 == 1
quietly: gen placebo3 = 0
quietly: replace placebo3 = 1 if F.placebo2 == 1
quietly: gen placebo4 = 0
quietly: replace placebo4 = 1 if F.placebo3 == 1
*create dummy variable for one year in advance of the boom (cumulative wells greater than 17)
quietly: gen placebob = 0
quietly: replace placebob = 1 if F.boom > 0 & F.boom ~= . & boom == 0
*create dummy variable for two years in advance of the boom (cumulative wells greater than 17)
quietly: gen placebo1b = 0
quietly: replace placebo1b = 1 if F.placebob == 1
quietly: gen placebo2b = 0
quietly: replace placebo2b = 1 if F.placebo1b == 1
quietly: gen placebo3b = 0
quietly: replace placebo3b = 1 if F.placebo2b == 1
quietly: gen placebo4b = 0
quietly: replace placebo4b = 1 if F.placebo3b == 1
*normalize all coefficients by mean and standard deviation so that the interpretation
*of the coefficient is a one standard deviation increase relative to the mean
*Note: this does not affect the statistical significance, only the interpretation
quietly: summarize placebo
quietly: gen placebo_n = (placebo - r(mean))/r(sd)
quietly: summarize placebo1
quietly: gen placebo1_n = (placebo1 - r(mean))/r(sd)
quietly: summarize placebo2
quietly: gen placebo2_n = (placebo2 - r(mean))/r(sd)
quietly: summarize placebo3
quietly: gen placebo3_n = (placebo3 - r(mean))/r(sd)
quietly: summarize placebo4
quietly: gen placebo4_n = (placebo4 - r(mean))/r(sd)
quietly: summarize placebob
quietly: gen placebob_n = (placebob - r(mean))/r(sd)
quietly: summarize placebo1b
quietly: gen placebo1b_n = (placebo1b - r(mean))/r(sd)
quietly: summarize placebo2b
quietly: gen placebo2b_n = (placebo2b - r(mean))/r(sd)
quietly: summarize placebo3b
quietly: gen placebo3b_n = (placebo3b - r(mean))/r(sd)
quietly: summarize placebo4b
quietly: gen placebo4b_n = (placebo4b - r(mean))/r(sd)
quietly: summarize ln_c_wells
quietly: gen ln_c_wells_n = (ln_c_wells - r(mean))/r(sd)
quietly: summarize boom
quietly: gen boom_n = (boom - r(mean))/r(sd)
quietly: summarize wells
quietly: gen wells_n = (wells - r(mean))/r(sd)
*level regressions pre 2009 as in Gilje 2019
reghdfe ln_dep placebo_n placebo1_n ln_c_wells_n if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep placebob_n placebo1b_n boom_n if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
*growth regressions pre 2009 as in Gilje, Loutskina, Strahan (JF 2016)
reghdfe dep_growth_w placebob_n placebo1b_n boom_n if year <= 2009, absorb(stcntyid year) cluster(stcntyid)
*In later years, in sample after the time periods of the above published papers some pre-trends with deposits emerge depending on the specification
*when looking at deposit levels in boom counties. Ideally, one would like to observe the oil and gas bonus and royalty payments received by
*the mineral owners that live locally. The number of wells drilled in a given year is a rough proxy for this, as particularly later in the sample
*firms tend to do preliminary leasing 3 to 5 years in advance of drilling. Therefore, some pre-trends emerge, depending on the specificaiton. However,
*as can be seen with the normalized variables, the magnitude of these "pre-trends" tends to be significantly lower than once full drilling activity commences
*I point this issue out so that researchers can take appropriate care when using the data from later periods and estimating economic magnitudes.
*As depending on the exact economic question researchers are looking at being aware of this issue could be important.
*level regressions full sample
reghdfe ln_dep placebo_n placebo1_n placebo2_n placebo3_n placebo4_n ln_c_wells_n, absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep placebob_n placebo1b_n placebo2b_n placebo3b_n placebo4b_n boom_n, absorb(stcntyid year) cluster(stcntyid)
*growth regressions full sample
reghdfe dep_growth_w placebob_n placebo1b_n placebo2b_n placebo3b_n placebo4b_n boom_n, absorb(stcntyid year) cluster(stcntyid)
reghdfe dep_growth_w placebo_n placebo1_n placebo2_n placebo3_n placebo4_n wells_n, absorb(stcntyid year) cluster(stcntyid)
*It is important to note that several places that experienced "booms" also experienced busts, where drilling activity has ceased
*To see this we can create a bust variable, note, this hasn't been used in research to my knowledge, but I think represents an interesting opportunity
gen bust = 0
*code a county-year as bust if it had a boom and there are not wells being currently drilled
replace bust = 1 if boom == 1 & wells == 0
quietly: summarize bust
quietly: gen bust_n = (bust - r(mean))/r(sd)
*level regressions full sample
reghdfe ln_dep boom bust, absorb(stcntyid year) cluster(stcntyid)
reghdfe ln_dep boom_n bust_n, absorb(stcntyid year) cluster(stcntyid)
*growth regressions full sample
reghdfe dep_growth_w boom bust, absorb(stcntyid year) cluster(stcntyid)
reghdfe dep_growth_w boom_n bust_n, absorb(stcntyid year) cluster(stcntyid)