25  Selection models

25.1 Replication of Jackson et al. (2009)

library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
funda <- tbl(pg, sql("SELECT * FROM comp.funda"))
company <- tbl(pg, sql("SELECT * FROM comp.company"))
aco_pnfnda <- tbl(pg, sql("SELECT * FROM comp.aco_pnfnda"))
funda_mod <-
  funda |>
  filter(indfmt == "INDL", datafmt == "STD",
         consol == "C", popsrc == "D") |>
  mutate(sich = as.character(sich))

sics <- 
  company |>
  select(gvkey, sic)
pensions <-
  aco_pnfnda |>
  filter(indfmt == "INDL", datafmt == "STD",
         consol == "C", popsrc == "D") |>
  mutate(dben = coalesce(pbpro > 0 | !is.na(pbarr), FALSE)) |>
  select(gvkey, datadate, dben)

pensions
# Source:   SQL [?? x 3]
# Database: postgres  [igow@/tmp:5432/igow]
   gvkey  datadate   dben 
   <chr>  <date>     <lgl>
 1 001000 1973-12-31 FALSE
 2 001000 1974-12-31 FALSE
 3 001000 1975-12-31 FALSE
 4 001000 1976-12-31 FALSE
 5 001000 1977-12-31 FALSE
 6 001004 1982-05-31 TRUE 
 7 001004 1983-05-31 TRUE 
 8 001004 1984-05-31 TRUE 
 9 001004 1985-05-31 TRUE 
10 001004 1986-05-31 TRUE 
# ℹ more rows
raw_data <-
  funda_mod |>
  inner_join(sics, by = "gvkey") |>
  mutate(sic = coalesce(sich, sic)) |>
  mutate(sic3 = substr(sic, 1, 3)) |>
  select(gvkey, datadate, fyear, xrd, at, ppegt, cogs, np, xad, dltt, dpact)

25.2 Discussion questions

  1. Based on the evidence in Lennox et al. (2012), why do authors use selection models?
  2. How do you interpret the results in Tables 4 and 5 of Lennox et al. (2012)?
  3. What’s going on in Table 6 of Lennox et al. (2012)?
  4. Evaluate the “suggestions” in section VI of Lennox et al. (2012).
  5. Lennox et al. (2012) say:

For example, Bushee et al. (2003) stands outs in terms of justifying the variables in the first and second stage models, reporting sensitivity tests to alternative model specifications, and reporting diagnostic tests for multicollinearity. Moreover, Bushee et al. (2003) clearly identify their exclusion restrictions and they state that they do not have good reason to expect that those Z variables would directly affect the dependent variable in the second stage model.

Discuss the details of what Bushee et al. (2003) do on each of the points mentioned by Lennox et al. (2012).

  1. Which table in Bushee et al. (2003) includes the main results of their second-stage model? What do they find there?
  2. How plausible do you find the idea studied in Jackson et al. (2009)? Suppose you could run a field experiment (e.g., you were appointed head of the relevant regulator and given authority to run experiments) to test these ideas. What choices might you make to get the best experiment?