Once, a colleague told me: "Using a particular dataset is not, by itself, a contribution." I mostly agree. But I also feel that in many papers the main contribution is access to a particular dataset — and it’s often framed that way. 1/7
The existence of great, easily accessible, pre-packaged data shapes how we spend our research time. If you have an amazing, cleaned dataset, you can pour energy into identification, modelling, and robustness. 2/7
If you don't, you pour months or years into scraping, merging, cleaning, and documentation. That work is essential but barely rewarded, and it crowds out time to build more "visible" econometric skills. 3/7
Nov 27, 2025 09:40This creates a gap: data haves vs. data have-nots. Access to shiny, clean, cool data is often restricted and mediated by networks: location, supervisor, affiliation, language, etc. 4/7
When hiring and tenure committees evaluate us, they think they’re observing "research skill" via publications. But part of what they’re really seeing is "had access to privileged data." 5/7
Meanwhile, our training is skewed. We teach econometrics in excruciating detail over multiple courses. But we rarely teach how to cold-email a ministry, how to negotiate an NDA, or how to scrape when everything else fails. 6/7
For empirical applied micro, shouldn't "data acquisition" be a core methods course — on par with experimental methods in behavioral econ? If getting the data is half the battle, why do we treat it like an afterthought? 7/7