- Our paper on the effect of ChatGPT on activity on @stackoverflow.com.web.brid.gy is out: academic.oup.com/pnasnexus/ar... @maria-drc.bsky.social, Nadzeya Laurentsyeva & I find a 25% decrease in activity on SO within 6 months of #ChatGPT 's release vs counterfactuals. Why does it matter?Nov 15, 2024 09:28
- SO is an incredibly valuable and unique source of information about programming. It contains millions of questions asked, answered, & edited by people with different perspectives, curated with votes and tags. Its posts are contained in datasets used to train LLMs like the Pile.
- LLMs are a pretty decent substitute for many questions we might look up or ask on Stack Overflow. But other people don't see the questions you ask ChatGPT, or the answers you get. Only OpenAI can.
- In the paper we also find - heterogeneities across programming languages - no change in post quality and we present supporting evidence from the SO User Survey. We also observe further decline in posting past the point where our counterfactuals are valid.
- But the most important take away is that a major public source of data is rapidly shrinking. Ironically, future AI systems will miss a valuable source of data to learn from. We discuss this and implications for competition in AI and search - have a look!