- We wrote a thing -- showing you don't need LLMs to model language production dynamics like the tendency for speakers to reduce predictable words. All you have to do is better model how speech rate varies depending on where a word is and how long the utterance is. arxiv.org/abs/2512.23659
- We explored this problem by re-analyzing two recent papers, showing that (1) small-scale language model surprisal outperforms long-range surprisal, and (2) bigram transition probabilities do better than LLMs. Then we conducted two new studies -- showing the same thing as (2).
- Beyond better capturing utterance structure, one of my favorite contributions of the work is that my students pushed me to expand the work to corpora outside Mainstream American English. We analyzed both Buckeye (a classic!) and all of the CORAAL data and show phrase-sized probabilistic reduction
- And on a separate note, I will say I am quite tired of "predictive power" analyses that rely on model comparison that do not report coefficients or effect sizes. These probabilistic effects on language production are TINY. This is probably why there's so much variability in whether people find it
- And since n-grams are all you need, this provides some nice converging evidence that speakers probably use very local representations to pace their utterances, say by retrieving phrases. We would love comments!Dec 30, 2025 13:55