Kenny Peng
CS PhD student at Cornell Tech. Interested in interactions between algorithms and society. Princeton math '22.
kennypeng.me
- Reposted by Kenny PengOur paper “Inferring fine-grained migration patterns across the United States” is now out in @natcomms.nature.com! We released a new, highly granular migration dataset. 1/9
- Reposted by Kenny Peng🎙️ I had a great time joining the Data Skeptic podcast to talk about my work on recommender systems If you're interested in embeddings, aligning group preferences, or music recommendations, check out the episode below 👇 open.spotify.com/episode/6IsP...
- Reposted by Kenny PengCheck out our new paper at #AAAI 2026! I’ll be presenting in Singapore at Saturday’s poster session (12–2pm). This is joint work with @shuvoms.bsky.social, @bergerlab.bsky.social, @emmapierson.bsky.social, and @nkgarg.bsky.social. 1/9
- Reposted by Kenny PengExcited to present a new preprint with @nkgarg.bsky.social: presenting usage statistics and observational findings from Paper Skygest in the first six months of deployment! 🎉📜 arxiv.org/abs/2601.04253
- Reposted by Kenny Pengso so so excited to present our research + connect with the #ATScience community 🧪🎉
- Excited to announce that @sjgreenwood.bsky.social will be at #ATScience presenting their work on the beloved @paper-feed.bsky.social , experiments on self-hosted feeds (in collaboration with @graze.social @aendra.com) and observational analyses of social media on #atproto! Looking forward to it! 🎉
- Year 3 of spending many days making gingerbread — this year, featuring the gantries of Long Island City
- Reposted by Kenny Peng[Not loaded yet]
- I had a lot of fun making this map of Manhattan’s grid (only the numbered streets and avenues). Learned that 4th avenue doesn’t exist, but then learned that it actually does exist but only for a few blocks.
- For #30DayMapChallenge day 11, a minimal map from @kennypeng.bsky.social. Kenny extracts minimal elements from a not-as-minimal-as-it-seems object: the Manhattan street grid. "I show how Manhattan’s numbered grid of streets and avenues is more complicated than you might realize," he says.
- Map was 100% only possible due to @gsagostini.bsky.social ‘s tutelage
- Reposted by Kenny PengNew #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
- Being Divya's labmate (and fellow ferry commuter) has been a real pleasure, and I've learned a ton from both her research itself and her approach to research (and also from the other random things she knows about).
- "those already relatively advantaged are, empirically, more able to pay time costs and navigate administrative burdens imposed by the mechanisms." This point by @nkgarg.bsky.social has greatly shaped my thinking about the role of computer science in public service settings.
- New piece, out in the Sigecom Exchanges! It's my first solo-author piece, and the closest thing I've written to being my "manifesto." #econsky #ecsky arxiv.org/abs/2507.03600
- How do we reconcile excitement about sparse autoencoders with negative results showing that they underperform simple baselines? Our new position paper makes a distinction: SAEs are very useful for tools for discovering *unknown* concepts, less good for acting on *known* concepts.
- One paragraph pitch for why sparse autoencoders are cool (they learn *interpretable* text embeddings)
- We're presenting two papers Wednesday at #ICML2025, both at 11am. Come chat about "Sparse Autoencoders for Hypothesis Generation" (west-421), and "Correlated Errors in LLMs" (east-1102)! Short thread ⬇️
- Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7) arxiv.org/abs/2506.07962
- What explains error correlation? We found that models from the same company are more correlated. Strikingly, more accurate models also have more correlated errors, suggesting some level of convergence among newer models. (2/7)
- Reposted by Kenny PengNew work 🎉: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful. In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing coverage.
- Reposted by Kenny Peng[Not loaded yet]
- Reposted by Kenny PengI’m really excited to share the first paper of my PhD, “Learning Disease Progression Models That Capture Health Disparities” (accepted at #CHIL2025)! ✨ 1/ 📄: arxiv.org/abs/2412.16406
- Reposted by Kenny Peng[Not loaded yet]
- Our lab had a #dogathon 🐕 yesterday where we analyzed NYC Open Data on dog licenses. We learned a lot of dog facts, which I’ll share in this thread 🧵 1) Geospatial trends: Cavalier King Charles Spaniels are common in Manhattan; the opposite is true for Yorkshire Terriers.
- Reposted by Kenny Peng[Not loaded yet]
- Reposted by Kenny Peng[Not loaded yet]
- (1/n) New paper/code! Sparse Autoencoders for Hypothesis Generation HypotheSAEs generates interpretable features of text data that predict a target variable: What features predict clicks from headlines / party from congressional speech / rating from Yelp review? arxiv.org/abs/2502.04382
- Reposted by Kenny PengPlease repost to get the word out! @nkgarg.bsky.social and I are excited to present a personalized feed for academics! It shows posts about papers from accounts you’re following bsky.app/profile/pape...