Paul Röttger @ EMNLP
Postdoc @milanlp.bsky.social working on LLM safety and societal impacts. Previously PhD @oii.ox.ac.uk and CTO / co-founder of Rewire (acquired '23)
paulrottger.com
- There’s plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases — which is where bias actually matters. IssueBench, our attempt to fix this, is accepted at TACL, and I will be at #EMNLP2025 next week to talk about it! New results 🧵
- Quick recap of our setup: For each of 212 political issues we prompt LLMs with thousands of realistic requests for writing assistance. Then we classify each model response for which stance it expresses on the issue at hand.
- For this final version of our paper, we added results for Grok and DeepSeek alongside GPT, Llama, Qwen, and OLMo. Surprisingly, despite being developed in quite different settings, all models are very similar in how they write about different political issues.
-
View full threadFor more details on IssueBench, check out our paper and dataset release. And if you have any questions, please get in touch with me or my amazing co-authors 🤗 Paper: arxiv.org/abs/2502.08395 Data: huggingface.co/datasets/Pau...
- LLMs are good at simulating human behaviours, but they are not going to be great unless we train them to. We hope SimBench can be the foundation for more specialised development of LLM simulators. I really enjoyed working on this with @tiancheng.bsky.social et al. Many fun results 👇
- Reposted by Paul Röttger @ EMNLP🏆 Thrilled to share that our HateDay paper has received an Outstanding Paper Award at #ACL2025 Big thanks to my wonderful co-authors: @deeliu97.bsky.social, Niyati, @computermacgyver.bsky.social, Sam, Victor, and @paul-rottger.bsky.social! Thread 👇and data avail at huggingface.co/datasets/man...
- Can we detect #hatespeech at scale on social media? To answer this, we introduce 🤬HateDay🗓️, a global hate speech dataset representative of a day on Twitter. The answer: not really! Detection perf is low and overestimated by traditional eval methods arxiv.org/abs/2411.15462 🧵
- Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025. As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
- Measuring *social and political biases* in LLMs is more important than ever, now that >500 million people use LLMs. I am particularly excited to check out work on this by @kldivergence.bsky.social @1e0sun.bsky.social @jacyanthis.bsky.social @anjaliruban.bsky.social
- *pluralism* in human values & preferences (e.g. with personalisation) will also just grow more important for a global diversity of users. @morlikow.bsky.social is presenting our poster today at 1100. Also hyped for @michaelryan207.bsky.social's work and @verenarieser.bsky.social's keynote!
-
View full threadLet me know if I missed anything in the timetables, and please say hi if you want to chat about sociotechnical alignment, safety, the societal impact of AI, or related topics :) Here is a link to the timetable sheet 👇 See you around! docs.google.com/spreadsheets...
- Reposted by Paul Röttger @ EMNLPCan LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! 🧵
- Reposted by Paul Röttger @ EMNLP📈Out today in @PNASNews!📈 In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages. 🧵:
- Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
- Before we get to those results though, let me briefly explain our setup: We test for *issue bias* in LLMs by prompting models to write about an issue in many different ways and then classifying the stance of each response. Bias in this setting is when one stance dominates.
- We cover 212 political issues from real user chats with LLMs. These issues are extremely varied, spanning tech (e.g. military drones), social justice (gender equality), the environment (carbon emissions) and many more policy areas.
-
View full threadWe are very excited for people to use and expand IssueBench. All links are below. Please get in touch if you have any questions 🤗 Paper: arxiv.org/abs/2502.08395 Data: huggingface.co/datasets/Pau... Code: github.com/paul-rottger...
- Reposted by Paul Röttger @ EMNLPI’m thrilled to share that our paper on mitigating false refusal in language models has been accepted to ICLR 2025 @iclr-conf.bsky.social! arxiv.org/abs/2410.03415 Joint work with chengzhi, @paul-rottger.bsky.social, @barbaraplank.bsky.social.