What are the main issues discussed in a set of documents?
We’ve just released a step-by-step BERTopic tutorial.
We also launch a new page, gathering various NLP tutorials for social scientists.
👉 www.css.cnrs.fr/tutorials-an...
Citation is the foundation of academic promotion. It’s noisy, sure, but its integrity is worth fighting for. Hallucinated citations should be a desk reject.
NEW: NeurIPS,one of the world’s top academic AI conferences, accepted research papers with 100+ AI-hallucinated citations, new report claims
fortune.com/2026/01/21/n...
⏳ Deadline approaching! We’re hiring 2 fully funded postdocs in #NLP.
Join the MilaNLP team and contribute to our upcoming research projects (SALMON & TOLD)
🔗 Details + how to apply: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
🚨(Software) Update:
In my PhD, I had a side project to fix an annoying problem: when you ask 5 people to label the same thing, you often get different answers. But in ML (and lots of other analyses), you still need a single aggregated answer. Using the majority vote is easy–but often wrong.
1/N
However, disagreement isn’t just noise—it’s information. It can mean an item is genuinely hard—or someone wasn’t paying attention. If only you knew whom to trust…
That summer, Taylor Berg-Kirkpatrick, Ashish Vaswani, and I built MACE (Multi-Annotator Competence Estimation).
2/N
MACE estimates:
1. Annotator reliability (who’s consistent?)
2. Item difficulty (which examples spark disagreement?)
3. The most likely aggregate label (the latent “best guess”)
That “side project” ended up powering hundreds of annotation projects over the years.
3/N
If you are curious about the theoretical background, see
Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., & Hovy E. (2013). Learning Whom to Trust With MACE. In: Proceedings of NAACL-HLT. ACL.
aclanthology.org/N13-1132.pdf
And for even more details:
aclanthology.org/Q18-1040.pdf
N/N
New year, new job? If that is your current mantra, check the open postdoc positions with Debora Nozza and me at our lab. Deadline is January 31st.
milanlproc.github.io/open_positio...
🚀 We’re opening 2 fully funded postdoc positions in #NLP!
Join the MilaNLP team and contribute to our upcoming research projects.
🔗 More details: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
Thrilled to announce the Handbook of Computational Social Science is officially out! 956 pages, 118 authors, and truly global, interdisciplinary perspectives. Deep thanks to the contributors and anonymous reviewers who shaped this over 4 years. Buy your copy now!
@elgarpublishing.bsky.social
#MemoryModay#NLProc Countering Hateful and Offensive Speech Online - Open Challenges" by Plaza-Del-Arco, @debora_nozza, Guerini, Sorensen, Zampieri, 2024 is a tutorial on the challenges and solutions for detecting and mitigating hate speech.
#MemoryModay#NLProc Uma, A. N. et al. examine AI model training in 'Learning from Disagreement: A Survey'. Disagreement-handling methods' performance is shaped by evaluation methods & dataset traits.
#TBT#NLProc#MachineLearning#SafetyFirst 'Safety-Tuned LLaMAs: Improving LLMs Safety' by Bianchi et al. explores training LLMs for safe refusals, warns of over-tuning.
🚀 We’re opening 2 fully funded postdoc positions in #NLP!
Join the MilaNLP team and contribute to our upcoming research projects.
🔗 More details: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
We're hiring interns in the Computational Social Science group at Microsoft Research NYC!
If you're interested in designing AI‑based systems and understanding their impact at both individual and societal scales, apply here by Jan 9, 2026: apply.careers.microsoft.com/careers/job/...
After I shared “How to professor” last year, some people asked for a similar post on writing. Now I finally got around to typing up our lab's writing workshop slides.
It covers basic advice for research papers and grant applications.
Curious? Read it here: dirkhovy.com/post/2025_11...
#MemoryModay#NLProc 'Leveraging Social Interactions to Detect Misinformation on Social Media' by Fornaciari et al. (2023) uses combined text and network analysis to spot unreliable threads.
#MemoryModay#NLProc 'Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models' by @paul-rottger.bsky.social et al. (2022). A suite of tests for 10 languages.
#TBT#NLProc 'Compromesso! Italian Many-Shot Jailbreaks Undermine LLM Safety' by Pernisi, @dirkhovy.bsky.social, @paul-rottger.bsky.social (2024). Paper highlights LLM vulnerability through Italian demos, more demos = more attack chances.
📢 Postdoc position 📢
I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1.
(Different from NYU Faculty Fellows, which are also great but less connected to my lab.)
Link in 🧵
I will be @euripsconf.bsky.social this week to present our paper as non-archival at the PAIG workshop (Beyong Regulation:
Private Governance & Oversight Mechanisms for AI). Very much looking forward to the discussions!
If you are at #EurIPS and want to chat about LLM's training data. Reach out!
Another exhausting day in the lab… conducting very rigorous panettone analysis. Pandoro was evaluated too, because we believe in fair experimental design.
#TBT#NLProc '@donyarn.bsky.social & @dirkhovy.bsky.social's 2024 paper, 'Conversations as a Source for Teaching Scientific Concepts' turns video dialogues into effective teaching tools.'
#MemoryModay#NLProc 'Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers,' by Nguyen & @dirkhovy.bsky.social decodes speaker reviews for user preferences using topic models. Domain knowledge needed for market analysis.
“Teacher Demonstrations in a BabyLM’s Zone of Proximal Development for Contingent Multi-Turn Interaction” selected for an Outstanding Paper Award at the BabyLM Challenge & Workshop!
#TBT#NLProc ' Attanasio et al. study asks 'Is It Worth the (Environmental) Cost?' analyzing continuous training for language models. Balances benefits, environmental impacts, for responsible use. #Sustainability'
#MemoryModay#NLProc 'Universal Joy: A Data Set and Results for Classifying Emotions Across Languages' by Lamprinidis et al. (2021) explores how AI research affects our planet.
#TBT#NLProc "Explaining Speech Classification Models" by Pastor et al. (2024) makes speech classification more transparent! 🔍 Their research reveals which words matter most and how tone and background noise impact decisions.
#TBT#NLProc Hessenthaler et al.'s 2022 work delves into AI's link with fairness & energy reduction in English NLP models, challenging bias reduction theories. #AI#sustainability
Excited to head to Suzhou for the 30th edition of #EMNLP2025! 🎉 Had the great honor to serve as general chair this year. Looking forward to catching up with everyone and seeing some amazing #NLP research! 🤓📚
🗓️ Nov 5 – Main Conference Posters
Personalization up to a Point
🧠 In the context of content moderation, we show that fully personalized models can perpetuate hate speech, and propose a policy-based method to impose legal boundaries.
📍 Hall C | 11:00–12:30
🗓️ Nov 5 – Main Conference Posters
📘 Biased Tales
A dataset of 5k short LLM bedtime stories generated across sociocultural axes with an evaluation taxonomy for character-centric attributes and context-centric attributes.
📍 Hall C | 11:00–12:30
🗓️ Nov 5 - Demo
Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
🧩 Co-DETECT – an iterative, human-LLM collaboration framework for surfacing edge cases and refining annotation codebooks in text classification.
📍 Demo Session 2 – Hall C3 | 14:30–16:00
🗓️ Nov 6 – Findings Posters
The “r” in “woman” stands for rights.
💬 We propose a taxonomy of social dynamics in implicit misogyny (EN,IT), auditing 9 LLMs — and they consistently fail. The more social knowledge a message requires, the worse they perform.
📍 Hall C | 12:30–13:30
🗓️ Nov 7 – Main Conference Posters
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
🧍 Discussing different applications for LLM persona prompting, and how to measure their success.
📍 Hall C | 10:30–12:00
🗓️ Nov 7 – Main Conference Posters
TrojanStego: Your Language Model Can Secretly Be a Steganographic Privacy-Leaking Agent
🔒 LLMs can be fine-tuned to leak secrets via token-based steganography!
📍 Hall C | 10:30–12:00
🗓️ Nov 8 – WiNLP Workshops
No for Some, Yes for Others
🤖 We investigate how sociodemographic persona prompts affect false refusal behaviors in LLMs. Model and task type are the dominant factors driving these refusals.
🗓️ Nov 8 – NLPerspectives Workshops
Balancing Quality and Variation
🧮 For datasets to represent diverse opinions, they must preserve variation while filtering out spam. We evaluate annotator filtering heuristics and show how they often remove genuine variation.
🗓️ Nov 8 – BabyLM Workshop
Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction
👶 ContingentChat, a Teacher–Student framework that benchmarks and improves multi-turn contingency in a BabyLM trained on 100M words.
🗓️ Nov 8 – STARSEM Workshop
Generalizability of Media Frames: Corpus Creation and Analysis Across Countries
📰 We investigate how well media frames generalize across different media landscapes. The 15 MFC frames remain broadly applicable, with minor revisions of the guidelines.
🗓️ Nov 6 – Oral Presentation (TACL)
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
⚖️ A foundation for measuring LLM political bias in realistic user conversations.
📍 A303 | 10:30–12:00