Marzieh Fadaee
seeks to understand language.
Head of Cohere Labs
@Cohere_Labs @Cohere
PhD from @UvA_Amsterdam
marziehf.github.io
- Reposted by Marzieh Fadaee[Not loaded yet]
- Reposted by Marzieh FadaeeWe’re not your average lab. We’re a hybrid research environment dedicated to revolutionizing the ML space. And we’re hiring a Senior Research Scientist to co-create with us. If you believe in research as a shared, global effort — this is your chance.
- I'm excited to share that I'll be stepping into the role of Head of @cohereforai.bsky.social. It's an honor and a responsibility to lead such an extraordinary group of researchers pushing the boundaries of AI research.
- Reposted by Marzieh FadaeeWhile effective for chess♟️, Elo ratings struggle with LLM evaluation due to volatility and transitivity issues. New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
- Breaking into AI research is harder than ever, and early-career researchers face fewer chances to get started. Entry points matter. We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
- Over the years, I've watched scholars go from their very first project → to their first paper → to research careers they once thought were out of reach. It’s been incredible to see what can happen when someone gets their first real chance and works hard to make it count 🏅
- Reposted by Marzieh Fadaee[Not loaded yet]
- ACL day 2 ✨
- 🖼️ Most text-to-image models only really work in English. This limits who can use them and whose imagination they reflect. We asked: can we build a small, efficient model that understands prompts in multiple languages natively?
- Everyone talks about GEB (I agree, it's a gem) but Hofstadter's Analogy book is criminally underrated. If you're working on learning intelligence through language understanding, it’s a must-read.
- Reposted by Marzieh Fadaee🍋 Squeezing the most of few samples - check out our LLMonade recipe for few-sample test-time scaling in multitask environments. Turns out that standard methods miss out on gains on non-English languages. We propose more robust alternatives. Very proud of this work that our scholar Ammar led! 🚀
- London has me under its spell. every. single. visit.
- Reposted by Marzieh Fadaee🚨LLM safety research needs to be at least as multilingual as our models. What's the current stage and how to progress from here? This work led by @yongzx.bsky.social has answers! 👇
- Reposted by Marzieh FadaeeOver 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.
- Reposted by Marzieh Fadaee📢 The Copenhagen NLP Symposium on June 20th! - Invited talks by @loubnabnl.hf.co (HF) @mziizm.bsky.social (Cohere) @najoung.bsky.social (BU) @kylelo.bsky.social (AI2) Yohei Oseki (UTokyo) - Exciting posters by other participants Register to attend and/or present your poster at cphnlp.github.io /1
- Reposted by Marzieh Fadaee[Not loaded yet]
- Reposted by Marzieh Fadaee[Not loaded yet]
- 1/ Science is only as strong as the benchmarks it relies on. So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark? We took a deep dive into Chatbot Arena to find out. 🧵
- 2/ 🧪 With theory, simulations, and real-world experiments, we stress-test Arena’s fairness and found: - Undisclosed private model testing warps results - Silent model deprecation undermines rank stability - Data access disparities between providers that enable overfitting
- Not in Singapore for #ICLR2025 but our lab’s work is! In particular, I am very proud of these collaborations: ✨INCLUDE (spotlight) — models fail to grasp regional nuances across languages 💎To Code or Not to Code (poster) — code is key for generalizing beyond coding tasks
- 🚨 Excited to share our latest paper! Multilingual LLMs are getting really good. But the way we evaluate them? Not the best sometimes. 🌟 We show how decades of lessons from Machine Translation can help us fix it
- 📖New preprint with Eleftheria Briakou @swetaagrawal.bsky.social @mziizm.bsky.social @kocmitom.bsky.social! arxiv.org/abs/2504.11829 🌍It reflects experiences from my personal research journey: coming from MT into multilingual LLM research I missed reliable evaluations and evaluation research…
- Very excited to release Kaleidoscope—a multilingual, multimodal evaluation set for VLMs, built as part of our open-science initiative! 🌍 18 languages (high-, mid-, low-) 📚 21k questions (55% require image understanding) 🧪 STEM, social science, reasoning, and practical skills
- Reposted by Marzieh FadaeeBig news from WMT! 🎉 We are expanding beyond MT and launching a new multilingual instruction shared task. Our goal is to foster truly multilingual LLM evaluation and best practices in automatic and human evaluation. Join us and build the winning multilingual system! www2.statmt.org/wmt25/multil...
- Reposted by Marzieh Fadaee☀️ Summer internship at Cohere! Are you excited about multilingual evaluation, human judgment, or meta-eval? Come help us explore how a rigorous eval really looks like while questioning the status quo in LLM evaluation. I’m looking for an intern (EU timezone preferred), are you interested? Ping me!
- Command🅰️ technical report is out. Information-dense. Detailed. Pretty. Simply A+! 💎: cohere.com/research/pap...
- I'm excited to share the tech report for our @cohere.com @cohereforai.bsky.social Command A and Command R7B models. We highlight our novel approach to model training including self-refinement algorithms and model merging techniques at scale. Read more below! ⬇️
- I am so proud of Cohere's dedication to open science and its impact on the community! ✨
- Good morning Paris
- ✨👓 Aya Vision is here 👓✨ A multilingual, multimodal model designed to understand across languages and modalities (text, images, etc) to bridge the language gap and empower global users!