Quentin Gallouédec
PhD - Research @hf.co 🤗
TRL maintainer
- It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. 🦾
- just pip install trl
- How many of these 8 things did you know? huggingface.co/blog/qgallou...
- 🚀 TRL 0.14 – Featuring GRPO! 🚀 TRL 0.14 brings *GRPO*, the RL algorithm behind 🐳 DeekSeek-R1 . ⚡ Blazing fast generation with vLLM integration. 📉 Optimized training with DeepSpeed ZeRO 1/2/3.
- Reposted by Quentin Gallouédec[Not loaded yet]
- Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go! github.com/huggingface/...
- The algorithm behind DeepSeek's R1 model (aka GRPO) now lives in TRL main branch! Go and test it!
- [Stonks] TRL is a Python library for training language models. It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.
- 🎅 Santa Claus has delivered the ultimate guide to understand OOM error (link in comment)
- Top 1 Python dev today. Third time since september 🫨
- 🚨 TRL 0.13 is out! 🤗 Featuring a Process-supervised Reward Models (PRM) Trainer 🏋️ PRMs empower LLMs to "think before answering"—a key feature behind OpenAI's o1 launch just two weeks ago. 🚀
- Reposted by Quentin GallouédecWe outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We're open sourcing the full recipe and sharing a detailed blog post 👇
- The number of TRL models on the 🤗 Hub has risen x60 this year! 📈 How about doing the same next year?
- Reposted by Quentin Gallouédec[Not loaded yet]
- Join us at Hugging Face as an intern if you want to contribute to amazing open-source projects, and develop LLM's best finetuning library, aka TRL. 🧑💻 Full remote 🤯 Exciting subjects 🌍 Anywhere in the world 🤸🏻 Flexible working hours Link to apply in comment 👇
- Reposted by Quentin GallouédecWe’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗 US: apply.workable.com/huggingface/... EMEA: apply.workable.com/huggingface/...
- I'd love to! We have a lot of room for improvement here!
- Reposted by Quentin Gallouédec[Not loaded yet]
- Reposted by Quentin Gallouédec[Not loaded yet]
- Reposted by Quentin Gallouédec[Not loaded yet]
- How can you avoid the temptation to use a subprocess for sub-commands? This blog post from @muellerzr.bsky.social saved my day. muellerzr.github.io/til/argparse...
- Finetune SmolLM2 with TRL!