See full post

Quentin Gallouédec

qgallouedec.hf.co

Followers · Following

PhD - Research @hf.co 🤗 TRL maintainer

Joined November 2024

Posts Replies Media Original posts Likes

Quentin Gallouédec qgallouedec.hf.co · May 2, 2025
It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. 🦾

View on Bluesky Download image (1)Download image (2)Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Apr 26, 2025
just pip install trl

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Apr 20, 2025
How many of these 8 things did you know? huggingface.co/blog/qgallou...
Gotchas in Tokenizer Behavior Every Developer Should Know

A Blog post by Quentin Gallouédec on Hugging Face

huggingface.co

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Jan 30, 2025
🚀 TRL 0.14 – Featuring GRPO! 🚀 TRL 0.14 brings *GRPO*, the RL algorithm behind 🐳 DeekSeek-R1 . ⚡ Blazing fast generation with vLLM integration. 📉 Optimized training with DeepSpeed ZeRO 1/2/3.

View on Bluesky Download image Show all post labels

Reposted by Quentin Gallouédec
Thomas Wolf thomwolf.bsky.social · Jan 28, 2025
[Not loaded yet]

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Jan 25, 2025
Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go! github.com/huggingface/...
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Jan 22, 2025
The algorithm behind DeepSeek's R1 model (aka GRPO) now lives in TRL main branch! Go and test it!

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Jan 6, 2025
[Stonks] TRL is a Python library for training language models. It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Dec 24, 2024
🎅 Santa Claus has delivered the ultimate guide to understand OOM error (link in comment)

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Dec 17, 2024
Top 1 Python dev today. Third time since september 🫨

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Dec 17, 2024
🚨 TRL 0.13 is out! 🤗 Featuring a Process-supervised Reward Models (PRM) Trainer 🏋️ PRMs empower LLMs to "think before answering"—a key feature behind OpenAI's o1 launch just two weeks ago. 🚀

View on Bluesky Download image Show all post labels

Reposted by Quentin Gallouédec
Lewis Tunstall lewtun.bsky.social · Dec 16, 2024
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We're open sourcing the full recipe and sharing a detailed blog post 👇

View on Bluesky Download image Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Dec 3, 2024
The number of TRL models on the 🤗 Hub has risen x60 this year! 📈 How about doing the same next year?

View on Bluesky Download image Show all post labels

Reposted by Quentin Gallouédec
Ben Burtenshaw benburtenshaw.bsky.social · Dec 2, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Nov 27, 2024
Join us at Hugging Face as an intern if you want to contribute to amazing open-source projects, and develop LLM's best finetuning library, aka TRL. 🧑‍💻 Full remote 🤯 Exciting subjects 🌍 Anywhere in the world 🤸🏻 Flexible working hours Link to apply in comment 👇

View on Bluesky Download image Show all post labels

Reposted by Quentin Gallouédec
Elie eliebak.hf.co · Nov 27, 2024
We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗 US: apply.workable.com/huggingface/... EMEA: apply.workable.com/huggingface/...
ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

apply.workable.com

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Nov 25, 2024
I'd love to! We have a lot of room for improvement here!
- Ben Burtenshaw benburtenshaw.bsky.social · Nov 25, 2024
  [Not loaded yet]
View on Bluesky Show all post labels

Reposted by Quentin Gallouédec
Ben Burtenshaw benburtenshaw.bsky.social · Nov 25, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Quentin Gallouédec
Ben Burtenshaw benburtenshaw.bsky.social · Nov 25, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Quentin Gallouédec
Thomas Wolf thomwolf.bsky.social · Nov 24, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Nov 22, 2024
How can you avoid the temptation to use a subprocess for sub-commands? This blog post from @muellerzr.bsky.social saved my day. muellerzr.github.io/til/argparse...
Zach Mueller - Calling argparse without subprocess

How to use argparse without the CLI

muellerzr.github.io

View on Bluesky Show all post labels

Quentin Gallouédec qgallouedec.hf.co · Nov 21, 2024
Finetune SmolLM2 with TRL!
- Ben Burtenshaw benburtenshaw.bsky.social · Nov 21, 2024
  [Not loaded yet]
View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙