See full post

Elie

Followers · Following

Training LLM's at huggingface | hf.co/science

Joined October 2024

Posts Replies Media Original posts Likes

Reposted by Elie
anton-l anton-l.bsky.social · Feb 12, 2025
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Elie
Quentin Gallouédec qgallouedec.hf.co · Jan 25, 2025
Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go! github.com/huggingface/...
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

View on Bluesky Show all post labels

Reposted by Elie
Lewis Tunstall lewtun.bsky.social · Jan 25, 2025
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open! Follow along: github.com/huggingface/...
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

View on Bluesky Show all post labels

Reposted by Elie
anton-l anton-l.bsky.social · Dec 19, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Elie eliebak.hf.co · Dec 11, 2024
WOW, Gemini Flash 2.0 is really impressive. Wondering about the size of this supposedly smol model. One odd thing is that the model seems to lose some ability with long contexts compared to Flash 1.5. If any google friends could share insights, I'd love to hear them!

View on Bluesky Download image Show all post labels

Elie eliebak.hf.co · Dec 4, 2024
Hey, I'll be at neurips next week! My DM are open if you want to meet and talk about pre-training/data/whatever you want 🫡

View on Bluesky Show all post labels

Elie eliebak.hf.co · Dec 3, 2024
Google patent on "Training of large neural network". 😮 I don't know if this give much information but by going quickly through it seems that: - They are not only using "causal language modeling task" as a pre-training task but also "span corruption" and "prefix modeling". (ref [0805]-[0091])

View on Bluesky Download image Show all post labels
Elie eliebak.hf.co · Dec 3, 2024
- They use some kind of metadata token to give information about toxicity, data leakage but also "quality" token? - [0118] talk about using some kind of lora's during the finetuning/alignment phase to adapt on multiple downstream task - ~[0154] some memory evaluation technique?

View on Bluesky Show all post labels
Elie eliebak.hf.co · Dec 3, 2024
Link: www.freepatentsonline.com/y2024/037844... I've probably missed a lot, feel free to add more ⬇️
https://www.freepatentsonline.com/y2024/0378441.html

freepatentsonline.com

View on Bluesky Show all post labels

Reposted by Elie
merve merve.bsky.social · Dec 2, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Elie
Loubna Ben Allal loubnabnl.hf.co · Nov 30, 2024
📬 Summarize and rewrite your text/emails faster, and offline! Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...
smollm/smol_tools at main · huggingface/smollm

Everything about the SmolLM & SmolLM2 family of models - huggingface/smollm

github.com

View on Bluesky Show all post labels

Elie eliebak.hf.co · Nov 30, 2024
What else should we log during LLM training? Right now, it's just loss, grad_norm, and evals, but I want to log more to have a better understanding of pre-training. Thinking about adding stuff like entropix metrics (agreement, varentropy?) Any thoughts or cool ideas?

View on Bluesky Show all post labels

Reposted by Elie
Xenova xenova.bsky.social · Nov 27, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Elie
muellerzr muellerzr.bsky.social · Nov 26, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Elie eliebak.hf.co · Nov 27, 2024
10000% agree with omar, this is totally disproportionate
- osanseviero osanseviero.bsky.social · Nov 27, 2024
  [Not loaded yet]
View on Bluesky Show all post labels

Elie eliebak.hf.co · Nov 27, 2024
We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗 US: apply.workable.com/huggingface/... EMEA: apply.workable.com/huggingface/...
ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

apply.workable.com

View on Bluesky Show all post labels

Reposted by Elie
jsulz handle.invalid · Nov 26, 2024
On the Xet team at @huggingface.bsky.social we're always looking for ways to move bytes to computer near you as fast as possible. To do this, we're redesigning the upload and download infrastructure on the Hub. This post describes how, check the thread for details 🧵 huggingface.co/blog/rearchi...
Rearchitecting Hugging Face Uploads and Downloads

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

View on Bluesky Show all post labels

Elie eliebak.hf.co · Nov 26, 2024
The SmolLM series has a new member: say hi to SmolVLM! 🤏 It uses a preliminary 16k context version of SmolLM2 to tackle long-context vision documents and higher-res images. And yes, we’re cooking up versions with bigger context lengths. 👨‍🍳 Try it yourself here: huggingface.co/spaces/Huggi...

View on Bluesky Download image Show all post labels

Reposted by Elie
merve merve.bsky.social · Nov 26, 2024
Small yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

View on Bluesky Download image Show all post labels

Reposted by Elie
Andi andimara.bsky.social · Nov 26, 2024
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

View on Bluesky Download image Show all post labels

Reposted by Elie
anton-l anton-l.bsky.social · Nov 25, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Elie
Alexander Doria dorialexander.bsky.social · Nov 24, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Elie eliebak.hf.co · Oct 31, 2024
Hey babe, wake up, we just dropped a new SmolLM 🫡 Fully open-source. We’ll release a blog post soon to detail how we trained it. I'm also super excited about all the demos that will come in the next few days, especially looking forward for people to test it with entropix 🐸

View on Bluesky Download image Show all post labels

An unhandled error has occurred. Reload 🗙