Elie
Training LLM's at huggingface | hf.co/science
- Reposted by Elie[Not loaded yet]
- Reposted by ElieLast moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go! github.com/huggingface/...
- Reposted by ElieWe are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open! Follow along: github.com/huggingface/...
- Reposted by Elie[Not loaded yet]
- WOW, Gemini Flash 2.0 is really impressive. Wondering about the size of this supposedly smol model. One odd thing is that the model seems to lose some ability with long contexts compared to Flash 1.5. If any google friends could share insights, I'd love to hear them!
- Hey, I'll be at neurips next week! My DM are open if you want to meet and talk about pre-training/data/whatever you want 🫡
- Google patent on "Training of large neural network". 😮 I don't know if this give much information but by going quickly through it seems that: - They are not only using "causal language modeling task" as a pre-training task but also "span corruption" and "prefix modeling". (ref [0805]-[0091])
- - They use some kind of metadata token to give information about toxicity, data leakage but also "quality" token? - [0118] talk about using some kind of lora's during the finetuning/alignment phase to adapt on multiple downstream task - ~[0154] some memory evaluation technique?
- Link: www.freepatentsonline.com/y2024/037844... I've probably missed a lot, feel free to add more ⬇️
- Reposted by Elie[Not loaded yet]
- Reposted by Elie📬 Summarize and rewrite your text/emails faster, and offline! Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...
- What else should we log during LLM training? Right now, it's just loss, grad_norm, and evals, but I want to log more to have a better understanding of pre-training. Thinking about adding stuff like entropix metrics (agreement, varentropy?) Any thoughts or cool ideas?
- Reposted by Elie[Not loaded yet]
- Reposted by Elie[Not loaded yet]
- 10000% agree with omar, this is totally disproportionate
- We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗 US: apply.workable.com/huggingface/... EMEA: apply.workable.com/huggingface/...
- Reposted by ElieOn the Xet team at @huggingface.bsky.social we're always looking for ways to move bytes to computer near you as fast as possible. To do this, we're redesigning the upload and download infrastructure on the Hub. This post describes how, check the thread for details 🧵 huggingface.co/blog/rearchi...
- The SmolLM series has a new member: say hi to SmolVLM! 🤏 It uses a preliminary 16k context version of SmolLM2 to tackle long-context vision documents and higher-res images. And yes, we’re cooking up versions with bigger context lengths. 👨🍳 Try it yourself here: huggingface.co/spaces/Huggi...
- Reposted by ElieSmall yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...
- Reposted by ElieLet's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
- Reposted by Elie[Not loaded yet]
- Reposted by Elie[Not loaded yet]
- Hey babe, wake up, we just dropped a new SmolLM 🫡 Fully open-source. We’ll release a blog post soon to detail how we trained it. I'm also super excited about all the demos that will come in the next few days, especially looking forward for people to test it with entropix 🐸