Andi
Multimodal research @huggingface
- New Blog📖✨: nanoVLM: The simplest way to train your own Vision-Language Model in pure PyTorch explained step-by-step! Easy to read, even easier to use. Train your first VLM today!
- Train your Vision-Language Model in just two commands: > git clone github.com/huggingface/... > python train.py
- Read the blog: huggingface.co/blog/nanovlm
- Real-time SmolVLM in a web browser with transformers.js. All running locally with no installs. Just open the website.
- Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models. 🔥 Explaining how to create a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! huggingface.co/papers/2504....
- Here are the coolest insights from our experiments: ✨ Longer context = Big wins: Increasing the context length from 2K to 16K gave our tiny VLMs a 60% performance boost!
- ✨ Smaller is smarter with SigLIP: Surprise! Smaller LLMs didn't benefit from the usual large SigLIP (400M). Instead, we use the 80M base SigLIP that performs equally well at just 20% of the original size!
-
View full threadIf you’re into efficient multimodal models, you’ll love this one. Check out the paper: huggingface.co/papers/2504....
- 🚀 We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR! 📄✨ Lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM ⚡ SOTA in document conversion, beating every competing model we tested (including models 27x more params) 🤯 But how? 🧶⬇️
- How does SmolDocling beat models 27× bigger? SmolDocling transforms any document into structured metadata with DocTags, being SOTA in: ✅ Full-page conversion ✅ Layout identification ✅ Equations, tables, charts, plots, code OCR
- What makes it unique? 📌 Handles everything a document has: tables, charts, code, equations, lists, and more 📌 Works beyond scientific papers—supports business docs, patents, and forms 📌 It runs with less than 1GB of RAM, so running at large batch sizes is super cheap!
-
View full threadSmolDocling is available today 🏗️ 🔗 Model: huggingface.co/ds4sd/SmolDo... 📖 Paper: huggingface.co/papers/2503.... 🤗 Space: huggingface.co/spaces/ds4sd... Try it and let us know what you think! 💬
- Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs - 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Integrated on @hf.co from Day 0! Check out their blog! huggingface.co/blog/aya-vis...
- Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s 🔥 Inspired by our team's effort to open-source DeepSeek's R1, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any SmolVLM—or create your own custom VLMs!
- Launching the training for SmolVLM 256M is as simple as: ./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh Then we use wandb to track the losses. Check out the file to find out details!
- Post training, you can run the evaluation on all of these tasks by running: sbatch vision/experiments/evaluation/vloom/async_evals_tr_346/run_evals_0_shots_val_2048 . slurm
-
View full threadAnd that was why we didn't release this before. It's live research code. Most gets rewritten fairly often, and some parts have been the same for years. It works, it manages to produce SOTA results at 256M and 80B sizes, but it's not production code. Go check it out: github.com/huggingface/...
- Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀
- Smol but mighty: • 256M delivers 80% of the performance of our 2.2B model. • 500M hits 90%. Both beat our SOTA 80B model from 17 months ago! 🎉 Efficiency 🤝 Performance Explore the collection here: huggingface.co/collections/... Blog: huggingface.co/blog/smolervlm
- Our models are integrated into ColiPali, delivering SOTA retrieval speeds with performance rivaling models 10x their size. 🏃♂️💨 SmolVLM makes it faster and cheaper to build searchable databases. Real-world impact, unlocked.
-
View full threadLinks :D Demo: huggingface.co/spaces/Huggi... Models: huggingface.co/collections/... Blog: huggingface.co/blog/smolervlm
- Reposted by AndiWe outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We're open sourcing the full recipe and sharing a detailed blog post 👇
- Do you want to try Llama 3.3? huggingface.co/chat/
- Reposted by AndiJust been messing with SmolVLM - visual language model that is _Smol_. 2B params, does some amazing stuff, with little memory, quickly. I'll post a couple of examples below. Super cool stuff from @merve.bsky.social & @andimara.bsky.social!
- Reposted by Andi📬 Summarize and rewrite your text/emails faster, and offline! Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...
- Reposted by Andiit only takes a single CLI command to kick-off a Direct Preference Optimization fine-tuning run on SmolVLM huggingface.co/blog/smolvlm... you're welcome
- Do you want to help us build SmolVLM2? We are hiring interns! Remote roles from EMEA or US. apply.workable.com/huggingface/...
- Reposted by AndiLet's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
- Reposted by AndiThe authors of ColPali trained a retrieval model based on SmolVLM 🤠 TLDR; - ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks - ColSmolVLM is more memory efficient than ColQwen2 💗 Find the model here huggingface.co/vidore/colsm...
- Reposted by Andi[Not loaded yet]
- Reposted by AndiIt's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. 🧵 Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.
- Reposted by Andi[Not loaded yet]