See full post

Andi

andimara.bsky.social

Followers · Following

Multimodal research @huggingface

Joined November 2024

Posts Replies Media Original posts Likes

Andi andimara.bsky.social · May 21, 2025
New Blog📖✨: nanoVLM: The simplest way to train your own Vision-Language Model in pure PyTorch explained step-by-step! Easy to read, even easier to use. Train your first VLM today!

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · May 21, 2025
Train your Vision-Language Model in just two commands: > git clone github.com/huggingface/... > python train.py
GitHub - huggingface/nanoVLM: The simplest, fastest repository for training/finetuning small-sized VLMs.

The simplest, fastest repository for training/finetuning small-sized VLMs. - huggingface/nanoVLM

github.com

View on Bluesky Show all post labels
Andi andimara.bsky.social · May 21, 2025
Read the blog: huggingface.co/blog/nanovlm
nanoVLM: The simplest repository to train your VLM in pure PyTorch

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

View on Bluesky Show all post labels

Andi andimara.bsky.social · May 14, 2025
Real-time SmolVLM in a web browser with transformers.js. All running locally with no installs. Just open the website.

View on Bluesky Download video Show all post labels
Andi andimara.bsky.social · May 14, 2025
Link: webml-community-smolvlm-realtime-webgpu.static.hf.space/index.html
Camera Interaction App

webml-community-smolvlm-realtime-webgpu.static.hf.space

View on Bluesky Show all post labels

Andi andimara.bsky.social · Apr 8, 2025
Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models. 🔥 Explaining how to create a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! huggingface.co/papers/2504....
Paper page - SmolVLM: Redefining small and efficient multimodal models

Join the discussion on this paper page

huggingface.co

View on Bluesky Show all post labels
Andi andimara.bsky.social · Apr 8, 2025
Here are the coolest insights from our experiments: ✨ Longer context = Big wins: Increasing the context length from 2K to 16K gave our tiny VLMs a 60% performance boost!

View on Bluesky Show all post labels
Andi andimara.bsky.social · Apr 8, 2025
✨ Smaller is smarter with SigLIP: Surprise! Smaller LLMs didn't benefit from the usual large SigLIP (400M). Instead, we use the 80M base SigLIP that performs equally well at just 20% of the original size!

View on Bluesky Show all post labels
View full thread
Andi andimara.bsky.social · Apr 8, 2025
If you’re into efficient multimodal models, you’ll love this one. Check out the paper: huggingface.co/papers/2504....
Paper page - SmolVLM: Redefining small and efficient multimodal models

Join the discussion on this paper page

huggingface.co

View on Bluesky Show all post labels

Andi andimara.bsky.social · Mar 17, 2025
🚀 We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR! 📄✨ Lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM ⚡ SOTA in document conversion, beating every competing model we tested (including models 27x more params) 🤯 But how? 🧶⬇️

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Mar 17, 2025
How does SmolDocling beat models 27× bigger? SmolDocling transforms any document into structured metadata with DocTags, being SOTA in: ✅ Full-page conversion ✅ Layout identification ✅ Equations, tables, charts, plots, code OCR

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Mar 17, 2025
What makes it unique? 📌 Handles everything a document has: tables, charts, code, equations, lists, and more 📌 Works beyond scientific papers—supports business docs, patents, and forms 📌 It runs with less than 1GB of RAM, so running at large batch sizes is super cheap!

View on Bluesky Download image Show all post labels
View full thread
Andi andimara.bsky.social · Mar 17, 2025
SmolDocling is available today 🏗️ 🔗 Model: huggingface.co/ds4sd/SmolDo... 📖 Paper: huggingface.co/papers/2503.... 🤗 Space: huggingface.co/spaces/ds4sd... Try it and let us know what you think! 💬
ds4sd/SmolDocling-256M-preview · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

View on Bluesky Show all post labels

Andi andimara.bsky.social · Mar 5, 2025
Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs - 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Integrated on @hf.co from Day 0! Check out their blog! huggingface.co/blog/aya-vis...
A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

View on Bluesky Show all post labels

Andi andimara.bsky.social · Jan 31, 2025
Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s 🔥 Inspired by our team's effort to open-source DeepSeek's R1, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any SmolVLM—or create your own custom VLMs!

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
Launching the training for SmolVLM 256M is as simple as: ./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh Then we use wandb to track the losses. Check out the file to find out details!

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
Post training, you can run the evaluation on all of these tasks by running: sbatch vision/experiments/evaluation/vloom/async_evals_tr_346/run_evals_0_shots_val_2048 . slurm

View on Bluesky Download image Show all post labels
View full thread
Andi andimara.bsky.social · Jan 31, 2025
And that was why we didn't release this before. It's live research code. Most gets rewritten fairly often, and some parts have been the same for years. It works, it manages to produce SOTA results at 256M and 80B sizes, but it's not production code. Go check it out: github.com/huggingface/...
smollm/vision at main · huggingface/smollm

Everything about the SmolLM2 and SmolVLM family of models - huggingface/smollm

github.com

View on Bluesky Show all post labels

Andi andimara.bsky.social · Jan 23, 2025
Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 23, 2025
Smol but mighty: • 256M delivers 80% of the performance of our 2.2B model. • 500M hits 90%. Both beat our SOTA 80B model from 17 months ago! 🎉 Efficiency 🤝 Performance Explore the collection here: huggingface.co/collections/... Blog: huggingface.co/blog/smolervlm

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 23, 2025
Our models are integrated into ColiPali, delivering SOTA retrieval speeds with performance rivaling models 10x their size. 🏃‍♂️💨 SmolVLM makes it faster and cheaper to build searchable databases. Real-world impact, unlocked.

View on Bluesky Download image Show all post labels
View full thread
Andi andimara.bsky.social · Jan 23, 2025
Links :D Demo: huggingface.co/spaces/Huggi... Models: huggingface.co/collections/... Blog: huggingface.co/blog/smolervlm
SmolVLM - a Hugging Face Space by HuggingFaceTB

Discover amazing ML apps made by the community

huggingface.co

View on Bluesky Show all post labels

Reposted by Andi
Lewis Tunstall lewtun.bsky.social · Dec 16, 2024
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We're open sourcing the full recipe and sharing a detailed blog post 👇

View on Bluesky Download image Show all post labels

Andi andimara.bsky.social · Dec 6, 2024
Do you want to try Llama 3.3? huggingface.co/chat/
HuggingChat

Making the community's best AI chat models available to everyone.

huggingface.co

View on Bluesky Show all post labels

Reposted by Andi
Tom Elliot tomelliot.net · Dec 3, 2024
Just been messing with SmolVLM - visual language model that is _Smol_. 2B params, does some amazing stuff, with little memory, quickly. I'll post a couple of examples below. Super cool stuff from @merve.bsky.social & @andimara.bsky.social!

View on Bluesky Show all post labels

Reposted by Andi
Loubna Ben Allal loubnabnl.hf.co · Nov 30, 2024
📬 Summarize and rewrite your text/emails faster, and offline! Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...
smollm/smol_tools at main · huggingface/smollm

Everything about the SmolLM & SmolLM2 family of models - huggingface/smollm

github.com

View on Bluesky Show all post labels

Reposted by Andi
merve merve.bsky.social · Nov 28, 2024
it only takes a single CLI command to kick-off a Direct Preference Optimization fine-tuning run on SmolVLM huggingface.co/blog/smolvlm... you're welcome

View on Bluesky Download image Show all post labels

Andi andimara.bsky.social · Nov 29, 2024
Do you want to help us build SmolVLM2? We are hiring interns! Remote roles from EMEA or US. apply.workable.com/huggingface/...
- Andi andimara.bsky.social · Nov 26, 2024
  Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
View on Bluesky Show all post labels

Reposted by Andi
Andi andimara.bsky.social · Nov 26, 2024
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

View on Bluesky Download image Show all post labels

Reposted by Andi
merve merve.bsky.social · Nov 27, 2024
The authors of ColPali trained a retrieval model based on SmolVLM 🤠 TLDR; - ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks - ColSmolVLM is more memory efficient than ColQwen2 💗 Find the model here huggingface.co/vidore/colsm...

View on Bluesky Download image Show all post labels

Reposted by Andi
frimelle frimelle.bsky.social · Nov 28, 2024
[Not loaded yet]

View on Bluesky Show all post labels

Reposted by Andi
merve merve.bsky.social · Nov 27, 2024
It's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. 🧵 Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.

View on Bluesky Show all post labels

Reposted by Andi
Leandro von Werra lvwerra.bsky.social · Nov 26, 2024
[Not loaded yet]

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙