🚀 We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR! 📄✨
Lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM ⚡
SOTA in document conversion, beating every competing model we tested (including models 27x more params) 🤯
But how? 🧶⬇️
How does SmolDocling beat models 27× bigger? SmolDocling transforms any document into structured metadata with DocTags, being SOTA in:
✅ Full-page conversion
✅ Layout identification
✅ Equations, tables, charts, plots, code OCR
What makes it unique?
📌 Handles everything a document has: tables, charts, code, equations, lists, and more
📌 Works beyond scientific papers—supports business docs, patents, and forms
📌 It runs with less than 1GB of RAM, so running at large batch sizes is super cheap!
Mar 17, 2025 15:53At only 256M parameters, SmolDocling outperforms much larger models on key document conversion tasks:
🖋️ Full-page transcription: Beats models 27× bigger!
📑 Equations: Matches or beats leading models like GOT
💻 Code recognition: We introduce the first benchmark for code OCR
SmolDocling is available today 🏗️
🔗 Model:
huggingface.co/ds4sd/SmolDo...
📖 Paper:
huggingface.co/papers/2503....
🤗 Space:
huggingface.co/spaces/ds4sd...
Try it and let us know what you think! 💬

ds4sd/SmolDocling-256M-preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.