🚀 We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR! 📄✨
Lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM ⚡
SOTA in document conversion, beating every competing model we tested (including models 27x more params) 🤯
But how? 🧶⬇️
Mar 17, 2025 15:53How does SmolDocling beat models 27× bigger? SmolDocling transforms any document into structured metadata with DocTags, being SOTA in:
✅ Full-page conversion
✅ Layout identification
✅ Equations, tables, charts, plots, code OCR
What makes it unique?
📌 Handles everything a document has: tables, charts, code, equations, lists, and more
📌 Works beyond scientific papers—supports business docs, patents, and forms
📌 It runs with less than 1GB of RAM, so running at large batch sizes is super cheap!
At only 256M parameters, SmolDocling outperforms much larger models on key document conversion tasks:
🖋️ Full-page transcription: Beats models 27× bigger!
📑 Equations: Matches or beats leading models like GOT
💻 Code recognition: We introduce the first benchmark for code OCR