Xuan Son Nguyen
Software Engineer @ Hugging Face 🤗
- Very nice touch, Gmail 😅
- Part 2 of my journey building a smart home! 🚀 In this part: > ESPHome & custom component > RF433 receiver & transmitter > Hassio custom addon
- Link to article: blog.ngxson.com/building-my-...
- Just published a new article on my blog 🏃♂️ Building My Smart Home - Part 1: Plan, Idea & Home Assistant Check it out!
- Link to article: blog.ngxson.com/building-my-...
- Kudos to Google and the llama.cpp team! 🤝 GGUF support for Gemma 270M right from day-0
- Link here: huggingface.co/collections/...
- Richy Mini and SmolLM3 are featured in Github's weekly news! 🚀 🚀
- Watch it here: www.youtube.com/watch?v=Qtzz...
- Gemma 3n has arrived in llama.cpp 👨🍳 🍰 Comes in 2 flavors: E2B and E4B (E means "effective/active parameters")
- See you this Sunday at AI Plumbers conference: 2nd edition! 📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin 👉 Register here: lu.ma/vqx423ct
- ✨✨ AIFoundry is bringing you the AI Plumbers Conference: 2nd edition — an open source meetup for low-level AI builders to dive deep into "the plumbing" of modern AI 📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin 📅 When: June 15, 2025 👉 Register now: lu.ma/vqx423ct
- Hugging Face Inference Endpoints now officially support deploying **vision** models via llama.cpp 👀 👀 Try it now: endpoints.huggingface.co/catalog
- Real-time webcam demo with @huggingface.bsky.social SmolVLM and llama.cpp server. All running locally on a Macbook M3
- Check it out: github.com/ngxson/smolv...
- Although we have A100, H200, M3 Ultra, etc Still can't match the power of that Casio FX 😆
- llama.cpp vision support just got much better! 🚀 Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run. Now, you can use all supported models via a "llama-mtmd-cli" 🔥 (Only Qwen2VL is not yet supported)
- Finally have time to write a blog post about ggml-easy! 😂 ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
- Learn more: blog.ngxson.com/introducing-...
- Someone at Google definitely had a lot of fun making this 😆 And if you don't know, it's available in "Starter apps" section on AI Studio. The app is called "Gemini 95"
- Telling LLM memory requirement WITHOUT a calculator? Just use your good old human brain 🧠 😎 Check out my 3‑step estimation 🚀
- Google having a quite good sense of humor 😂 Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🤏
- Cooking a fun thing today, I can now load safetensors file directly to GGML without having to convert it to GGUF! Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp 😆
- Where to try? ggml-easy --> github.com/ngxson/ggml-...
- No vibe coding. Just code it ✅ Visit my website --> ngxson.com
- On Monday, the 24th, I'm proud to give a talk at sota's webinar. My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, trade-offs, and limitations. The session will end with an Q&A, where you can ask me anything about this subject.
- 📅 The Live Webinar will happen at 🕔 11 AM SF — 2 PM NYC — 6 PM London — 19h00 Paris 👉👉👉 Register here: app.getcontrast.io/register/sot... 👈👈👈
- Had a fantastic chat today with Georgi Gerganov, the brilliant mind behind ggml, llama.cpp, and whisper.cpp! We discussed about: 🚀 The integration of vision models into llama.cpp 🚀 The challenges of maintaining a smooth UX/DX 🚀 The exciting future of llama.cpp Big things ahead - stay tuned!
- OK now you are the best, Gememe 2.0
- Wanna try Gemma 3 vision with llama.cpp? There is a playground for that! More in 🧵
- Follow the guide here: github.com/ggml-org/lla...
- Day-zero Gemma 3 support in llama.cpp 🤯 👉 4 model sizes: 1B, 4B, 12B, 27B 👉 Vision capability (except for 1B) with bi-direction attention 👉 Context size: 32k (1B) and 128k (4B, 12B, 27B) 👉 +140 languages support (except for 1B) 👉 Day-zero support on many frameworks 🚀
- Huge thanks for Hugging Face and Google for supporting me with the llama.cpp implementation ❤️ More info: huggingface.co/blog/gemma3
- Aya Vision is now the number one trending OCR model on Hugging Face 🚀 👉 Comes in 2 sizes, 8B and 32B 👉 Supports 32 languages 👉 Day-zero support with HF Transformers
- Try via this space: huggingface.co/spaces/prith...
- Did you know? A number of 🤗 Hugging Face's blog posts now feature AI-created podcasts 🎙️ This offers an alternative way to absorb extensive and intricate articles 🔍
- Qwen/QwQ-32B has just arrived on Hugging Chat! Try it now: huggingface.co/chat/models/...
- CogView-4 is out 🔥🚀 The SoTa OPEN text to image model by ZhipuAI Demo: huggingface.co/spaces/THUDM... ✨ 6B with Apache2.0 ✨ Supports Chinese & English Prompts by ANY length ✨ Generate Chinese characters within images ✨ Creates images at any resolution within a given range
- Wondering how much RAM is needed to run a given GGUF? Try: npx @huggingface/gguf [model].gguf This also work with remote file, for example: npx @huggingface/gguf https: //huggingface.co/bartowski/Qwen_QwQ-32B-GGUF/resolve/main/Qwen_QwQ-32B-Q4_K_M.gguf
- Apple unveils the M3 Ultra chip, support up to 512GB unified money, oops sorry, unified memory. Perfect for pro workflows and AI development 👀 Read more: www.apple.com/newsroom/202...
- Baby wake up! The Hugging Face Reasoning Course is out 🚀 Huge thanks to Maxime Labonne for building the first practical example in the reasoning course. Link: huggingface.co/reasoning-co...
- DiffRhythm's Revolutionary Music Generation 🎵🎵 🚀 Lightning-Fast Production: Create full-length songs with vocals in under 10 seconds! ⚡ Non-Autoregressive Structure: build on top of variational autoencoder (VAE) 🤏 Small: Both VAE + Base model combined is < 2.5GB 🌍 Open-Source model code + weights
- With the new 🐸 JFrog 's model scanner on the 🤗 Hugging Face hub, we're making running AI models even more secured for everyone!
- EgoLife: An AI-Powered Egocentric Life Assistant Key details: > Open-source dataset: 300+ hrs egocentric, multimodal data > 3K long-context QAs for daily insights > Open-source models: EgoGPT & EgoRAG for smart recall Turning real-life moments into personalized AI help!
- More details: huggingface.co/collections/...
- Disassemble Phi-4-multimodal-instruct: > Minimal Vision encoder: 440M/460M respectively > Projector: 2-layer MLP for both modalities > Language model: Phi-4-mini 3.3B parameters > LoRA adapters for Vision/Audio decoder, applied on top of Phi-4-mini
- This weekend, I found a very fun project from the community: TinyLM 🤏 Built around transformers.js 🚀 , this project aims to provide to the developer a straight-forward API to work with LLM And most important, it can run inference on-browser using WebGPU or wasm 🚀 No server is needed!
- Try it here: tinylm.wizenheimer.dev