Caleb Fahlgren
SWE @hf.co
- You can just ask things 🗣️ "show me messages in the coding category that are in the top 10% of reward model scores" Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset 🔥
- Reposted by Caleb Fahlgren[This post could not be retrieved]
- The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @hf.co dataset ✨
- It doesn't get easier than this. Why are you writing SQL by yourself when it's almost 2025
- This is insane! Structured generation in the browser with the new @hf.co SmolLM2-1.7B model • Tiny 1.7B LLM running at 88 tokens / second ⚡ • Powered by MLC/WebLLM on WebGPU 🔥 • JSON Structured Generation entirely in the browser 🤏
- Here's the space by @reach-vb.hf.co huggingface.co/spaces/reach...
- Reposted by Caleb Fahlgren[This post could not be retrieved]
- The OpenLLM Leaderboard just passed 2k evals 🥳 Here's a look at the distribution of average scores for all those models! Great work by the @huggingface.bsky.social team to do these evals!
- Here's what the model licenses look like: Lots of great open licenses in there too! 💪
- You can literally do the histogram in one line in less than 10 seconds 💨 > from histogram(train, "Average ⬆️")
- observers 🔭 - automatically log all OpenAI compatible requests to a dataset 💽 • supports any OpenAI compatible endpoint 💪 • supports @duckdb.org, @huggingface.bsky.social datasets and Argilla as stores > pip install observers
- ** log and get out of the way **
- Automatically tracking all Ollama requests to a dataset with the new observers python library! With just a few lines of code all your requests can be sent to @huggingface.bsky.social datasets for annotating, analysis and observability 🔭
- SmolTalk is out 🗣️ Over 1M high quality instructions used for training SmolLM2, one of the best small language models in the industry. huggingface.co/datasets/Hug...
- Reposted by Caleb FahlgrenObservers: A Lightweight SDK for AI Observability TLDR; - Track and record interactions with AI models - Store observations in multiple backends @huggingface.bsky.social, @duckdb.org or Argilla - Query and analyse your AI interactions with ease GitHub: github.com/cfahlgren1/o...
- Reposted by Caleb FahlgrenFoursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...
- Range requests + Parquet is what makes the Hugging Face SQL Console possible to query datasets entirely in the browser
- Reposted by Caleb Fahlgrenduckdb-gsheets v0.0.3 is out, courtesy of @a13x.bsky.social the power is terrifying! duckdb-gsheets.com
- Reposted by Caleb FahlgrenWhen XetHub joined Hugging Face, we brainstormed how to share our tech with the community. The magic? Versioning chunks, not files, giving rise to: 🧠 Smarter storage ⏩ Faster uploads 🚀 Efficient downloads Curious? Read the blog and let us know how it could help your workflows!
- Life would be so easy if @duckdb.org had an LLMs.txt 🤩 llmstxt.org
- Now they do! t.co/T1WhhBIAqS quick to implement it too