Daniel Vila
Everything datasets and human feedback for AI at Hugging Face.
Prev: co-founder and CEO of Argilla (acquired by Hugging Face)
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel VilaNew chapter in the Hugging Face NLP course! 🤗 🚀 We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. Any feedback for improvements welcome!
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel VilaHigh-quality data for fine-tuning language models for free and at the click of a button! Prompt and wait for your dataset to push to Argilla or the Hub Evaluate, review and fine-tune a model. Blog:
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel Vila[This post could not be retrieved]
- 💥 Ending 2024: A full data annotation journey on the Hugging Face Hub—from raw data to training-ready datasets! With Argilla 2.6.0, push your data to the Hub from the UI Let’s make 2025 the year anyone can build more transparent and accountable AI—no coding or model skills needed.
- Release notes: github.com/argilla-io/a...
- Get started: docs.argilla.io/latest/getti...
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila🔥 We got great feedback on this: "Synthetic Data Generator" A no-code tool to create datasets with LLMs, making it a breeze, allowing ANYONE to create datasets and models in minutes and without any code. Blog: buff.ly/4gybyoT GitHub: buff.ly/49IDSmd Space: buff.ly/3Y1S99z
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel VilaDesperate to contribute to the development of Scots language AI. I've just contributed 16 examples to this dataset: data-is-better-together-fineweb-c.hf.space/share-your-p...
- I've just contributed 156 examples to the FineWeb 2 Spanish dataset: data-is-better-together-fineweb-c.hf.space/share-your-p... If you want to contribute, sign in with @hf.co and find your language
- Help shape the future of multilingual Open Source AI! Join the FineWeb 2 Community Annotation Sprint to create an open training dataset with full transparency and human validation in many languages. Review datasets in your language and help identify the best sources for training.
- Join this Space, search for your language, and start contributing: huggingface.co/spaces/data-... Don't know how to start, want to discuss? Join: huggingface.co/spaces/Huggi...
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila👐 Open Image Preferences is an Apache 2.0 licensed dataset for text-to-image generation by the @hf.co community. This dataset contains 10K text-to-image preference pairs across image generation categories, using different model families and prompt complexities. Blog: huggingface.co/blog/image-p...
- Reposted by Daniel Vila[Not loaded yet]
- Announcing Global-MMLU - an improved MMLU Open dataset with evaluation coverage across 42 languages. The result of months of work with the goal of advancing Multilingual LLM evaluation. Built together with the community and amazing collaborators at Cohere4AI, MILA, MIT, and many more.
- Open dataset: huggingface.co/datasets/Coh... Paper: arxiv.org/pdf/2412.03304
- We're about to launch the biggest collaboration effort since the Open Assistant. Let's get the highest quality data for open foundation models with all the nuances & diversity of each language, all with data provenance and transparency Join us as language lead: docs.google.com/forms/d/10XI...
- Reposted by Daniel VilaNext week we're launching a collaborative annotation effort to build a big multilingual dataset, so you can have high-quality data in your language. We are really close to getting leads for 100 languages! Can you help us cover the remaining 200?
- Reposted by Daniel VilaFor anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol. 🧵>>
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[SATURDAY THREAD] ☕️ 🧑🎓 In case you spent the week reading GDPR legislation and missed everything. It’s all about vision language models and image preference datasets. >> 🧵 Here are the models and datasets you can use in your projects.
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel VilaThe community has labelled over 3000 image preferences in a few hours. One open source image preferences dataset coming right up!
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel VilaAt @huggingface.bsky.social 🤗 we're preparing a collaborative annotation effort to build an open-source multilingual dataset. If you'd like to get high-quality open data for your language, check if yours is listed in this form and sign up! forms.gle/DHJdtvoSNxAA...
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Super excited to launch the Open Images Preferences @huggingface.bsky.social community sprint Have fun browsing images generated with the latest OSS models while contributing to the future of Open Source AI 🧵
- Find all the details in the blog post, you just need to sign in and start choosing the images you prefer. huggingface.co/blog/burtens...
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Let's make AI more inclusive. At @huggingface.bsky.social we'll launch a huge community sprint soon to build high-quality training datasets for many languages. We're looking for Language Leads to help with outreach. Find your language and nominate yourself: forms.gle/iAJVauUQ3FN8...
- Contributing to the task itself will be easy as well, with no programming skills required, just reading short documents in the language and rating them according to their educational quality for training AI models
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel Vila[Not loaded yet]
- Interested in open datasets for ML and AI? I've just created this feed with posts about @huggingface.bsky.social datasets! Don't miss the latest news and conversations about the secret sauce behind every AI model. bsky.app/profile/dvil...
- This is super useful for NLP lovers like myself
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel VilaTRL is a cornerstone of LLM post training and imo it's the default to learn. There are great alternatives like Unsloth, Axolotl, and AutoTrain. But if you want a daily drive that does experimentation to production, it's TRL. 🧵 these community notebooks guide you through TRL's core:
- I am very excited to launch a new community initiative next week. Let's build the largest open community dataset to evaluate and improve image generation models. Follow: huggingface.co/data-is-bett... And stay tuned here
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel VilaIn case you passed out and woke up on saturday lunch. Small models and high quality data are back! ... if they ever left 🤔 - SmolTalk dataset from @huggingface.bsky.social - Tulu 3 models and datasets from @ai2.bsky.social - Nvidia Nymba model from @nvidiastudio.bsky.social
- Reposted by Daniel Vila[Not loaded yet]
- 📣 @huggingface.bsky.social important 🦋 updates: 🚀 You can now add your handle on your HF profile for others to find you on 🦋 huggingface.co/settings/pro... ❤️ We have just updated the list of Hugging Face Folks: bsky.app/starter-pack...at://did:plc:qcm5pejjqltepp6kztn6pzib/app.bsky.graph.starterpack/3laz5x7naiz22
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel VilaDon't forget to set your Bluesky account in your @huggingface.bsky.social profile! Instructions in 🧵
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[This post could not be retrieved]
- Reposted by Daniel Vila[Not loaded yet]
- Reposted by Daniel Vila[Not loaded yet]