Pedro Ortiz Suarez
Principal Research Scientist at the Common Crawl Foundation. Weird coffee person ☕️, runner 🏃🏻♂️. (he/him) 🇫🇷🇪🇺🇨🇴
- Reposted by Pedro Ortiz SuarezThe Common Crawl team presented a seminar at Stanford HAI entitled “Preserving Humanity's Knowledge and Making it Accessible: Addressing Challenges of Public Web Data”. commoncrawl.org/blog/common-...
- Reposted by Pedro Ortiz SuarezIf you were able to join us, let us know about your experience: docs.google.com/forms/d/e/1F...
- Reposted by Pedro Ortiz SuarezThank you everyone for coming to WMDQS (pronounced "whim ducks")!
- Reposted by Pedro Ortiz SuarezWMDQS is underway! Come join us in Room 520A at @colmweb.org! #COLM2025
- Reposted by Pedro Ortiz SuarezLooking forward to tomorrow's #COLM2025 workshop on multilingual data quality! 🤩
- In collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
- Reposted by Pedro Ortiz SuarezIn collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
- Reposted by Pedro Ortiz Suarez[Not loaded yet]
- If you want to help us improve language and cultural coverage, and build an open source LangID system, please register to our shared task on Language Identification! 💬 Registering is easy! All the details are on the shared task webpage: wmdqs.org/shared-task/ Deadline: July 23, 2025 (AoE) ⏰
- The Common Crawl Foundation, MLCommons, EleutherAI, and John Hopkins' Center for Language and Speech Processing have the pleasure of inviting you to register for the 1st shared task on Language Identification for web data. commoncrawl.org/blog/wmdqs-s...
- Reposted by Pedro Ortiz Suarez[Not loaded yet]
- Reposted by Pedro Ortiz SuarezIn June 2025 the Common Crawl Foundation, MLCommons, and EleutherAI had the pleasure of hosting a virtual hackathon in partnership with Masakhane in order to collect language identification annotations for African languages. commoncrawl.org/blog/the-fir...
- Reposted by Pedro Ortiz SuarezThe Common Crawl Foundation team took part in the United Nations Open Source Week in New York City this June, meeting with global developers, researchers, and policymakers to discuss all things open source and AI. commoncrawl.org/blog/common-...
- Reposted by Pedro Ortiz SuarezThe deadline for paper submissions has been extended! The new deadline is July 3, 2025. AoE. For more information, please visit: wmdqs.org
- Call for papers! We are organising the 1st Workshop on Multilingual Data Quality Signals with @mlcommons.org and @eleutherai.bsky.social, held in tandem with @colmweb.org. Submit your research on multilingual data quality! Submission deadline is 23 June, more info: wmdqs.org
- Reposted by Pedro Ortiz SuarezThe Common Crawl Foundation, together with IBM, the AI Alliance, and BrightQuery will be hosting an "UN Conference" at IBM's new flagship NYC HQ at One Madison Avenue on Friday, June 20, from 12:30-5pm. If you are in NYC, it would be great to see you there! lu.ma/p0a1scde
- Reposted by Pedro Ortiz SuarezCall for papers! We are organising the 1st Workshop on Multilingual Data Quality Signals with @mlcommons.org and @eleutherai.bsky.social, held in tandem with @colmweb.org. Submit your research on multilingual data quality! Submission deadline is 23 June, more info: wmdqs.org
- I’ll be running the Paris Marathon this Sunday for cancer research and treatment 🏃🏻♂️ Please donate if you can! Every donation no matter how small, helps immensely. marathon-paris.dossards-solidaires.org/fundraisers/...
- Reposted by Pedro Ortiz SuarezWe would like to welcome all of our attending members to Oslo, with a special welcome to two of our newest members, the Publications Office of the European Union and @commoncrawl.bsky.social! @nettarkivet.bsky.social | #iipcGA25 | #webarchiving
- Reposted by Pedro Ortiz Suarez[Not loaded yet]
- I’ll be today at the AI Action Summit in Paris, if you’re attending and want to discuss about @commoncrawl.bsky.social or about open data, please DM me!
- We're very happy to release cc-downloader, a new CLI tool to download Common Crawl data 📚🚀🧑💻 cc-downloader is still under active development, so if you find any issues or would like to submit a feature request, please visit its GitHub repository at github.com/commoncrawl/....
- Reposted by Pedro Ortiz Suarez[Not loaded yet]