Olivier Grisel
Software engineer at probabl, scikit-learn contributor.
Also at:
sigmoid.social/@ogrisel
github.com/ogrisel
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel⚡ Release 0.6.2 is out ⚡ github.com/skrub-data/s...
- I will speak about probabilistic regressions, @skrub-data.bsky.social and skore contributors will also present their libraries. Come join us!
- A bunch of scikit-learn core contributors will attend or speak at @pydataparis.bsky.social 2025 on Tuesday and Wednesday next week. Ticketing, practical infos and schedule at: pydata.org/paris2025
- scikit-learn 1.8 will be the first scikit-learn release with native extensions that are officially marked as free-threading compatible. github.com/scikit-learn...
- Reposted by Olivier Grisel[Not loaded yet]
- Looking forward to attending PyData Paris 2025! I will give a talk about probabilistic predictions for regression problems (I need to start working on my slides ;)
- Reposted by Olivier Grisel[Not loaded yet]
- Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems. We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values. probabl-ai.github.io/calibration-...
- Attending the @skrub-data.bsky.social tutorial by @riccardocappuzzo.com and @glemaitre58.bsky.social at #EuroScipy2025. They introduce the new DataOps feature released in skrub 0.6. Here is the repo with the material for the tutorial: github.com/skrub-data/E...
- Reposted by Olivier Grisel🚨What is SOTA on tabular data, really? We are excited to announce 𝗧𝗮𝗯𝗔𝗿𝗲𝗻𝗮, a living benchmark for machine learning on IID tabular data with: 📊 an online leaderboard (submit!) 📑 carefully curated datasets 📈 strong tree-based, deep learning, and foundation models 🧵
- Reposted by Olivier Grisel👨🎓🧾✨#icml2025 Paper: TabICL, A Tabular Foundation Model for In-Context Learning on Large Data With Jingang Qu, @dholzmueller.bsky.social, and Marine Le Morvan TL;DR: a well-designed architecture and pretraining gives best tabular learner, and more scalable On top, it's 100% open source 1/9
- Reposted by Olivier GriselExcited to have co-contributed the SquashingScaler, which implements the robust numerical preprocessing from RealMLP!
- Reposted by Olivier GriselI got 3rd out of 691 in a tabular kaggle competition – with only neural networks! 🥉 My solution is short (48 LOC) and relatively general-purpose – I used skrub to preprocess string and date columns, and pytabkit to create an ensemble of RealMLP and TabM models. Link below👇
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier GriselThe week of September 29th, Paris will become the epicenter of #opensource scientific computing, with a great series of events. This rare alignment creates the perfect opportunity to visit and join a vibrant community of developers, maintainers, and users! Check this out (links in thread) ⬇️
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷 Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa... Big 🧵👇 with details!
- Reposted by Olivier Grisel🎓Paper time!✨ #ICLR spotlight. Concluding of 5 years of research on missing values handling for prediction: Beware of diminishing returns in imputation for prediction. 1/8
- Reposted by Olivier Griselwe released olmo 32b today! ☺️ 🐟our largest & best fully open model to-date 🐠right up there w similar size weights-only models from big companies on popular benchmarks 🐡but we used way less compute & all our data, ckpts, code, recipe are free & open made a nice plot of our post-trained results!✌️
- Reposted by Olivier Grisel[Not loaded yet]
- Loky 3.5.0 is out! Loky provides an extended version of Python's `concurrent.futures.ProcessPoolExecutor` that leverages cloudpickle to work within interactive Jupyter sessions on all platforms and reuse existing workers to hide the overhead of starting new workers each time.
- I have the intuition that TabPFN approximates amortized Bayesian inference with a Solomonoff prior via in-context learning. Perplexity agrees :) www.perplexity.ai/search/is-it... I wonder if this theoretical "universality" is one of the reasons for its empirical success.
- Reposted by Olivier Grisel[Not loaded yet]
- Recently merged in scikit-learn's main branch: display the maximum predicted class probability in 2D continuous feature spaces (mostly for didactic purposes): scikit-learn.org/dev/auto_exa... The linked example has been updated to include some conclusions we can draw from this plot.
- Credits go to @lucyleeow.bsky.social who is now also on Bluesky!
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel📢PSA: #NeurIPS2024 recordings are now publicly available! The workshops always have tons of interesting things on at once, so the FOMO is real😵💫 Luckily it's all recorded, so I've been catching up on what I missed. Thread below with some personal highlights🧵
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier GriselWe've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions.
- Reposted by Olivier Grisel[Not loaded yet]
- Reposted by Olivier GriselIt is hard to overstate how cool and powerful is flex attention. @chhillee.bsky.social pytorch.org/blog/flexatten… TL;DR: it is an implementation of the attention operator in pytorch that allows in particular to efficiently "carve" the attention matrix. 1/3