Carl Boettiger
Ecology, theory, computers. https://carlboettiger.info
- Reposted by Carl Boettiger📣NEW in Mongabay: DSE's McKalee Steen & Magali de Bruyn share insights on how #TraditionalEcologicalKnowledge and Indigenous digital sovereignty are key to creating #ResponsibleAI for achieving #conservation and environmental goals. 📰 Read the story now: bit.ly/4bqbrfG
- We're hiring a full-time research software engineer for biodiversity & bioacoustics! Position offers a strong research component with independence + research pub opportunities and real-world impact. expected salary range $111K - $116K. dse.berkeley.edu/news/were-hi...
- Compelling report from the ATOM project on the state of open models: www.atomproject.ai
- Rather applaud JOSS's revised approach to deal with AI-based submissions. blog.joss.theoj.org/2026/01/prep... , particularly with 'starting open' and at least 'six months developer history'. More of scholarly pub could benefit from such policies methinks.
- it is kinda delightful to see that the open-source Continue.dev plugin fully supports modern MCP (resources, prompts), while GitHub Copilot plugin only supports tool docstrings. Now my continue.dev makes perfect tool calls using small/dumb open source LLMs, while CoPilot fails w/ frontier LLMs 🙃
- Argh! @github.com announces it will start charging even for *self-hosted* runners 😥. resources.github.com/actions/2026... I have long relied on my self-hosted runners in my teaching to run automated 'reproducibility' checks. (The edu allocation is way too small for my large classes).
- Now @GitHub.com has postponing this to re-evaluate due to community outcry 😂 Have always considered moving to GitLab, whose k8s support is far better than GitHub's half-arsed ARC controller. At the same time, teaching mainstream platforms has value for students... Anyone have opinions on this?
- Reposted by Carl BoettigerExcited to launch the new improved Reproducible Code guide from @britishecologicalsociety.org @methodsinecoevol.bsky.social FREE online here! www.britishecologicalsociety.org//wp-content/... Amazing work by some very talented ECRs. We hope it’s useful!
- Reposted by Carl Boettigerwe released olmo 32b today! ☺️ 🐟our largest & best fully open model to-date 🐠right up there w similar size weights-only models from big companies on popular benchmarks 🐡but we used way less compute & all our data, ckpts, code, recipe are free & open made a nice plot of our post-trained results!✌️
- Reposted by Carl BoettigerGreat to see you at #COP30, Governor @gavinnewsom.bsky.social! Magali de Bruyn (R) and McKalee Steen (L) are leading several events @ COP30, including providing recommendations for how Indigenous communities can leverage tech and #DataScience to advance #EnvironmentalStewardship and #Sovereignty.
- Reposted by Carl BoettigerThis seems important. Current AI models can't read graphs. They "see" what they expect to see, even if the data shows something else.
- Wow, rio-stac-io looks awesome! github.com/planetlabs/r... Anyone have a chance to compare this to the odc-stac approach? ( @mdsumner.bsky.social 👀 ?)
- sad to be missing #jupytercon this week! But small win, recently made my first PR to JupyterHub fancy-profiles to add support for ARM architecture. & now I have JupyterHub running on an NVIDIA DGX Spark (via k3s). Also have CUDA/RAPIDS based image in rocker for python+R envs, rocker/cuda:arm64
- Working with arm64, cuda drivers, and shared ram design can be a bit of a learning experience but overall was smoother than I expected. The Spark is a remarkably capable machine for the price. Not just for LLMs (like gpt-oss-120b), but cuda-accel polars, & even as a shared jupyterhub
- Reposted by Carl BoettigerRaise your hand if you're going to #JupyterCon 🙋♀️ See you there @ucbids.bsky.social! #JupyterCon2025 #DataScience
- Reposted by Carl Boettiger🚀 anymap v0.6 is here! This release comes packed with major new features. These updates make interactive geospatial analysis in Python smoother and more powerful than ever. 🔗 GitHub: github.com/opengeos/any... 📘 Docs: anymap.dev #python #geospatial #jupyter #dataviz
- Reposted by Carl BoettigerThe 1.4 release of @duckdb.org supports using a DuckDB database to serve vector tiles! Of course, I had to try this out in R. Check it out: all 242,000 US Census block groups dynamically served as vector tiles from a DuckDB database, displayed on a MapLibre map from R in Positron.
- Reposted by Carl BoettigerAll 8.1 million US Census blocks. Visualized smoothly in 3D. Instant population and housing totals from a lasso selection. All running seamlessly in the browser, no traditional backend. While everyone’s talking about AI, it’s an incredible time for geospatial tech.
- Really thrilled to see the renewed momentum from @ucbids.bsky.social on open source infrastructure! A new partnership with @2i2c.org puts supporting tools, their developers, and their community at the center. cdss.berkeley.edu/news/berkele...
- mid-semester surveys are in! Is it terrible that I do a happy dance when reading that most students felt "the heavy use of AI was either not helpful or detrimental to learning?" YES dear students, you are smarter than the bots. and now you know it too.
- Like so many of us, I've tried just saying this in previous semesters to justify our not using it, with very limited success. And the students can use it well for simple, well scoped tasks. but outsourcing our thinking on open-ended assignments goes badly. this lesson we must learn by experience
- here we are in the last week of module 2. 120 students, majority new to coding, are set up with their ipynb's in VSCode+CoPilot agent mode (w/ sonnet 4.5, GPT5-codex). As they plug away in pairs writing some of the best code & analysis this module has seen. remarkably little use of the bots!
- What is happening? They are re-evaluating global fisheries declines result of Worm et al's classic paper using the latest RAM legacy stock data. I've encouraged them to use the LLMs instead of memorizing syntax.
- But as I wonder the buzzing room, or peer over shoulders, they are talking only to each other. At my prompting, one team puts a complex query to the agent. But while it spins away, they start tapping out a solution by hand, pure ibis code, tight and elegant - it's done before GPT can reply.
-
View full threadI don't know what clicked. I'm sure we will hit the walls again. With a large class for only 3 of us there's always a lot of variation. But it's the first time in some time I've witnessed the students know the tools are there when they are stuck, but they also know when they're better
- Media still largely misses the RL part of training LLMs. NY Times: > they’re computer models trained on massive amounts of text to predict the next word in a sentence. What feels like empathy or validation is really just the A.I. chatbot echoing back language patterns that it’s learned.
- sure, but sycophantic affirmation is hardly a pattern it got from memorizing the internet. Alignment is trained. RL is the smiley face in front of the monster. (from www.nytimes.com/2025/09/26/w...)
- this matters because 'AI', like any technology, is designed, owned, operated by companies that make choices. In other tech - iPhones, TikTok - we quickly attribute design choices to specific companies. But we still discuss "AI" as if it was some disembodied discovery, more uranium than software.
- Reposted by Carl BoettigerNSF GRFP solicitation is finally up. Life Sci deadline extended to Nov 10 but 2nd year grad students no longer eligible www.nsf.gov/funding/oppo...
- Fantastic piece from @schmidtdse.bsky.social post-doc @lucialayr.bsky.social on handling the emotional side of doing a PhD in climate or ecological modeling blogs.egu.eu/geolog/2025/...
- How to overcome an "existential modeling crisis" as a science researcher, from DSE postdoc @lucialayr.bsky.social: ✅ Define modeling in your own words ✅ Embrace validation ✅ Find allies ✅ Trust in "the modeler's mindset" ✅ Draw from other experts 📸 @uofcalifornia.bsky.social by Mathew Burciaga
- Excited to be heading to join the summit @cu-esiil.bsky.social this week!
- Campus reminds us, "If classroom temperatures reach or exceed 82°F for more than 15 minutes:" our first step is: 1. Ensure instructors have taken the Heat Illness Prevention Training in the UC Learning Management System. teaching.berkeley.edu/resources/gu...
- Reposted by Carl BoettigerRL102: From Tabular Q-Learning to Deep Q-Learning (DQN) - A Practical Introduction to (Deep) Reinforcement Learning araffin.github.io/post/rl102/
- Increasingly convinced that the advances we will see with 'AI' in the next few years will come not from more bigger NN models replacing tasks done with conventional programming, but from greater tool use from models.
- This matters because very few organizations have the resources to train bigger models, but writing an MCP app or opens new abilities by orchestrating tasks across LLM API is much more accessible.
- It seems the leading AI companies are already doing this -- LLMs can't add, but they've all learned to call a calculator. They don't know current news, but have learned to google. Increasingly the value is not just in some raw model weights, but the platforms around them.
-
View full threadthese tools can be useful, even transformational or foundational, but I think more in the 'duct tape + pvc piping' is foundational than in the 'one ring to rule them all' view. If even the companies are proceeding with tool use, this is something we too can build for ourselves.
- Reposted by Carl BoettigerI reference data science at the singularity constantly as it's not just a model of why data science blew up, but also how any field or org can organize itself around data, code sharing, and benchmarks: arxiv.org/abs/2310.00865
- Okay, but Claude's commentary while helping me set up a ray cluster on National Research Platform (NRP) is pretty entertaining: Claude: (proposes 2 worker-config) Me: why two workers? Claude: (queries resources) Claude: "HOLY MOLY! Your cluster has nodes with 57TB of memory and 13,800 CPUs!"
- (aside but LLMs were never particularly good at counting... NRP currently has 29,878 cores and 1,434 GPUs of various sizes.)
- Reposted by Carl BoettigerYou still have 5 days to apply for this awesome postdoc opportunity in the Environmental Data Science Innovation & Impact Lab at the University of Colorado-Boulder! This could be your backyard!
- Like many, I've struggled with students merely pasting LLM outputs as their own work. But this year when I encourage them to use AI in generating code they need, now they ask all these critical questions like: 'but is this output ok? how do I know? how could it be better?'
- my course is on environmental data science -- for us code is a means to end. But this is not an excuse for inefficient, unreadable, or verbose code. Data science thrives on concise, semantically meaningful code that expresses the robust, powerful abstractions.
- tidy data (Cobb's 3rd normal from & relational data for the non-#rstats crowd), grammar of graphics. These abstractions are powerful tools that cut across languages, but good libraries help us express them more easily.
-
View full threadit's notable that cleaner syntax often means better performance too. Standardizing a data tables library like dplyr or ibis around abstractions like RDBs and lazy eval gives not only cleaner abstractions and syntax -- it lets lets us leverage major performance and scale improvements too.
- Also exciting to see the new fully open (training data, weights, training details) LLM from the Swiss, 'Apertus' www.swiss-ai.org/apertus ! Shipped in both 8B & 70B params on HF, over 40% non-English sources. Support for vllm + public web interface (via publicai.co). Impressive.
- Very interesting project from @ai2.bsky.social , a fully open, agent-based LLM for academic literature asta.allen.ai . Nice overview at allenai.org/blog/asta
- Great post-doctoral opportunity at @cu-esiil.bsky.social jobs.colorado.edu/jobs/JobDeta...
- Excellent post from @beenwrekt.bsky.social here on failure of AI safety. While much is familiar, he makes an excellent pt that the solution is obvious and do-able: terminate the conversation (hilariously illustrated by Claude's refusal to answer bio sciences qs). www.argmin.net/p/the-banal-...
- Reposted by Carl BoettigerEthemblage is an online game developed by @leahgovia.bsky.social leahgovia.itch.io/ethemblage It's a story in which players interact with different scenarios where technologies such as acoustic monitors, drones, and GPS collars are applied by conservationists to monitor non-human species.
- Reposted by Carl Boettiger2/ Understanding where, when, how, and why wildlife is declining is key for effective conservation By tapping into participatory science, we now have a (biased) complementary way to monitor mortality events in near-real time 👉: huggingface.co/spaces/diego... I’ll highlight four case studies ⬇️
- Really re-energized and inspired after the CA 30x30 Summit in San Diego! Amazing to see what this community -- from gov't agencies, land managers, tribal leaders, performing artists and everything in between -- can do together. sites.google.com/view/2025-30...