Nick Vincent
Studying people and computers (https://www.nickmvincent.com/)
Blogging about data and steering AI (https://dataleverage.substack.com/)
- OpenAI launching an overleaf competitor seems like it could be a big deal, and particularly interesting in wake of the NeurIPS hallucinations discourse (an important issue, but a lot of the back-and-forth I saw seemed to be missing a lot of important factors): openai.com/index/introd...
- Here's the post: dataleverage.substack.com/p/the-coding...
- Writing a follow up post on data aspects of coding agents. One thing that's really under-discussed, IMO -- as far as I can tell, NO coding agent allows for consumer users to trigger server-side deletion of transcripts or even metadata. Anyone seen anything to the contrary?
- Seems plausible that some motivation for labs to restrict usage of subscription auth tokens is the value of structured data from using the official app, but unfortunate that the current data control for agents is super limited (30 days or 5 yrs, no indiv deletions, etc.)
- Coding agents are (1) a big deal, (2) very relevant to data leverage, and (3) able to help build tools that support data leverage! dataleverage.substack.com/p/coding-age...
- Reposted by Nick VincentA bunch of us are working to advance #PublicAI: AI that is publicly accountable, accessible, and sustainable. A lot of us are interested in local-first, community-governed, and more open models of what this technology could be. We welcome allies in the @publicai.network! publicai.network/whitepaper/
- Reposted by Nick VincentHappening now! Join us in Upper Level Room 4 for our workshop on Algorithmic Collective Action #NeurIPS2025 We will have stellar talks to kick off the day, followed by contributed talks and posters by authors before lunch break.
- Reposted by Nick VincentTODAY is the first-ever #NeurIPS position paper track! Come hear thoughtful arguments about “digital heroin,” the nature of innovation, protecting privacy, machine unlearning, & how we can do ML research better as a community. See you: ballroom 20AB from 10-11a & 3:30-4:30p! #NeurIPS2025 #NeurIPSSD
- Longer blog post: AI companies and data creators actually have aligned incentives re: establishing clearer "Data Rules" (norms, rules, contracts that control use of both "fresh" data and of model outputs). Good Data Rules can also support commons! dataleverage.substack.com/p/almost-eve...
- Reposted by Nick Vincent"There are many challenges to transforming the AI ecosystem and strong interests resisting change. But we know change is possible, and we believe we have more allies in this effort than it may seem. There is a rebel in every Death Star." 🗣️ @b-cavello.bsky.social in our #4DCongress
- Heading to AIES, excited to catch up with folks there!
- New blog (a recap post): "How collective bargaining for information, public AI, and HCI research all fit together." Connecting these ideas + a short summary of various recent posts (of which there are many, perhaps too many!). On Substack, but also posted to leaflet
- Substack: dataleverage.substack.com/p/how-collec... Leaflet: dataleverage.leaflet.pub/3m2wpj7l7c22w
- Reposted by Nick VincentV interesting twist on MCP! “user data is often fragmented across services and locked into specific providers, reinforcing user lock-in” - enter Human Context Protocol (HCP): “user-owned repositories of preferences designed for active, reflective control and consent-based sharing.” 1/
- Anyone compiling discussions/thoughts on emerging licensing schemes and preference signals? eg rslstandard.org and github.com/creativecomm... ? externalizing some notes here datalicenses.org, but want to find where these discussions are happening!
- Excited to be giving a talk on data leverage to the Singapore AI Safety Hub. Trying to capture updated thoughts from recent years, and have long wanted to better connect leverage/collective bargaining to the safety context.
- About a week away from the deadline to submit to the ✨ Workshop on Algorithmic Collective Action (ACA) ✨ acaworkshop.github.io at NeurIPS 2025!
- 🧵In several recent posts, I speculated that eventually, dataset details may become an important quality signal for consumers choosing AI products. "This model is good for asking health questions, because 10,000 doctors attested to supporting training and/or eval". Etc.
- It looks like some skepticism was warranted (not much progress towards this vision yet). I do think "dataset details as quality signals" is still possible though, and could play a key role in addressing looming information economics challenges.
- The core challenge: many inputs into AI are information, and thus hard to design efficient markets for. Info is hard to exclude (pre-training data remains very hard to exclude, but even post-training data may be hard without sufficient effort)
-
View full threadFollow up, tying together "AI as ranking chunks of human records" with "eval leverage" and "dataset details as quality signals": dataleverage.substack.com/p/how-do-we-... And related, "eval leverage": dataleverage.substack.com/p/evaluation...
- Around ICML with loose evening plans and an interest in "public AI", Canadian sovereign AI, or anything related? Swing by the Internet Archive Canada between 5p and 7p lu.ma/7rjoaxts
- [FAccT-related link round-up]: It was great to present on measuring Attentional Agency with Zachary Wojtowicz at FAccT. Here's our paper on ACM DL: dl.acm.org/doi/10.1145/... On Thurs Aditya Karan will present on collective action dl.acm.org/doi/10.1145/... at 10:57 (New Stage A)
- These blog posts expand on attentional agency: - genAI as ranking chunks of info: dataleverage.substack.com/p/google-and... - utility of AI stems from people: dataleverage.substack.com/p/each-insta... - connection to evals: dataleverage.substack.com/p/how-do-we-...
- And we have a blog post on algorithmic collective action with multiple collectives! dataleverage.substack.com/p/algorithmi...
- Finally, I recently shared a preprint that relates deeply to the above ideas, on Collective Bargaining for Information: arxiv.org/abs/2506.10272, and have a blog post on this as well: dataleverage.substack.com/p/on-ai-driv...
- “Attentional agency” — talk in new stage b at facct in the session right now!
- Off to FAccT; Excited to see faces old and new!
- Another blog post: a link roundup on AI's impact on jobs and power concentration, another proposal for Collective Bargaining for Information, and some additional thoughts on the topic: dataleverage.substack.com/p/on-ai-driv...
- New data leverage post: "Google and TikTok rank bundles of information; ChatGPT ranks grains." dataleverage.substack.com/p/google-and... This will be post 1/3 in a series about viewing many AI products as all competing around the same task: ranking bundles or grains of records made by people.
- This has implications for Internet policy, for understanding where the value in AI comes from, and for thinking about why we might even consider a certain model to be "good"! This first post leans heavily on recent work with Zachary Wojtowicz and Shrey Jain, to appear at this upcoming FAccT
- arxiv.org/abs/2405.14614 Follow ups coming very soon (already drafted): would love to discuss these ideas with folks. Is this all repetitive with past data labor/leverage work? Are some aspects obvious to you?
-
View full thread
- Sharing a new paper (led by Aditya Karan): there's growing interest in algorithmic collective action, when a "collective" acts through data to impact a recommender system, classifier, or other model. But... what happens if two collectives act at the same time?
- Pre-print now on arxiv and to appear at FAccT 2025: arxiv.org/abs/2505.00195 "Algorithmic Collective Action with Two Collectives -- Aditya Karan, Nicholas Vincent, Karrie Karahalios, Hari Sundaram"
- New early draft post: "Public AI, Data Appraisal, and Data Debates" "A consortium of Public AI labs can substantially improve data pricing, which may also help to concretize debates about the ethics and legality of training practices." dataleverage.substack.com/p/public-ai-...
- Reposted by Nick Vincent“Algo decision making systems are “leviathans”, harmful not for their arbitrariness or opacity, but systemacity of decisions" - @christinalu.bsky.social on need for plural #AI model ontologies (sounds technical, but has big consequences for human #commons) www.combinationsmag.com/model-plural...
- New Data Leverage newsletter post. It's about... data leverage (specifically, evaluation-focused bargaining) and products du jour (deep research, agents). dataleverage.substack.com/p/evaluation...
- I have some new co-authored writing to share, along with a round-up of important articles for the "content ecosystems and AI" space. I'm doing an experiment with microblogging directly to a GitHub repo that I can share across platforms...
- Here's my round-up as a markdown file: github.com/nickmvincent... Here's the newsletter post, Tipping Points for Content Ecosystems: dataleverage.substack.com/p/tipping-po...
- Reposted by Nick VincentGlobal Dialogues has launched at the Paris #AIActionSummit. Watch @audreyt.org give the announcement via @projectsyndicate.bsky.social youtu.be/XkwqYQL6V4A?... (starts at 02:47:30)
- On Mon, wrote a post on the live-by-the-sword, die-by-the-sword nature of the current data paradigm. On Wed, there was quite a development on this front -- OpenAI came out with a statement that they have evidence that DeepSeek "used" OpenAI models in some fashion (this was faster than I expected!)
- Given it seems clear that data protection technologies (such as the techniques OpenAI used to gather this evidence) will play a role in the near-term, I put together another post with a simple proposal that could reduce some of the tension in the current paradigm.
- AI labs and tech companies should open-source their data protection techniques so that content creators can benefit from new and old advances in this space: dataleverage.substack.com/p/ai-labs-co...