Gabriel
Distinguished AI Research Scientist at SentinelOne. Former OpenAI, Apple infosec. Lecturer at John’s Hopkins SAIS Alperovitch Institute. Deceiver of hike length and difficulty.
- Everybody has a hard eval until gradient descent punches you in the face.
- New research from @silascutler.bsky.social and myself. We tracked 175k exposed Ollama endpoints for nearly a year. Collected and analyzed custom models, sizes, quantizations, system prompts, and more.
- 🔥 👀 New research from @morecoffeeplz.bsky.social and @silascutler.bsky.social on the "silent" AI network, a massive, unmanaged layer of open-source AI infrastructure operating in the shadows.
- This is an exposure dataset which means we are trying to study something by measuring the shadow that it casts. We can’t poll these systems directly, but we can understand the shape of the ecosystem.
- The top 10 Model Families control 85% of the market. Other families in the long tail.
-
View full threadAccountability diffuses at the deployment layer, but dependency concentrates at the model supply layer. The dominant risk is not what the models can do, but how fast capability diffuses, how it gets wired, and whether misuse feedback loops are actioned post release.
- *vague posts about upcoming research*
- Love getting malware under TLP:AMBER+S, when the S stands for “spite”. 🫖
- We about to have some Llama Drama :)
- Reposted by Gabrieland of course it’s chatgpt slop with the rhetorical flourish of a remedial high school debate club. “from X to Y — or worse” “This Isn’t X it’s Y.” “Replace X with Y and it’s Z.” “The most sobering part? It’s X.” “your no longer dealing with X. You’re facing Y”
- “Wow this dude has a really strong opinion about code review” *scans posts* “Oh that’s his only opinion”
- —dangerously-skip-permissions is the only thing keeping claude code installed on my machine.
- Benchmarks for cybersecurity are everywhere and mostly measuring the wrong thing. We reviewed evals from Microsoft, Meta and academia and found they don't measure what matters for defenders in real IR situations. 🧵 s1.ai/benchmk1
- Most security evals reduce workflows into MCQs/static Q&A. That bakes in unrealistic assumptions that the “right question” is already asked, evidence is pre-packaged, wrong answers are cheap, and there’s no triage/queue pressure or escalation decisions.
- The best “agentic” benchmark we saw (ExCyTIn-Bench) still shows how far we are. Even in a curated Azure-style environment models struggled with multi-hop investigations over heterogeneous logs (data be confusing like that).
-
View full threadA deeper problem is that nobody has time for anything but LLM-as-a-judge evaluations (often vendor-on-vendor), creating these Ouroboros loops that are easy to overfit and hard to trust. Thats a huge gap when we’re being asked to rely on them for SOC automations or enterprise security work.
- Reviewing AI cyber benchmarking and evaluations may break me. Ya’ll will really LLM as a judge anything
- Reposted by GabrielHoly hell, what an obituary
- Timely presentation from my colleague Jim on the current landscape of Hactivism and War. youtu.be/sNaORI-k-fY?...
- Reposted by Gabriel✅ #LLM literacy is table stakes for defenders, CTI analysts, and #cybersecurity professionals of all stripes now. Still looking for a way into this complex field? 🤔 LABS has got you covered! Start here: s1.ai/inside-llm-1 @sentinelone.com
- Great post from @philofishal.bsky.social on the initials stages of the LLM training pipeline! www.sentinelone.com/labs/inside-...
- Reposted by Gabriel"this new chemical process operates at ambient temperature and pressure. It chemically dissolves the glue holding the blade together. The high-value carbon fiber can be recovered, cleaned, and reused in everything from new turbines to car parts." interestingengineering.com/energy/china...
- Reposted by Gabriel‘CURTAINS FOR OPSEC? T-SMOG AND FRATBOY CAUGHT FLIPPING A GOV’
- Reposted by GabrielAmong the many reasons you don’t kidnap a foreign head of state at gunpoint even if you have the capability, is that it sparks consequences you can neither control nor anticipate.
- Reposted by GabrielBill Watterson could do Sin City but Frank Miller could not do Calvin and Hobbes
- Reposted by Gabrieleveryone thinks they’re a bayesian until they have to update their priors
- Reposted by GabrielI'm speaking at the @SANSInstitute #CTISummit on an operation against #Rhadamanthys years before #OperationEndgame. sans.org/u/1CtB
- More research and observations on LLMs and Ransomware from me and the team! www.sentinelone.com/labs/llms-ra...
- Tl;dr - LLMs are accelerating the ransomware lifecycle. - Measurable gains in speed, volume, multilingual reach. No step-change in novel TTPs - Self-hosted LLMs will likely be the go-to for top tier actors. - Defenders should prepare for incremental but rapid adversary efficiency gains.
- Reposted by Gabriel[This post could not be retrieved]
- For anybody interested, my teammates and I wrote some predictions for next year.: www.sentinelone.com/blog/cyberse... thread below with some thoughts.
- Not breaking news, but most AI startups are just features that will be absorbed by incumbents. Generic “co-pilot for X” will die. Those that survive will be small, specialized teams with proprietary data and efficient training pipelines.
- AI is now a punchline for users and a magnet for capital. We are in a bubble. Pure play AI valuations will crater, when they do everybody will realize that the folks who were fired because of “AI efficiencies” were actually fired because money was cheap in the past and expensive in the present.
-
View full threadOn the cyber criminal and military front dual use will eat alignment. The same tech that politely refuses uncomfortable questions in public will be tuned privately for offense, surveillance, and control.
- Scribbling in the margins of these LLM cyber capability evaluations: “horse difficulty has not been solved… absolutely nothing exists, which is scandalous!… unpreparedness disgraceful… horse question in disgraceful state!”
- What questions do folks have about the global use of open source models?
- Reposted by GabrielThe kinds of databases: • Fancy files • 1970s stays winning • This would be so cool if it worked at scale • Oh, that's why google has a monopoly on search
- Reposted by Gabriel
- Reposted by Gabrielso you’re telling me that olives nuzzi isnt some sort of pasta dish?
- “Our model is very dangerous, but also not quite functional enough to cause damage Also here are no indicators that the community could use to identify more abuse.” Cmon folks, if you are serious about stopping AI abuse then let’s see less marketing and more actionable intel.
- Reposted by GabrielDepthAnything3 slam
- Reposted by GabrielGood things are possible and we don’t have to settle.
- Reposted by Gabriel2026 DBIR sneak peek: “Water plays an increasingly significant role in [ransomware] attacks. In 2024, 100% of recorded ransomware events were attributed to threat actors that drink water”
- Our presentation from LabsCon25 for those who missed it - LLM Enabled Malware In The Wild. www.sentinelone.com/labs/labscon...
- I miss when the internet was fun.
- Reposted by GabrielWhat if we did a single run and declared victory
- Reposted by Gabriel"Sunset Dunes is a testament to what happens when San Francisco thinks big and invests in public spaces. [...] And it reminds us that we shouldn’t let fear of change keep us from imagining something better for our neighborhoods." www.sfchronicle.com/opinion/open...
- The only people I know that refer to ChatGPT as “Chat” are those in romantic relationships with it. nypost.com/2025/10/16/b...
- Which is to say that as the context window fills up it just acts as a mirror for how the individual wants to be treated. Yikes.
- Reposted by Gabriel"I don't have anything to hide why should I care about privacy?"
- Reposted by GabrielNormal person: I asked AI and it told me-- Every AI researcher:
- Reposted by Gabriel“What if you could fuck the singularity?” is the apotheosis of technofuturism (2025)
- Reposted by GabrielBREAKING: Friday night massacre underway at CDC. Doznes of "disease detectives," high-level scientists, entire Washington staff and editors of the MMWR (Morbidity and Mortality Weekly Report) have all been RIFed and received the following notice:
- Some research from my team!
- 🔎 Attackers are embedding LLMs directly into malware, creating code that can generate malicious logic at runtime rather than embedded in code. 🔥New @sentinellabs.bsky.social research by @alex.leetnoob.com, @vkamluk.bsky.social, and Gabriel Bernadett-Shapiro at #LABScon 2025. 🔥 s1.ai/llm-mw
- Reposted by Gabrieljames comey (2025)
- Not the BPO report we need, but definitely the one we deserve.
- We are releasing details on BRICKSTORM malware activity, a China-based threat hitting US tech to potentially target downstream customers and hunt for data on vulnerabilities in products. This actor is stealthy, and we've provided a tool to hunt for them. cloud.google.com/blog/topics/...