atj15: Extracting and Reconstructing LLM Backdoor Triggers "Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for iden- tifying sleeper agent-style backdoors in causal language models." arxiv.org/pdf/2602.03085

Extracting and Reconstructing LLM Backdoor Triggers "Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for iden- tifying sleeper agent-style backdoors in causal language models." arxiv.org/pdf/2602.03085

https://arxiv.org/pdf/2602.03085