Ted Underwood: A different approach to interpretability. Instead of probing neurons and circuits, *tune* the models to provide "accurate, quantitative descriptions of their own internal processes during certain kinds of decision-making," and then show that this ability generalizes. #MLSky 🤖

See full post

Ted Underwood tedunderwood.com
A different approach to interpretability. Instead of probing neurons and circuits, *tune* the models to provide "accurate, quantitative descriptions of their own internal processes during certain kinds of decision-making," and then show that this ability generalizes. #MLSky 🤖
Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training

We have only limited understanding of how and why large language models (LLMs) respond in the ways that they do. Their neural networks have proven challenging to interpret, and we are only beginning t...

arxiv.org
Jun 12, 2025 14:31
0 reposts 1 quote 0 likes

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙