- I've been shocked that a theory-driven method yields practical results this good, especially on attention approximation. I proposed my best new optimizer design originally as a dumb baseline; the fact that you can get these efficiency gains with a principled approach makes me a lil insecure.
- idk dude it's 30 pages of proofs and then they drop thisFeb 20, 2025 15:53