Yoav Goldberg
- play this www.puzzlescript.net/play.html?p=...
- I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e7...
- the fascinating (to me) quality of hard-core RL researchers (e.g Sutton) is the ability to have an all encompassing view of RL as the basis of intelligence, while at the same time working on super low level stuff like tabular TD algorithms, and yet strongly believe these are actually the same thing
- what's the latest-and-greatest attempt to reverse-engineer and document the inner-working of claude-code?
- lets talk about "In context learning". it is clearly NOT "learning", because its ephemeral. It IS some form of generalization from examples, which is very cool. but we need a name. how do we call this skill of generalization from example?