See full post

Yoav Goldberg

yoavgo.bsky.social

Followers · Following

Joined April 2023

Posts Replies Media Original posts Likes Lists

Yoav Goldberg yoavgo.bsky.social · Dec 20, 2025
play this www.puzzlescript.net/play.html?p=...

View on Bluesky Show all post labels

Yoav Goldberg yoavgo.bsky.social · Dec 5, 2025
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e7...
rl-wrong-about-rewards.md

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

View on Bluesky Show all post labels

Yoav Goldberg yoavgo.bsky.social · Nov 27, 2025
the fascinating (to me) quality of hard-core RL researchers (e.g Sutton) is the ability to have an all encompassing view of RL as the basis of intelligence, while at the same time working on super low level stuff like tabular TD algorithms, and yet strongly believe these are actually the same thing

View on Bluesky Show all post labels

Yoav Goldberg yoavgo.bsky.social · Nov 17, 2025
what's the latest-and-greatest attempt to reverse-engineer and document the inner-working of claude-code?

View on Bluesky Show all post labels

Yoav Goldberg yoavgo.bsky.social · Nov 16, 2025
lets talk about "In context learning". it is clearly NOT "learning", because its ephemeral. It IS some form of generalization from examples, which is very cool. but we need a name. how do we call this skill of generalization from example?

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙