xjdr
hot takes, linear Algebra, JAX apologist, Raconteur
- I have become radicalized
- so far the experience has been pretty good here but the default feeds are _terrible_. feels like its going to take a few weeks to whip these feeds into shape with mutes and "show less like these" plus lots of likes. Following feed is good but i need to follow a lot more people
- very interesting work and it reminds me a bit of this paper. Tokenizers and ROPE must die. after samplers, i am on to those next ... arxiv.org/abs/2407.036...
- i keep forgetting to include this cause i always assume people do this by default. Any time there is an exponent or a norm, you should be working in the highest practical precision
- the BigVision repo is my current reference impl for gemma and ViT. such an underrated repo @giffmana.bsky.social and team are doing the lord's work github.com/google-resea... github.com/google-resea...
- now that people are paying attention again, here is your periodic reminder. Always run in bf16. always apply ROPE and attention softmax at float32 (as shown here) github.com/xjdr-alt/ent...
- Reposted by xjdrSo first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.at://did:plc:vg3thtvfbgfrr3u6pf6hy3yk/app.bsky.graph.starterpack/3lbphjvucu32k