jsulz
I like pretty things, functional things, funny things, food things, and computer things.
Former: Storage infra things 🤗 @hf.co, Devops things @lexblog.bsky.social and devex/cloud infra things at @pantheon.io
- Sometimes I am struck by the weirdness of our dystopian present.
- Great piece from @frimelle.bsky.social and @giadapistilli.com Our use of AI is still in the early stages and none of what is to come is preordained. We can paint a different future for ourselves that is better than the dreary present of Web 2.0.
- Beautiful post from @yewjin.bsky.social "Success is a lousy navigation system for the second half of life." Potentially a lousy system for any half of your life, but more pressing as time goes on.
- The Hub is on 100% on Xet. 🚀 A little over a year ago, @hf.co acquired XetHub to unlock the next phase of growth in models and datasets. huggingface.co/blog/xethub-... In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.
- Today, we've finalized this first phase of migrating the Hub to a new, modern storage system. One that's built to scale with AI builders of today and tomorrow. huggingface.co/blog/from-fi... There's still a lot of work to do, but we're excited for what's next. 💪
- Just a reminder that this is a legit status code
- The rise of AI-generated workslop.
- Nice breakdown by the @anthropic.com team of a few recent infra bugs that led to the worst nightmare of any engineer: "random, inconsistent degradation." Good reminder that these genies still rely on solid infrastructure, good evaluations, and constant monitoring.
- Hard not to 🙄 at this section of Zuck's vision of "Personal Superintelligence" "Personal devices like glasses that understand our context because they can see what we see, hear what we hear, and interact with us throughout the day will become our primary computing devices."
- We just crossed 1 million repositories backed by Xet storage on @hf.co I celebrated by reviving the early 2000s web design aesthetics that I love so much. Here's our dashboard showing our progress converting the Hub from Git LFS to Xet (and demonstrating my questionable design sensibilities).
- This also serves as a reminder to myself that I owe a round of "Thank you"s to all the talented designers I've worked with over the years.
- Perhaps the bitter lesson about all organizational design is that all you need is a garbage can of chaos.
- Loved this post from @henrikkarlsson.bsky.social "There have been a series of experiences that have helped me realize more of my agency, but I think the most important one was becoming a father" 💯💯💯💯
- We've moved the first 20PB from Git LFS to Xet on @hf.co without any interruptions. Now we're migrating the rest of the Hub. We got this far by focusing on the community first. Here's a deep dive on the infra making this possible and what's next: huggingface.co/blog/migrati...
- The engine behind moving from Git LFS to Xet is our migration process. It's simple, powerful, and has moved well over a dozen PB just by itself. Here's a high level view of how it works.
- You can see over the past few months some of the biggest migrations show up in our cluster throughput. Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
- A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes We call this the Git LFS Bridge internally, and like our migration process, it's power is in its simplicity.
- A look into monitoring/observability at @hf.co Some fun tidbits in here, like how we use our NAT gateway as a cost sentinel. Cloud infra costs are no joke.
- "A close friend has used em-dashes since our days in college, and yet every time they include one in a text to me, I can't help but think, "Did an LLM write this?"
- Further proof that cute animals are the great distractors.
- On using AI for personal messages: “We want to just write a prompt and have it done. And there’s something that we are losing – it’s the process. And in the process, there’s many important aspects. It is the co-construction of ourselves with our activities”
- Privacy concerns are legitimate and need to be addressed, but a larger part of me is concerned about the social, cultural, and cognitive impacts of a "magic genie bot that is going to take care of the exigencies of life"
- What happens when you give an LLM a high-level objective to manage a small business and an incomplete toolset to achieve its aims? It makes questionable inventory and sales decisions, loses most of its money, and has an identity crisis. Not *so* far off from how I would perform.
- And so the march to the season of darkness begins.
- The sun will rise in #Seattle #Washington tomorrow at 5:13, 26 seconds later than the day before. It will set at 21:11, 2 seconds earlier than the day before.
- It's been a bit since I took a step back and looked at our progress to migrate @hf.co from Git LFS to Xet, but every time I do it's mind boggling. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? 🤗 700,000 users/orgs 📈 350,000 repos 🚀 15PB
- Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
- These are hard numbers to put into context, but let's try. The latest run of Common Crawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. 🤯🤯🤯
- "All of these jobs are ultimately about trust and responsibility. Not only does the task need to be done, someone needs to take responsibility for what was delivered."
- Of course, both the above and this sentiment put forth by @petebuttigieg.bsky.social can be true at the same time.
- And BOTH of the above can be true and we can ALSO agree that @petebuttigieg.bsky.social having a Substack is weird.
- Stumbled across deepwiki.com last night. Great resource for anyone trying to get up to speed on an open source repo. Does a pretty good job of explaining the xet-core codebase and Xet deduplication tech on Hugging Face deepwiki.com/huggingface/... (probably better than I have 😅)
- A bit of reflection on being a new parent; mostly on my perception of time as a new father. TL;DR - I have less time 🤪 (but more purpose).
- Forest bathing. A common Washington state pastime.
- I've been working on cutting down my, "I'm sorry, I meant to be more clear in my last prompt" messages too. There is a price to being polite. Good breakdown by Julien Delavande, @sashamtl.bsky.social, and Régis Pierrard.
- Just get the Windows box back up and running to troubleshoot some user reports about slow downloads in WSL. It'll be easy. I *won't* spend the entire morning downloading software and rebooting the machine.
- Learned GCP was out by seeing my dog door monitor was broken (backed by a cloud SQL instance). How am I going to replay/track the events of her going in and out the dog door during this outage? These are the important questions.
- A philosophy to live by (also, check out those beards 🧔🧔)
- "It started a few years ago when Peter was raised to the rank of department head and was careless enough to leave a portrait of himself floating around." How one man's face came to be plastered all over the Bell Labs campus (including on a watertower) during the early days of Unix development.
- Awk is for kids and adults alike, just like A&W's root beer floats.
- I misspelled avocado once while grocery shopping. Now I only buy avacardos.
- I've seen a few posts like these recently. It's disappointing. In general, I'm a social media lurker, but I'm here because there are some great minds posting interesting things. Without forums like these, there is no good way for me to hear these voices.
- I'm not so interested in the other site for personal reasons. I'll continue to carve out a space here, but this is a reminder to find other spaces too. That said, if you're looking for a good way to stay in touch with the AI/ML community here, this post from @nsaphra.bsky.social is a good primer