Maria Khalusova
Always growing, she/her, RAG builder, LLM whisperer, tech generalist
- Really looking forward to food-induced coma naps
- What's going to be different for you in 2026?
- Just entered my next decade and I think it’ll be the best one yet.
- Once again, I completely forgot that I have this account. Oops
- Things move fast in AI. Every week brings new models, new capabilities, or new ideas to chase. It’s exciting, but also easy to get swept up in the pace and forget to pause, to touch grass, to zoom out and see the bigger picture. 🧵
- Next week, I’m stepping away for a couple of months to take a sabbatical and spend time with my kids. I’m not burnt out. I’m following my own advice: do the thing you’ll regret not doing when you’re old.
- Kids won’t be kids forever, and mine are getting ever so close to becoming teenagers. Now is time I know I’ll never get back. I’m incredibly grateful to be in a place, both professionally and personally, where this is possible.
-
View full threadPS: That said, I’ll probably still keep an eye on what’s happening and may even share some posts every now and then. I’ve got a lot of thoughts on RAG, data processing, LLMs/VLMs, etc., so I likely won’t disappear fully.
- Asking “What is the best chunk size for RAG?” without any additional context is like asking, “What’s the best thing to wear?” Wear where? What’s the weather like? What size are you? Are you going to a wedding or hiking a trail? There’s no single answer that works for every situation. 🧵
- Same goes for chunking. The “best” chunk size depends on a range of factors, and without those, the question is incomplete. Here are some of the questions to ask instead: * What does your data look like? Financial statements, technical manuals, customer support transcripts are not the same.
- They all vary in structure, style, and length. * What is your use case? Are you trying to answer questions with specific facts? Are you gathering multiple documents to summarize for a report? Do you pull from transcripts and need to preserve speaker attribution?
-
View full threadRAG exists to solve different problems across varied domains. Understand the problem you’re solving and look at your data.
- Do the thing that you will regret not doing when you're old.
- I went to check what new courses deeplearning.ai has, and was pleasantly surprised to see that the short course Marc Sun, Younes Belkada, and I have built over a year ago is still featured as one of the Top Rated courses 😍
- I'm taking this whole developer becoming a farmer dream way too far, am I?
- At least I have interrupted your doomscrolling with some cuteness!
- If you've been prioritizing urgent work, make sure to prioritize important work.
- How anyone can like peanut butter is beyond me.
- Similar ≠ relevant
- I'm starting a series of blog posts on RAG beyond the basic set up. In the first part, we're setting the stage. Why naive RAG is not enough, and how a lot of the issues can be traced back to data processing choices. Part 1: unstructured.io/blog/level-u...
- Part 2 is a high-level overview of advanced RAG techniques: unstructured.io/blog/level-u...
- Nothing starts a Wednesday morning quite like your dog getting sprayed by a skunk 🤢
- I have some epic plans for this summer and none of you’ll be able to guess what they are.
- What you're not changing, you're choosing. This is a gentle reminder for the next time you're prioritizing a cool new shiny thing over building the foundation or addressing tech debt.
- Word of the day seems to be "sycophantic". Thanks AI community for increasing my vocabulary :)
- 32 pages - is it still a blog or do I call it a book now?
- I don’t like the term “AI-assisted coding”. AI is just another tool in the box. We've had code completion and refactoring features in IDEs for a hot minute now, do we call it “IDE-assisted coding”? We just use these things to make our lives easier. 🧵
- Though there used to be people saying things along the lines of “You’re not a real programmer if you don't code without your IDE’s fancy features!" or "True devs only need Notepad and a terminal”. Surely, we’ve long moved past that. It doesn't matter what tools you are using - it's about writing
- good code. Are we now repeating the same sentiment but with LLMs? AI is just another tool - like a refactoring feature in the IDE or a debugger. Debugger doesn’t find the issue - you do, faster with its help. So does AI help you work faster or more efficiently, but at the end of the day,
- you’re doing the work. You’re still making decisions, solving problems, and putting together something useful. Coding with AI only becomes “vibe coding” if you’re not paying attention or care to what you or your tools are doing.
- Apropos of nothing, here's a tip. Some embedding models are robust against typos, and others are sensitive to them. That's a whole rabbit hole on its own 🤓, but long story short - this will matter if your users are fat-fingering the queries.
- E.g. `BAAI/bge-small-en-v1.5` vs `text-embedding-3-large`
- Today I learned that Postgres doesn't delete columns. Not from the catalog and not from data files. Deleted columns are only "soft deleted". But they still exist. And now I wonder if this is why the logo is an elephant. As in, it never forgets.
- Source: x.com/gwenshap/sta...
- I think I pissed off Live Demo gods at some point, they were not on my side today.
- I may need to travel to the US soon, and this genuinely worries me.
- I'm looking for incredible people to join my team! If you're excited about AI and know that great AI starts with great data - If you love building, learning, and helping others do the same - DM me. Let's talk. And if this sounds like someone you know, feel free to RT!
- We use RAG to look up context for an LLM, next step should be looking up available MCP Servers and their tools. We need a middle layer between the LLM and servers, to look up available tools based on the request instead of having all of that info blow up the context window.
- April appears to be a good month for hitting milestones.
- Years of breaking down complex concepts to a broad audience turn out to be rather useful in vibe coding.
- Figuring out Jira automations is not something I had envisioned for myself, but here we are 🤣