Tristan Bepler
Scientist and Group Leader of the Simons Machine Learning Center
@SEMC_NYSBC. Co-founder and CEO of http://OpenProtein.AI. Opinions are my own.
- New job openings @openprotein.bsky.social across protein foundation model research, computational protein design, and cloud platform engineering www.openprotein.ai/careers
- Our preprint on sequence-to-property learning and zero-shot fitness prediction with PoET-2 is live: arxiv.org/abs/2508.04724 PoET-2 is also open sourced on github: github.com/OpenProteinA... Thanks to the @openprotein.bsky.social team!
- Reposted by Tristan BeplerBoltz-1 & Boltz-2 now live via GUI & APIs! Predict protein, protein–RNA/DNA/ligand structures with confidence scores & binding affinity metrics for virtual screening. Compare finetuned models in the new overview page to find your best performer fast. www.openprotein.ai/early-access...
- Reposted by Tristan BeplerProduct update: Indel Analysis lets you score insertions/deletions across your sequence using PoET-2. You can now also compare multiple 3D structures in Mol* to evaluate design alternatives. Sign up now: www.openprotein.ai/early-access...
- Why does no one in AI protein engineering work on indels? We’re solving this at OpenProtein.AI. Check out our upcoming indel design tool! 🤩 1/4 @openprotein.bsky.social
- It supports screening deletions, insertion sites, and replacement sites. Explore viable shortened proteins, or insert new structural or functional sequences like localization signals or structural tags. 2/4
- Indels are still a major challenge for variant effect prediction and protein design. PoET-2 has significantly improved the state-of-the-art for functional and clinical indel variant effect prediction. 3/4
- How would you use a tool like this? Do you design or screen indels in your work? 4/4
- Reposted by Tristan BeplerHave we hit a "scaling wall" for protein language models? 🤔 Our latest ProteinGym v1.3 release suggests that for zero-shot fitness prediction, simply making pLMs bigger isn't better beyond 1-4B parameters. The winning strategy? Combining MSAs & structure in multimodal models!
- Reposted by Tristan BeplerProduct update: PoET-2 now supports structure inputs for enhanced prediction and design via Python APIs. Check out our new inverse folding tutorial to see it in action. 🔗 docs.openprotein.ai/walkthroughs... Sign up for OpenProtein.AI: www.openprotein.ai/early-access...
- Generative protein sequence design, variant effect prediction, and fine-tuning are now fully supported for PoET-2 with structure and sequence prompts in the @openprotein.bsky.social python client and APIs! Check out our new walkthrough on inverse folding: docs.openprotein.ai/walkthroughs...
- Sign up for OpenProtein.AI (free for academic use): www.openprotein.ai/early-access... and install the python client to get started: github.com/OpenProteinA...
- Learn more about PoET-2 in our whitepaper: www.openprotein.ai/a-multimodal...
- Huge thanks to the @openprotein.bsky.social team! We've got more exciting PoET-2 updates to come 🚀
- Reposted by Tristan Bepler🧬 Protein Revolution: The Tiny Model Making a Massive Impact! PoET-2 is changing the game in computational protein design, slashing experimental data needs by 30x! 🚀 learn more: www.synbiobeta.com/read/protein... #ProteinDesign #BiotechInnovation #AIRevolution
- Excited to share PoET-2, our next breakthrough in protein language modeling. It represents a fundamental shift in how AI learns from evolutionary sequences. 🧵 1/13
- Since our first protein language models in 2019 (in Bonnie's lab!), the field has focused on scale - building ever-larger models up to 100B parameters to extract information from natural sequence databases. 2/13
- But memorizing sequences isn't enough. The real challenge: can we build models that learn the fundamental principles that govern how proteins evolve and function? 3/13
-
View full threadHuge thanks to our incredible team @openprotein.bsky.social, especially Tim Truong. This is just the beginning of AI systems that truly understand protein biology. I can’t wait to see what the community can do with these models! 13/13
- Reposted by Tristan Bepler🧬 Announcing PoET-2: A breakthrough protein language model that achieves trillion-parameter performance with just 182M parameters, transforming our ability to understand proteins.
- Reposted by Tristan BeplerCan we learn protein biology from a language model? In new work led by @liambai.bsky.social and me, we explore how sparse autoencoders can help us understand biology—going from mechanistic interpretability to mechanistic biology.
-
View full threadReposted by Tristan BeplerWe’re excited about the potential of SAEs in biology and would love to hear your ideas. Our preprint: www.biorxiv.org/content/10.1... Visualizer: interprot.com Github: github.com/etowahadams/... HuggingFace: huggingface.co/liambai/Inte...
- A flexible framework for fast and accurate segmentation of filaments and membranes in tomograms and micrographs - the TARDIS manuscript is now live @biorxivpreprint.bsky.social ! Thanks to hard work by Robert Kiewisz and our many collaborators! www.biorxiv.org/content/10.1...
- We provide pre-trained networks for microtubule and membrane semantic segmenation and instance segmentation of general surface and linear-like structures.
- As a flexible framework, new semantic segmenation networks can be plugged in for new biomolecules and imaging modalities, which we show in application to actin and TIRF microscopy.
-
View full threadTARDIS is open source on github (github.com/SMLC-NYSBC/T...) and can also be installed via pip
- I'll be talking about shrinking protein language models with #PoET and protein engineering at openprotein.ai tomorrow at A*STAR's Bioinformatics Institute. If you can't make it, I'll also be presenting at the Berger Lab seminar @mitofficial.bsky.social on Wednesday!
- Reposted by Tristan BeplerYet more evidence that transfer learning of sequence-only PLMs does not benefit from scale beyond 650M params 🧵
- The complete guide for transfer learning with the protein language model ESM-2. (In brief: Use ESM-2 650M and calculate mean embeddings across sites.) www.biorxiv.org/content/10.1...