Our latest protein family-based GenAI collection of tools and datasets, ProFam, is out now. Everything -- from data, training and inference code, to a 215M llama-based ProFam-1 are fully open sourced.
🧵
With ProFam-1, we scaled learning from single sequence to protein family definitions of different kinds, curating a large protein family corpus, ProtFam-atlas. I'm particularily stoked about the idea of inference-time-compute. This contribution laid out a very exciting path for future work.
In essence, we probed the model's ability to ricapitulate family statistics, bootstrap protein structure prediction, and assess mutation effect, demonstrating excellent performance across all tasks, especially using test-time-scaling via prompt conditioning.
Dec 22, 2025 14:57You can use the model right now to freely generate families for single sequence inputs (i.e., diversification conditioned by intrinsic representations of evolution), or to engineer proteins based on family promts (diversification by conditioning on particular evoluationary trajectories).