Our latest protein family-based GenAI collection of tools and datasets, ProFam, is out now. Everything -- from data, training and inference code, to a 215M llama-based ProFam-1 are fully open sourced.
🧵
Built by CATH, TÜM and NVIDIA, ProFam-1 is our new open-source protein family language model (pfLM) designed to generate functional protein variants and predict fitness using in-context example sequences.
With ProFam-1, we scaled learning from single sequence to protein family definitions of different kinds, curating a large protein family corpus, ProtFam-atlas. I'm particularily stoked about the idea of inference-time-compute. This contribution laid out a very exciting path for future work.
Dec 22, 2025 14:57In essence, we probed the model's ability to ricapitulate family statistics, bootstrap protein structure prediction, and assess mutation effect, demonstrating excellent performance across all tasks, especially using test-time-scaling via prompt conditioning.
You can use the model right now to freely generate families for single sequence inputs (i.e., diversification conditioned by intrinsic representations of evolution), or to engineer proteins based on family promts (diversification by conditioning on particular evoluationary trajectories).