- Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social doi.org/10.1038/s414...
- Our research introduces a novel language model designed specifically for proteins, tackling the complexity of viral proteins—the "viral dark matter." PST enhances protein understanding and power high-throughput viromics to reveal hidden ecological & functional insights in viral genomic datasets.
- Key findings include: Improved Protein Functional Prediction: Leveraging the Protein Set Transformer, we can make more accurate predictions about viral protein functions, even in cases where homology is absent.Dec 15, 2025 18:43
- Deciphering Auxiliary Metabolic Genes: Our model allows for a deeper exploration of auxiliary viral genes (AVGs), shedding light on their roles in viral life cycles and host interactions.
- Scalability for Viromics Research: The approach paves the way for high-throughput analysis, making it feasible to explore vast and previously uncharacterized viral genomes, ultimately unlocking their ecological roles.
- This research represents a significant step towards understanding the crucial roles of viruses in ecosystems. Excited to see where this will lead us in virology and microbial ecology!