Yun S. Song
Professor of EECS and Statistics at UC Berkeley. Mathematical and computational biologist.
- Published online on Jan 2, 2025 and just appeared in the December 2025 issue!
- A language model predicts the effects of genetic variants in the human genome go.nature.com/4gWppWg rdcu.be/eUK6Z
- Reposted by Yun S. SongThe registration deadline is fast approaching for probgen 2026! Abstracts due by January 15, registration by January 31 probgen2026.github.io
- Reposted by Yun S. SongOver the past 5+ years I've had the honor of working with @wsdewitt.github.io @victora.bsky.social and many others on a project to "replay" affinity maturation evolution from a fixed starting point. matsen.group/general/2025...
- Reposted by Yun S. SongOrganisers - Shu Zhang | @gladstoneinst.bsky.social Invited Speaker - @yun-s-song.bsky.social | @ucberkeleyofficial.bsky.social
- Reposted by Yun S. SongHow to keep in step when your (protein) partner speeds up… Here we investigated the adaptive remodeling of a protein-protein interaction surface essential for telomere protection. Congrats to whole team! www.science.org/doi/10.1126/...
- Reposted by Yun S. SongThe last work of my PhD is finally out: www.pnas.org/doi/10.1073/...! This work is about accurately estimating branch length in the Ancestral Recombination Graph (ARG), which is achieved by a really simple framework with minimal assumptions. (1/n)
- An open-rank faculty search in AI + Engineering (Bioengineering included) at UC Berkeley. Due date: Monday, Nov 3, 2025 at 11:59pm (PT) Please help spread the news. aprecruit.berkeley.edu/JPF05144
- Reposted by Yun S. SongThis is truly an incredible breakthrough IMO. Really exemplifies what you get when deep domain expertise (popgen/evolution/disease genetics in this case) fuses with cleverly crafted ML. What u get r sleek, well thought out architectures that absolutely destroy the behemoths. Wow!! 1/
- We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics. www.biorxiv.org/content/10.1... (1/n)
- We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics. www.biorxiv.org/content/10.1... (1/n)
- GPN-Star features a novel phylogeny-aware architecture that enables the model to explicitly capture evolutionary relationships encoded in whole-genome alignments and overcomes the key limitations of our earlier model GPN-MSA (doi.org/10.1038/s415...). (2/n)
- We also introduce a calibration method that removes the confounding effect of mutation rate variation from gLM predictions for the first time. This improves downstream performance and enables a more direct interpretation of model scores as estimates of selective constraint. (3/n)
-
View full threadAll in all, we believe that GPN-Star offers a scalable & flexible approach for training effective gLMs. This work was led by my talented students @czye.bsky.social and @gonzalobenegas.bsky.social, with contributions from other lab members, @peterdfields.bsky.social at Jax, & B. Clarke at DKFZ (n/n)
- SINGER, our ARG inference method, is finally published and freely available online: doi.org/10.1038/s415... It was a long journey – 16 months from initial submission to acceptance. Is it just me, or has peer review gotten more arduous lately? 4+ rounds of review isn't so unusual these days...
- Reposted by Yun S. SongHi Bluesky — Dedicating my first post to this work and software, led by the incredibly meticulous and capable @fandingzhou.bsky.social! An earlier version of this was shared at the 2022 Bioconductor Conference (bioc2022.bioconductor.org/schedule/).
- Gene expression changes aren’t just about mean shifts — variability shifts matter too, especially for aging. We're thrilled to introduce QRscore, a flexible non-parametric framework for detecting shifts in mean and variance across conditions. doi.org/10.1016/j.cr...
- Reposted by Yun S. SongGene expression changes aren’t just about mean shifts — variability shifts matter too, especially for aging. We're thrilled to introduce QRscore, a flexible non-parametric framework for detecting shifts in mean and variance across conditions. doi.org/10.1016/j.cr...
- Reposted by Yun S. SongIn a new preprint we use deep learning on lineage trees to infer the functional form of the relationship between affinity and fitness that controls antibody evolution in germinal centers: arxiv.org/abs/2508.09871 🧵
- Antibodies are highly diverse, but most possible sequences are unstable or polyreactive. In this work, just published in Cell Syst., we propose a new source of data for modeling constraints from these properties. Our models show clear improvements in predicting Ab dysfunction. (1/n) t.co/qCZERPUMPF
- Natural antibodies are generated in B cells and tested for function (sufficient expression, low autoreactivity). If either the heavy or light chain fails, the B cell can try to generate it again. We usually can only sequence B cells that have passed all checkpoints. (2/n)
- Most mature B cells express only the final, successful heavy and light chains (allelic exclusion). However, ~1% express two light chains (allelic inclusion). Previous work in mice has found that when this occurs, one of the light chains is dysfunctional. (3/n)
-
View full threadThis work was led by my talented student Milind Jagota @milindjagota.bsky.social in collaboration with colleagues at UC Berkeley, UCSF (the Ye Lab @yimmieg.bsky.social), and Fred Hutch (the Matsen Lab @matsen.bsky.social). We are grateful to all co-authors for their enthusiasm and hard work. (n/n)
- Reposted by Yun S. Song(1/4) 🧬 Why Sequence the Genomes of Earth’s Biodiversity? The Earth BioGenome Project 🌍 is a global network of initiatives working together to create a complete genome library for all Eukaryotic life—from mushrooms 🍄 to mammals 🐘. #biodiversity #genomes #sequence #earthbiogenome #education #stem
- Reposted by Yun S. SongGerminal center clonal diversity trees as a musical score, a great image to start @victora.bsky.social's CCII seminar, "Replaying germinal center evolution on a quantified affinity landscape" #GerminalCenter #Immunology www.ccii.med.kyoto-u.ac.jp/en/event/the...
- Reposted by Yun S. SongIn vivo mapping of mutagenesis sensitivity of human enhancers www.nature.com/articles/s41...
- The 2026 Probabilistic Modeling in Genomics (ProbGen) meeting will be held at UC Berkeley, March 25-28, 2026. We have an amazing list of keynote speakers and session chairs: probgen2026.github.io Please help spread the news.
- Reposted by Yun S. Song[This post could not be retrieved]
- Reposted by Yun S. SongCheck out CRISPRpedia, our resource on all things #CRISPR! The latest chapter is on CRISPR & ethics: innovativegenomics.org/crisprpedia/... CRISPRpedia features 85+ original illustrations that are free to download & use for non-commercial purposes! #STEMeducation #STEMed #bioethics #SciArt
- Reposted by Yun S. SongHow well can deep learning models predict the effect of modifying chromatin on gene expression??? Our work -- led by Sanjit Batra and Alan Cabrera when they were in @yun-s-song.bsky.social ’s and Isaac Hilton’s labs -- tries to answer this. 🧵🧬🧪 elifesciences.org/reviewed-pre...
- Reposted by Yun S. SongNew preprint in collaboration with @paulinanunezv.bsky.social supervised by @jonnyfrazer.bsky.social and Mafalda Dias – we propose a simple approach to improving zero-shot variant effect prediction in pre-existing protein and genome language models: 🧶 1/n www.biorxiv.org/content/10.1...
- How can one efficiently simulate phylodynamics for populations with billions of individuals, as is typical in many applications, e.g., viral evolution and cancer genomics? In this work with M. Celentano, @wsdewitt.github.io , & S. Prillo, we provide a solution. doi.org/10.1073/pnas... 1/n
- Recasting the perils of phylodynamic non-identifiability as a feature not a bug, we show that a unique forward-equivalent process enables exact and efficient simulation from arbitrarily large populations. With M Celentano, S Prillo, @yun-s-song.bsky.social www.pnas.org/doi/10.1073/pnas.2412978122
- Multi-type birth-death-mutation-sampling (BDMS) models – a general class of stochastic processes with birth, death, mutation, and incomplete sampling – have a wide variety of applications in evolutionary biology. Phylogenetic trees are central objects in these studies. 2/n
- Developing and benchmarking inference methods relies on extensive tree simulation, but due to death and incomplete sampling in BDMS models, the tree describing the ancestral relationship of an observed sample represents only a partial history of the full population. 3/n
-
View full threadWe believe that our work opens new avenues for testing existing inference methods as well as developing new ones (e.g., based on machine learning or Approximate Bayesian Computation, both of which require large amounts of training data). n/n
- Reposted by Yun S. SongIn a medical breakthrough, a team including IGI’s @urnov.bsky.social & @giannikopoulosp.bsky.social created an on-demand #CRISPR therapy for an infant with a deadly gene mutation — developed, approved, and delivered to the patient in just 6 months. Read more: ow.ly/G0Bg50VTonC #RareDisease 🧬
- Reposted by Yun S. SongJennifer Doudna @jenniferdoudna.bsky.social @doudna-lab.bsky.social speaks with Cleo Abrams on the history and future of #CRISPR 🧬. Watch here: youtu.be/0OXaanDHENI?...
- Reposted by Yun S. SongOverfitting is among the conceptually most interesting problems in machine learning. I am happy of several new phenomena we began to understand with Pierfrancesco Urbani. Alert: mostly non-rigorous! (Celebrating Jorge Kurchan) web.stanford.edu/~montanar/OT...
- Reposted by Yun S. SongIf you want to check if a human gene has copy-number changes or lands in a complex region, try pangene.bioinweb.org. Recently updated with more and better assemblies.
- Thrilled to see my digital art on the cover of Trends Genet. The two binary strings represent reverse-complementary DNA sequences (00=A, 01=C, 10=G, 11=T) and the connecting rectangles represent “embeddings” learned by DNA language models. Pls check out our article as well: doi.org/10.1016/j.ti...
- In our updated TraitGym preprint (w/ @gonzalobenegas.bsky.social & Gökcen Eraslan), we evaluate Evo 2 on regulatory variants associated with human traits. We see marked performance gains with scale on Mendelian traits, although still a bit behind alignment-based methods. doi.org/10.1101/2025... 1/n
- It seems that scaling substantially improves the performance of Evo 2 for promoters and ncRNA but not for distal enhancer-like elements, which were not part of its data curation process. 2/n
- We inspected the Evo 2 logo at the famous ZRS enhancer with evidence of functionality from conservation, ENCODE biochem assays, and disease associations (polydactyly). While Evo 2 picks up a nearby exon, it does not predict constraint on ZRS. Browser: genome.ucsc.edu/s/gbenegas/e... 3/n
-
View full threadAn earlier thread on TraitGym by Gonzalo can be found here: bsky.app/profile/gonz... n/n
- Can DNA sequence models predict mutations affecting human traits? We introduce TraitGym, a curated benchmark of causal regulatory variants for 113 Mendelian & 83 complex traits, and evaluate functional genomics and DNA language models. Joint work w/ Gökcen Eraslan and @yun-s-song.bsky.social 🧵👇
- Reposted by Yun S. SongCan DNA sequence models predict mutations affecting human traits? We introduce TraitGym, a curated benchmark of causal regulatory variants for 113 Mendelian & 83 complex traits, and evaluate functional genomics and DNA language models. Joint work w/ Gökcen Eraslan and @yun-s-song.bsky.social 🧵👇
- Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics biorxiv.org/content/10.1101/202…
- Reposted by Yun S. SongA month ago we @vevotherapeutics.bsky.social announced that we have generated the largest single-cell perturbation atlas in history, Tahoe-100M. Today, we announce that we will fully open-source Tahoe-100M in Feb, as part of a collaboration with NVidia health to train cell state models.
- Our work, which shows statistical issues with the previous claim of a severe ancient bottleneck in the ancestry of African populations, has been selected as a Featured article in Genetics. doi.org/10.1093/gene...