Dongwook Kim: Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. 🧵1/n 📃 doi.org/10.64898/2026.01.26.701724

See full post

Dongwook Kim dongwookkim.bsky.social
Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. 🧵1/n 📃 doi.org/10.64898/2026.01.26.701724
AmpliPhy improves gene trees by adding homologs without affecting alignments

In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy. ### Competing Interest Statement The authors have declared no competing interest. Swiss National Science Foundation, https://ror.org/00yjd3n13, 216623, 10005715

doi.org
Jan 28, 2026 06:10
0 reposts 0 quotes 0 likes

View on Bluesky Show all post labels
Dongwook Kim dongwookkim.bsky.social · Jan 28
We devised a benchmark method to quantify the impact of homolog enrichment on phylogenetic inference, decomposing the effects on MSA quality, tree inference quality, and rooting. We show homolog enrichment improves tree inference, while effects on alignments remain marginal. 🧵2/n

View on Bluesky Download image Show all post labels
Dongwook Kim dongwookkim.bsky.social · Jan 28
At lower taxonomic levels (e.g., Aminotes), this improvement was associated with more precise root placement. This provides empirical evidence that denser taxon sampling can ameliorate gene tree inference of closely related species by adding information for accurate rooting. 🧵3/n

View on Bluesky Download image Show all post labels
Dongwook Kim dongwookkim.bsky.social · Jan 28
This effect is maintained when MAFFT adds homologs onto an existing MSA without disrupting column structure. Based on these findings, we developed AmpliPhy, a Nextflow pipeline that automates database-driven homolog enrichment for improved gene tree inference at scale. 🧵4/n

View on Bluesky Download image Show all post labels
Dongwook Kim dongwookkim.bsky.social · Jan 28
This work has been done by a collaborative effort with Manuel Gil (ZHAW), Kazutaka Katoh (UOsaka), and @dessimoz.bsky.social (UNIL/SIB). Try AmpliPhy now, we appreciate your feedback! 🌐 github.com/dessimozlab/ampliphy
GitHub - DessimozLab/ampliphy: Improve phylogenetic inference by amplifying multiple sequence alignment with homologous sequences

Improve phylogenetic inference by amplifying multiple sequence alignment with homologous sequences - DessimozLab/ampliphy

github.com

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙