- New paper out in Bioinformatics! PanForest uses random forests to predict gene presence/absence in bacterial genomes based on other genes present. Joint work with Alan Beavan & Maria Rosa Domingo-Sananes. 🔗 doi.org/10.1093/bioinformatics/btag005
- The core insight: genes don't distribute randomly across genomes. Some genes "like" to co-occur, others avoid each other. PanForest learns these patterns and tells you which genes are predictable from their genomic context.Jan 14, 2026 22:25
- We tested it on 1,000 E. coli genomes with ~12,700 accessory genes. Runs in ~5 hours on 8 processors. Scales to Network of Life pangenomes.
- Case study: antimicrobial resistance genes. Certain AMR genes reliably predict other AMR genes for the same drug. But we also found unexpected associations with genes NOT previously linked to resistance. New targets for investigation?
- Outputs both prediction accuracy (how predictable is each gene?) and feature importance (which genes matter most for each prediction?). Useful for understanding genome organisation, synthetic biology, and molecular ecology.
- Open source & freely available: github.com/alanbeavan/PanForest Congrats to Alan on leading this work! 🎉