New publication alert. :-) We published our new tool MultiStageSearch: An Iterative Workflow for Unbiased Taxonomic Analysis of Pathogens Using Proteogenomics in Journal of Proteome Research
pubs.acs.org/doi/10.1021/...
#TeamMassSpec #Bioinformatics #Proteomics #Proteogenomics #Pathogens
MultiStageSearch: An Iterative Workflow for Unbiased Taxonomic Analysis of Pathogens Using Proteogenomics
The global SARS-CoV-2 pandemic emphasized the need for accurate pathogen diagnostics. While genomics is the gold standard, integrating mass spectrometry-based proteomics offers additional benefits. However, current proteomic and genomic reference databases are often biased toward specific taxa, such as pathogenic strains or model organisms, and proteomic databases are less comprehensive. These biases and gaps can lead to inaccurate identifications. To address these issues, we introduce MultiStageSearch, a multistep database search method that combines proteome and genome databases for taxonomic analysis. Initially, a generalist proteome database is used to infer potential species. Then, MultiStageSearch generates a specialized proteogenomic database for precise identification. This database is preprocessed to filter duplicates and cluster identical open reading frames to reduce genomic database biases. The workflow operates independently of strain-level NCBI taxonomy, enabling the identification of strains not represented in existing taxonomies. We benchmarked the workflow on viral and bacterial samples, demonstrating its superior performance in strain-level taxonomic inference compared to existing methods. MultiStageSearch offers a flexible and accurate approach for pathogen research and diagnostics, overcoming incomplete search spaces and biases inherent in reference databases.