Recently published in Nature Communications, PepQuery2 is a powerful tool that facilitates fast and targeted identification of both new and existing peptides in proteomics datasets obtained from mass spectrometry experiments. It offers a peptide-centric approach, allowing users to search for specific peptide or protein sequences of interest within massive public datasets of MS/MS spectra. This enables researchers to access and analyze vast amounts of proteomics data directly, thereby unleashing the full potential of these resources for biomedical research.
The tool is available in two versions: a stand-alone version and a web version. The stand-alone version enables users to search through over one billion indexed MS/MS spectra stored in PepQueryDB or public datasets from various repositories like PRIDE, MassIVE, iProX, and jPOSTrepo. Additionally, it supports direct analysis of user-provided MS/MS data. The web version offers a user-friendly interface for querying datasets available in PepQueryDB, including datasets from flagship CPTAC studies.
PepQuery2 has proven effective in detecting proteomic evidence for newly predicted peptides based on genomic data, prioritizing tumor-specific antigens for cancer research, identifying missing proteins, and selecting proteotypic peptides for targeted proteomics experiments. PepQuery2 also pairs well with spectrum-centric database searches; a peptide-centric approach complements these existing algorithms while also being capable of validating them. The tool's stringent criteria for novel peptide validation and competitive filtering based on unrestricted modification searching help ensure accurate results. Additionally, PepQuery2's new MS/MS data indexing method allows for quick retrieval of candidate spectra from large-scale proteomics datasets, enabling scalable and user-friendly querying of billions of indexed MS/MS spectra in PepQueryDB from any computer with internet access. This indexing technology has the potential for broader applications in other MS-based omics technologies, such as metabolomics, for fast and targeted data analysis.
Overall, PepQuery2 demonstrates its usefulness in various applications and holds promise for transforming public MS proteomics data into valuable information for the research community. By democratizing access to these datasets, PepQuery2 offers an invaluable resource that can transform raw data into meaningful information for the broader research community. The tool's efficiency, accuracy, and comprehensive consideration of protein modifications contribute to reducing false discoveries, making it a valuable asset in the field of proteomics research.