In a recent issue of Nature Communications, Clinical Proteomics Tumor Analysis Consortium (CPTAC) researchers from Bing Zhang’s lab at Baylor College of Medicine describe a new bioinformatics software tool developed to evaluate the quality of variant peptide identifications. Using a deep learning algorithm, the software AutoRT, accurately predicts peptide retention time (that is the time points when peptides elute from the liquid chromatography column) based on its sequence.
In mass spectrometry (MS), the use of liquid chromatography (LC) leads to a more comprehensive peptide identification by separating and reducing the number of peptides simultaneously injected into the instrument. To identify peptides, peptide scoring algorithms calculate peptide spectrum matches (PSMs) by comparing query spectra to theoretical spectra derived in silico from a database of possible proteins and identifying the most likely peptide candidates. But sometimes these algorithms spit-out incorrect PSMs - especially when working with large datasets and novel peptide sequences, and currently there are no established ways to evaluate this.
However, the degree to which peptides interact with the LC system before entering the MS chamber, called the ‘observed’ peptide retention time (RT), also gives us valuable peptide information that has not been widely employed as part of the peptide identification process. CPTAC researchers found that if the RT of a peptide can be predicted based on its sequence, that information can be compared with the observed RT to act as an internal PSM quality checker and help to reduce false peptide identification. The AutoRT software was developed to accurately predict peptide RTs and compare them with observed RTs to determine the PSM quality in a given experiment, giving researchers more confidence in their findings.
Researchers applied AutoRT to systematically compare different quality control strategies for variant peptide identification from three large-scale experimental data sets, generated on label-free, tandem mass tag (TMT), and isobaric tags for relative and absolute quantification (iTRAQ) platforms. The analysis of the resultant 57 million spectra demonstrated that global false discovery rate (FDR) control followed by PepQuery validation offered the highest sensitivity while identifying high quality variant peptides.
CPTAC researchers further applied the software and optimized strategy to the discovery of tumor neoantigens in a streamlined computational workflow named NeoFlow. NeoFlow integrates whole exon sequencing (WES) and MS/MS proteomics data to discover possible neoantigens. Neoantigen discovery is typically done through genomic-based data, and enhanced with proteomics, however, many questions arise on the best way to control the quality of variant peptide identification using this method. Using AutoRT, researchers were able to proteogenomically increase the sensitivity and reliability of putative neoantigen discovery.
The AutoRT software provides insights and practical information to guide method selection for future proteogenomic studies, demonstrating optimization of quality control strategies in large-scale data sets.