Best Performers Announced for the NCI-CPTAC DREAM Proteogenomics Computational Challenge

The National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) is pleased to announce that teams led by Jaewoo Kang (Korea University), and Yuanfang Guan with Hongyang Li (University of Michigan) as the best performers of the NCI-CPTAC DREAM Proteogenomics Computational Challenge. Over 500 participants from 20 countries registered for the Challenge, which offered $25,000 in cash awards contributed by the NVIDIA Foundation through its Compute the Cure initiative.

The Challenge used proteogenomic data sets generated by CPTAC to answer fundamental questions about how different levels of biological signal (DNA, RNA, protein) relate to one another. The reason is that cancer is driven by genomic aberrations that manifest themselves in changes in the structure and abundance of proteins, the functional gene products. Characterization of alterations in the proteome has the promise to shed new light into cancer development and may be used for development of biomarkers and therapeutics. However, measuring the proteome is still challenging even with the recent rapid technology developments in mass spectrometry that are enabling deep proteomics analysis, and it would be much cheaper and easier if it was possible to simply measure for example mRNA levels and predict the protein levels with high accuracy.

The NCI-CPTAC DREAM Proteogenomics Challenge, a community-based collaborative competition, evaluated how well these predictions can be done in breast and ovarian cancer. To speed their progress, participating researchers used NVIDIA Tesla GPU accelerators through Google Cloud. These efforts highlighted the use of computational tools in cancer research to extract information from the cancer proteome and to understand the association between the genome, transcriptome and proteome in tumors.

Best Performing Teams

Jaewoo Kang, Professor of Computer Science at Korea University College of Informatics in South Korea, along with Sunkyu Kim and Heewon Lee, won the $5,000 cash award for the first sub-challenge that focused on imputing missing protein levels from known protein abundances. From the given matrix, Kang and his team factorized the protein level matrix into protein and sample matrices to capture the inherent characteristics of proteins and samples respectively. Then, the team multiplied the two matrices to form a dense approximation matrix to recover the missing protein levels.

Best Performing TeamsYuanfang Guan, Assistant Professor of Computational Medicine and Bioinformatics, and her student Hongyang Li from the University of Michigan won a $20,000 cash award for both the second and third sub-challenges. The second sub-challenge focused on predicting protein abundances from mRNA and genetic data. The third sub-challenge predicted phosphoprotein levels from mRNA, genetic and proteomic data. From the given matrices, Guan and Li reduced the effect of noise that can be generated when proteogenomic data sets are assembled from various sources like different measurement techniques and patient cohorts. Guan and Li synchronized the multi-source data and created a model inspired by biological principles. By harnessing the central dogma of biology and the universal interactions among biomolecules, Guan and Li integrated these ideas into machine learning models.

Top performers for the first sub-challenge also include Jeremy Jacobsen from the University of Colorado Boulder and Jingyi Jessica Li and Xinzhou Ge from the University of California along with Kexin Li from Tsinghua University located in China. Top performers for the second sub-challenge also include the Jaewoo Kang, Sunkyu Kim, and Heewon Lee from Korea University, Han Yu from the State University of New York at Buffalo, Bora Lee from Deargen Incorporated in South Korea, and Eunji Heo and Seohui Bae of Korea Institute of Science and Technology. Top performers for the third sub-challenge include Jan Kaczmarczyk, Piotr Stępniak, and Michał Warchoł Ardigen of Ardigen located in Poland.

Next Steps

The Challenge’s best performers will move into the community phase of the project where they will collaborate with each other to assess their methods and devise a better solution. In this phase, Kang and his team plan to leverage existing biological knowledge about proteins to improve the prediction performance. “We hope that our method can be a useful addition to the community to help bridge the gap between transcriptomics and proteomics,” Kang said.

As a professor, Guan intends to educate her students about her winning models. According to Guan, Challenges that are transparent and fair help students establish a correct value system by rewarding hard work and innovation. “I use these models to teach how things work and use the process of creating these models to teach how to make things work,” Guan said.

Funders and Sponsors

NCI CPTAC is the main sponsor of the NCI-CPTAC DREAM Proteogenomics Challenge. The NVIDIA Foundation provided a $25,000 contribution for cash awards.

The NVIDIA Foundation is the charitable arm of Silicon Valley-based NVIDIA Corporation. The foundation accelerates solutions to the world’s most pressing issues in health and education. Compute the Cure, its signature philanthropic initiative, aims to advance the fight against cancer by funding cancer researchers using innovative computing techniques to accelerate their search for a cure, supporting nonprofits providing patient care and support services, and engaging NVIDIA employees around the world in raising funds for cancer-focused organizations. Through grants and employee fundraising, the NVIDIA Foundation has donated more than $4 million to cancer causes.

About DREAM Challenges

First conceived by IBM in 2006, DREAM Challenges have addressed objectives that range from predictive models for disease progression to developing models for cell signaling networks. Designed and run by a community of researchers, DREAM Challenges invite participants to propose solutions, fostering collaboration and building communities in the process. The DREAM Challenges community shares a vision of open collaboration to leverage the “wisdom of the crowd” to improve human health and sciences.

The DREAM Challenges alongside Epidemium are hosting a satellite event prior to the RECOMB 2018 conference in Paris, France. The event will focus on proteogenomics and data sharing among other topics as well as share highlights from the NCI-CPTAC Proteogenomics DREAM Challenge.