Questions & Answers

Clinical Proteomic Technology Assessment for Cancer (CPTAC) program


1. What is "proteomics"?

2 .How does studying the proteome compare to studying the genome?

3. What are some of the technologies used in proteomics research?

4. What are the challenges with using MS?

5. What does NCI hope will be accomplished at the end of this program?




  1. What is "proteomics"?

    The term 'proteome' was first used in 1994 to refer to all the proteins in a cell, tissue, or organism. Proteomics refers to the study of the proteome. Because proteins are involved in almost all biological activities, including disease, the proteome is a critical target for understanding how disease arises and how to prevent it.

    Protein scientists pursue many avenues of inquiry about proteins, working to determine their function and amino acid sequence; their three-dimensional structure; how the addition of sugars, phosphates, or fats affects protein function; and how proteins interact with other molecules, including other proteins. Some researchers focus on the proteins present in particular parts of the cell such as the outer cell membrane, the nucleus, the cytoplasm (the region of the cell outside the nucleus), or the nuclear membrane; others analyze protein-protein interactions in a particular cell or organism; some study the differences between the proteins present in diseased vs. healthy cells.

  2. How does studying the proteome compare to studying the genome?

    The total number of proteins in human cells is estimated to be between 250 thousand to one million, of which only a small percentage has been sequenced or identified. The complete proteome has not been characterized for any organism. In contrast, the genome (the entire set of genes) for several organisms has been sequenced, including humans. The human genome is estimated to contain about 21,000 protein-encoding genes.

    Besides the disparity in numbers of proteins and genes, another important difference between the genome and proteome is that the genome is static and relatively unchanged from day to day. Cellular proteins, on the other hand, are continually moving and undergoing changes such as binding to a cell membrane, partnering with other proteins, gaining or losing additional chemical groups, or breaking into two or more pieces.

    Several other properties of proteins add to the complexity: proteins and/or modified proteins can vary among individuals, between cell types, and even within the same cell under different conditions. One gene can encode more than one protein (even up to 1,000) and one protein can be modified in multiple ways, which may change its behavior. This can happen when the cell uses a single gene DNA template to produce several different messenger RNAs, which are then used as templates to make different proteins, or it may happen when a protein is modified by cellular processes after it is created. The quantity of different proteins can also vary greatly. For example, in human blood, the concentration of the protein albumin is more than a billion times greater than another protein, interleukin-6. This makes it difficult to measure such "low-abundance" proteins, many of which may have direct relation to cancer or other diseases. The wide difference in relative abundance of different proteins in the blood or other fluids is known as the "dynamic range."

  3. What are some of the technologies used in proteomics research?

    Traditionally, proteomics experiments have been done using two-dimensional gel electrophoresis (2GDE), a process by which large mixtures of proteins are separated by electrical charge and size. In the first dimension, the proteins migrate through a gel-like substance until they are separated by their charge; for the second dimension, they are transferred to a second semi-solid gel and are separated by size. The advantage of this method is that a large number (3,000 to 10,000) of proteins can be visually separated. The drawback is that certain kinds of proteins such as membrane proteins, proteins present in very small amounts, or very large or very small proteins are difficult or impossible to visualize by 2GDE.

    Over the last ten years mass spectrometry (MS) has become the method of choice for analyses of complex protein samples. MS is a technique that measures two properties: the mass-to-charge ratio (m/z) of a mixture of ions (particles with an electric charge) in the gas phase under vacuum; and the number of ions present at each m/z value. The end product is a mass spectrum or chart with a series of spiked peaks, each representing the ion or charged protein fragment present in a given sample. The height of the peak is related to the abundance of the protein fragment. The size of the peaks and the distance between them constitute a "fingerprint" of the sample and provide a clue to its identity.

    A potential advantage of mass spectrometry (MS) over other technologies for detecting and monitoring subtle changes in substances in the body is the ability to measure rapidly and inexpensively thousands of elements in a few drops of blood. Unlike 2GDE, MS patterns generated from the thousands of proteins present in blood are difficult to analyze visually. However, the powerful computational ability of today's computers makes it possible to analyze MS spectra rapidly and distinguish subtle differences in patterns between diseased and healthy people.

    Mass spectrometry-based proteomics analysis is extremely rapid. The entire process, from collecting blood to analyzing the MS spectrum, can occur in less than one minute. In addition, hundreds of samples can be analyzed sequentially, and extremely small amounts of protein can be detected.

  4. What are the challenges with using MS?

    As a result of the rapid emergence of MS, proteomics data are being collected at a faster pace than the ability of the researchers to validate, interpret, and integrate them with other known data. Despite many claims for the discovery of cancer-related proteins or "biomarkers," it has proven very difficult to reproduce and validate results across either laboratories/institutions or technology platforms. To that end, it is essential to assess thoroughly the various MS platforms to understand their capabilities and limitations in rigorously and reproducibly identifying proteins and peptides relevant to cancer. In addition, new, more robust software tools are needed in all areas of data analysis, including data collection, storage, searching, analysis, classification, management, archiving, and retrieval. These needs are the basis for the CPTAC awards.

    In addition, the CPTAC program is working towards a shared set of reference reagents and bioinformatics resources for use with the various technology platforms. To that end, NCI is working with several partners, including:

    National Institute of Standards and Technology (NIST). NCI has entered into an Interagency Agreement with NIST to develop mass spectrometry assessment materials to be used by the CPTAC teams. These materials, designed to assess the performance metrics of various instruments, will be the first of their kind developed by the NCI and will help to evaluate and compare existing proteomic technologies. Specifically, NIST is creating and validating a 20-protein mixture to be used as a reference for proteomic analyses. Proteins in this reference mixture span the dynamic range of the plasma proteome, and many have been associated with cancer processes. Each protein will be characterized for multiple properties, and the mixture will serve as a validated, reproducible resource to support cancer proteomics experiments.

    Argonne National Laboratories (ANL). NCI has formed an Interagency Agreement with ANL to produce ~100 well-characterized cancer-related proteins for use in antibody production, affinity capture technology development, and creation of reagent protein standards.

  5. What does NCI hope will be accomplished at the end of this program?

    The goal is to have comparable, reliable and reproducible results in protein measurement, particularly those proteins and peptides that might be relevant to cancer. To do this, we want to arrive at a point where we know how the technologies work and where the sources of variability are, so they can be accounted for in experimentation going forward. If we can reach this point, then we can be certain that proteomic changes we see associated with different cancers are biological and not technical in nature.


A Service of the National Cancer Institute
National Cancer InstituteDepartment of Health and Human ServicesNational Institutes of HealthFirstGov.gov