Scientific Bibliography


Research Articles

2006   2005   2004

[ expand all abstracts ] [ collapse all abstracts ] [ = CPTI Mouse Models ]

2006

Biomarkers for cancer screening, diagnosis, and treatment: a systems approach.
Hartwell L, Mankoff D, Paulovich A, Ramsey S, and Swisher E.
Nature Biotechnology.
2006 August.

[ expand abstract ]

Biomarkers measured in a variety of patient samples, including blood, tissue, urine and cerebrospinal fluid, are used in a diverse array of clinical settings. Although many successful biomarkers have been developed to date, advances in genetics and proteomics promise to usher in a new era of abundant, informative biomarkers that could transform the application of molecular biology to human disease. The application of biomarkers to cancer is leading the way because of the unique association of genomic changes in cancer cells with the disease process. Consequently, DNA-based biomarkers are already becoming incorporated into routine patient management and are providing lessons on the value added by appropriate diagnostic tests. Moreover, cancer management illustrates the complexity of the disease process, which can potentially be distinguished through appropriate biomarkers applied to different individuals, different types of disease, the progression of disease states and the multi-step nature of cancer treatment.

Scenarios for the use of biomarker-based diagnostics for cancer include the following: risk assessment, noninvasive screening for early-stage disease, detection and localization, disease stratification and prognosis, response to therapy and, for those in remission, screening for disease recurrence. Cost and potential morbidity increase as we progress along this continuum (Fig. 1). Our goals in applying diagnostic tests are (i) to identify persons harboring potentially life-threatening cancers at the earliest stage possible, (ii) to avoid false-positive tests and diagnosing of cancers that would otherwise not threaten a person's well-being to avoid psychological stress and unnecessary treatments, and (iii) to minimize the overall cost of the program. It is unlikely, however, that any single test will perfectly meet all of these goals.

A statistical method for chromatographic alignment of LC-MS data.
Wang P, Coram M, Tang H, Fitzgibbon M, Zhang H, Yi E, Aebersold, and McIntosh M. Biostatistics. 2006 August 2.
[ expand abstract ]

Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific masscharge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments.

In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time-warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.

Back to Top

Adenomatous polyposis coli (APC) is required for normal development of skin and thymus.
Kuraguchi M, Wang X, Bronson R, Rothenberg R, Ohene-Baah N, Lund J, Kucherlapati M, Maas R, and Kucherlapati R.
PLOS Genetics.
2006 July 28.

[ expand abstract ]

The tumor suppressor gene Apc (adenomatous polyposis coli) is a member of the Wnt signaling pathway that is involved in development and tumorigenesis. Heterozygous knockout mice for Apc have a tumor predisposition phenotype and homozygosity leads to embryonic lethality. To understand the role of Apc in development we generated a floxed allele. These mice were mated with a strain carrying Cre recombinase under the control of the human Keratin 14 (K14) promoter, which is active in basal cells of epidermis and other stratified epithelia. Mice homozygous for the floxed allele that also carry the K14-cre transgene were viable but had stunted growth and died before weaning. Histological and immunochemical examinations revealed that K14-cre mediated Apc loss resulted in aberrant growth in many ectodermally derived squamous epithelia including hair follicles, teeth and oral and corneal epithelia. In addition, squamous metaplasia was observed in various epithelial-derived tissues including the thymus. The aberrant growth of hair follicles and other appendages as well as the thymic abnormalities in K14-cre; ApcCKO/CKO mice suggest Apc gene is crucial in embryonic cells to specify epithelial cell fates in organs that require epithelial-mesenchymal interactions for their development.

General framework for developing and evaluating database scoring algorithms using the TANDEM search engine.
MacLean, B, Eng J, Beavis R, and McIntosh M.
Bioinformatics.
2006 July 28.

[ expand abstract ]

MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra to a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Supplementary materials, including source code for the scoring functions, are available from http://proteomics.fhcrc.org.

Back to Top

A reagent resource to identify proteins and peptides of interest to the cancer community: A workshop report.
Haab B, Paulovich A, Anderson N, Clark A, Downing G, Hermjakob H, Labaer J, and Uhlen M.
Molecular and Cellular Proteomics.
2006 Jul 24.

[ expand abstract ]

On the basis of discussions with representatives from all sectors of the cancer research community, the NCI recognizes the immense opportunities to apply proteomic technologies to further cancer research. Validated and well-characterized affinity capture reagents (e.g., antibodies, aptamers, affibodies) will play a key role in proteomic research platforms for the prevention, early detection, treatment, and monitoring of cancer. To discuss ways to develop new resources and optimize current opportunities in this area, the National Cancer Institute (NCI) convened the "Proteomic Technologies Reagents Resource Workshop" in Chicago, IL on December 12-13, 2005. The workshop brought together leading scientists in proteomic research to discuss model systems for evaluating and delivering resources for reagents to support mass spectrometry (MS) and affinity capture platforms. Speakers discussed issues and identified action items related to an overall vision for and proposed models for a shared proteomics reagents resource, applications of affinity capture methods in cancer research, quality control and validation of affinity capture reagents, considerations for target selection, and construction of a reagents database. The meeting also featured presentations and discussion from leading private-sector investigators on state-of-the-art technologies and capabilities to meet the user community's needs. This workshop was developed as a component of the NCI's Clinical Proteomics Technologies Initiative for Cancer (CPTI ) a coordinated initiative that includes the establishment of reagent resources for the scientific community. This workshop report explores various approaches to develop a framework that will most effectively fulfill the needs of the NCI and the cancer research community.

Analysis of Acrylamide Labeled Serum Proteins by LC-MS/MS.
Vitor F, Coram M, Phanstiel D, Glukhova V, Zhang Q, Fitzgibbon M, McIntosh M, and Hanash S.
Journal of Proteome Research.
2006 July 13.

[ expand abstract ]

Isotopic labeling of cysteine residues with acrylamide was previously utilized for relative quantitation of proteins by MALDI-TOF. Here, we explored and compared the application of deuterated and (13) C isotopes of acrylamide for quantitative proteomic analysis using LC-MS/MS and high-resolution FTICR mass spectrometry. The method was applied to human serum samples that were immunodepleted of abundant proteins. Our results show reliable quantitation of proteins across an abundance range that spans 5 orders of magnitude based on ion intensities and known protein concentration in plasma. The use of (13)C isotope of acrylamide had a slightly greater advantage relative to deuterated acrylamide, because of shifts in elution of deuterated acrylamide relative to its corresponding nondeuterated compound by reversed-phase chromatography. Overall, the use of acrylamide for differentially labeling intact proteins in complex mixtures, in combination with LC-MS/MS provides a robust method for quantitative analysis of complex proteomes.

Back to Top

Quality control metrics for LC-MS feature detection tools demonstrated on Saccharomyces cerevisiae proteomic profiles.
Piening B, Wang P, Bangur C, Whiteaker J, Zhang H, Feng L-C, Keane J, Eng J, Tang H, Prakash A, McIntosh M, and Paulovich A.
Journal of Proteome Research.
2006 July;5(7):1527-1534.

[ expand abstract ]

Quantitative proteomic profiling using liquid chromatography-mass spectrometry is emerging as an important tool for biomarker discovery, prompting development of algorithms for high-throughput peptide feature detection in complex samples. However, neither annotated standard data sets nor quality control metrics currently exist for assessing the validity of feature detection algorithms. We propose a quality control metric, Mass Deviance, for assessing the accuracy of feature detection tools. Because the Mass Deviance metric is derived from the natural distribution of peptide masses, it is machine-and proteome-independent and enables assessment of feature detection tools in the absence of completely annotated data sets. We validate the use of Mass Deviance with a second, independent metric that is based on isotopic distributions, demonstrating that we can use Mass Deviance to identify aberrant features with high accuracy. We then demonstrate the use of independent metrics in tandem as a robust way to evaluate the performance of peptide feature detection algorithms. This work is done on complex LC-MS profiles of Saccharomyces cerevisiae which present a significant challenge to peptide feature detection algorithms.

Mass Spectrometry-Based Study of the Plasma Proteome in a Mouse Intestinal Tumor Model.
Hung K, Kho A, Sarracino D, Georgeon R, Krastins B, Forrester S, Haab B, Kohane I, and Kucherlapati R.
Journal of Proteome Research.
2006 June 27.

[ expand abstract ]

Early detection of cancer can greatly improve prognosis. Identification of proteins or peptides in the circulation, at different stages of cancer, would greatly enhance treatment decisions. Mass spectrometry (MS) is emerging as a powerful tool to identify proteins from complex mixtures such as plasma that may help identify novel sets of markers that may be associated with the presence of tumors. To examine this feature we have used a genetically modified mouse model, Apc(Min), which develops intestinal tumors with 100% penetrance. Utilizing liquid chromatography-tandem mass spectrometry (LC-MS/MS), we identified total plasma proteome (TPP) and plasma glycoproteome (PGP) profiles in tumor-bearing mice. Principal component analysis (PCA) and agglomerative hierarchial clustering analysis revealed that these protein profiles can be used to distinguish between tumor-bearing Apc(Min) and wild-type control mice. Leave-one-out cross-validation analysis established that global TPP and global PGP profiles can be used to correctly predict tumor-bearing animals in 17/19 (89%) and 19/19 (100%) of cases, respectively. Furthermore, leave-one-out cross-validation analysis confirmed that the significant differentially expressed proteins from both the TPP and the PGP were able to correctly predict tumor-bearing animals in 19/19 (100%) of cases. A subset of these proteins was independently validated by antibody microarrays using detection by two color rolling circle amplification (TC-RCA). Analysis of the significant differentially expressed proteins indicated that some might derive from the stroma or the host response. These studies suggest that mass spectrometry-based approaches to examine the plasma proteome may prove to be a valuable method for determining the presence of intestinal tumors.

Back to Top

Compression of LC/MS Proteomic Data.
Miguel A, Keane J, Whiteaker J, Zhang H, and Paulovich A.
Proceedings of the 19th IEEE International Symposium on Computer-Based Medical Systems.
Conference held 2006 June 22-23;925-930.

[ expand abstract ]

The unrelenting growth ofmass spectrometry (MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. The data for this study was derived from peptides of hand-mixed protein samples passed through a high performance liquid chromatography system (HPLC) and an electrospray ionization time-of-flight (ESI-TOF) mass spectrometer. Several lossless data compression methods were applied and yielded up to a 25:1 compression ratio relative to the original files containing base64 encoding of the data.

A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin CW, Chen J, Goodlett D, Whiteaker J, Paulovich A, and McIntosh M.
Bioinformatics.
2006 June 9.

[ expand abstract ]

MOTIVATION: Comparing two or more complex protein mixtures using liquid chromatography mass spectrometry (LC-MS) requires multiple analysis steps to locate and quantitate natural peptides within a single experiment and to align and normalize findings across multiple experiments. RESULTS: We describe msInspect, an open-source application comprising algorithms and visualization tools for the analysis of multiple LC-MS experimental measurements. The platform integrates novel algorithms for detecting signatures of natural peptides within a single LC-MS measurement and combines multiple experimental measurements into a peptide array, which may then be mined using analysis tools traditionally applied to genomic array analysis. The platform supports quantitation by both label-free and isotopic labeling approaches. The software implementation has been designed so that many key components may be easily replaced, making it useful as a workbench for integrating other novel algorithms developed by a growing research community. AVAILABILITY: The msInspect software is distributed freely under an Apache 2.0 license. The software as well as a Zip file with all peptide feature files and scripts needed to generate the tables and figures in this article are available at http://proteomics.fhcrc.org/.

Back to Top

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.
Fermin D, Allen B, Blackwell T, Menon R, Adamski M, Xu Y, Ulintz P, Omenn GS, and States D.
Genome Biology.
2006 May;7(4):R35.

[ expand abstract ]

Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study.
States D, Omenn G, Blackwell T, Fermin D, Eng J, Speicher D, and Hanash S.
Nature Biotechnology.
2006 March; 24(3): 333-338.

[ expand abstract ]

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously non-annotated gene sequences.

Back to Top

Computational Proteomics Analysis System (CPAS): An extensible open source analytic system for evaluating and publishing proteomic data and high throughput biological experiments.
Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin C, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich P, and McIntosh M.
Journal of Proteome Research.
2006 Jan-Feb; 5(1): 112-21.

[ expand abstract ]

The open-source Computational Proteomics Analysis System (CPAS) contains an entire data analysis and management pipeline for Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) proteomics, including experiment annotation, protein database searching and sequence management, and mining LC-MS/MS peptide and protein identifications. CPAS architecture and features, such as a general experiment annotation component, installation software, and data security management, make it useful for collaborative projects across geographical locations and for proteomics laboratories without substantial computational support.

Normalization regarding non-random missing values in high-throughput mass spectrometry data.
Wang P, Tang H, Zhang H, Whiteaker J, Paulovich A, and McIntosh M.
Proceedings of the Pacific Symposium on Biocomputing.

Conference held Jan 3-7 2006; 11: 315-326.

[ expand abstract ]

We propose a two-step normalization procedure for a high-throughput mass spectrometry (MS) data, which is a necessary step in biomarker clustering or classification. First, a global normalization step is used to remove sources of systematic variation between MS profiles due to, for instance, varying amounts of sample degradation over time. A probability model is then used to investigate the intensity-dependent missing events and provides possible substitutions for the missing values. We illustrate the performance of the method wit ha LC-MS data set of synthetic protein mixtures.

Back to Top

2005

Two-dimensional electrophoresis database of fluorescence-labeled proteins of colon cancer cells.
Mori Y, Kondo T, Yamada T, Tsuchida A, Aoki T, Hirohashi S.
J Chromatogr B Analyt Technol Biomed Life Sci.
2005 Sep 5;823(2):82-97.

[ expand abstract ]

We constructed a novel database of the proteome of DLD-1 colon cancer cells by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) of fluorescence-labeled proteins followed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF) analysis. The database consists of 258 functionally categorized proteins corresponding to 314 protein spots. The majority of the proteins are oxidoreductases, cytoskeletal proteins and nucleic acid binding proteins. Phosphatase treatment showed that 28% of the protein spots on the gel are phosphorylated, and mass spectrometric analysis identified 21 of them. Proteins of DLD-1 cells and of laser-microdissected colon cancer tissues showed similar distribution on 2D gels, suggesting the utility of our database for clinical proteomics.

Back to Top

Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.
Elias JE, Haas W, Faherty BK, Gygi SP.
Nat Methods.
2005 Sep;2(9):667-75.

[ expand abstract ]

Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.

A streamlined platform for high-content functional proteomics of primary human specimens.
Jessani N, Niessen S, Wei BQ, Nicolau M, Humphrey M, Ji Y, Han W, Noh DY, Yates JR, Jeffrey SS, Cravatt BF.
Nat Methods.
2005 Sep;2(9):691-697.

[ expand abstract ]

Achieving information content of satisfactory breadth and depth remains a formidable challenge for proteomics. This problem is particularly relevant to the study of primary human specimens, such as tumor biopsies, which are heterogeneous and of finite quantity. Here we present a functional proteomics strategy that unites the activity-based protein profiling and multidimensional protein identification technologies (ABPP-MudPIT) for the streamlined analysis of human samples. This convergent platform involves a rapid initial phase, in which enzyme activity signatures are generated for functional classification of samples, followed by in-depth analysis of representative members from each class. Using this two-tiered approach, we identified more than 50 enzyme activities in human breast tumors, nearly a third of which represent previously uncharacterized proteins. Comparison with cDNA microarrays revealed enzymes whose activity, but not mRNA expression, depicted tumor class, underscoring the power of ABPP-MudPIT for the discovery of new markers of human disease that may evade detection by other molecular profiling methods.

Back to Top

POET: Using proteomics to screen pools of open reading frames for protein expression.
Gillette WK, Esposito D, Frank PH, Zhou M, Yu LR, Jozwik C, Zhang X, McGowan B, Jacobowitz DM, Pollard HB, Hao T, Hill DE, Vidal M, Conrads TP, Veenstra TD, Hartley JL.
Mol Cell Proteomics.
2005 Aug 19; [Epub ahead of print].

[ expand abstract ]

We have developed a pooled ORF (open reading frame) expression technology, POET, that uses recombinational cloning and proteomics methods (two dimensional gel electrophoresis and mass spectrometry) to identify ORFs that when expressed are likely to yield high levels of soluble, purified protein. Because the method works on pools of ORFs, the procedures needed to subclone, express, purify, and assay protein expression for hundreds of clones are greatly simplified. From a pool of 688 C. elegans ORFs expressed in E. coli, small scale expression and purification of 12 positive clones identified by POET yielded on average 6 times as much protein as negative clones. Larger scale expression and purification of 6 of the positive clones yielded 47 to 374 mg of purified protein per liter. POET pools of ORFs can be constructed, and the pools of the resulting proteins can be analyzed and manipulated, to rapidly acquire information about the attributes of hundreds proteins simultaneously.

Robust Accurate Identification of Peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics.
Alves G, Yu YK.
Bioinformatics.
2005 Aug 16; [Epub ahead of print].

[ expand abstract ]

MOTIVATION: The key to mass-spectrometry-based proteomics is peptide sequencing. The major challenge in peptide sequencing, whether library search or de novo, is to better infer statistical significance and better attain noise reduction. Because the noise in a spectrum depends on experimental conditions, the instrument used, and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. We wish to overcome such issues. RESULTS: We design RAId to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Through de novo sequencing, we establish the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences become potential candidates for new peptides that are not yet in the database. The use of spectrum-specific background statistics seems to enable RAId to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications. AVAILABILITY: Programs implementing the methods described are available from the authors upon request.

Back to Top

Two-dimensional gel isoelectric focusing.
Stastna M, Slais K.
Electrophoresis.
2005 Aug 15; [Epub ahead of print].

[ expand abstract ]

Two-dimensional gel isoelectric focusing (2-D gel IEF) is presented as the combination of the same separation method used consecutively in two directions of the same gel. In this new method, after completion of IEF process in the first dimension the gel was cut into the separate strips, each containing selected analytes together with the appropriate part of the original broad pH gradient, and the strips were rotated by 90 degrees (with regard to the first IEF) and left to diffuse overnight. After diffusion the strips were subjected to the second IEF. During the second IEF, the corresponding narrow part of pH gradient in each strip was restored again, however, now along the strip. The progress of the separation process can be monitored visually by using colored low-molecular-weight isoelectric point (pI) markers loaded into the gel simultaneously with proteins. The unique properties of IEF, focusing and resolution power were enhanced by using the same technique twice. Two forms of beta-lactoglobulin (pI values 5.14 and 5.31, respectively) nonseparated in the first IEF were successfully separated in the second dimension at relatively low voltage (330 V) with the resolution power comparable to the high-resolution gels requiring the high voltage during the run and long separation time. Glucose oxidase loaded as diluted solution into ten positions across the gel was finally focused into a single band during 2-D gel IEF. Since the first and second IEF are carried out on the same gel, no losses and contamination of analyte occur. The suggested method can be used for separation/fractionation of complex biological mixtures, similarly as other multidimensional separation techniques applied in proteomics, and can be followed by further processing, e.g., mass spectrometry analysis. The focusing properties of IEF could be useful especially in separation of mixtures, where components are at low concentration levels.

Sample handling for mass spectrometric proteomic investigations of human sera.
West-Nielsen M, Hogdall EV, Marchiori E, Hogdall CK, Schou C, Heegaard NH.
Anal Chem.
2005 Aug 15;77(16):5114-23.

[ expand abstract ]

Proteomic investigations of sera are potentially of value for diagnosis, prognosis, choice of therapy, and disease activity assessment by virtue of discovering new biomarkers and biomarker patterns. Much debate focuses on the biological relevance and the need for identification of such biomarkers while less effort has been invested in devising standard procedures for sample preparation and storage in relation to model building based on complex sets of mass spectrometric (MS) data. Thus, development of standardized methods for collection and storage of patient samples together with standards for transportation and handling of samples are needed. This requires knowledge about how sample processing affects MS-based proteome analyses and thereby how nonbiological biased classification errors are avoided. In this study, we characterize the effects of sample handling, including clotting conditions, storage temperature, storage time, and freeze/thaw cycles, on MS-based proteomics of human serum by using principal components analysis, support vector machine learning, and clustering methods based on genetic algorithms as class modeling and prediction methods. Using spiking to artificially create differentiable sample groups, this integrated approach yields data that--even when working with sample groups that differ more than may be expected in biological studies--clearly demonstrate the need for comparable sampling conditions for samples used for modeling and for the samples that are going into the test set group. Also, the study emphasizes the difference between class prediction and class comparison studies as well as the advantages and disadvantages of different modeling methods.

Back to Top

Investigating diversity in human plasma proteins.
Nedelkov D, Kiernan UA, Niederkofler EE, Tubbs KA, Nelson RW.
Proc Natl Acad Sci U S A.
2005 Aug 2;102(31):10852-7.

[ expand abstract ]

Plasma proteins represent an important part of the human proteome. Although recent proteomics research efforts focus largely on determining the overall number of proteins circulating in plasma, it is equally important to delineate protein variations among individuals, because they can signal the onset of diseases and be used as biological markers in diagnostics. To date, there has been no systematic proteomics effort to characterize the breadth of structural modifications in individual proteins in the general population. In this work, we have undertaken a population proteomics study to define gene- and protein-level diversity that is encountered in the general population. Twenty-five plasma proteins from a cohort of 96 healthy individuals were investigated through affinity-based mass spectrometric assays. A total of 76 structural forms/variants were observed for the 25 proteins within the samples cohort. Posttranslational modifications were detected in 18 proteins, and point mutations were observed in 4 proteins. The frequency of occurrence of these variations was wide-ranged, with some modifications being observed in only one sample, and others detected in all 96 samples. Even though a relatively small cohort of individuals was investigated, the results from this study illustrate the extent of protein diversity in the human population and can be of immediate aid in clinical proteomics/biomarker studies by laying a basal-level statistical foundation from which protein diversity relating to disease can be evaluated.

Proteomic characterization of the angiogenesis inhibitor SU6668 reveals multiple impacts on cellular kinase signaling.
Godl K, Gruss OJ, Eickhoff J, Wissing J, Blencke S, Weber M, Degen H, Brehmer D, Orfi L, Horvath Z, Keri G, Muller S, Cotten M, Ullrich A, Daub H.
Cancer Res.
2005 Aug 1;65(15):6919-26.

[ expand abstract ]

Knowledge about molecular drug action is critical for the development of protein kinase inhibitors for cancer therapy. Here, we establish a chemical proteomic approach to profile the anticancer drug SU6668, which was originally designed as a selective inhibitor of receptor tyrosine kinases involved in tumor vascularization. By employing immobilized SU6668 for the affinity capture of cellular drug targets in combination with mass spectrometry, we identified previously unknown targets of SU6668 including Aurora kinases and TANK-binding kinase 1. Importantly, a cell cycle block induced by SU6668 could be attributed to inhibition of Aurora kinase activity. Moreover, SU6668 potently suppressed antiviral and inflammatory responses by interfering with TANK-binding kinase 1-mediated signal transmission. These results show the potential of chemical proteomics to provide rationales for the development of potent kinase inhibitors, which combine rather unexpected biological modes of action by simultaneously targeting defined sets of both serine/threonine and tyrosine kinases involved in cancer progression.

Back to Top

Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides.
Beynon RJ, Doherty MK, Pratt JM, Gaskell SJ.
Nat Methods.
2005 Aug;2(8):587-9.

[ expand abstract ]

Absolute quantification in proteomics usually involves simultaneous determination of representative proteolytic peptides and stable isotope-labeled analogs. The principal limitation to widespread implementation of this approach is the availability of standard signature peptides in accurately known amounts. We report the successful design and construction of an artificial gene encoding a concatenation of tryptic peptides (QCAT protein) from several chick (Gallus gallus) skeletal muscle proteins and features for quantification and purification.

Analysis of candidate genes through a proteomics-based approach in primary cell lines from malignant melanomas and their metastases.
Carta F, Demuro PP, Zanini C, Santona A, Castiglia D, D'atri S, Ascierto PA, Napolitano M, Cossu A, Tadolini B, Turrini F, Manca A, Sini MC, Palmieri G, Rozzo AC; on behalf of the Italian Melanoma Intergroup (IMI).
Melanoma Res.
2005 Aug;15(4):235-244.

[ expand abstract ]

Proteomics provides a powerful approach for screening alterations in protein expression and post-translational modification associated with particular human diseases. In this study, the analysis of protein expression was focused on malignant melanoma in order to determine the candidate genes involved in tumour progression. The proteomes of cultured melanocytes and of cell lines from primary and metastatic lesions of one malignant melanoma patient were profiled using two-dimensional electrophoresis (2-DE) and mass spectrometry. Differentially expressed proteins were confirmed by 2-DE and mass spectrometry on an additional four malignant melanoma cell lines. Total RNA from the first subset of cell lines was used for quantitative reverse transcriptase-polymerase chain reaction (RT-PCR) of the candidate genes identified after proteomics analysis. A very high similarity was observed in the 2-DE maps of two malignant melanoma cell lines derived from primary and secondary lesions of the same patient. Mass spectrometry identified 37 proteins which were found to be more abundant in tumour cells in comparison with control melanocytes (as confirmed on additional cell lines), with a relatively high prevalence of stress proteins. Eight candidate genes (PRDX2, HSP27, HSP60, HSPA8, HSP9B, STIP1, PDI and P4HB) were further characterized by evaluating their messenger RNA expression levels through real-time RT-PCR analysis. Overexpression of HSP27, HSP60 and HSPA8 and downregulation of PRDX2 were observed in cells from metastatic malignant melanoma in comparison with those from primary melanoma. Although further investigations with larger numbers of paired normal and tumour samples are needed, our findings strongly suggest that the dysregulation of stress pathways may be involved in melanoma progression.

Back to Top

PRIDE: The proteomics identifications database.
Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R.
Proteomics.
2005 Aug;5(13):3537-45.

[ expand abstract ]

The advent of high-throughput proteomics has enabled the identification of ever increasing numbers of proteins. Correspondingly, the number of publications centered on these protein identifications has increased dramatically. With the first results of the HUPO Plasma Proteome Project being analyzed and many other large-scale proteomics projects about to disseminate their data, this trend is not likely to flatten out any time soon. However, the publication mechanism of these identified proteins has lagged behind in technical terms. Often very long lists of identifications are either published directly with the article, resulting in both a voluminous and rather tedious read, or are included on the publisher's website as supplementary information. In either case, these lists are typically only provided as portable document format documents with a custom-made layout, making it practically impossible for computer programs to interpret them, let alone efficiently query them. Here we propose the proteomics identifications (PRIDE) database (http://www.ebi.ac.uk/pride) as a means to finally turn publicly available data into publicly accessible data. PRIDE offers a web-based query interface, a user-friendly data upload facility, and a documented application programming interface for direct computational access. The complete PRIDE database, source code, data, and support tools are freely available for web access or download and local installation.

Plasma Proteome Database as a resource for proteomics research.
Muthusamy B, Hanumanthu G, Suresh S, Rekha B, Srinivas D, Karthick L, Vrushabendra BM, Sharma S, Mishra G, Chatterjee P, Mangala KS, Shivashankar HN, Chandrika KN, Deshpande N, Suresh M, Kannabiran N, Niranjan V, Nalli A, Prasad TS, Arun KS, Reddy R, Chandran S, Jadhav T, Julie D, Mahesh M, John SL, Palvankar K, Sudhir D, Bala P, Rashmi NS, Vishnupriya G, Dhar K, Reshma S, Chaerkady R, Gandhi TK, Harsha HC, Mohan SS, Deshpande KS, Sarker M, Pandey A.
Proteomics.
2005 Aug;5(13):3531-6.

[ expand abstract ]

Plasma is one of the best studied compartments in the human body and serves as an ideal body fluid for the diagnosis of diseases. This report provides a detailed functional annotation of all the plasma proteins identified to date. In all, gene products encoded by 3778 distinct genes were annotated based on proteins previously published in the literature as plasma proteins and the identification of multiple peptides from proteins under HUPO's Plasma Proteome Project. Our analysis revealed that 51% of these genes encoded more than one protein isoform. All single nucleotide polymorphisms involving protein-coding regions were mapped onto the protein sequences. We found a number of examples of isoform-specific subcellular localization as well as tissue expression. This database is an attempt at comprehensive annotation of a complex subproteome and is available on the web at http://www.plasmaproteomedatabase.org.

Back to Top

Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.
Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H, Apweiler R, Haab BB, Simpson RJ, Eddes JS, Kapp EA, Moritz RL, Chan DW, Rai AJ, Admon A, Aebersold R, Eng J, Hancock WS, Hefta SA, Meyer H, Paik YK, Yoo JS, Ping P, Pounds J, Adkins J, Qian X, Wang R, Wasinger V, Wu CY, Zhao X, Zeng R, Archakov A, Tsugita A, Beer I, Pandey A, Pisano M, Andrews P, Tammen H, Speicher DW, Hanash SM.
Proteomics.
2005 Aug;5(13):3226-45.

[ expand abstract ]

HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anti-coagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics.med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay-based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs.We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches.This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.

Isoelectric focusing in serial immobilized pH gradient gels to improve protein separation in proteomic analysis.
Poznanovic S, Schwall G, Zengerling H, Cahill MA.
Electrophoresis.
2005 Aug;26(16):3185-90.

[ expand abstract ]

We previously demonstrated the separation of proteins by isoelectric focusing (IEF) over pH 4-8 immobilized pH gradients (IPGs) over 54 cm (Poland et al., Electrophoresis 2003, 24, 1271). Here we show that similar results can be conveniently achieved using commercially available IPGs of appropriate pH ranges positioned end-on-end in series during electrophoresis, which we term 'daisy chain IEF'. Proteins efficiently electrophorese from one IPG to another during IEF by traversing buffer-filled porous bridges between the serial IPGs. A variety of materials can function as bridges, including paper, polyacrylamide gels or even IPGs. The quality of two-dimensional (2-D) protein patterns is not apparently worse than that generated by conventional IEF using the same individual IPGs. A major advantage of this method is that sample is consumed efficiently, without the requirement for preliminary steps, such as chamber IEF. This advantage is pronounced when working with extremely limited sources of samples, such as with clinical biopsies or cellular subfractions. The present study was limited by the commercial availability of suitable pH gradients. Proteomics analyses could be further improved if commercial vendors would manufacture IPGs with suitable pH ranges to achieve high resolution ( approximately 100 cm) IEF separation of proteins in one electrophoretic step over the pH range 2-12.

Back to Top

Web-based data warehouse on gene expression in human colorectal cancer.
Sagynaliev E, Steinert R, Nestler G, Lippert H, Knoch M, Reymond MA.
Proteomics.
2005 Aug;5(12):3066-78.

[ expand abstract ]

Based on biomedical literature databases, we tried a first step for constructing a gene expression "data warehouse" specific to human colorectal cancer (CRC). Results of genome-wide transcriptomic research were available from 12 studies, using various technologies, namely, SAGE, cDNA and oligonucleotide arrays, and adaptor-tagged amplification. Three studies analyzed CRC cell lines and nine studies of human samples. The total number of patients was 144. Out of 982 up- or down-regulated genes, 863 (88%) were found to be differentially expressed in a single study, 88 in two studies, 22 in three studies, 7 in four studies, and only 2 genes in six studies. Eight large-scale proteomics studies were published in CRC, using 2-D-, SDS- or free-flow electrophoresis, involving only 11 patients. Out of 408 differentially expressed proteins, 339 (83%) were found to be differentially expressed only in a single study, 16 in three studies, 10 in four studies, 3 in five, and 1 in eight studies. Confirmation at proteome level of results obtained with large-scale transcriptomics studies was possible in 25%. This proportion was higher (67%) for reproducing proteome results using transcriptomics technologies. Obviously, reproducibility and overlapping between published gene expression results at proteome and transcriptome level are low in human CRC. Thus, the development of standardized processes for collecting samples, storing, retrieving, and querying gene expression data obtained with different technologies is of central importance in translational research.

MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences.
Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O.
Nucleic Acids Res.
2005 Jul 25;33(13):4164-71.

[ expand abstract ]

The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.

Back to Top

Processing methods for differential analysis of LC/MS profile data.
Katajamaa M, Oresic M.
BMC Bioinformatics.
2005 Jul 18;6:179.

[ expand abstract ]

BACKGROUND: Liquid chromatography coupled to mass spectrometry (LC/MS) has been widely used in proteomics and metabolomics research. In this context, the technology has been increasingly used for differential profiling, i.e. broad screening of biomolecular components across multiple samples in order to elucidate the observed phenotypes and discover biomarkers. One of the major challenges in this domain remains development of better solutions for processing of LC/MS data. RESULTS: We present a software package MZmine that enables differential LC/MS analysis of metabolomics data. This software is a toolbox containing methods for all data processing stages preceding differential analysis: spectral filtering, peak detection, alignment and normalization. Specifically, we developed and implemented a new recursive peak search algorithm and a secondary peak picking method for improving already aligned results, as well as a normalization tool that uses multiple internal standards. Visualization tools enable comparative viewing of data across multiple samples. Peak lists can be exported into other data analysis programs. The toolbox has already been utilized in a wide range of applications. We demonstrate its utility on an example of metabolic profiling of Catharanthus roseus cell cultures. CONCLUSION: The software is freely available under the GNU General Public License and it can be obtained from the project web page at: http://mzmine.sourceforge.net/.

BRIGEP--the BRIDGE-based genome-transcriptome-proteome browser.
Goesmann A, Linke B, Bartels D, Dondrup M, Krause L, Neuweger H, Oehm S, Paczian T, Wilke A, Meyer F.
Nucleic Acids Res.
2005 Jul 1;33(Web Server issue):W710-6.

[ expand abstract ]

The growing amount of information resulting from the increasing number of publicly available genomes and experimental results thereof necessitates the development of comprehensive systems for data processing and analysis. In this paper, we describe the current state and latest developments of our BRIGEP bioinformatics software system consisting of three web-based applications: GenDB, EMMA and ProDB. These applications facilitate the processing and analysis of bacterial genome, transcriptome and proteome data and are actively used by numerous international groups. We are currently in the process of extensively interconnecting these applications. BRIGEP was developed in the Bioinformatics Resource Facility of the Center for Biotechnology at Bielefeld University and is freely available. A demo project with sample data and access to all three tools is available at https://www.cebitec.uni-bielefeld.de/groups/brf/software/brigep/. Code bundles for these and other tools developed in our group are accessible on our FTP server at ftp.cebitec.uni-bielefeld.de/pub/software/.

Back to Top

DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy.
Halligan BD, Ruotti V, Twigger SN, Greene AS.
Nucleic Acids Res.
2005 Jul 1;33(Web Server issue):W376-81.

[ expand abstract ]

One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.

Metabolic labeling of proteins for proteomics.
Beynon RJ, Pratt JM.
Mol Cell Proteomics.
2005 Jul;4(7):857-72.

[ expand abstract ]

Realization of the advantages of stable isotope labeling for proteomics has emerged gradually. However, many stable isotope label approaches rely on labeling in vitro using complex and sometimes expensive reagents. This review discusses strategies for labeling protein in vivo through metabolic incorporation of label into protein. This approach has many advantages, is particularly suited to single cells grown in culture (prokaryotic or eukaryotic), but is nonetheless subject to a number of complicating factors that must be controlled so that meaningful experiments can be conducted. Confounding issues include the metabolic lability of the amino acid precursor, incomplete labeling, and the role of protein turnover in labeling kinetics. All of these are controllable, provided that appropriate precautions are adopted.

Back to Top

Proteomic analysis of redox- and ErbB2-dependent changes in mammary luminal epithelial cells using cysteine- and lysine-labelling two-dimensional difference gel electrophoresis.
Chan HL, Gharbi S, Gaffney PR, Cramer R, Waterfield MD, Timms JF.
Proteomics.
2005 Jul;5(11):2908-26.

[ expand abstract ]

Differential protein expression analysis based on modification of selected amino acids with labelling reagents has become the major method of choice for quantitative proteomics. One such methodology, two-dimensional difference gel electrophoresis (2-D DIGE), uses a matched set of fluorescent N-hydroxysuccinimidyl (NHS) ester cyanine dyes to label lysine residues in different samples which can be run simultaneously on the same gels. Here we report the use of iodoacetylated cyanine (ICy) dyes (for labelling of cysteine thiols, for 2-D DIGE-based redox proteomics. Characterisation of ICy dye labelling in relation to its stoichiometry, sensitivity and specificity is described, as well as comparison of ICy dye with NHS-Cy dye labelling and several protein staining methods. We have optimised conditions for labelling of nonreduced, denatured samples and report increased sensitivity for a subset of thiol-containing proteins, allowing accurate monitoring of redox-dependent thiol modifications and expression changes. Cysteine labelling was then combined with lysine labelling in a multiplex 2-D DIGE proteomic study of redox-dependent and ErbB2-dependent changes in epithelial cells exposed to oxidative stress. This study identifies differentially modified proteins involved in cellular redox regulation, protein folding, proliferative suppression, glycolysis and cytoskeletal organisation, revealing the complexity of the response to oxidative stress and the impact that overexpression of ErbB2 has on this response.

Comparison of label free methods for quantifying human proteins by shotgun proteomics.
Old WM, Meyer-Arendt K, Aveline-Wolf L, Pierce KG, Mendoza A, Sevinsky JR, Resing KA, Ahn NG.
Mol Cell Proteomics.
2005 Jun 23; [Epub ahead of print].

[ expand abstract ]

Measurements of mass spectral peak intensities and spectral counts are promising methods for quantifying protein abundance changes in shotgun proteomics analyses. We describe SERAC, software developed to evaluate the ability of each method to quantify relative changes in protein abundance. Dynamic range and linearity using a three-dimensional ion trap were tested using standard proteins spiked into a complex sample. Linearity and good agreement between observed vs. expected protein ratios were obtained after normalization and background subtraction of peak area intensity measurements and correction of spectral counts to eliminate discontinuity in ratio estimates. Peak intensity values useful for protein quantitation ranged from 107 to 1011 counts with no obvious saturation effect, and proteins in replicate samples showed variations of less than 2-fold within the 95% range (+/- 2sigma) when = 3 peptides/protein were shared between samples. Protein ratios were determined with high confidence from spectral counts when maximum spectral counts were = 4 spectra/protein, and replicates showed equivalent measurements well within 95% confidence limits. In further tests, complex samples were separated by gel exclusion chromatography, quantifying changes in protein abundance between different fractions. Linear behavior of peak area intensity measurements was obtained for peptides from proteins in different fractions. Protein ratios determined by spectral counting agreed well with those determined from peak area intensity measurements, and both agreed with independent measurements based on gel staining intensities. Overall, spectral counting proved to be a more sensitive method for detecting proteins that undergo changes abundance, whereas peak area intensity measurements yielded more accurate estimates of protein ratios. Finally, these methods were used to analyze differential changes in protein expression in human erythroleukemia K562 cells, stimulated under conditions that promote cell differentiation by MAP kinase pathway activation. Protein changes identified with p<0.1 showed good correlations with parallel measurements of changes in mRNA expression.

Back to Top

Cognate peptide-receptor ligand mapping by directed phage display.
Stratmann T, Kang AS.
Proteome Sci.
2005 Jun 17;3:7.

[ expand abstract ]

BACKGROUND: A rapid phage display method for the elucidation of cognate peptide specific ligand for receptors is described. The approach may be readily integrated into the interface of genomic and proteomic studies to identify biologically relevant ligands. METHODS: A gene fragment library from influenza coat protein haemagglutinin (HA) gene was constructed by treating HA cDNA with DNAse I to create 50 - 100 bp fragments. These fragments were cloned into plasmid pORFES IV and in-frame inserts were selected. These in-frame fragment inserts were subsequently cloned into a filamentous phage display vector JC-M13-88 for surface display as fusions to a synthetic copy of gene VIII. Two well characterized antibodies, mAb 12CA5 and pAb 07431, directed against distinct known regions of HA were used to pan the library. RESULTS: Two linear epitopes, HA peptide 112 - 126 and 162-173, recognized by mAb 12CA5 and pAb 07431, respectively, were identified as the cognate epitopes. CONCLUSION: This approach is a useful alternative to conventional methods such as screening of overlapping synthetic peptide libraries or gene fragment expression libraries when searching for precise peptide protein interactions, and may be applied to functional proteomics.

An aptamer-based protein biochip.
Stadtherr K, Wolf H, Lindner P.
Anal Chem.
2005 Jun 1;77(11):3437-43.

[ expand abstract ]

The establishment of an aptamer-based biochip for protein detection is described. Using a model system comprising human IgE as the analyte and single-stranded DNA aptamers specific for IgE or anti-IgE antibodies as immobilized ligands on chips, we could demonstrate that aptamers were equivalent or superior to antibodies in terms of specificity and sensitivity, respectively. Aptamer-based analyte detection on glass slides could clearly be demonstrated at minimum concentrations of 10 ng/mL IgE. In addition, we successfully showed specific analyte recognition in complex protein samples by the aptamer-based biochip system. Using DNA aptamers specific for human thrombin as an additional model receptor/ligand system, dual protein detection on a single slide could be proven. In conclusion, we could show the suitability of nucleic acid aptamers as low molecular weight receptors on biochips for sensitive and specific protein detection, representing an innovative tool for future proteomics.

Back to Top

Predicting functional gene links from phylogenetic-statistical analyses of whole genomes.
Barker D, Pagel M.
PLoS Comput Biol.
2005 Jun;1(1):e3.

[ expand abstract ]

An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.

Protein sequence tags: a novel solution for comparative proteomics.
Kuhn K, Prinz T, Schafer J, Baumann C, Scharfke M, Kienle S, Schwarz J, Steiner S, Hamon C.
Proteomics.
2005 Jun;5(9):2364-8.

[ expand abstract ]

Comparative proteome profiling using stable isotope peptide labelling and mass spectrometry has emerged as a promising strategy. Here, we show the broad potential of our proprietary protein sequence tag (PST) technology. A special feature of PST is its ability to detect a wide variety of proteins including the pharmaceutically relevant membrane and nuclear proteins. This procedure addresses a similar number of proteins, compared to the multidimensional protein identification technology approach, but offers additionally a quantitative analysis with its recently developed quantitative PST version.

Back to Top

Algorithms for protein interaction networks.
Lappe M, Holm L.
Biochem Soc Trans.
2005 Jun;33(Pt 3):530-4.

[ expand abstract ]

The functional characterization of all genes and their gene products is the main challenge of the postgenomic era. Recent experimental and computational techniques have enabled the study of interactions among all proteins on a large scale. In this paper, approaches will be presented to exploit interaction information for the inference of protein structure, function, signalling pathways and ultimately entire interactomes. Interaction networks can be modelled as graphs, showing the operation of gene function in terms of protein interactions. Since the architecture of biological networks differs distinctly from random networks, these functional maps contain a signal that can be used for predictive purposes. Protein function and structure can be predicted by matching interaction patterns, without the requirement of sequence similarity. Moving on to a higher level definition of protein function, the question arises how to decompose complex networks into meaningful subsets. An algorithm will be demonstrated, which extracts whole signal-transduction pathways from noisy graphs derived from text-mining the biological literature. Finally, an algorithmic strategy is formulated that enables the proteomics community to build a reliable scaffold of the interactome in a fraction of the time compared with uncoordinated efforts.

Reverse-phase protein microarrays for tissue-based analysis.
Speer R, Wulfkuhle JD, Liotta LA.
Curr Opin Mol Ther.
2005 Jun;7(3):240-5.

[ expand abstract ]

The deciphering of the human genome has elucidated our biological structural design and has generated insights into disease development and pathogenesis. At the same time, knowledge of genetic changes during disease processes has demonstrated the need to move beyond genomics towards proteomics and a systems biology approach to science. Analyzing the proteome comprises more than just a numeration of proteins. In fact, it characterizes proteins within cells in the context of their functional status and interactions in their physiological micro- and macroenvironments. As dysregulated signaling often underpins most human diseases, an overarching goal of proteomics is to profile the working state of signaling pathways, to develop 'circuit maps' of normal and diseased protein networks and identify hyperactive, defective or inoperable transduction pathways. Reverse-phase protein microarrays represent a new technology that can generate a multiplex readout of dozens of phosphorylated events simultaneously to profile the state of a signaling pathway target even after the cell is lyzed and the contents denatured.

Back to Top

A novel approach to protein expression profiling using antibody microarrays combined with surface plasmon resonance technology.
Usui-Aoki K, Shimada K, Nagano M, Kawai M, Koga H.
Proteomics.
2005 Jun;5(9):2396-401.

[ expand abstract ]

We have previously described our systems for the high-throughput production of antibodies against mouse KIAA proteins and their validation (Proteomics 2004, 4, 1412-1416). Using our "libraries" of antibodies, we established a novel antibody microarray system in which surface plasmon resonance (SPR) technology is utilized for signal detection. Up to 400 real-time antibody-target bindings could be measured simultaneously within a single hour. This rapid detection was achieved by direct readout of the bindings using SPR technology. To evaluate our system, we assessed the reproducibility on crude protein samples and obtained satisfactorily reproducible results, exhibiting correlation values >0.92. Using this SPR-based antibody microarray system, we examined mKIAA protein expression in five different adult mouse tissues and identified the specific tissue expression patterns of several mKIAA proteins.

Target selection of soluble protein complexes for structural proteomics studies.
Shen W, Yun S, Tam B, Dalal K, Pio FF.
Proteome Sci.
2005 May 18;3(1):3.

[ expand abstract ]

BACKGROUND: Protein expression in E. coli is the most commonly used system to produce protein for structural studies, because it is fast and inexpensive and can produce large quantity of proteins. However, when proteins from other species such as mammalian are produced in this system, problems of protein expression and solubility arise 1. Structural genomics project are currently investigating proteomics pipelines that would produce sufficient quantities of recombinant proteins for structural studies of protein complexes. To investigate how the E. coli protein expression system could be used for this purpose, we purified apoptotic binary protein complexes formed between members of the Caspase Associated Recruitment Domain (CARD) family. RESULTS: A combinatorial approach to the generation of protein complexes was performed between members of the CARD domain protein family that have the ability to form hetero-dimers between each other. In our method, each gene coding for a specific protein partner is cloned in pET-28b (Novagen) and PGEX2T (Amersham) expression vectors. All combinations of protein complexes are then obtained by reconstituting complexes from purified components in native conditions, after denaturation-renaturation or co-expression. Our study applied to 14 soluble CARD domain proteins revealed that co-expression studies perform better than native and denaturation-renaturation methods. In this study, we confirm existing interactions obtained in vivoin mammalian cells and also predict new interactions. CONCLUSION: The simplicity of this screening method could be easily scaled up to identify soluble protein complexes for structural genomic projects. This study reports informative statistics on the solubility of human protein complexes expressed in E.coli belonging to the human CARD protein family.

Back to Top

HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.
Trivedi P, Edwards JW, Wang J, Gadbury GL, Srinivasasainagendra V, Zakharkin SO, Kim K, Mehta T, Brand JP, Patki A, Page GP, Allison DB.
BMC Bioinformatics.
2005 Apr 6;6(1):86.

[ expand abstract ]

BACKGROUND: Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. RESULTS: Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. CONCLUSION: HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.

High-resolution functional proteomics by active-site peptide profiling.
Okerberg ES, Wu J, Zhang B, Samii B, Blackford K, Winn DT, Shreder KR, Burbaum JJ, Patricelli MP.
Proc Natl Acad Sci U S A.
2005 Apr 5;102(14):4996-5001.

[ expand abstract ]

Characterization and functional annotation of the large number of proteins predicted from genome sequencing projects poses a major scientific challenge. Whereas several proteomics techniques have been developed to quantify the abundance of proteins, these methods provide little information regarding protein function. Here, we present a gel-free platform that permits ultrasensitive, quantitative, and high-resolution analyses of protein activities in proteomes, including highly problematic samples such as undiluted plasma. We demonstrate the value of this platform for the discovery of both disease-related enzyme activities and specific inhibitors that target these proteins.

Back to Top

An integrated approach utilizing proteomics and bioinformatics to detect ovarian cancer.
Yu JK, Zheng S, Tang Y, Li L.
J Zhejiang Univ Sci B.
2005 Apr;6(4):227-31.

[ expand abstract ]

OBJECTIVE: To find new potential biomarkers and establish the patterns for the detection of ovarian cancer. METHODS: Sixty one serum samples including 32 ovarian cancer patients and 29 healthy people were detected by surface-enhanced laser desorption/ionization mass spectrometry (SELDI-MS). The protein fingerprint data were analyzed by bioinformatics tools. Ten folds cross-validation support vector machine (SVM) was used to establish the diagnostic pattern. RESULTS: Five potential biomarkers were found (2085 Da, 5881 Da, 7564 Da, 9422 Da, 6044 Da), combined with which the diagnostic pattern separated the ovarian cancer from the healthy samples with a sensitivity of 96.7%, a specificity of 96.7% and a positive predictive value of 96.7%. CONCLUSIONS: The combination of SELDI with bioinformatics tools could find new biomarkers and establish patterns with high sensitivity and specificity for the detection of ovarian cancer.

Proteomic detection of prostate-specific antigen using a serum fractionation procedure: potential implication for new low-abundance cancer biomarkers detection.
Solassol J, Marin P, Demettre E, Rouanet P, Bockaert J, Maudelonde T, Mange A.
Anal Biochem.
2005 Mar 1;338(1):26-31.

[ expand abstract ]

One of the major obstacles in proteomic analysis of biological fluids is the presence of highly abundant proteins such as albumin and immunoglobulins, which can interfere with the resolution and sensitivity of the proteome profiling techniques used. In this paper, we describe an anion exchange fractionation procedure for serum using denaturating conditions allowing protein-protein interaction disruption before analysis by surface-enhanced laser desorption/ionization and by two-dimensional electrophoresis. This method simplifies the serum proteome into subproteomes and markedly increases resolution and sensitivity without any loss of minor proteins. To confirm the applicability of this method, fractionated serum of a patient with prostate cancer was analyzed for the presence of the prostate-specific antigen (PSA) which is a low-abundance tumor marker protein. The results demonstrate that PSA can be detected by two-dimensional electrophoresis only in serum following fractionation. Hence, this procedure may facilitate the identification of other, so far unknown, tumor markers in patient sera.

Back to Top

SELDI-TOF MS profiling of serum for detection of the progression of chronic hepatitis C to hepatocellular carcinoma.
Schwegler EE, Cazares L, Steel LF, Adam BL, Johnson DA, Semmes OJ, Block TM, Marrero JA, Drake RR.
Hepatology.
2005 Mar;41(3):634-42.

[ expand abstract ]

Proteomic profiling of serum is an emerging technique to identify new biomarkers indicative of disease severity and progression. The objective of our study was to assess the use of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) to identify multiple serum protein biomarkers for detection of liver disease progression to hepatocellular carcinoma (HCC). A cohort of 170 serum samples obtained from subjects in the United States with no liver disease (n = 39), liver diseases not associated with cirrhosis (n = 36), cirrhosis (n = 38), or HCC (n = 57) were applied to metal affinity protein chips for protein profiling by SELDI-TOF MS. Across the four test groups, 38 differentially expressed proteins were used to generate multiple decision classification trees to distinguish the known disease states. Analysis of a subset of samples with only hepatitis C virus (HCV)-related disease was emphasized. The serum protein profiles of control patients were readily distinguished from each HCV-associated disease state. Two-way comparisons of chronic hepatitis C, HCV cirrhosis, or HCV-HCC versus healthy had a sensitivity/specificity range of 74% to 95%. For distinguishing chronic HCV from HCV-HCC, a sensitivity of 61% and a specificity of 76% were obtained. However, when the values of known serum markers alpha fetoprotein, des-gamma carboxyprothrombin, and GP73 were combined with the SELDI peak values, the sensitivity and specifity improved to 75% and 92%, respectively. In conclusion, SELDI-TOF MS serum profiling is able to distinguish HCC from liver disease before cirrhosis as well as cirrhosis, especially in patients with HCV infection compared with other etiologies.

Detection of bladder cancer using a point-of-care proteomic assay.
Grossman HB, Messing E, Soloway M, Tomera K, Katz G, Berger Y, Shen Y.
JAMA.
2005 Feb 16;293(7):810-6.

[ expand abstract ]

CONTEXT: A combination of methods is used for diagnosis of bladder cancer because no single procedure detects all malignancies. Urine tests are frequently part of an evaluation, but have either been nonspecific for cancer or required specialized analysis at a laboratory. OBJECTIVE: To investigate whether a point-of-care proteomic test that measures the nuclear matrix protein NMP22 in voided urine could enhance detection of malignancy in patients with risk factors or symptoms of bladder cancer. DESIGN, SETTING, AND PATIENTS: Twenty-three academic, private practice, and veterans' facilities in 10 states prospectively enrolled consecutive patients from September 2001 to May 2002. Participants included 1331 patients at elevated risk for bladder cancer due to factors such as history of smoking or symptoms including hematuria and dysuria. Patients at risk for malignancy of the urinary tract provided a voided urine sample for analysis of NMP22 protein and cytology prior to cystoscopy. MAIN OUTCOME MEASURES: The diagnosis of bladder cancer, based on cystoscopy with biopsy, was accepted as the reference standard. The performance of the NMP22 test was compared with voided urine cytology as an aid to cancer detection. Testing for the NMP22 tumor marker was conducted in a blinded manner. RESULTS: Bladder cancer was diagnosed in 79 patients. The NMP22 assay was positive in 44 of 79 patients with cancer (sensitivity, 55.7%; 95% confidence interval [CI], 44.1%-66.7%), whereas cytology test results were positive in 12 of 76 patients (sensitivity, 15.8%; 95% CI, 7.6%-24.0%). The specificity of the NMP22 assay was 85.7% (95% CI, 83.8%-87.6%) compared with 99.2% (95% CI, 98.7%-99.7%) for cytology. The proteomic marker detected 4 cancers that were not visualized during initial endoscopy, including 3 that were muscle invasive and 1 carcinoma in situ. CONCLUSION: The noninvasive point-of-care assay for elevated urinary NMP22 protein can increase the accuracy of cystoscopy, with test results available during the patient visit.

Back to Top

Protein expression profiling identifies subclasses of breast cancer and predicts prognosis.
Jacquemier J, Ginestier C, Rougemont J, Bardou VJ, Charafe-Jauffret E, Geneix J, Adelaide J, Koki A, Houvenaeghel G, Hassoun J, Maraninchi D, Viens P, Birnbaum D, Bertucci F.
Cancer Res.
2005 Feb 1;65(3):767-79.

[ expand abstract ]

Breast cancer is a heterogeneous disease whose evolution is difficult to predict by using classic histoclinical prognostic factors. Prognostic classification can benefit from molecular analyses such as large-scale expression profiling. Using immunohistochemistry on tissue microarrays, we have monitored the expression of 26 selected proteins in more than 1,600 cancer samples from 552 consecutive patients with early breast cancer. Both an unsupervised approach and a new supervised method were used to analyze these profiles. Hierarchical clustering identified relevant clusters of coexpressed proteins and clusters of tumors. We delineated protein clusters associated with the estrogen receptor and with proliferation. Tumor clusters correlated with several histoclinical features of samples, including 5-year metastasis-free survival (MFS), and with the recently proposed pathophysiologic taxonomy of disease. The supervised method identified a set of 21 proteins whose combined expression significantly correlated to MFS in a learning set of 368 patients (P < 0.0001) and in a validation set of 184 patients (P < 0.0001). Among the 552 patients, the 5-year MFS was 90% for patients classified in the "good-prognosis class" and 61% for those classified in the "poor-prognosis class" (P < 0.0001). This difference remained significant when the molecular grouping was applied according to lymph node or estrogen receptor status, as well as the type of adjuvant systemic therapy. In multivariate analysis, the 21-protein set was the strongest independent predictor of clinical outcome. These results show that protein expression profiling may be a clinically useful approach to assess breast cancer heterogeneity and prognosis in stage I, II, or III disease.

Liquid ultraviolet matrix-assisted laser desorption/ionization -- mass spectrometry for automated proteomic analysis.
Cramer R, Corless S.
Proteomics.
2005 Feb;5(2):360-70.

[ expand abstract ]

We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet -- matrix-assisted laser desorption/ionisation -- mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. The low-femtomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydroxybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and low-mass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.

Back to Top

High throughput proteome screening for biomarker detection.
Pan S, Zhang H, Rush J, Eng J, Zhang N, Patterson D, Comb MJ, Aebersold R.
Mol Cell Proteomics.
2005 Feb;4(2):182-90.

[ expand abstract ]

Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Current methods, while highly developed and powerful, are falling short of their goal of routinely analyzing whole proteomes mainly because the wealth of proteomic information accumulated from prior studies is not used for the planning or interpretation of present experiments. The consequence of this situation is that in every proteomic experiment the proteome is rediscovered. In this report we describe an approach for quantitative proteomics that builds on the extensive prior knowledge of proteomes and a platform for the implementation of the method. The method is based on the selection and chemical synthesis of isotopically labeled reference peptides that uniquely identify a particular protein and the addition of a panel of such peptides to the sample mixture consisting of tryptic peptides from the proteome in question. The platform consists of a peptide separation module for the generation of ordered peptide arrays from the combined peptide sample on the sample plate of a MALDI mass spectrometer, a high throughput MALDI-TOF/TOF mass spectrometer, and a suite of software tools for the selective analysis of the targeted peptides and the interpretation of the results. Applying the method to the analysis of the human blood serum proteome we demonstrate the feasibility of using mass spectrometry-based proteomics as a high throughput screening technology for the detection and quantification of targeted proteins in a complex system.

Proteins as biomarkers of oxidative/nitrosative stress in diseases: the contribution of redox proteomics.
Dalle-Donne I, Scaloni A, Giustarini D, Cavarra E, Tell G, Lungarella G, Colombo R, Rossi R, Milzani A.
Mass Spectrom Rev.
2005 Jan-Feb;24(1):55-99.

[ expand abstract ]

Reactive oxygen species (ROS) and reactive nitrogen species (RNS) contribute to the pathogenesis and/or progression of several human diseases. Proteins are important molecular signposts of oxidative/nitrosative damage. However, it is generally unresolved whether the presence of oxidatively/nitrosatively modified proteins has a causal role or simply reflects secondary epiphenomena. Only direct identification and characterization of the modified protein(s) in a given pathophysiological condition can decipher the potential roles played by ROS/RNS-induced protein modifications. During the last few years, mass spectrometry (MS)-based technologies have contributed in a significant way to foster a better understanding of disease processes. The study of oxidative/nitrosative modifications, investigated by redox proteomics, is contributing to establish a relationship between pathological hallmarks of disease and protein structural and functional abnormalities. MS-based technologies promise a contribution in a new era of molecular medicine, especially in the discovery of diagnostic biomarkers of oxidative/nitrosative stress, enabling early detection of diseases. Indeed, identification and characterization of oxidatively/nitrosatively modified proteins in human diseases has just begun.

Back to Top

The European Bioinformatics Institute's data resources: towards systems biology.
Brooksbank C, Cameron G, Thornton J.
Nucleic Acids Res.
2005 Jan 1;33(Database issue):D46-53.

[ expand abstract ]

Genomic and post-genomic