Scientific Bibliography
Research Articles
[ expand all abstracts ] [ collapse all abstracts ] [
= CPTI Mouse Models ]
2006
Biomarkers for cancer screening, diagnosis, and treatment: a systems approach.
Hartwell L, Mankoff D, Paulovich A, Ramsey S, and Swisher E.
Nature Biotechnology. 2006 August.
[ expand abstract ]
Biomarkers measured in a variety of patient samples, including blood, tissue, urine and cerebrospinal fluid, are used in a diverse array of clinical settings. Although many successful biomarkers have been developed to date, advances in genetics and proteomics promise to usher in a new era of abundant, informative biomarkers that could transform the application of molecular biology to human disease. The application of biomarkers to cancer is leading the way because of the unique association of genomic changes in cancer cells with the disease process. Consequently, DNA-based biomarkers are already becoming incorporated into routine patient management and are providing lessons on the value added by appropriate diagnostic tests. Moreover, cancer management illustrates the complexity of the disease process, which can potentially be distinguished through appropriate biomarkers applied to different individuals, different types of disease, the progression of disease states and the multi-step nature of cancer treatment.
Scenarios for the use of biomarker-based diagnostics for cancer include the following: risk assessment, noninvasive screening for early-stage disease, detection and localization, disease stratification and prognosis, response to therapy and, for those in remission, screening for disease recurrence. Cost and potential morbidity increase as we progress along this continuum (Fig. 1). Our goals in applying diagnostic tests are (i) to identify persons harboring potentially life-threatening cancers at the earliest stage possible, (ii) to avoid false-positive tests and diagnosing of cancers that would otherwise not threaten a person's well-being to avoid psychological stress and unnecessary treatments, and (iii) to minimize the overall cost of the program. It is unlikely, however, that any single test will perfectly meet all of these goals.
A statistical method for chromatographic alignment of LC-MS data.
Wang P, Coram M, Tang H, Fitzgibbon M, Zhang H, Yi E, Aebersold, and
McIntosh M.
Biostatistics. 2006 August 2.
[ expand abstract ]
Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific masscharge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments.
In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time-warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.
Adenomatous polyposis coli (APC) is required for normal development of skin and thymus.
Kuraguchi M, Wang X, Bronson R, Rothenberg R, Ohene-Baah N, Lund J, Kucherlapati M, Maas R, and Kucherlapati R.
PLOS Genetics. 2006 July 28.
[ expand abstract ]
The tumor suppressor gene Apc (adenomatous polyposis coli) is a member of the Wnt signaling pathway that is involved in development and tumorigenesis. Heterozygous knockout mice for Apc have a tumor predisposition phenotype and homozygosity leads to embryonic lethality. To understand the role of Apc in development we generated a floxed allele. These mice were mated with a strain carrying Cre recombinase under the control of the human Keratin 14 (K14) promoter, which is active in basal cells of epidermis and other stratified epithelia. Mice homozygous for the floxed allele that also carry the K14-cre transgene were viable but had stunted growth and died before weaning. Histological and immunochemical examinations revealed that K14-cre mediated Apc loss resulted in aberrant growth in many ectodermally derived squamous epithelia including hair follicles, teeth and oral and corneal epithelia. In addition, squamous metaplasia was observed in various epithelial-derived tissues including the thymus. The aberrant growth of hair follicles and other appendages as well as the thymic abnormalities in K14-cre; ApcCKO/CKO mice suggest Apc gene is crucial in embryonic cells to specify epithelial cell fates in organs that require epithelial-mesenchymal interactions for their development.
General framework for developing and evaluating database scoring algorithms using the TANDEM search engine.
MacLean, B, Eng J, Beavis R, and McIntosh M.
Bioinformatics. 2006 July 28.
[ expand abstract ]
MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra to a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Supplementary materials, including source code for the scoring functions, are available from http://proteomics.fhcrc.org.
A reagent resource to identify proteins and peptides of interest to the cancer community: A workshop report.
Haab B, Paulovich A, Anderson N, Clark A, Downing G, Hermjakob H, Labaer J, and Uhlen M.
Molecular and Cellular Proteomics. 2006 Jul 24.
[ expand abstract ]
On the basis of discussions with representatives from all sectors of the cancer research community, the NCI recognizes the immense opportunities to apply proteomic technologies to further cancer research. Validated and well-characterized affinity capture reagents (e.g., antibodies, aptamers, affibodies) will play a key role in proteomic research platforms for the prevention, early detection, treatment, and monitoring of cancer. To discuss ways to develop new resources and optimize current opportunities in this area, the National Cancer Institute (NCI) convened the "Proteomic Technologies Reagents Resource Workshop" in Chicago, IL on December 12-13, 2005. The workshop brought together leading scientists in proteomic research to discuss model systems for evaluating and delivering resources for reagents to support mass spectrometry (MS) and affinity capture platforms. Speakers discussed issues and identified action items related to an overall vision for and proposed models for a shared proteomics reagents resource, applications of affinity capture methods in cancer research, quality control and validation of affinity capture reagents, considerations for target selection, and construction of a reagents database. The meeting also featured presentations and discussion from leading private-sector investigators on state-of-the-art technologies and capabilities to meet the user community's needs. This workshop was developed as a component of the NCI's Clinical Proteomics Technologies Initiative for Cancer (CPTI ) a coordinated initiative that includes the establishment of reagent resources for the scientific community. This workshop report explores various approaches to develop a framework that will most effectively fulfill the needs of the NCI and the cancer research community.
Analysis of Acrylamide Labeled Serum Proteins by LC-MS/MS.
Vitor F, Coram M, Phanstiel D, Glukhova V, Zhang Q, Fitzgibbon M,
McIntosh M, and Hanash S.
Journal of Proteome Research. 2006 July 13.
[ expand abstract ]
Isotopic labeling of cysteine residues with acrylamide was previously utilized for relative quantitation of proteins by MALDI-TOF. Here, we explored and compared the application of deuterated and (13) C isotopes of acrylamide for quantitative proteomic analysis using LC-MS/MS and high-resolution FTICR mass spectrometry. The method was applied to human serum samples that were immunodepleted of abundant proteins. Our results show reliable quantitation of proteins across an abundance range that spans 5 orders of magnitude based on ion intensities and known protein concentration in plasma. The use of (13)C isotope of acrylamide had a slightly greater advantage relative to deuterated acrylamide, because of shifts in elution of deuterated acrylamide relative to its corresponding nondeuterated compound by reversed-phase chromatography. Overall, the use of acrylamide for differentially labeling intact proteins in complex mixtures, in combination with LC-MS/MS provides a robust method for quantitative analysis of complex proteomes.
Quality control metrics for LC-MS feature detection tools demonstrated on Saccharomyces cerevisiae proteomic profiles.
Piening B, Wang P, Bangur C, Whiteaker J, Zhang H, Feng L-C, Keane J, Eng J, Tang H, Prakash A, McIntosh M, and Paulovich A.
Journal of Proteome Research. 2006 July;5(7):1527-1534.
[ expand abstract ]
Quantitative proteomic profiling using liquid chromatography-mass spectrometry is emerging as an important tool for biomarker discovery, prompting development of algorithms for high-throughput peptide feature detection in complex samples. However, neither annotated standard data sets nor quality control metrics currently exist for assessing the validity of feature detection algorithms. We propose a quality control metric, Mass Deviance, for assessing the accuracy of feature detection tools. Because the Mass Deviance metric is derived from the natural distribution of peptide masses, it is machine-and proteome-independent and enables assessment of feature detection tools in the absence of completely annotated data sets. We validate the use of Mass Deviance with a second, independent metric that is based on isotopic distributions, demonstrating that we can use Mass Deviance to identify aberrant features with high accuracy. We then demonstrate the use of independent metrics in tandem as a robust way to evaluate the performance of peptide feature detection algorithms. This work is done on complex LC-MS profiles of Saccharomyces cerevisiae which present a significant challenge to peptide feature detection algorithms.
Mass Spectrometry-Based Study of the Plasma Proteome in a Mouse Intestinal Tumor Model.
Hung K, Kho A, Sarracino D, Georgeon R, Krastins B, Forrester S, Haab B, Kohane I, and Kucherlapati R.
Journal of Proteome Research. 2006 June 27.
[ expand abstract ]
Early detection of cancer can greatly improve prognosis. Identification of proteins or peptides in the circulation, at different stages of cancer, would greatly enhance treatment decisions. Mass spectrometry (MS) is emerging as a powerful tool to identify proteins from complex mixtures such as plasma that may help identify novel sets of markers that may be associated with the presence of tumors. To examine this feature we have used a genetically modified mouse model, Apc(Min), which develops intestinal tumors with 100% penetrance. Utilizing liquid chromatography-tandem mass spectrometry (LC-MS/MS), we identified total plasma proteome (TPP) and plasma glycoproteome (PGP) profiles in tumor-bearing mice. Principal component analysis (PCA) and agglomerative hierarchial clustering analysis revealed that these protein profiles can be used to distinguish between tumor-bearing Apc(Min) and wild-type control mice. Leave-one-out cross-validation analysis established that global TPP and global PGP profiles can be used to correctly predict tumor-bearing animals in 17/19 (89%) and 19/19 (100%) of cases, respectively. Furthermore, leave-one-out cross-validation analysis confirmed that the significant differentially expressed proteins from both the TPP and the PGP were able to correctly predict tumor-bearing animals in 19/19 (100%) of cases. A subset of these proteins was independently validated by antibody microarrays using detection by two color rolling circle amplification (TC-RCA). Analysis of the significant differentially expressed proteins indicated that some might derive from the stroma or the host response. These studies suggest that mass spectrometry-based approaches to examine the plasma proteome may prove to be a valuable method for determining the presence of intestinal tumors.
Compression of LC/MS Proteomic Data.
Miguel A, Keane J, Whiteaker J, Zhang H, and Paulovich A.
Proceedings of the 19th IEEE International Symposium on Computer-Based Medical Systems. Conference held 2006 June 22-23;925-930.
[ expand abstract ]
The unrelenting growth ofmass spectrometry (MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. The data for this study was derived from peptides of hand-mixed protein samples passed through a high performance liquid chromatography system (HPLC) and an electrospray ionization time-of-flight (ESI-TOF) mass spectrometer. Several lossless data compression methods were applied and yielded up to a 25:1 compression ratio relative to the original files containing base64 encoding of the data.
A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin CW, Chen J, Goodlett D, Whiteaker J, Paulovich A, and McIntosh M.
Bioinformatics. 2006 June 9.
[ expand abstract ]
MOTIVATION: Comparing two or more complex protein mixtures using liquid chromatography mass spectrometry (LC-MS) requires multiple analysis steps to locate and quantitate natural peptides within a single experiment and to align and normalize findings across multiple experiments. RESULTS: We describe msInspect, an open-source application comprising algorithms and visualization tools for the analysis of multiple LC-MS experimental measurements. The platform integrates novel algorithms for detecting signatures of natural peptides within a single LC-MS measurement and combines multiple experimental measurements into a peptide array, which may then be mined using analysis tools traditionally applied to genomic array analysis. The platform supports quantitation by both label-free and isotopic labeling approaches. The software implementation has been designed so that many key components may be easily replaced, making it useful as a workbench for integrating other novel algorithms developed by a growing research community. AVAILABILITY: The msInspect software is distributed freely under an Apache 2.0 license. The software as well as a Zip file with all peptide feature files and scripts needed to generate the tables and figures in this article are available at http://proteomics.fhcrc.org/.
Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.
Fermin D, Allen B, Blackwell T, Menon R, Adamski M, Xu Y, Ulintz P, Omenn GS, and States D.
Genome Biology. 2006 May;7(4):R35.
[ expand abstract ]
Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.
Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study.
States D, Omenn G, Blackwell T, Fermin D, Eng J, Speicher D, and Hanash S.
Nature Biotechnology. 2006 March; 24(3): 333-338.
[ expand abstract ]
The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously non-annotated gene sequences.
Computational Proteomics Analysis System (CPAS): An extensible open source analytic system for evaluating and publishing proteomic
data and high throughput biological experiments.
Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin C, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich P, and McIntosh M.
Journal of Proteome Research. 2006 Jan-Feb; 5(1): 112-21.
[ expand abstract ]
The open-source Computational Proteomics Analysis System (CPAS) contains an entire data analysis and management pipeline for Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) proteomics, including experiment annotation, protein database searching and sequence management, and mining LC-MS/MS peptide and protein identifications. CPAS architecture and features, such as a general experiment annotation component, installation software, and data security management, make it useful for collaborative projects across geographical locations and for proteomics laboratories without substantial computational support.
Normalization regarding non-random missing values in high-throughput mass spectrometry data.
Wang P, Tang H, Zhang H, Whiteaker J, Paulovich A, and McIntosh M.
Proceedings of the Pacific Symposium on Biocomputing.
Conference held Jan 3-7 2006; 11: 315-326.
[ expand abstract ]
We propose a two-step normalization procedure for a high-throughput mass spectrometry (MS) data, which is a necessary step in biomarker clustering or classification. First, a global normalization step is used to remove sources of systematic variation between MS profiles due to, for instance, varying amounts of sample degradation over time. A probability model is then used to investigate the intensity-dependent missing events and provides possible substitutions for the missing values. We illustrate the performance of the method wit ha LC-MS data set of synthetic protein mixtures.
2005
Two-dimensional electrophoresis database of fluorescence-labeled proteins of colon cancer cells.
Mori Y, Kondo T, Yamada T, Tsuchida A, Aoki T, Hirohashi S.
J Chromatogr B Analyt Technol Biomed Life Sci. 2005 Sep 5;823(2):82-97.
[ expand abstract ]
We constructed a novel database of the proteome of DLD-1 colon cancer cells by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) of fluorescence-labeled proteins followed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF) analysis. The database consists of 258 functionally categorized proteins corresponding to 314 protein spots. The majority of the proteins are oxidoreductases, cytoskeletal proteins and nucleic acid binding proteins. Phosphatase treatment showed that 28% of the protein spots on the gel are phosphorylated, and mass spectrometric analysis identified 21 of them. Proteins of DLD-1 cells and of laser-microdissected colon cancer tissues showed similar distribution on 2D gels, suggesting the utility of our database for clinical proteomics.
Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.
Elias JE, Haas W, Faherty BK, Gygi SP.
Nat Methods. 2005 Sep;2(9):667-75.
[ expand abstract ]
Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.
A streamlined platform for high-content functional proteomics of primary human specimens.
Jessani N, Niessen S, Wei BQ, Nicolau M, Humphrey M, Ji Y, Han W, Noh DY, Yates JR, Jeffrey SS, Cravatt BF.
Nat Methods. 2005 Sep;2(9):691-697.
[ expand abstract ]
Achieving information content of satisfactory breadth and depth remains a formidable challenge for proteomics. This problem is particularly relevant to the study of primary human specimens, such as tumor biopsies, which are heterogeneous and of finite quantity. Here we present a functional proteomics strategy that unites the activity-based protein profiling and multidimensional protein identification technologies (ABPP-MudPIT) for the streamlined analysis of human samples. This convergent platform involves a rapid initial phase, in which enzyme activity signatures are generated for functional classification of samples, followed by in-depth analysis of representative members from each class. Using this two-tiered approach, we identified more than 50 enzyme activities in human breast tumors, nearly a third of which represent previously uncharacterized proteins. Comparison with cDNA microarrays revealed enzymes whose activity, but not mRNA expression, depicted tumor class, underscoring the power of ABPP-MudPIT for the discovery of new markers of human disease that may evade detection by other molecular profiling methods.
POET: Using proteomics to screen pools of open reading frames for protein expression.
Gillette WK, Esposito D, Frank PH, Zhou M, Yu LR, Jozwik C, Zhang X, McGowan B, Jacobowitz DM, Pollard HB, Hao T, Hill DE, Vidal M, Conrads TP, Veenstra TD, Hartley JL.
Mol Cell Proteomics. 2005 Aug 19; [Epub ahead of print].
[ expand abstract ]
We have developed a pooled ORF (open reading frame) expression technology, POET, that uses recombinational cloning and proteomics methods (two dimensional gel electrophoresis and mass spectrometry) to identify ORFs that when expressed are likely to yield high levels of soluble, purified protein. Because the method works on pools of ORFs, the procedures needed to subclone, express, purify, and assay protein expression for hundreds of clones are greatly simplified. From a pool of 688 C. elegans ORFs expressed in E. coli, small scale expression and purification of 12 positive clones identified by POET yielded on average 6 times as much protein as negative clones. Larger scale expression and purification of 6 of the positive clones yielded 47 to 374 mg of purified protein per liter. POET pools of ORFs can be constructed, and the pools of the resulting proteins can be analyzed and manipulated, to rapidly acquire information about the attributes of hundreds proteins simultaneously.
Robust Accurate Identification of Peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics.
Alves G, Yu YK.
Bioinformatics. 2005 Aug 16; [Epub ahead of print].
[ expand abstract ]
MOTIVATION: The key to mass-spectrometry-based proteomics is peptide sequencing. The major challenge in peptide sequencing, whether library search or de novo, is to better infer statistical significance and better attain noise reduction. Because the noise in a spectrum depends on experimental conditions, the instrument used, and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. We wish to overcome such issues. RESULTS: We design RAId to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Through de novo sequencing, we establish the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences become potential candidates for new peptides that are not yet in the database. The use of spectrum-specific background statistics seems to enable RAId to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications. AVAILABILITY: Programs implementing the methods described are available from the authors upon request.
Two-dimensional gel isoelectric focusing.
Stastna M, Slais K.
Electrophoresis. 2005 Aug 15; [Epub ahead of print].
[ expand abstract ]
Two-dimensional gel isoelectric focusing (2-D gel IEF) is presented as the combination of the same separation method used consecutively in two directions of the same gel. In this new method, after completion of IEF process in the first dimension the gel was cut into the separate strips, each containing selected analytes together with the appropriate part of the original broad pH gradient, and the strips were rotated by 90 degrees (with regard to the first IEF) and left to diffuse overnight. After diffusion the strips were subjected to the second IEF. During the second IEF, the corresponding narrow part of pH gradient in each strip was restored again, however, now along the strip. The progress of the separation process can be monitored visually by using colored low-molecular-weight isoelectric point (pI) markers loaded into the gel simultaneously with proteins. The unique properties of IEF, focusing and resolution power were enhanced by using the same technique twice. Two forms of beta-lactoglobulin (pI values 5.14 and 5.31, respectively) nonseparated in the first IEF were successfully separated in the second dimension at relatively low voltage (330 V) with the resolution power comparable to the high-resolution gels requiring the high voltage during the run and long separation time. Glucose oxidase loaded as diluted solution into ten positions across the gel was finally focused into a single band during 2-D gel IEF. Since the first and second IEF are carried out on the same gel, no losses and contamination of analyte occur. The suggested method can be used for separation/fractionation of complex biological mixtures, similarly as other multidimensional separation techniques applied in proteomics, and can be followed by further processing, e.g., mass spectrometry analysis. The focusing properties of IEF could be useful especially in separation of mixtures, where components are at low concentration levels.
Sample handling for mass spectrometric proteomic investigations of human sera.
West-Nielsen M, Hogdall EV, Marchiori E, Hogdall CK, Schou C, Heegaard NH.
Anal Chem. 2005 Aug 15;77(16):5114-23.
[ expand abstract ]
Proteomic investigations of sera are potentially of value for diagnosis, prognosis, choice of therapy, and disease activity assessment by virtue of discovering new biomarkers and biomarker patterns. Much debate focuses on the biological relevance and the need for identification of such biomarkers while less effort has been invested in devising standard procedures for sample preparation and storage in relation to model building based on complex sets of mass spectrometric (MS) data. Thus, development of standardized methods for collection and storage of patient samples together with standards for transportation and handling of samples are needed. This requires knowledge about how sample processing affects MS-based proteome analyses and thereby how nonbiological biased classification errors are avoided. In this study, we characterize the effects of sample handling, including clotting conditions, storage temperature, storage time, and freeze/thaw cycles, on MS-based proteomics of human serum by using principal components analysis, support vector machine learning, and clustering methods based on genetic algorithms as class modeling and prediction methods. Using spiking to artificially create differentiable sample groups, this integrated approach yields data that--even when working with sample groups that differ more than may be expected in biological studies--clearly demonstrate the need for comparable sampling conditions for samples used for modeling and for the samples that are going into the test set group. Also, the study emphasizes the difference between class prediction and class comparison studies as well as the advantages and disadvantages of different modeling methods.
Investigating diversity in human plasma proteins.
Nedelkov D, Kiernan UA, Niederkofler EE, Tubbs KA, Nelson RW.
Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):10852-7.
[ expand abstract ]
Plasma proteins represent an important part of the human proteome. Although recent proteomics research efforts focus largely on determining the overall number of proteins circulating in plasma, it is equally important to delineate protein variations among individuals, because they can signal the onset of diseases and be used as biological markers in diagnostics. To date, there has been no systematic proteomics effort to characterize the breadth of structural modifications in individual proteins in the general population. In this work, we have undertaken a population proteomics study to define gene- and protein-level diversity that is encountered in the general population. Twenty-five plasma proteins from a cohort of 96 healthy individuals were investigated through affinity-based mass spectrometric assays. A total of 76 structural forms/variants were observed for the 25 proteins within the samples cohort. Posttranslational modifications were detected in 18 proteins, and point mutations were observed in 4 proteins. The frequency of occurrence of these variations was wide-ranged, with some modifications being observed in only one sample, and others detected in all 96 samples. Even though a relatively small cohort of individuals was investigated, the results from this study illustrate the extent of protein diversity in the human population and can be of immediate aid in clinical proteomics/biomarker studies by laying a basal-level statistical foundation from which protein diversity relating to disease can be evaluated.
Proteomic characterization of the angiogenesis inhibitor SU6668 reveals multiple impacts on cellular kinase signaling.
Godl K, Gruss OJ, Eickhoff J, Wissing J, Blencke S, Weber M, Degen H, Brehmer D, Orfi L, Horvath Z, Keri G, Muller S, Cotten M, Ullrich A, Daub H.
Cancer Res. 2005 Aug 1;65(15):6919-26.
[ expand abstract ]
Knowledge about molecular drug action is critical for the development of protein kinase inhibitors for cancer therapy. Here, we establish a chemical proteomic approach to profile the anticancer drug SU6668, which was originally designed as a selective inhibitor of receptor tyrosine kinases involved in tumor vascularization. By employing immobilized SU6668 for the affinity capture of cellular drug targets in combination with mass spectrometry, we identified previously unknown targets of SU6668 including Aurora kinases and TANK-binding kinase 1. Importantly, a cell cycle block induced by SU6668 could be attributed to inhibition of Aurora kinase activity. Moreover, SU6668 potently suppressed antiviral and inflammatory responses by interfering with TANK-binding kinase 1-mediated signal transmission. These results show the potential of chemical proteomics to provide rationales for the development of potent kinase inhibitors, which combine rather unexpected biological modes of action by simultaneously targeting defined sets of both serine/threonine and tyrosine kinases involved in cancer progression.
Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides.
Beynon RJ, Doherty MK, Pratt JM, Gaskell SJ.
Nat Methods. 2005 Aug;2(8):587-9.
[ expand abstract ]
Absolute quantification in proteomics usually involves simultaneous determination of representative proteolytic peptides and stable isotope-labeled analogs. The principal limitation to widespread implementation of this approach is the availability of standard signature peptides in accurately known amounts. We report the successful design and construction of an artificial gene encoding a concatenation of tryptic peptides (QCAT protein) from several chick (Gallus gallus) skeletal muscle proteins and features for quantification and purification.
Analysis of candidate genes through a proteomics-based approach in primary cell lines from malignant melanomas and their metastases.
Carta F, Demuro PP, Zanini C, Santona A, Castiglia D, D'atri S, Ascierto PA, Napolitano M, Cossu A, Tadolini B, Turrini F, Manca A, Sini MC, Palmieri G, Rozzo AC; on behalf of the Italian Melanoma Intergroup (IMI).
Melanoma Res. 2005 Aug;15(4):235-244.
[ expand abstract ]
Proteomics provides a powerful approach for screening alterations in protein expression and post-translational modification associated with particular human diseases. In this study, the analysis of protein expression was focused on malignant melanoma in order to determine the candidate genes involved in tumour progression. The proteomes of cultured melanocytes and of cell lines from primary and metastatic lesions of one malignant melanoma patient were profiled using two-dimensional electrophoresis (2-DE) and mass spectrometry. Differentially expressed proteins were confirmed by 2-DE and mass spectrometry on an additional four malignant melanoma cell lines. Total RNA from the first subset of cell lines was used for quantitative reverse transcriptase-polymerase chain reaction (RT-PCR) of the candidate genes identified after proteomics analysis. A very high similarity was observed in the 2-DE maps of two malignant melanoma cell lines derived from primary and secondary lesions of the same patient. Mass spectrometry identified 37 proteins which were found to be more abundant in tumour cells in comparison with control melanocytes (as confirmed on additional cell lines), with a relatively high prevalence of stress proteins. Eight candidate genes (PRDX2, HSP27, HSP60, HSPA8, HSP9B, STIP1, PDI and P4HB) were further characterized by evaluating their messenger RNA expression levels through real-time RT-PCR analysis. Overexpression of HSP27, HSP60 and HSPA8 and downregulation of PRDX2 were observed in cells from metastatic malignant melanoma in comparison with those from primary melanoma. Although further investigations with larger numbers of paired normal and tumour samples are needed, our findings strongly suggest that the dysregulation of stress pathways may be involved in melanoma progression.
PRIDE: The proteomics identifications database.
Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R.
Proteomics. 2005 Aug;5(13):3537-45.
[ expand abstract ]
The advent of high-throughput proteomics has enabled the identification of ever increasing numbers of proteins. Correspondingly, the number of publications centered on these protein identifications has increased dramatically. With the first results of the HUPO Plasma Proteome Project being analyzed and many other large-scale proteomics projects about to disseminate their data, this trend is not likely to flatten out any time soon. However, the publication mechanism of these identified proteins has lagged behind in technical terms. Often very long lists of identifications are either published directly with the article, resulting in both a voluminous and rather tedious read, or are included on the publisher's website as supplementary information. In either case, these lists are typically only provided as portable document format documents with a custom-made layout, making it practically impossible for computer programs to interpret them, let alone efficiently query them. Here we propose the proteomics identifications (PRIDE) database (http://www.ebi.ac.uk/pride) as a means to finally turn publicly available data into publicly accessible data. PRIDE offers a web-based query interface, a user-friendly data upload facility, and a documented application programming interface for direct computational access. The complete PRIDE database, source code, data, and support tools are freely available for web access or download and local installation.
Plasma Proteome Database as a resource for proteomics research.
Muthusamy B, Hanumanthu G, Suresh S, Rekha B, Srinivas D, Karthick L, Vrushabendra BM, Sharma S, Mishra G, Chatterjee P, Mangala KS, Shivashankar HN, Chandrika KN, Deshpande N, Suresh M, Kannabiran N, Niranjan V, Nalli A, Prasad TS, Arun KS, Reddy R, Chandran S, Jadhav T, Julie D, Mahesh M, John SL, Palvankar K, Sudhir D, Bala P, Rashmi NS, Vishnupriya G, Dhar K, Reshma S, Chaerkady R, Gandhi TK, Harsha HC, Mohan SS, Deshpande KS, Sarker M, Pandey A.
Proteomics. 2005 Aug;5(13):3531-6.
[ expand abstract ]
Plasma is one of the best studied compartments in the human body and serves as an ideal body fluid for the diagnosis of diseases. This report provides a detailed functional annotation of all the plasma proteins identified to date. In all, gene products encoded by 3778 distinct genes were annotated based on proteins previously published in the literature as plasma proteins and the identification of multiple peptides from proteins under HUPO's Plasma Proteome Project. Our analysis revealed that 51% of these genes encoded more than one protein isoform. All single nucleotide polymorphisms involving protein-coding regions were mapped onto the protein sequences. We found a number of examples of isoform-specific subcellular localization as well as tissue expression. This database is an attempt at comprehensive annotation of a complex subproteome and is available on the web at http://www.plasmaproteomedatabase.org.
Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.
Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H, Apweiler R, Haab BB, Simpson RJ, Eddes JS, Kapp EA, Moritz RL, Chan DW, Rai AJ, Admon A, Aebersold R, Eng J, Hancock WS, Hefta SA, Meyer H, Paik YK, Yoo JS, Ping P, Pounds J, Adkins J, Qian X, Wang R, Wasinger V, Wu CY, Zhao X, Zeng R, Archakov A, Tsugita A, Beer I, Pandey A, Pisano M, Andrews P, Tammen H, Speicher DW, Hanash SM.
Proteomics. 2005 Aug;5(13):3226-45.
[ expand abstract ]
HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anti-coagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics.med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay-based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs.We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches.This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.
Isoelectric focusing in serial immobilized pH gradient gels to improve protein separation in proteomic analysis.
Poznanovic S, Schwall G, Zengerling H, Cahill MA.
Electrophoresis. 2005 Aug;26(16):3185-90.
[ expand abstract ]
We previously demonstrated the separation of proteins by isoelectric focusing (IEF) over pH 4-8 immobilized pH gradients (IPGs) over 54 cm (Poland et al., Electrophoresis 2003, 24, 1271). Here we show that similar results can be conveniently achieved using commercially available IPGs of appropriate pH ranges positioned end-on-end in series during electrophoresis, which we term 'daisy chain IEF'. Proteins efficiently electrophorese from one IPG to another during IEF by traversing buffer-filled porous bridges between the serial IPGs. A variety of materials can function as bridges, including paper, polyacrylamide gels or even IPGs. The quality of two-dimensional (2-D) protein patterns is not apparently worse than that generated by conventional IEF using the same individual IPGs. A major advantage of this method is that sample is consumed efficiently, without the requirement for preliminary steps, such as chamber IEF. This advantage is pronounced when working with extremely limited sources of samples, such as with clinical biopsies or cellular subfractions. The present study was limited by the commercial availability of suitable pH gradients. Proteomics analyses could be further improved if commercial vendors would manufacture IPGs with suitable pH ranges to achieve high resolution ( approximately 100 cm) IEF separation of proteins in one electrophoretic step over the pH range 2-12.
Web-based data warehouse on gene expression in human colorectal cancer.
Sagynaliev E, Steinert R, Nestler G, Lippert H, Knoch M, Reymond MA.
Proteomics. 2005 Aug;5(12):3066-78.
[ expand abstract ]
Based on biomedical literature databases, we tried a first step for constructing a gene expression "data warehouse" specific to human colorectal cancer (CRC). Results of genome-wide transcriptomic research were available from 12 studies, using various technologies, namely, SAGE, cDNA and oligonucleotide arrays, and adaptor-tagged amplification. Three studies analyzed CRC cell lines and nine studies of human samples. The total number of patients was 144. Out of 982 up- or down-regulated genes, 863 (88%) were found to be differentially expressed in a single study, 88 in two studies, 22 in three studies, 7 in four studies, and only 2 genes in six studies. Eight large-scale proteomics studies were published in CRC, using 2-D-, SDS- or free-flow electrophoresis, involving only 11 patients. Out of 408 differentially expressed proteins, 339 (83%) were found to be differentially expressed only in a single study, 16 in three studies, 10 in four studies, 3 in five, and 1 in eight studies. Confirmation at proteome level of results obtained with large-scale transcriptomics studies was possible in 25%. This proportion was higher (67%) for reproducing proteome results using transcriptomics technologies. Obviously, reproducibility and overlapping between published gene expression results at proteome and transcriptome level are low in human CRC. Thus, the development of standardized processes for collecting samples, storing, retrieving, and querying gene expression data obtained with different technologies is of central importance in translational research.
MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences.
Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O.
Nucleic Acids Res. 2005 Jul 25;33(13):4164-71.
[ expand abstract ]
The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.
Processing methods for differential analysis of LC/MS profile data.
Katajamaa M, Oresic M.
BMC Bioinformatics. 2005 Jul 18;6:179.
[ expand abstract ]
BACKGROUND: Liquid chromatography coupled to mass spectrometry (LC/MS) has been widely used in proteomics and metabolomics research. In this context, the technology has been increasingly used for differential profiling, i.e. broad screening of biomolecular components across multiple samples in order to elucidate the observed phenotypes and discover biomarkers. One of the major challenges in this domain remains development of better solutions for processing of LC/MS data. RESULTS: We present a software package MZmine that enables differential LC/MS analysis of metabolomics data. This software is a toolbox containing methods for all data processing stages preceding differential analysis: spectral filtering, peak detection, alignment and normalization. Specifically, we developed and implemented a new recursive peak search algorithm and a secondary peak picking method for improving already aligned results, as well as a normalization tool that uses multiple internal standards. Visualization tools enable comparative viewing of data across multiple samples. Peak lists can be exported into other data analysis programs. The toolbox has already been utilized in a wide range of applications. We demonstrate its utility on an example of metabolic profiling of Catharanthus roseus cell cultures. CONCLUSION: The software is freely available under the GNU General Public License and it can be obtained from the project web page at: http://mzmine.sourceforge.net/.
BRIGEP--the BRIDGE-based genome-transcriptome-proteome browser.
Goesmann A, Linke B, Bartels D, Dondrup M, Krause L, Neuweger H, Oehm S, Paczian T, Wilke A, Meyer F.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W710-6.
[ expand abstract ]
The growing amount of information resulting from the increasing number of publicly available genomes and experimental results thereof necessitates the development of comprehensive systems for data processing and analysis. In this paper, we describe the current state and latest developments of our BRIGEP bioinformatics software system consisting of three web-based applications: GenDB, EMMA and ProDB. These applications facilitate the processing and analysis of bacterial genome, transcriptome and proteome data and are actively used by numerous international groups. We are currently in the process of extensively interconnecting these applications. BRIGEP was developed in the Bioinformatics Resource Facility of the Center for Biotechnology at Bielefeld University and is freely available. A demo project with sample data and access to all three tools is available at https://www.cebitec.uni-bielefeld.de/groups/brf/software/brigep/. Code bundles for these and other tools developed in our group are accessible on our FTP server at ftp.cebitec.uni-bielefeld.de/pub/software/.
DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy.
Halligan BD, Ruotti V, Twigger SN, Greene AS.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W376-81.
[ expand abstract ]
One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.
Metabolic labeling of proteins for proteomics.
Beynon RJ, Pratt JM.
Mol Cell Proteomics. 2005 Jul;4(7):857-72.
[ expand abstract ]
Realization of the advantages of stable isotope labeling for proteomics has emerged gradually. However, many stable isotope label approaches rely on labeling in vitro using complex and sometimes expensive reagents. This review discusses strategies for labeling protein in vivo through metabolic incorporation of label into protein. This approach has many advantages, is particularly suited to single cells grown in culture (prokaryotic or eukaryotic), but is nonetheless subject to a number of complicating factors that must be controlled so that meaningful experiments can be conducted. Confounding issues include the metabolic lability of the amino acid precursor, incomplete labeling, and the role of protein turnover in labeling kinetics. All of these are controllable, provided that appropriate precautions are adopted.
Proteomic analysis of redox- and ErbB2-dependent changes in mammary luminal epithelial cells using cysteine- and lysine-labelling two-dimensional difference gel electrophoresis.
Chan HL, Gharbi S, Gaffney PR, Cramer R, Waterfield MD, Timms JF.
Proteomics. 2005 Jul;5(11):2908-26.
[ expand abstract ]
Differential protein expression analysis based on modification of selected amino acids with labelling reagents has become the major method of choice for quantitative proteomics. One such methodology, two-dimensional difference gel electrophoresis (2-D DIGE), uses a matched set of fluorescent N-hydroxysuccinimidyl (NHS) ester cyanine dyes to label lysine residues in different samples which can be run simultaneously on the same gels. Here we report the use of iodoacetylated cyanine (ICy) dyes (for labelling of cysteine thiols, for 2-D DIGE-based redox proteomics. Characterisation of ICy dye labelling in relation to its stoichiometry, sensitivity and specificity is described, as well as comparison of ICy dye with NHS-Cy dye labelling and several protein staining methods. We have optimised conditions for labelling of nonreduced, denatured samples and report increased sensitivity for a subset of thiol-containing proteins, allowing accurate monitoring of redox-dependent thiol modifications and expression changes. Cysteine labelling was then combined with lysine labelling in a multiplex 2-D DIGE proteomic study of redox-dependent and ErbB2-dependent changes in epithelial cells exposed to oxidative stress. This study identifies differentially modified proteins involved in cellular redox regulation, protein folding, proliferative suppression, glycolysis and cytoskeletal organisation, revealing the complexity of the response to oxidative stress and the impact that overexpression of ErbB2 has on this response.
Comparison of label free methods for quantifying human proteins by shotgun proteomics.
Old WM, Meyer-Arendt K, Aveline-Wolf L, Pierce KG, Mendoza A, Sevinsky JR, Resing KA, Ahn NG.
Mol Cell Proteomics. 2005 Jun 23; [Epub ahead of print].
[ expand abstract ]
Measurements of mass spectral peak intensities and spectral counts are promising methods for quantifying protein abundance changes in shotgun proteomics analyses. We describe SERAC, software developed to evaluate the ability of each method to quantify relative changes in protein abundance. Dynamic range and linearity using a three-dimensional ion trap were tested using standard proteins spiked into a complex sample. Linearity and good agreement between observed vs. expected protein ratios were obtained after normalization and background subtraction of peak area intensity measurements and correction of spectral counts to eliminate discontinuity in ratio estimates. Peak intensity values useful for protein quantitation ranged from 107 to 1011 counts with no obvious saturation effect, and proteins in replicate samples showed variations of less than 2-fold within the 95% range (+/- 2sigma) when = 3 peptides/protein were shared between samples. Protein ratios were determined with high confidence from spectral counts when maximum spectral counts were = 4 spectra/protein, and replicates showed equivalent measurements well within 95% confidence limits. In further tests, complex samples were separated by gel exclusion chromatography, quantifying changes in protein abundance between different fractions. Linear behavior of peak area intensity measurements was obtained for peptides from proteins in different fractions. Protein ratios determined by spectral counting agreed well with those determined from peak area intensity measurements, and both agreed with independent measurements based on gel staining intensities. Overall, spectral counting proved to be a more sensitive method for detecting proteins that undergo changes abundance, whereas peak area intensity measurements yielded more accurate estimates of protein ratios. Finally, these methods were used to analyze differential changes in protein expression in human erythroleukemia K562 cells, stimulated under conditions that promote cell differentiation by MAP kinase pathway activation. Protein changes identified with p<0.1 showed good correlations with parallel measurements of changes in mRNA expression.
Cognate peptide-receptor ligand mapping by directed phage display.
Stratmann T, Kang AS.
Proteome Sci. 2005 Jun 17;3:7.
[ expand abstract ]
BACKGROUND: A rapid phage display method for the elucidation of cognate peptide specific ligand for receptors is described. The approach may be readily integrated into the interface of genomic and proteomic studies to identify biologically relevant ligands. METHODS: A gene fragment library from influenza coat protein haemagglutinin (HA) gene was constructed by treating HA cDNA with DNAse I to create 50 - 100 bp fragments. These fragments were cloned into plasmid pORFES IV and in-frame inserts were selected. These in-frame fragment inserts were subsequently cloned into a filamentous phage display vector JC-M13-88 for surface display as fusions to a synthetic copy of gene VIII. Two well characterized antibodies, mAb 12CA5 and pAb 07431, directed against distinct known regions of HA were used to pan the library. RESULTS: Two linear epitopes, HA peptide 112 - 126 and 162-173, recognized by mAb 12CA5 and pAb 07431, respectively, were identified as the cognate epitopes. CONCLUSION: This approach is a useful alternative to conventional methods such as screening of overlapping synthetic peptide libraries or gene fragment expression libraries when searching for precise peptide protein interactions, and may be applied to functional proteomics.
An aptamer-based protein biochip.
Stadtherr K, Wolf H, Lindner P.
Anal Chem. 2005 Jun 1;77(11):3437-43.
[ expand abstract ]
The establishment of an aptamer-based biochip for protein detection is described. Using a model system comprising human IgE as the analyte and single-stranded DNA aptamers specific for IgE or anti-IgE antibodies as immobilized ligands on chips, we could demonstrate that aptamers were equivalent or superior to antibodies in terms of specificity and sensitivity, respectively. Aptamer-based analyte detection on glass slides could clearly be demonstrated at minimum concentrations of 10 ng/mL IgE. In addition, we successfully showed specific analyte recognition in complex protein samples by the aptamer-based biochip system. Using DNA aptamers specific for human thrombin as an additional model receptor/ligand system, dual protein detection on a single slide could be proven. In conclusion, we could show the suitability of nucleic acid aptamers as low molecular weight receptors on biochips for sensitive and specific protein detection, representing an innovative tool for future proteomics.
Predicting functional gene links from phylogenetic-statistical analyses of whole genomes.
Barker D, Pagel M.
PLoS Comput Biol. 2005 Jun;1(1):e3.
[ expand abstract ]
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.
Protein sequence tags: a novel solution for comparative proteomics.
Kuhn K, Prinz T, Schafer J, Baumann C, Scharfke M, Kienle S, Schwarz J, Steiner S, Hamon C.
Proteomics. 2005 Jun;5(9):2364-8.
[ expand abstract ]
Comparative proteome profiling using stable isotope peptide labelling and mass spectrometry has emerged as a promising strategy. Here, we show the broad potential of our proprietary protein sequence tag (PST) technology. A special feature of PST is its ability to detect a wide variety of proteins including the pharmaceutically relevant membrane and nuclear proteins. This procedure addresses a similar number of proteins, compared to the multidimensional protein identification technology approach, but offers additionally a quantitative analysis with its recently developed quantitative PST version.
Algorithms for protein interaction networks.
Lappe M, Holm L.
Biochem Soc Trans. 2005 Jun;33(Pt 3):530-4.
[ expand abstract ]
The functional characterization of all genes and their gene products is the main challenge of the postgenomic era. Recent experimental and computational techniques have enabled the study of interactions among all proteins on a large scale. In this paper, approaches will be presented to exploit interaction information for the inference of protein structure, function, signalling pathways and ultimately entire interactomes. Interaction networks can be modelled as graphs, showing the operation of gene function in terms of protein interactions. Since the architecture of biological networks differs distinctly from random networks, these functional maps contain a signal that can be used for predictive purposes. Protein function and structure can be predicted by matching interaction patterns, without the requirement of sequence similarity. Moving on to a higher level definition of protein function, the question arises how to decompose complex networks into meaningful subsets. An algorithm will be demonstrated, which extracts whole signal-transduction pathways from noisy graphs derived from text-mining the biological literature. Finally, an algorithmic strategy is formulated that enables the proteomics community to build a reliable scaffold of the interactome in a fraction of the time compared with uncoordinated efforts.
Reverse-phase protein microarrays for tissue-based analysis.
Speer R, Wulfkuhle JD, Liotta LA.
Curr Opin Mol Ther. 2005 Jun;7(3):240-5.
[ expand abstract ]
The deciphering of the human genome has elucidated our biological structural design and has generated insights into disease development and pathogenesis. At the same time, knowledge of genetic changes during disease processes has demonstrated the need to move beyond genomics towards proteomics and a systems biology approach to science. Analyzing the proteome comprises more than just a numeration of proteins. In fact, it characterizes proteins within cells in the context of their functional status and interactions in their physiological micro- and macroenvironments. As dysregulated signaling often underpins most human diseases, an overarching goal of proteomics is to profile the working state of signaling pathways, to develop 'circuit maps' of normal and diseased protein networks and identify hyperactive, defective or inoperable transduction pathways. Reverse-phase protein microarrays represent a new technology that can generate a multiplex readout of dozens of phosphorylated events simultaneously to profile the state of a signaling pathway target even after the cell is lyzed and the contents denatured.
A novel approach to protein expression profiling using antibody microarrays combined with surface plasmon resonance technology.
Usui-Aoki K, Shimada K, Nagano M, Kawai M, Koga H.
Proteomics. 2005 Jun;5(9):2396-401.
[ expand abstract ]
We have previously described our systems for the high-throughput production of antibodies against mouse KIAA proteins and their validation (Proteomics 2004, 4, 1412-1416). Using our "libraries" of antibodies, we established a novel antibody microarray system in which surface plasmon resonance (SPR) technology is utilized for signal detection. Up to 400 real-time antibody-target bindings could be measured simultaneously within a single hour. This rapid detection was achieved by direct readout of the bindings using SPR technology. To evaluate our system, we assessed the reproducibility on crude protein samples and obtained satisfactorily reproducible results, exhibiting correlation values >0.92. Using this SPR-based antibody microarray system, we examined mKIAA protein expression in five different adult mouse tissues and identified the specific tissue expression patterns of several mKIAA proteins.
Target selection of soluble protein complexes for structural proteomics studies.
Shen W, Yun S, Tam B, Dalal K, Pio FF.
Proteome Sci. 2005 May 18;3(1):3.
[ expand abstract ]
BACKGROUND: Protein expression in E. coli is the most commonly used system to produce protein for structural studies, because it is fast and inexpensive and can produce large quantity of proteins. However, when proteins from other species such as mammalian are produced in this system, problems of protein expression and solubility arise 1. Structural genomics project are currently investigating proteomics pipelines that would produce sufficient quantities of recombinant proteins for structural studies of protein complexes. To investigate how the E. coli protein expression system could be used for this purpose, we purified apoptotic binary protein complexes formed between members of the Caspase Associated Recruitment Domain (CARD) family. RESULTS: A combinatorial approach to the generation of protein complexes was performed between members of the CARD domain protein family that have the ability to form hetero-dimers between each other. In our method, each gene coding for a specific protein partner is cloned in pET-28b (Novagen) and PGEX2T (Amersham) expression vectors. All combinations of protein complexes are then obtained by reconstituting complexes from purified components in native conditions, after denaturation-renaturation or co-expression. Our study applied to 14 soluble CARD domain proteins revealed that co-expression studies perform better than native and denaturation-renaturation methods. In this study, we confirm existing interactions obtained in vivoin mammalian cells and also predict new interactions. CONCLUSION: The simplicity of this screening method could be easily scaled up to identify soluble protein complexes for structural genomic projects. This study reports informative statistics on the solubility of human protein complexes expressed in E.coli belonging to the human CARD protein family.
HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.
Trivedi P, Edwards JW, Wang J, Gadbury GL, Srinivasasainagendra V, Zakharkin SO, Kim K, Mehta T, Brand JP, Patki A, Page GP, Allison DB.
BMC Bioinformatics. 2005 Apr 6;6(1):86.
[ expand abstract ]
BACKGROUND: Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. RESULTS: Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. CONCLUSION: HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.
High-resolution functional proteomics by active-site peptide profiling.
Okerberg ES, Wu J, Zhang B, Samii B, Blackford K, Winn DT, Shreder KR, Burbaum JJ, Patricelli MP.
Proc Natl Acad Sci U S A. 2005 Apr 5;102(14):4996-5001.
[ expand abstract ]
Characterization and functional annotation of the large number of proteins predicted from genome sequencing projects poses a major scientific challenge. Whereas several proteomics techniques have been developed to quantify the abundance of proteins, these methods provide little information regarding protein function. Here, we present a gel-free platform that permits ultrasensitive, quantitative, and high-resolution analyses of protein activities in proteomes, including highly problematic samples such as undiluted plasma. We demonstrate the value of this platform for the discovery of both disease-related enzyme activities and specific inhibitors that target these proteins.
An integrated approach utilizing proteomics and bioinformatics to detect ovarian cancer.
Yu JK, Zheng S, Tang Y, Li L.
J Zhejiang Univ Sci B. 2005 Apr;6(4):227-31.
[ expand abstract ]
OBJECTIVE: To find new potential biomarkers and establish the patterns for the detection of ovarian cancer. METHODS: Sixty one serum samples including 32 ovarian cancer patients and 29 healthy people were detected by surface-enhanced laser desorption/ionization mass spectrometry (SELDI-MS). The protein fingerprint data were analyzed by bioinformatics tools. Ten folds cross-validation support vector machine (SVM) was used to establish the diagnostic pattern. RESULTS: Five potential biomarkers were found (2085 Da, 5881 Da, 7564 Da, 9422 Da, 6044 Da), combined with which the diagnostic pattern separated the ovarian cancer from the healthy samples with a sensitivity of 96.7%, a specificity of 96.7% and a positive predictive value of 96.7%. CONCLUSIONS: The combination of SELDI with bioinformatics tools could find new biomarkers and establish patterns with high sensitivity and specificity for the detection of ovarian cancer.
Proteomic detection of prostate-specific antigen using a serum fractionation procedure: potential implication for new low-abundance cancer biomarkers detection.
Solassol J, Marin P, Demettre E, Rouanet P, Bockaert J, Maudelonde T, Mange A.
Anal Biochem. 2005 Mar 1;338(1):26-31.
[ expand abstract ]
One of the major obstacles in proteomic analysis of biological fluids is the presence of highly abundant proteins such as albumin and immunoglobulins, which can interfere with the resolution and sensitivity of the proteome profiling techniques used. In this paper, we describe an anion exchange fractionation procedure for serum using denaturating conditions allowing protein-protein interaction disruption before analysis by surface-enhanced laser desorption/ionization and by two-dimensional electrophoresis. This method simplifies the serum proteome into subproteomes and markedly increases resolution and sensitivity without any loss of minor proteins. To confirm the applicability of this method, fractionated serum of a patient with prostate cancer was analyzed for the presence of the prostate-specific antigen (PSA) which is a low-abundance tumor marker protein. The results demonstrate that PSA can be detected by two-dimensional electrophoresis only in serum following fractionation. Hence, this procedure may facilitate the identification of other, so far unknown, tumor markers in patient sera.
SELDI-TOF MS profiling of serum for detection of the progression of chronic hepatitis C to hepatocellular carcinoma.
Schwegler EE, Cazares L, Steel LF, Adam BL, Johnson DA, Semmes OJ, Block TM, Marrero JA, Drake RR.
Hepatology. 2005 Mar;41(3):634-42.
[ expand abstract ]
Proteomic profiling of serum is an emerging technique to identify new biomarkers indicative of disease severity and progression. The objective of our study was to assess the use of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) to identify multiple serum protein biomarkers for detection of liver disease progression to hepatocellular carcinoma (HCC). A cohort of 170 serum samples obtained from subjects in the United States with no liver disease (n = 39), liver diseases not associated with cirrhosis (n = 36), cirrhosis (n = 38), or HCC (n = 57) were applied to metal affinity protein chips for protein profiling by SELDI-TOF MS. Across the four test groups, 38 differentially expressed proteins were used to generate multiple decision classification trees to distinguish the known disease states. Analysis of a subset of samples with only hepatitis C virus (HCV)-related disease was emphasized. The serum protein profiles of control patients were readily distinguished from each HCV-associated disease state. Two-way comparisons of chronic hepatitis C, HCV cirrhosis, or HCV-HCC versus healthy had a sensitivity/specificity range of 74% to 95%. For distinguishing chronic HCV from HCV-HCC, a sensitivity of 61% and a specificity of 76% were obtained. However, when the values of known serum markers alpha fetoprotein, des-gamma carboxyprothrombin, and GP73 were combined with the SELDI peak values, the sensitivity and specifity improved to 75% and 92%, respectively. In conclusion, SELDI-TOF MS serum profiling is able to distinguish HCC from liver disease before cirrhosis as well as cirrhosis, especially in patients with HCV infection compared with other etiologies.
Detection of bladder cancer using a point-of-care proteomic assay.
Grossman HB, Messing E, Soloway M, Tomera K, Katz G, Berger Y, Shen Y.
JAMA. 2005 Feb 16;293(7):810-6.
[ expand abstract ]
CONTEXT: A combination of methods is used for diagnosis of bladder cancer because no single procedure detects all malignancies. Urine tests are frequently part of an evaluation, but have either been nonspecific for cancer or required specialized analysis at a laboratory. OBJECTIVE: To investigate whether a point-of-care proteomic test that measures the nuclear matrix protein NMP22 in voided urine could enhance detection of malignancy in patients with risk factors or symptoms of bladder cancer. DESIGN, SETTING, AND PATIENTS: Twenty-three academic, private practice, and veterans' facilities in 10 states prospectively enrolled consecutive patients from September 2001 to May 2002. Participants included 1331 patients at elevated risk for bladder cancer due to factors such as history of smoking or symptoms including hematuria and dysuria. Patients at risk for malignancy of the urinary tract provided a voided urine sample for analysis of NMP22 protein and cytology prior to cystoscopy. MAIN OUTCOME MEASURES: The diagnosis of bladder cancer, based on cystoscopy with biopsy, was accepted as the reference standard. The performance of the NMP22 test was compared with voided urine cytology as an aid to cancer detection. Testing for the NMP22 tumor marker was conducted in a blinded manner. RESULTS: Bladder cancer was diagnosed in 79 patients. The NMP22 assay was positive in 44 of 79 patients with cancer (sensitivity, 55.7%; 95% confidence interval [CI], 44.1%-66.7%), whereas cytology test results were positive in 12 of 76 patients (sensitivity, 15.8%; 95% CI, 7.6%-24.0%). The specificity of the NMP22 assay was 85.7% (95% CI, 83.8%-87.6%) compared with 99.2% (95% CI, 98.7%-99.7%) for cytology. The proteomic marker detected 4 cancers that were not visualized during initial endoscopy, including 3 that were muscle invasive and 1 carcinoma in situ. CONCLUSION: The noninvasive point-of-care assay for elevated urinary NMP22 protein can increase the accuracy of cystoscopy, with test results available during the patient visit.
Protein expression profiling identifies subclasses of breast cancer and predicts prognosis.
Jacquemier J, Ginestier C, Rougemont J, Bardou VJ, Charafe-Jauffret E, Geneix J, Adelaide J, Koki A, Houvenaeghel G, Hassoun J, Maraninchi D, Viens P, Birnbaum D, Bertucci F.
Cancer Res. 2005 Feb 1;65(3):767-79.
[ expand abstract ]
Breast cancer is a heterogeneous disease whose evolution is difficult to predict by using classic histoclinical prognostic factors. Prognostic classification can benefit from molecular analyses such as large-scale expression profiling. Using immunohistochemistry on tissue microarrays, we have monitored the expression of 26 selected proteins in more than 1,600 cancer samples from 552 consecutive patients with early breast cancer. Both an unsupervised approach and a new supervised method were used to analyze these profiles. Hierarchical clustering identified relevant clusters of coexpressed proteins and clusters of tumors. We delineated protein clusters associated with the estrogen receptor and with proliferation. Tumor clusters correlated with several histoclinical features of samples, including 5-year metastasis-free survival (MFS), and with the recently proposed pathophysiologic taxonomy of disease. The supervised method identified a set of 21 proteins whose combined expression significantly correlated to MFS in a learning set of 368 patients (P < 0.0001) and in a validation set of 184 patients (P < 0.0001). Among the 552 patients, the 5-year MFS was 90% for patients classified in the "good-prognosis class" and 61% for those classified in the "poor-prognosis class" (P < 0.0001). This difference remained significant when the molecular grouping was applied according to lymph node or estrogen receptor status, as well as the type of adjuvant systemic therapy. In multivariate analysis, the 21-protein set was the strongest independent predictor of clinical outcome. These results show that protein expression profiling may be a clinically useful approach to assess breast cancer heterogeneity and prognosis in stage I, II, or III disease.
Liquid ultraviolet matrix-assisted laser desorption/ionization -- mass spectrometry for automated proteomic analysis.
Cramer R, Corless S.
Proteomics. 2005 Feb;5(2):360-70.
[ expand abstract ]
We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet -- matrix-assisted laser desorption/ionisation -- mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. The low-femtomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydroxybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and low-mass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.
High throughput proteome screening for biomarker detection.
Pan S, Zhang H, Rush J, Eng J, Zhang N, Patterson D, Comb MJ, Aebersold R.
Mol Cell Proteomics. 2005 Feb;4(2):182-90.
[ expand abstract ]
Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Current methods, while highly developed and powerful, are falling short of their goal of routinely analyzing whole proteomes mainly because the wealth of proteomic information accumulated from prior studies is not used for the planning or interpretation of present experiments. The consequence of this situation is that in every proteomic experiment the proteome is rediscovered. In this report we describe an approach for quantitative proteomics that builds on the extensive prior knowledge of proteomes and a platform for the implementation of the method. The method is based on the selection and chemical synthesis of isotopically labeled reference peptides that uniquely identify a particular protein and the addition of a panel of such peptides to the sample mixture consisting of tryptic peptides from the proteome in question. The platform consists of a peptide separation module for the generation of ordered peptide arrays from the combined peptide sample on the sample plate of a MALDI mass spectrometer, a high throughput MALDI-TOF/TOF mass spectrometer, and a suite of software tools for the selective analysis of the targeted peptides and the interpretation of the results. Applying the method to the analysis of the human blood serum proteome we demonstrate the feasibility of using mass spectrometry-based proteomics as a high throughput screening technology for the detection and quantification of targeted proteins in a complex system.
Proteins as biomarkers of oxidative/nitrosative stress in diseases: the contribution of redox proteomics.
Dalle-Donne I, Scaloni A, Giustarini D, Cavarra E, Tell G, Lungarella G, Colombo R, Rossi R, Milzani A.
Mass Spectrom Rev. 2005 Jan-Feb;24(1):55-99.
[ expand abstract ]
Reactive oxygen species (ROS) and reactive nitrogen species (RNS) contribute to the pathogenesis and/or progression of several human diseases. Proteins are important molecular signposts of oxidative/nitrosative damage. However, it is generally unresolved whether the presence of oxidatively/nitrosatively modified proteins has a causal role or simply reflects secondary epiphenomena. Only direct identification and characterization of the modified protein(s) in a given pathophysiological condition can decipher the potential roles played by ROS/RNS-induced protein modifications. During the last few years, mass spectrometry (MS)-based technologies have contributed in a significant way to foster a better understanding of disease processes. The study of oxidative/nitrosative modifications, investigated by redox proteomics, is contributing to establish a relationship between pathological hallmarks of disease and protein structural and functional abnormalities. MS-based technologies promise a contribution in a new era of molecular medicine, especially in the discovery of diagnostic biomarkers of oxidative/nitrosative stress, enabling early detection of diseases. Indeed, identification and characterization of oxidatively/nitrosatively modified proteins in human diseases has just begun.
The European Bioinformatics Institute's data resources: towards systems biology.
Brooksbank C, Cameron G, Thornton J.
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D46-53.
[ expand abstract ]
Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.
DynaProt 2D: an advanced proteomic database for dynamic online access to proteomes and two-dimensional electrophoresis gels.
Drews O, Gorg A.
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D583-7.
[ expand abstract ]
DynaProt 2D presents an advanced online database for dynamic access to proteomes and two-dimensional (2D) gels. The database was designed to administer complete in silico proteomes and links them with experimental proteomic data in the manner of 2D electrophoresis gels (IPG-Dalt). The 2D gels serve as reference maps in 2D gel analysis as well as tools for navigation of the database to switch between experimental and predicted data. Therefore, all identified spots in the gels are clickable and linked with summarized protein information. The protein information tables contain calculated characteristics, which are often used in proteomics, such as the molecular weight, isoelectric point, codon adaptation index, grand average of hydropathicity, etc. The design of the database permits online extension of gel data and protein attributes without knowledge of any software language. Besides navigation via 2D gels, the clear graphical user interface permits quick and intuitive searching throughout complete proteomes and supports, e.g. the search for proteins with isoelectric points within pH ranges of interest or protein classes (e.g. ribosomal proteins or transporters). The first organism implemented in the database is Lactococcus lactis. The database is available at www.wzw.tum.de/proteomik/lactis.
The UCSC Proteome Browser.
Hsu F, Pringle TH, Kuhn RM, Karolchik D, Diekhans M, Haussler D, Kent WJ.
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D454-8.
[ expand abstract ]
The University of California Santa Cruz (UCSC) Proteome Browser provides a wealth of protein information presented in graphical images and with links to other protein-related Internet sites. The Proteome Browser is tightly integrated with the UCSC Genome Browser. For the first time, Genome Browser users have both the genome and proteome worlds at their fingertips simultaneously. The Proteome Browser displays tracks of protein and genomic sequences, exon structure, polarity, hydrophobicity, locations of cysteine and glycosylation potential, Superfamily domains and amino acids that deviate from normal abundance. Histograms show genome-wide distribution of protein properties, including isoelectric point, molecular weight, number of exons, InterPro domains and cysteine locations, together with specific property values of the selected protein. The Proteome Browser also provides links to gene annotations in the Genome Browser, the Known Genes details page and the Gene Sorter; domain information from Superfamily, InterPro and Pfam; three-dimensional structures at the Protein Data Bank and ModBase; and pathway data at KEGG, BioCarta/CGAP and BioCyc. As of August 2004, the Proteome Browser is available for human, mouse and rat proteomes. The browser may be accessed from any Known Genes details page of the Genome Browser at http://genome.ucsc.edu. A user's guide is also available on this website.
Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein.
Cabantous S, Terwilliger TC, Waldo GS.
Nat Biotechnol. 2005 Jan;23(1):102-7.
[ expand abstract ]
Existing protein tagging and detection methods are powerful but have drawbacks.
Split protein tags can perturb protein solubility or may not work in living cells. Green fluorescent protein (GFP) fusions can misfold or exhibit altered processing. Fluorogenic biarsenical FLaSH or ReASH substrates overcome many of these limitations but require a polycysteine tag motif, a reducing environment and cell transfection or permeabilization. An ideal protein tag would be genetically encoded, would work both in vivo and in vitro, would provide a sensitive analytical signal and would not require external chemical reagents or substrates. One way to accomplish this might be with a split GFP, but the GFP fragments reported thus far are large and fold poorly, require chemical ligation or fused interacting partners to force their association, or require coexpression or co-refolding to produce detectable folded and fluorescent GFP. We have engineered soluble, self-associating fragments of GFP that can be used to tag and detect either soluble or insoluble proteins in living cells or cell lysates. The split GFP system is simple and does not change fusion protein solubility.
Peptide mass fingerprinting by matrix-assisted laser desorption ionization mass spectrometry of proteins detected by immunostaining on nitrocellulose.
Dufresne-Martin G, Lemay JF, Lavigne P, Klarskov K.
Proteomics. 2005 Jan;5(1):55-66.
[ expand abstract ]
We have developed an approach that allows peptide mass mapping by matrix-assisted laser desorption ionization-mass spectrometry of proteins visualized on a nitrocellulose membrane by immunochemical detection. Proteins are separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), electroblotted onto a nitrocellulose membrane and after blocking with a nonprotein-containing polymer such as polyvinylpyrrolidone 40 (PVP-40) or Tween 20, the proteins are stained with fount India ink. After incubation with primary and, if required, secondary peroxidase-coupled antibodies, immunochemically reactive proteins can be visualized using conventional enhanced chemiluminescence detection and assigned to the India ink-stained membrane by simple superposition. The proteins of interest are excised, submitted to "on -membrane" cleavage and the peptides are analyzed by mass spectrometry. Protein-based blocking reagents normally used in standard immunodetection protocols, such as skimmed milk, can be employed. We have obtained high-quality mass spectra of bovine serum albumin (BSA) detected on an immunoblot with an estimated amount of 100 fmol applied onto the gel, indicating the sensitivity of the present method. In addition, the approach is demonstrated with two other commercially available proteins, a serum protein, the successful identification of a tyrosine phosphorylated protein from total rat liver homogenate and serinephosphorylated proteins from an EcR 293 nuclear extract separated bytwo-dimensional (2-D) SDS-PAGE.
A proteomic approach to tumour target identification using phage display, affinity purification and mass spectrometry.
Geuijen CA, Bijl N, Smit RC, Cox F, Throsby M, Visser TJ, Jongeneelen MA, Bakker AB, Kruisbeek AM, Goudsmit J, de Kruif J.
Eur J Cancer. 2005 Jan;41(1):178-87.
[ expand abstract ]
Tumour-associated cell surface markers are potential targets for antibody-based therapies. We have obtained a panel of myeloid cell binding single chain variable fragments (scFv) by applying phage display selection on myeloid cell lines followed by a selection round on freshly isolated acute myeloid leukaemia (AML) blasts using flow cytometry. To identify the target antigens, the scFv were recloned and expressed in an IgG(1) format and tested for their ability to immunoprecipitate cell surface proteins. The IgGs that reacted with distinct cell membrane extractable proteins were used in large-scale affinity purification of the target antigen followed by mass-spectrometry-based identification. Well-characterised cell surface antigens, such as leukocyte antigen-related receptor protein tyrosine phosphatase (LAR PTP) and activated leukocyte adhesion molecule (ALCAM) in addition to several unknown proteins, like ATAD3A, were identified. These experiments demonstrate that phage antibody selection in combination with affinity chromatography and mass spectrometry can be exploited successfully to identify novel antibody target molecules on malignant cells.
Significant differences in nipple aspirate fluid protein expression between healthy women and those with breast cancer demonstrated by time-of-flight mass spectrometry.
Pawlik TM, Fritsche H, Coombes KR, Xiao L, Krishnamurthy S, Hunt KK, Pusztai L, Chen JN, Clarke CH, Arun B, Hung MC, Kuerer HM.
Breast Cancer Res Treat. 2005 Jan;89(2):149-57.
[ expand abstract ]
New approaches are needed for the early detection of breast cancer. Proteomic profiling technologies, such as surface-enhanced laser desorption ionization mass spectrometry (SELDI-MS), may be able to identify tumor markers in biological fluids. The objective of this study was to determine whether there are differences in protein expression patterns in nipple aspirate fluid (NAF) from the cancerous and noncancerous breasts of patients with unilateral breast cancer and the breasts of healthy volunteers. Paired NAF samples were obtained from 23 women with stage I or II unilateral invasive breast carcinoma and five healthy female volunteers. Aliquots of the samples were applied to SELDI Protein-chip arrays (WCX2 and IMAC3-Cu++), and protein expression was analyzed using time-of-flight MS. A total of 463 distinct peaks were detected and analyzed. In breast cancer patients, no differences in protein expression were identified between the breast with the intact primary carcinoma and the contralateral noncancerous breast. Seventeen peaks were overexpressed in cancer-bearing breasts compared to breasts of healthy volunteers (p < 0.0005). When spectra from the nontumor-bearing breasts of breast cancer patients were compared with spectra from breasts of healthy volunteers, two peaks that were overexpressed in breast cancer patients and one peak that was underexpressed in breast cancer patients were detected (p < 0.0027). SELDI-MS was able to identify differences in the phenotypic proteomic profile of NAF samples obtained from patients with early-stage breast cancer and healthy women. Proteomic screening techniques such as SELDI-MS analysis of NAF may be useful for breast cancer screening and diagnosis.
A novel strategy for quantitative proteomics using isotope-coded protein labels.
Schmidt A, Kellermann J, Lottspeich F.
Proteomics. 2005 Jan;5(1):4-15.
[ expand abstract ]
Stable isotope labelling in combination with mass spectrometry has emerged as a powerful tool to identify and relatively quantify thousands of proteins within complex protein mixtures. Here we describe a novel method, termed isotope-coded protein label (ICPL), which is capable of high-throughput quantitative proteome profiling on a global scale. Since ICPL is based on stable isotope tagging at the frequent free amino groups of isolated intact proteins, it is applicable to any protein sample, including extracts from tissues or body fluids, and compatible to all separation methods currently employed in proteome studies. The method showed highly accurate and reproducible quantification of proteins and yielded high sequence coverage, indispensable for the detection of post-translational modifications and protein isoforms. The efficiency (e.g. accuracy, dynamic range, sensitivity, speed) of the approach is demonstrated by comparative analysis of two differentially spiked proteomes.
Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry.
Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R.
Genome Biol. 2005;6(1):R9.
[ expand abstract ]
A crucial aim upon the completion of the human genome is the verification and functional annotation of all predicted genes and their protein products. Here we describe the mapping of peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments. Furthermore, we demonstrate that peptide identifications obtained from high-throughput proteomics can be integrated on a large scale with the human genome. This resource could serve as an expandable repository for MS-derived proteome information.
2004
Splicing factors are differentially expressed in tumors.
Kirschbaum-Slager N, Lopes GM, Galante PA, Riggins GJ, de Souza SJ.
Genet Mol Res. 2004 Dec 30;3(4):512-20.
[ expand abstract ]
Although alternative splicing of many genes has been found associated with different stages of tumorigenesis and splicing variants have been characterized as tumor markers, it is still not known whether these examples are sporadic or whether there is a broader association between the two phenomena. In this report we evaluated, through a bioinformatics approach, the expression of splicing factors in both normal and tumor tissues. This was possible by integrating data produced by proteomics, serial analysis of gene expression (SAGE) and microarray experiments. We observed a significant shift in the expression of splicing factors in tumors in both SAGE and microarray data, resulting from a large amount of experiments. We discuss that this supports the notion of a broader association between alternative splicing and cell transformation, and that splicing factors may be involved in oncogenic pathways
Development and standardization of multiplexed antibody microarrays for use in quantitative proteomics.
Perlee L, Christiansen J, Dondero R, Grimwade B, Lejnine S, Mullenix M, Shao W, Sorette M, Tchernev V, Patel D, Kingsmore S.
Proteome Sci. 2004 Dec 15;2(1):9.
[ expand abstract ]
BACKGROUND: Quantitative proteomics is an emerging field that encompasses multiplexed measurement of many known proteins in groups of experimental samples in order to identify differences between groups. Antibody arrays are a novel technology that is increasingly being used for quantitative proteomics studies due to highly multiplexed content, scalability, matrix flexibility and economy of sample consumption. Key applications of antibody arrays in quantitative proteomics studies are identification of novel diagnostic assays, biomarker discovery in trials of new drugs, and validation of qualitative proteomics discoveries. These applications require performance benchmarking, standardization and specification. RESULTS: Six dual-antibody, sandwich immunoassay arrays that measure 170 serum or plasma proteins were developed and experimental procedures refined in more than thirty quantitative proteomics studies. This report provides detailed information and specification for manufacture, qualification, assay automation, performance, assay validation and data processing for antibody arrays in large scale quantitative proteomics studies. CONCLUSION: The present report describes development of first generation standards for antibody arrays in quantitative proteomics. Specifically, it describes the requirements of a comprehensive validation program to identify and minimize antibody cross reaction under highly multiplexed conditions; provides the rationale for the application of standardized statistical approaches to manage the data output of highly replicated assays; defines design requirements for controls to normalize sample replicate measurements; emphasizes the importance of stringent quality control testing of reagents and antibody microarrays; recommends the use of real-time monitors to evaluate sensitivity, dynamic range and platform precision; and presents survey procedures to reveal the significance of biomarker findings.
ProteomeGRID: towards a high-throughput proteomics pipeline through opportunistic cluster image computing for two-dimensional gel electrophoresis.
Dowsey AW, Dunn MJ, Yang GZ.
Proteomics. 2004 Dec;4(12):3800-12.
[ expand abstract ]
The quest for high-throughput proteomics has revealed a number of critical issues. Whilst improved two-dimensional gel electrophoresis (2-DE) sample preparation, staining and imaging issues are being actively pursued by industry, reliable high-throughput spot matching and quantification remains a significant bottleneck in the bioinformatics pipeline, thus restricting the flow of data to mass spectrometry through robotic spot excision and protein digestion. To this end, it is important to establish a full multi-site Grid infrastructure for the processing, archival, standardisation and retrieval of proteomic data and metadata. Particular emphasis needs to be placed on large-scale image mining and statistical cross-validation for reliable, fully automated differential expression analysis, and the development of a statistical 2-DE object model and ontology that underpins the emerging HUPO PSI GPS (Human Proteome Organization Proteomics Standards Initiative General Proteomics Standards). The first step towards this goal is to overcome the computational and communications burden entailed by the image analysis of 2-DE gels with Grid enabled cluster computing. This paper presents the proTurbo framework as part of the ProteomeGRID, which utilises Condor cluster management combined with CORBA communications and JPEG-LS lossless image compression for task farming. A novel probabilistic eager scheduler has been developed to minimise make-span, where tasks are duplicated in response to the likelihood of the Condor machines' owners evicting them. A 60 gel experiment was pair-wise image registered (3540 tasks) on a 40 machine Linux cluster. Real-world performance and network overhead was gauged, and Poisson distributed worker evictions were simulated. Our results show a 4:1 lossless and 9:1 near lossless image compression ratio and so network overhead did not affect other users. With 40 workers a 32x speed-up was seen (80% resource efficiency), and the eager scheduler reduced the impact of evictions by 58%.
Depletion of the high-abundance plasma proteins.
Fountoulakis M, Juranville JF, Jiang L, Avila D, Roder D, Jakob P, Berndt P, Evers S, Langen H.
Amino Acids. 2004 Dec;27(3-4):249-59.
[ expand abstract ]
Body fluids, like plasma and urine, are comparatively easy to obtain and are useful for the detection of novel diagnostic markers by applying new technologies, like proteomics. However, in plasma, several high-abundance proteins are dominant and repress the signals of the lower-abundance proteins, which then become undetectable either by two-dimensional gels or chromatography. Therefore, depletion of the abundant proteins is a prerequisite for the detection of the low-abundance components. We applied affinity chromatography on blue matrix and Protein G and removed the most abundant human plasma proteins, albumin and the immunoglobulin chains. The plasma proteins, prior to albumin and immunoglobulin depletion, as well the eluates from the two chromatography steps were analyzed by two-dimensional electrophoresis and the proteins were identified by MALDI-TOF-MS. The analysis resulted in the identification of 83 different gene products in the untreated plasma. Removal of the high-abundance proteins resulted in the visualization of new protein signals. In the eluate of the two affinity steps, mostly albumin and immunoglobulin spots were detected but also spots representing several other abundant plasma proteins. The methodology is easy to perform and is useful as a first step in the detection of diagnostic markers in body fluids by applying proteomics technologies.
AMASS: software for automatically validating the quality of MS/MS spectrum fromSEQUEST results.
Sun W, Li F, Wang J, Zheng D, Gao Y.
Mol Cell Proteomics. 2004 Dec;3(12):1194-9.
[ expand abstract ]
Time-consuming and experience-dependent manual validations of tandem mass spectra are usually applied to SEQUEST results. This inefficient method has become a significant bottleneck for MS/MS data processing. Here we introduce a program AMASS (advanced mass spectrum screener), which can filter the tandem mass spectra of SEQUEST results by measuring the match percentage of high-abundant ions and the continuity of matched fragment ions in b, y series. Compared with Xcorr and DeltaCn filter, AMASS can increase the number of positives and reduce the number of negatives in 22 datasets generated from 18 known protein mixtures. It effectively removed most noisy spectra, false interpretations, and about half of poor fragmentation spectra, and AMASS can work synergistically with Rscore filter. We believe the use of AMASS and Rscore can result in a more accurate identification of peptide MS/MS spectra and reduce the time and energy for manual validation.
Phylomat: an automated protein motif analysis tool for phylogenomics.
Graham WV, Tcheng DK, Shirk AL, Attene-Ramos MS, Welge ME, Gaskins HR.
J Proteome Res. 2004 Nov-Dec;3(6):1289-91.
[ expand abstract ]
Recent progress in genomics, proteomics, and bioinformatics enables unprecedented opportunities to examine the evolutionary history of molecular, cellular, and developmental pathways through phylogenomics. Accordingly, we have developed a motif analysis tool for phylogenomics (Phylomat, http://alg.ncsa.uiuc.edu/pmat) that scans predicted proteome sets for proteins containing highly conserved amino acid motifs or domains for in silico analysis of the evolutionary history of these motifs/domains. Phylomat enables the user to download results as full protein or extracted motif/domain sequences from each protein. Tables containing the percent distribution of a motif/domain in organisms normalized to proteome size are displayed. Phylomat can also align the set of full protein or extracted motif/domain sequences and predict a neighbor-joining tree from relative sequence similarity. Together, Phylomat serves as a user-friendly data-mining tool for the phylogenomic analysis of conserved sequence motifs/domains in annotated proteomes from the three domains of life.
Anti-sulfonylbenzoate antibodies as a tool for the detection of nucleotide-binding proteins for functional proteomics.
Moore LL, Fulton AM, Harrison ML, Geahlen RL.
J Proteome Res. 2004 Nov-Dec;3(6):1184-90.
[ expand abstract ]
Proteins that bind ATP and GTP are important cellular components. We developed an immunological approach to selectively tag nucleotide-binding proteins based on the use of 5'-[4-(fluorosulfonyl)benzoyl]adenosine and 5'-[4-(fluorosulfonyl)benzoyl]guanosine affinity tags and an antibody against 4-(sulfonyl)benzoate. Detection follows affinity labeling, gel electrophoresis, and ester bond cleavage to expose the epitope. Trial analyses of labeled proteins from lymphoid cells identified multiple ATP-binding proteins, including chaperones, actin, kinases, an RNA splicing factor, a membrane ATPase, and ATP synthase.
Integration of Jacobson's pellicle method into proteomic strategies for plasma membrane proteins.
Rahbar AM, Fenselau C.
J Proteome Res. 2004 Nov-Dec;3(6):1267-77.
[ expand abstract ]
A modified form of the cationic colloidal silica technique for plasma membrane isolation has been combined with SDS-PAGE, mass spectrometry, and bioinformatics for evaluation as a proteomics strategy with human multiple myeloma cells and human breast cancer cells. On the basis of Western blots, half of the protein isolated is estimated to come from the plasma membrane. Forty-three percent of the 366 proteins identified by mass spectrometry had been previously classified as plasma membrane proteins. Thirty proteins previously categorized as hypothetical membrane proteins are now reported to be expressed.
Finding new components of the target of rapamycin (TOR) signaling network through chemical genetics and proteome chips.
Huang J, Zhu H, Haggarty SJ, Spring DR, Hwang H, Jin F, Snyder M, Schreiber SL.
Proc Natl Acad Sci U S A. 2004 Nov 23;101(47):16594-9.
[ expand abstract ]
The TOR (target of rapamycin) proteins play important roles in nutrient signaling in eukaryotic cells. Rapamycin treatment induces a state reminiscent of the nutrient starvation response, often resulting in growth inhibition. Using a chemical genetic modifier screen, we identified two classes of small molecules, small-molecule inhibitors of rapamycin (SMIRs) and small-molecule enhancers of rapamycin (SMERs), that suppress and augment, respectively, rapamycin's effect in the yeast Saccharomyces cerevisiae. Probing proteome chips with biotinylated SMIRs revealed putative intracellular target proteins, including Tep1p, a homolog of the mammalian PTEN (phosphatase and tensin homologue deleted on chromosome 10) tumor suppressor, and Ybr077cp (Nir1p), a protein of previously unknown function that we show to be a component of the TOR signaling network. Both SMIR target proteins are associated with PI(3,4)P2, suggesting a mechanism of regulation of the TOR pathway involving
phosphatidylinositides. Our results illustrate the combined use of chemical genetics and proteomics in biological discovery and map a path for creating useful therapeutics for treating human diseases involving the TOR pathway, such as diabetes and cancer.
An integrated approach to the detection of colorectal cancer utilizing proteomics and bioinformatics.
Yu JK, Chen YD, Zheng S.
World J Gastroenterol. 2004 Nov 1;10(21):3127-31.
[ expand abstract ]
AIM: To find new potential biomarkers and to establish patterns for early detection of colorectal cancer. METHODS: One hundred and eighty-two serum samples including 55 from colorectal cancer (CRC) patients, 35 from colorectal adenoma (CRA) patients and 92 from healthy persons (HP) were detected by surface-enhanced laser desorption/ionization mass spectrometry (SELDI-MS). The data of spectra were analyzed by bioinformatics tools like artificial neural network (ANN) and support vector machine (SVM). RESULTS: The diagnostic pattern combined with 7 potential biomarkers could differentiate CRC patients from CRA patients with a specificity of 83%, sensitivity of 89% and positive predictive value of 89%. The diagnostic pattern combined with 4 potential biomarkers could differentiate CRC patients from HP with a specificity of 92%, sensitivity of 89% and positive predictive value of 86%. CONCLUSION: The combination of SELDI with bioinformatics tools could help find new biomarkers and establish patterns with high sensitivity and specificity for the detection of CRC.
A common open representation of mass spectrometry data and its application to proteomics research.
Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R.
Nat Biotechnol. 2004 Nov;22(11):1459-66.
[ expand abstract ]
A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.
Extractor for ESI quadrupole TOF tandem MS data enabled for high throughput batch processing.
Boehm AM, Galvin RP, Sickmann A.
BMC Bioinformatics. 2004 Oct 26;5:162.
[ expand abstract ]
BACKGROUND: Mass spectrometry based proteomics result in huge amounts of data that has to be processed in real time in order to efficiently feed identification algorithms and to easily integrate in automated environments. We present wiff2dta, a tool created to convert MS/MS data obtained using Applied Biosystem's QStar and QTrap 2000 and 4000 series. RESULTS: Comparing the performance of wiff2dta with the standard tools, we find wiff2dta being the fastest solution for extracting spectrum data from ABIs raw file format. wiff2dta is at least 10% faster than the standard tools. It is also capable of batch processing and can be easily integrated in high throughput environments. The program is freely available via http://www.protein-ms.de, http://sourceforge.net/projects/protms/ and is also available from Applied Biosystems. CONCLUSIONS: wiff2dta offers the possibility to run as stand-alone application or within a batch process as command-line tool integrated in automation and high-throughput environments. It is more efficient than the state-of-the-art tools provided.
The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.
Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Guldener U, Mannhaupt G, Munsterkotter M, Mewes HW.
Nucleic Acids Res. 2004 Oct 14;32(18):5539-45.
[ expand abstract ]
In this paper, we present the Functional Catalogue (FunCat), a hierarchically structured, organism-independent, flexible and scalable controlled classification system enabling the functional description of proteins from any organism. FunCat has been applied for the manual annotation of prokaryotes, fungi, plants and animals. We describe how FunCat is implemented as a highly efficient and robust tool for the manual and automatic annotation of genomic sequences. Owing to its hierarchical architecture, FunCat has also proved to be useful for many subsequent downstream bioinformatic applications. This is illustrated by the analysis of large-scale experiments from various investigations in transcriptomics and proteomics, where FunCat was used to project experimental data into functional units, as 'gold standard' for functional classification methods, and also served to compare the significance of different experimental methods. Over the last decade, the FunCat has been established as a robust and stable annotation scheme that offers both, meaningful and manageable functional classification as well as ease of perception.
Multi-layered representation for cell signaling pathways.
Paek E, Park J, Lee KJ.
Mol Cell Proteomics. 2004 Oct;3(10):1009-22.
[ expand abstract ]
To understand complex signaling pathways and networks, it is necessary to develop a formal and structured representation of the available information in a format suitable for analysis by software tools. Due to the complexity and incompleteness of the current biological knowledge about cell signaling, such a device must be able to represent cellular pathways at differing levels of details, one level of information abstract enough to convey an essential signaling flow while hiding its details and another level of information detailed enough to explain the underlying mechanisms that account for the signaling flow described at a more abstract level. We have defined a formal ontology for cell-signaling events that allows us to describe these cellular pathways at various levels of abstraction. Using this formal representation, ROSPath (reactive oxygen species-mediated signaling pathway) database system has been implemented and made available on the web (rospath.ewha.ac.kr). ROSPath is a database system for reactive oxygen species (ROS)-mediated cell signaling pathways and signaling processes in molecular detail, which facilitates a comprehensive understanding of the regulatory mechanisms in signaling pathways. ROSPath includes growth factor-, stress-, and cytokine-induced signaling pathways containing about 500 unique proteins (mostly mammalian) and their related protein states, protein complexes, protein complex states, signaling interactions, signaling steps, and pathways. It is a web-based structured repository of information on the signaling pathways of interest and provides a means for managing data produced by large-scale and high-throughput techniques such as proteomics. Also, software tools are provided for querying, displaying, and analyzing pathways, thus furnishing an integrated web environment for visualizing and manipulating ROS-mediated cell-signaling events.
Diagnosis of pancreatic cancer using serum proteomic profiling.
Bhattacharyya S, Siegel ER, Petersen GM, Chari ST, Suva LJ, Haun RS.
Neoplasia. 2004 Sep-Oct;6(5):674-86.
[ expand abstract ]
In the United States, mortality rates from pancreatic cancer (PCa) have not changed significantly over the past 50 years. This is due, in part, to the lack of early detection methods for this particularly aggressive form of cancer. The objective of this study was to use high-throughput protein profiling technology to identify biomarkers in the serum proteome for the early detection of resectable PCa. Using surface-enhanced laser desorption/ionization mass spectrometry, protein profiles were generated from sera of 49 PCa patients and 54 unaffected individuals after fractionation on an anion exchange resin. The samples were randomly divided into a training set (69 samples) and test set (34 samples), and two multivariate analysis procedures, classification and regression tree and logistic regression, were used to develop classification models from these spectral data that could distinguish PCa from control serum samples. In the test set, both models correctly classified all of the Pca patient serum samples (100% sensitivity). Using the decision tree algorithm, a specificity of 93.5% was obtained, whereas the logistic regression model produced a specificity of 100%. These results suggest that high-throughput proteomics profiling has the capacity to provide new biomarkers for the early detection and diagnosis of PCa.
Sequence optimization as an alternative to de novo analysis of tandem mass spectrometry data.
Heredia-Langner A, Cannon WR, Jarman KD, Jarman KH.
Bioinformatics. 2004 Sep 22;20(14):2296-304.
[ expand abstract ]
MOTIVATION: Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS: We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B.
BMC Bioinformatics. 2004 Sep 9;5:128.
[ expand abstract ]
BACKGROUND: Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. RESULTS: Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. CONCLUSIONS: By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Disposable chromatography for a high-throughput nano-ESI/MS and nano-ESI/MS-MS platform.
Williams JG, Tomer KB.
J Am Soc Mass Spectrom. 2004 Sep;15(9):1333-40.
[ expand abstract ]
High-throughput proteomics has typically relied on protein identification based on MALDI-MS peptide maps of proteolytic digests of 2D-gel-separated proteins. This technique, however, requires significant sequence coverage in order to achieve a high level of confidence in the identification. Tandem MS data have the advantage of requiring fewer peptides (2) for high confidence identification, assuming adequate MS/MS sequence coverage. MALDI-MS/MS techniques are becoming available, but can still be problematic because of the difficulty of inducing fragment ions of a singly charged parent ion. Electrospray ionization, however, has the advantage of generating multiply charged species that are more readily fragmented during MS/MS analysis. Two electrospray/tandem mass spectrometry-based approaches, nanovial-ESI-MS/MS and LC-MS/MS, are used for high throughput proteomics, but much less often than MALDI-MS and peptide mass fingerprinting. Nanovial introduction entails extensive manual manipulation and often shows significant chemical background from the in-gel digest. LC-MS has the advantages that the chemical background can be removed prior to analysis and the analytes are concentrated during the separation, resulting in more abundant analyte signals. On the other hand, LC-MS can often be time intensive. Here, we report the incorporation of on-line sample clean-up and analyte concentration with a high-throughput, chip-based, robotic nano-ESI-MS platform for proteomics studies.
A tagging-via-substrate technology for detection and proteomics of farnesylated proteins.
Kho Y, Kim SC, Jiang C, Barma D, Kwon SW, Cheng J, Jaunbergs J, Weinbaum C, Tamanoi F, Falck J, Zhao Y.
Proc Natl Acad Sci U S A. 2004 Aug 24;101(34):12479-84.
[ expand abstract ]
A recently developed proteomics strategy, designated tagging-via-substrate (TAS) approach, is described for the detection and proteomic analysis of farnesylated proteins. TAS technology involves metabolic incorporation of a synthetic azido-farnesyl analog and chemoselective derivatization of azido-farnesyl-modified proteins by an elegant version of Staudinger reaction, pioneered by the Bertozzi group, using a biotinylated phosphine capture reagent. The resulting protein conjugates can be specifically detected and/or affinity-purified by streptavidin-linked horseradish peroxidase or agarose beads, respectively. Thus, the technology enables global profiling of farnesylated proteins by enriching farnesylated proteins and reducing the complexity of farnesylation subproteome. Azido-farnesylated proteins maintain the properties of protein farnesylation, including promoting membrane association, Ras-dependent mitogen-activated protein kinase kinase activation, and inhibition of lovastatin-induced apoptosis. A proteomic analysis of farnesylated proteins by TAS technology revealed 18 farnesylated proteins, including those with potentially novel farnesylation motifs, suggesting that future use of this method is likely to yield novel insight into protein farnesylation. TAS technology can be extended to other posttranslational modifications, such as geranylgeranylation and myristoylation, thus providing powerful tools for detection, quantification, and proteomic analysis of posttranslationally modified proteins.
Electrowetting-based microfluidics for analysis of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry.
Wheeler AR, Moon H, Kim CJ, Loo JA, Garrell RL.
Anal Chem. 2004 Aug 15;76(16):4833-8.
[ expand abstract ]
A new technique for preparing samples for matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is reported. The technique relies on electrowetting-on-dielectric (EWOD) to move droplets containing proteins or peptides and matrix to specific locations on an array of electrodes for analysis. Standard MALDI-MS reagents, analytes, concentrations, and recipes are demonstrated to be compatible with the technique. Mass spectra are comparable to those collected by conventional methods. Nonspecific adsorption of analytes to device surfaces is demonstrated to be negligible. The results suggest that EWOD may be a useful tool for automating sample preparation for high-throughput proteomics and other applications of MALDI-MS.
Towards developing a protein infrared spectra databank (PISD) for proteomics research.
Hering JA, Innocent PR, Haris PI.
Proteomics. 2004 Aug;4(8):2310-9.
[ expand abstract ]
Fourier transform infrared (FTIR) spectroscopy is an attractive tool for proteomics research as it can be used to rapidly characterize protein secondary structure in aqueous solution. Recently, a number of secondary structure prediction methods based on reference sets of FTIR spectra from proteins with known structure from X-ray crystallography have been suggested. These prediction methods, often referred to as pattern recognition based approaches, demonstrated good prediction accuracy using some error measure, e.g., the standard error of prediction (SEP). However, to avoid possible adverse effects from differences in recording, the analysis has been mostly based on reference sets of FTIR spectra from proteins recorded in one laboratory only. As a result, these studies were based on reference sets of FTIR spectra from a limited number of proteins. Pattern recognition based approaches, however, rely on reference sets of FTIR spectra from as many proteins as possible representing all possible band shape variation to be related to the diversity of protein structural classes. Hence, if we want to build reliable pattern recognition based systems to support proteomics research, which are capable of making good predictions from spectral data of any unknown protein, one common goal should be to build a comprehensive protein infrared spectra databank (PISD) containing FTIR spectra of proteins of known structure. We have started the process of developing a comprehensive PISD composed of spectra recorded in different laboratories. As part of this work, here we investigate possible effects on prediction accuracy achieved by a neural network analysis when using reference sets composed of FTIR spectra from different laboratories. Surprisingly low magnitude of difference in SEPs throughout all our experiments suggests that FTIR spectra recorded in different laboratories may be safely combined into one reference set with only minor deterioration of prediction accuracy in the worst case.
Use of high-throughput protein array for profiling of differentially expressed proteins in normal and malignant breast tissue.
Hudelist G, Pacher-Zavisin M, Singer CF, Holper T, Kubista E, Schreiber M, Manavi M, Bilban M, Czerwenka K.
Breast Cancer Res Treat. 2004 Aug;86(3):281-91.
[ expand abstract ]
cDNA arrays provide a powerful tool to identify gene expression pattern that are potentially associated with tumor invasion and metastasis. However, genes work at the protein level and, since the transcriptional activity of a gene does not necessarily reflect cellular protein expression, the identification and quantification of proteins is essential for the understanding of molecular events leading to malignant transformation. We have therefore employed a high-throughput protein microarray system which contains 378 well-characterized monoclonal antibodies in order to compare the gene expression pattern of malignant and adjacent normal breast tissue in a patient with primary breast cancer. Using this technique, we have identified a number of proteins that show increased expression levels in malignant breast tissues such as casein kinase Ie, p53, annexin XI, CDC25C, eIF-4E and MAP kinase 7. The expression of other proteins, such as the multifunctional regulator 14-3-3e was found to be decreased in malignant breast tissue, whereas the majority of proteins remained unchanged when compared to the corresponding non-malignant samples. The protein expression pattern was confirmed by immunohistochemistry, in which antibodies against 8 representative proteins known to be involved in carcinogenesis were employed in paraffin-embedded normal and malignant tissue sections deriving from the same patient. In each case, the results obtained by IHC matched the data obtained by antibody microarray system. Taken together, we have described for the first time a tumor cell specificity protein expression pattern by use of a novel commercially available antibody microarray system. We have thus demonstrated the feasibility of high-throughput protein arrays in the proteomic analysis of human breast tissue. We hypothesize that the use of protein arrays will not only increase our understanding of the molecular events, but could prove useful in evaluating prognosis and in determining optimal antineoplastic therapy.
Liquid chromatography MALDI MS/MS for membrane proteome analysis.
Zhang N, Li N, Li L.
J Proteome Res. 2004 Jul-Aug;3(4):719-27.
[ expand abstract ]
Membrane proteins play critical roles in many biological functions and are often the molecular targets for drug discovery. However, their analysis presents a special challenge largely due to their highly hydrophobic nature. We present a surfactant-aided shotgun proteomics approach for membrane proteome analysis. In this approach, membrane proteins were solubilized and digested in the presence of SDS followed by newly developed auto-offline liquid chromatography/matrix-assisted laser desorption ionization (LC/MALDI) tandem MS analysis. Because of high tolerance of MALDI to SDS, one-dimensional (1D) LC separation can be combined with MALDI for direct analysis of protein digests containing SDS, without the need for extensive sample cleanup. In addition, the heated droplet interface used in LC/MALDI can work with high flow LC separations, allowing a relatively large amount of protein digest to be used for 1D LC/MALDI which facilitates the detection of low abundance proteins. The proteome identification results obtained by LC/MALDI are compared to the gel electrophoresis/MS method as well as the shotgun proteomics method using 2D LC/electrospray ionization MS. It is demonstrated that, while LC/MALDI provides more extensive proteome coverage compared to the other two methods, these three methods are complementary to each other and a combination of these methods should provide a more comprehensive membrane proteome analysis.
Activity-based probes for the proteomic profiling of metalloproteases.
Saghatelian A, Jessani N, Joseph A, Humphrey M, Cravatt BF.
Proc Natl Acad Sci U S A. 2004 Jul 6;101(27):10000-5.
[ expand abstract ]
Metalloproteases (MPs) are a large and diverse class of enzymes implicated in numerous physiological and pathological processes, including tissue remodeling, peptide hormone processing, and cancer. MPs are tightly regulated by multiple posttranslational mechanisms in vivo, hindering their functional analysis by conventional genomic and proteomic methods. Here we describe a general strategy for creating activity-based proteomic probes for MPs by coupling a zinc-chelating hydroxamate to a benzophenone photocrosslinker, which promote selective binding and modification of MP active sites, respectively. These probes labeled active MPs but not their zymogen or inhibitor-bound counterparts and were used to identify members of this enzyme class up-regulated in invasive cancer cells and to evaluate the selectivity of MP inhibitors in whole proteomes. Interestingly, the matrix metalloproteinase inhibitor GM6001 (ilomastat), which is currently in clinical development, was found to also target the neprilysin, aminopeptidase, and dipeptidylpeptidase clans of MPs. These results demonstrate that MPs can display overlapping inhibitor sensitivities despite lacking sequence homology and stress the need to evaluate MP inhibitors broadly across this enzyme class to develop agents with suitable target selectivities in vivo. Activity-based profiling offers a powerful means for conducting such screens, as this approach can be carried out directly in whole proteomes, thereby facilitating the discovery of disease-associated MPs concurrently with inhibitors that selectively target these proteins.
Immobilization of oriented protein molecules on poly(ethylene glycol)-coated Si(111).
Cha T, Guo A, Jun Y, Pei D, Zhu XY.
Proteomics. 2004 Jul;4(7):1965-76.
[ expand abstract ]
A high-density poly(ethylene glycol) (PEG)-coated Si(111) surface is used for the immobilization of polyhistidine-tagged protein molecules. This process features a number of properties that are highly desirable for protein microarray technology: (i) minimal nonspecific protein adsorption; (ii) highly uniform surface functionality; (iii) controlled protein orientation; and (iv) highly specific immobilization reaction without the need of protein purification. The high-density PEG-coated silicon surface is obtained from the reaction of a multi-arm PEG (mPEG) molecule with a chlorine terminated Si(111) surface to give a mPEG film with thickness of 5.2 nm. Four out of the eight arms on each immobilized mPEG molecule are accessible for linking to the chelating iminodiacetic acid (IDA) groups for the binding of Cu(2+) ions. The resulting Cu(2+)-IDA-mPEG Si(111) surface is shown to specifically bind 6x histidine-tagged protein molecules, including green fluorescent protein (GFP) and sulfotransferase (ST), but otherwise retains its inertness towards nonspecific protein adsorption. We demonstrate a particular advantage of this strategy: the possibility of protein immobilization without the need of prepurification. Surface concentrations of relevant chemical species are quantitatively characterized at each reaction step by X-ray photoelectron spectroscopy (XPS). This kind of quantitative analysis is essential in tuning surface concentration and chemical environment for optimal sensitivity in probe-target interaction.
Depth of proteome issues: a yeast isotope-coded affinity tag reagent study.
Parker KC, Patterson D, Williamson B, Marchese J, Graber A, He F, Jacobson A, Juhasz P, Martin S.
Mol Cell Proteomics. 2004 Jul;3(7):625-59.
[ expand abstract ]
As a test case for optimizing how to perform proteomics experiments, we chose a yeast model system in which the UPF1 gene, a protein involved in nonsense-mediated mRNA decay, was knocked out by homologous recombination. The results from five complete isotope-coded affinity tag (ICAT) experiments were combined, two using matrix-assisted laser desorption/ionization (MALDI) tandem mass spectrometry (MS/MS) and three using electrospray MS/MS. We sought to assess the reproducibility of peptide identification and to develop an informatics structure that characterizes the identification process as well as possible, especially with regard to tenuous identifications. The cleavable form of the ICAT reagent system was used for quantification. Most proteins did not change significantly in expression as a consequence of the upf1 knockout. As expected, the Upf1 protein itself was down-regulated, and there were reproducible increases in expression of proteins involved in arginine biosynthesis. Initially, it seemed that about 10% of the proteins had changed in expression level, but after more thorough examination of the data it turned out that most of these apparent changes could be explained by artifacts of quantification caused by overlapping heavy/light pairs. About 700 proteins altogether were identified with high confidence and quantified. Many peptides with chemical modifications were identified, as well as peptides with noncanonical tryptic termini. Nearly all of these modified peptides corresponded to the most abundant yeast proteins, and some would otherwise have been attributed to "single hit" proteins at low confidence. To improve our confidence in the identifications, in MALDI experiments, the parent masses for the peptides were calibrated against nearby components. In addition, five novel parameters reflecting different aspects of identification were collected for each spectrum in addition to the Mascot score that was originally used. The interrelationship between these scoring parameters and confidence in protein identification is discussed.
TANDEM: matching proteins with tandem mass spectra.
Craig R, Beavis RC.
Bioinformatics. 2004 Jun 12;20(9):1466-7.
[ expand abstract ]
SUMMARY: Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. AVAILABILITY: The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
Subtractive proteomic mapping of the endothelial surface in lung and solid tumours for tissue-specific therapy.
Oh P, Li Y, Yu J, Durr E, Krasinska KM, Carver LA, Testa JE, Schnitzer JE.
Nature. 2004 Jun 10;429(6992):629-35.
[ expand abstract ]
The molecular complexity of tissues and the inaccessibility of most cells within a tissue limit the discovery of key targets for tissue-specific delivery of therapeutic and imaging agents in vivo. Here, we describe a hypothesis-driven, systems biology approach to identifying a small subset of proteins induced at the tissue-blood interface that are inherently accessible to antibodies injected intravenously. We use subcellular fractionation, subtractive proteomics and bioinformatics to identify endothelial cell surface proteins exhibiting restricted tissue distribution and apparent tissue modulation. Expression profiling and gamma-scintigraphic imaging with antibodies establishes two of these proteins, aminopeptidase-P and annexin A1, as selective in vivo targets for antibodies in lungs and solid tumours, respectively. Radio-immunotherapy to annexin A1 destroys tumours and increases animal survival. This analytical strategy can map tissue- and disease-specific expression of endothelial cell surface proteins to uncover novel accessible targets useful for imaging and therapy.
A novel cell-free protein synthesis system.
Sitaraman K, Esposito D, Klarmann G, Le Grice SF, Hartley JL, Chatterjee DK.
J Biotechnol. 2004 Jun 10;110(3):257-63.
[ expand abstract ]
An efficient cell-free protein synthesis system has been developed using a novel energy-regenerating source. Using the new energy source, 3-phosphoglycerate (3-PGA), protein synthesis continues beyond 2 h. In contrast, the reaction rate slowed down considerably within 30-45 min using a conventional energy source, phosphoenol pyruvate (PEP) under identical reaction conditions. This improvement results in the production of twice the amount of protein obtained with PEP as an energy source. We have also shown that Gam protein of phage lambda, an inhibitor of RecBCD (ExoV), protects linear PCR DNA templates from degradation in vitro. Furthermore, addition of purified Gam protein in extracts of Escherichia coli BL21 improves protein synthesis from PCR templates to a level comparable to plasmid DNA template. Therefore, combination of these improvements should be amenable to rapid expression of proteins in a high-throughput manner for proteomics and structural genomics applications.
PCR primer selection tool optimized for high-throughput proteomics and structural genomics.
Canaves JM, Morse A, West B.
Biotechniques. 2004 Jun;36(6):1040-2.
[ expand abstract ]
(no abstract)
High-resolution serum proteomic features for ovarian cancer detection.
Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM, Kohn EC, Fishman DA, Whitely G, Barrett JC, Liotta LA, Petricoin EF 3rd, Veenstra TD.
Endocr Relat Cancer. 2004 Jun;11(2):163-78.
[ expand abstract ]
Serum proteomic pattern diagnostics is an emerging paradigm employing low-resolution mass spectrometry (MS) to generate a set of biomarker classifiers. In the present study, we utilized a well-controlled ovarian cancer serum study set to compare the sensitivity and specificity of serum proteomic diagnostic patterns acquired using a high-resolution versus a low-resolution MS platform. In blinded testing sets, the high-resolution mass spectral data contained multiple diagnostic signatures that were superior to the low-resolution spectra in terms of sensitivity and specificity (P<0.00001) throughout the range of modeling conditions. Four mass spectral feature set patterns acquired from data obtained exclusively with the high-resolution mass spectrometer were 100% specific and sensitive in their diagnosis of serum samples as being acquired from either unaffected patients or those suffering from ovarian cancer. Important to the future of proteomic pattern diagnostics is the ability to recognize inferior spectra statistically, so that those resulting from a specific process error are recognized prior to their potentially incorrect (and damaging) diagnosis. To meet this need, we have developed a series of quality-assurance and in-process control procedures to (a) globally evaluate sources of sample variability, (b) identify outlying mass spectra, and (c) develop quality-control release specifications. From these quality-assurance and control (QA/QC) specifications, we identified 32 mass spectra out of the total 248 that showed statistically significant differences from the norm. Hence, 216 of the initial 248 high-resolution mass spectra were determined to be of high quality and were remodeled by pattern-recognition analysis. Again, we obtained four mass spectral feature set patterns that also exhibited 100% sensitivity and specificity in blinded validation tests (68/68 cancer: including 18/18 stage I, and 43/43 healthy). We conclude that (a) the use of high-resolution MS yields superior classification patterns as compared with those obtained with lower resolution instrumentation; (b) although the process error that we discovered did not have a deleterious impact on the present\ results obtained from proteomic pattern analysis, the major source of spectral variability emanated from mass spectral acquisition, and not bias at the clinical collection site; (c) this variability can be reduced and monitored through the use of QA/QC statistical procedures; (d) multiple and distinct proteomic patterns, comprising low molecular weight biomarkers, detected by high-resolution MS achieve accuracies surpassing individual biomarkers, warranting validation in a large clinical study.
Multimembrane dot-blotting: a cost-effective tool for proteome analysis.
Galperin MM, Traicoff JL, Ramesh A, Freebern WJ, Haggerty CM, Hartmann DP, Emmert-Buck MR, Gardner K, Knezevic V.
Biotechniques. 2004 Jun;36(6):1046-51.
[ expand abstract ]
The molecular profiles of protein expression from hundreds of cell lysates can be determined in a high-throughput manner by using fluorescent bead technologies, enzyme-linked immunosorbent assays (ELISAs), and protein microarrays. Although powerful, these tools are costly and technically challenging and thus have limited accessibility for many research groups. We propose a modification of traditional dot blotting that increases throughput of this approach and provides a simple and cost-effective technique for profiling multiple samples. In contrast to traditional blotting that uses a single membrane, we introduce blotting onto a stack of novel, thin, sieve-like membranes. These membranes have a high affinity for binding proteins, but have a lower capacity of protein binding compared to traditional (nitrocellulose) membranes. We compare the linear binding capacity and variability of these novel membranes with nitrocellulose membranes. Also, we describe the use of these membranes in a multilayer dot blot format for profiling mitogen-mediated signal transduction pathways in T cells.
Activity-based probes for protein tyrosine phosphatases.
Kumar S, Zhou B, Liang F, Wang WQ, Huang Z, Zhang ZY.
Proc Natl Acad Sci U S A. 2004 May 25;101(21):7943-8.
[ expand abstract ]
Protein tyrosine phosphatases (PTPs) are involved in the regulation of many aspects of cellular activity including proliferation, differentiation, metabolism, migration, and survival. Given the large number and complexity of PTPs in cell signaling, new strategies are needed for the integrated analysis of PTPs in the whole proteome. Unfortunately, the activities of many PTPs are tightly regulated by posttranslational mechanisms, limiting the utility of standard genomics and proteomics methods for functional characterization of these enzymes. To facilitate the global analysis of PTPs, we designed and synthesized two activity-based probes that consist of alpha-bromobenzylphosphonate as a PTP-specific trapping device and a linker that connects the trapping device with a biotin tag for visualization and purification. We showed that these probes are active site-directed irreversible inactivators of PTPs and form a covalent adduct with PTPs involving the active site Cys residue. Additionally, we demonstrated that the probes are extremely specific toward PTPs while remaining inert to other proteins, including the whole proteome from Escherichia coli. Consequently, these activity-based PTP probes can be used to profile PTP activity in complex proteomes. The ability to interrogate the entire PTP family on the basis of changes in their activity should greatly accelerate both the assignment of PTP function and the identification of potential therapeutic targets.
A high-throughput approach for subcellular proteome: identification of rat liver proteins using subcellular fractionation coupled with two-dimensional liquid chromatography tandem mass spectrometry and bioinformatic analysis.
Jiang XS, Zhou H, Zhang L, Sheng QH, Li SJ, Li L, Hao P, Li YX, Xia QC, Wu JR, Zeng R.
Mol Cell Proteomics. 2004 May;3(5):441-55.
[ expand abstract ]
Four fractions from rat liver (a crude mitochondria (CM) and cytosol (C) fraction obtained with differential centrifugation, a purified mitochondrial (PM) fraction obtained with nycodenz density gradient centrifugation, and a total liver (TL) fraction) were analyzed with two-dimensional liquid chromatography tandem mass spectrometry analysis. A total of 564 rat proteins were identified and were bioinformatically annotated according to their physicochemical characteristics and functions. While most extreme alkaline ribosomal proteins were identified in the TL fraction, the C fraction mainly included neutral enzymes and the PM fraction enriched alkaline proteins and proteins with electron transfer activity or oxygen binding activity. Such characteristics were more apparent in proteins identified only in the TL, C, or PM fraction. The Swiss-Prot annotation and the bioinformatic prediction results proved that the C and PM fractions had enriched cytoplasmic or mitochondrial proteins, respectively. Combination usage of subcellular fractionation with two-dimensional liquid chromatography tandem mass spectrometry was proved to be a high-throughput, sensitive, and effective analytical approach for subcellular proteomics research. Using such a strategy, we have constructed the largest proteome database to date for rat liver (564 rat proteins) and its cytosol (222 rat proteins) and mitochondrial fractions (227 rat proteins). Moreover, the 352 proteins with Swiss-Prot subcellular location annotation in the 564 identified proteins were used as an actual subcellular proteome dataset to evaluate the widely used bioinformatics tools such as PSORT, TargetP, TMHMM, and GRAVY.
Detection of prostate cancer using serum proteomics pattern in a histologically confirmed population.
Li J, White N, Zhang Z, Rosenzweig J, Mangold LA, Partin AW, Chan DW.
J Urol. 2004 May;171(5):1782-7.
[ expand abstract ]
PURPOSE: We retrospectively identified a panel of serum proteins that can discriminate between men with prostate cancer (clinically organ confined) and men with benign prostate disease. MATERIALS AND METHODS: A contemporary set of 345 men who had an archival serum sample available were included in this study. The cancer group consisted of 246 men who underwent radical prostatectomy at the Johns Hopkins Hospital between March 1999 and April 2001. The noncancer group included 99 men with no histological evidence of prostate cancer on biopsy between April 1997 and April 2001 at the same institution. Serum proteomics mass spectra of these patients were generated using ProteinChip arrays and a ProteinChip Biomarker System II surface enhanced laser desorption/ionization time of flight mass spectrometer (Ciphergen Biosystems, Inc., Fremont, California). The cases and controls were randomly split into training and testing groups by a stratified sampling procedure. A combination of bioinformatics tools including ProPeak (3Z Informatics, Charleston, South Carolina) was used to reveal the optimal panel of biomarkers for maximum separation of the prostate cancer and the benign prostate disease cohorts. RESULTS: A panel of 3 proteins (PC-1, PC-2 and PC-3) was selected using the training data. Performance of each of the protein markers and a linear regression derived composite index (PC-com3) were evaluated on the testing data. The area under the curve for prostate specific antigen (PSA), PC-1, PC-2, PC-3 and PC-com3 was 0.542, 0.585, 0.600, 0.636 and 0.643, respectively. Improvement of PC-com3 compared to PSA is observed at specificity range 30% to 80%. At a selected specificity of 45% the sensitivity of PC-com3 is 76%, significantly better than the PSA sensitivity of 57% (p <0.0001). CONCLUSIONS: Serum proteomics patterns may potentially aid in the early detection of prostate cancer.
Improving large-scale proteomics by clustering of mass spectrometry data.
Beer I, Barnea E, Ziv T, Admon A.
Proteomics. 2004 Apr;4(4):950-60.
[ expand abstract ]
Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects.
Proteomic characterization of the interstitial fluid perfusing the breast tumor microenvironment: a novel resource for biomarker and therapeutic target discovery.
Celis JE, Gromov P, Cabezon T, Moreira JM, Ambartsumian N, Sandelin K, Rank F, Gromova I.
Mol Cell Proteomics. 2004 Apr;3(4):327-44.
[ expand abstract ]
Clinical cancer proteomics aims at the identification of markers for early detection and predictive purposes, as well as to provide novel targets for drug discovery and therapeutic intervention. Proteomics-based analysis of traditional sources of biomarkers, such as serum, plasma, or tissue lyzates, has resulted in a wealth of information and the finding of several potential tumor biomarkers. However, many of these markers have shown limited usefulness in a clinical setting, underscoring the need for new clinically relevant sources. Here we present a novel and highly promising source of biomarkers, the tumor interstitial fluid (TIF) that perfuses the breast tumor microenvironment. We collected TIFs from small pieces of freshly dissected invasive breast carcinomas and analyzed them by two-dimensional polyacrylamide gel electrophoresis in combination with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, Western immunoblotting, as well as by cytokine-specific antibody arrays. This approach provided for the first time a snapshot of the protein components of the TIF, which we show consists of more than one thousand proteins--either secreted, shed by membrane vesicles, or externalized due to cell death--produced by the complex network of cell types that make up the tumor microenvironment. So far, we have identified 267 primary translation products including, but not limited to, proteins involved in cell proliferation, invasion, angiogenesis, metastasis, inflammation, protein synthesis, energy metabolism, oxidative stress, the actin cytoskeleton assembly, protein folding, and transport. As expected, the TIF contained several classical serum proteins. Considering that the protein composition of the TIF reflects the physiological and pathological state of the tissue, it should provide a new and potentially rich resource for diagnostic biomarker discovery and for identifying more selective targets for therapeutic intervention.
Accurate qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma using laser capture microdissection coupled with isotope-coded affinity tag and two-dimensional liquid chromatography mass spectrometry.
Li C, Hong Y, Tan YX, Zhou H, Ai JH, Li SJ, Zhang L, Xia QC, Wu JR, Wang HY, Zeng R.
Mol Cell Proteomics. 2004 Apr;3(4):399-409.
[ expand abstract ]
Laser capture microdissection (LCM) is a powerful tool that enables the isolation of specific cell types from tissue sections, overcoming the problem of tissue heterogeneity and contamination. This study combined the LCM with isotope-coded affinity tag (ICAT) technology and two-dimensional liquid chromatography to investigate the qualitative and quantitative proteomes of hepatocellular carcinoma (HCC). The effects of three different histochemical stains on tissue sections have been compared, and toluidine blue stain was proved as the most suitable stain for LCM followed by proteomic analysis. The solubilized proteins from microdissected HCC and non-HCC hepatocytes were qualitatively and quantitatively analyzed with two-dimensional liquid chromatography tandem mass spectrometry (2D-LC-MS/MS) alone or coupled with cleavable ICAT labeling technology. A total of 644 proteins were qualitative identified, and 261 proteins were unambiguously quantitated. These results show that the clinical proteomic method using LCM coupled with ICAT and 2D-LC-MS/MS can carry out not only large-scale but also accurate qualitative and quantitative analysis.
Mining disease susceptibility genes through SNP analyses and expression profiling using MALDI-TOF mass spectrometry.
Tang K, Oeth P, Kammerer S, Denissenko MF, Ekblom J, Jurinke C, van den Boom D, Braun A, Cantor CR.
J Proteome Res. 2004 Mar-Apr;3(2):218-27.
[ expand abstract ]
To find genes that underlie disease susceptibilities, genome-wide single nucleotide polymorphisms (SNPs) have been analyzed using high-throughput matrix assisted laser desorption/ionization (MALDI) time-of-flight (TOF) mass spectrometry (MS). As a proof-of-concept for this approach, gene regions have been identified that were previously associated by others with certain diseases or traits. On the same technology platform, accurate and absolute transcriptional profiling can be performed and applied to allele expression analysis. Here, we provide a brief review of the technology and its applications to disease gene discovery.
High-throughput comprehensive analysis of human plasma proteins: a step toward population proteomics.
Nedelkov D, Tubbs KA, Niederkofler EE, Kiernan UA, Nelson RW.
Anal Chem. 2004 Mar 15;76(6):1733-7.
[ expand abstract ]
A high-throughput (HT) comprehensive analysis approach was developed for assaying proteins directly from human plasma. Proteins were selectively retrieved, by utilizing antibodies immobilized within affinity pipet tips, and eluted onto enzymatically active mass spectrometer targets for subsequent digestion and structural characterization. Several parameters, including uniform parallel protein elution from 96 affinity pipet tips, proper buffering for on-target digestion, termination of the digestion, and MALDI matrix (re)introduction, were evaluated and optimized. The approach was validated via parallel, high-throughput analysis of transthyretin (TTR) and transferrin (TRFE) from 96 identical plasma samples. The 96 parallel analyses for each protein were completed in less than 90 min, measured from protein extraction to insertion in the mass spectrometer. Virtually identical mass spectra were obtained from the 96 TTR analyses, characterized by the presence of 14 tryptic fragments that allowed TTR sequence mapping with 100% coverage. Database search returned TTR as the best match for all 96 data sets. In regard to the TRFE analyses, database searching using data from the 96 spectra returned TRFE as the best match for all but 1 of the spectra. TRFE was mapped with 47-69% sequence coverage, with gaps in the sequence coverage corresponding to the carbohydrate-containing peptide fragments and large and small trypsin fragments that fell outside the window of mass analysis. Overall, the combined high-throughput affinity capture-protein digestion approach showed high reproducibility and speed and yielded an exceptional level of protein characterization, suggesting its use in future population proteomics endeavors.
A novel proteomic screen for peptide-protein interactions.
Schulze WX, Mann M.
J Biol Chem. 2004 Mar 12;279(11):10756-64.
[ expand abstract ]
Regulated interactions between short, unstructured amino acid sequences and modular protein domains are central to cell signaling. Here we use synthetic peptides in "active" (e.g. phosphorylated) and "control" (e.g. non-phosphorylated) forms as baits in affinity pull-down experiments to determine such interactions by quantitative proteomics. Stable isotope labeling by amino acids in cell culture distinguishes specific binders directly by the isotope ratios determined by mass spectrometry (Blagoev, B., Kratchmarova, I., Ong, S.-E., Nielsen, M., Foster, L. J., and Mann, M. (2003) Nat. Biotechnol. 21, 315-318). A tyrosine-phosphorylated peptide of the epidermal growth factor receptor specifically retrieved the Src homology domain (SH) 2- and SH3 domain-containing adapter protein Grb2. A proline-rich sequence of Son of Sevenless also specifically bound Grb2, demonstrating that the screen maintains specificity with low affinity interactions. The proline-rich Sos peptide retrieved only SH3 domain containing proteins as specific binding partners. Two of these, Pacsin 3 and Sorting Nexin 9, were confirmed by immunoprecipitation. Our data are consistent with a change in the role of Sos from Ras-dependent signaling to actin remodeling/endocytic signaling events by a proline-SH3 domain switch.
Profiling the activity of G proteins in patient-derived tissues by rapid affinity-capture of signal transduction proteins (GRASP).
Berman DM, Shih IeM, Burke LA, Veenstra TD, Zhao Y, Conrads TP, Kwon SW, Hoang V, Yu LR, Zhou M, Kurman RJ, Petricoin EF, Liotta LA.
Proteomics. 2004 Mar;4(3):812-8.
[ expand abstract ]
The next phase in molecular medicine will require the ability to identify signal transduction events inside a cell, in the biologic context of the disease-host interface and at a given point in time. New technologies are needed to profile the activity of these signaling pathways in patient tissue rather than cultured cell lines since the tumor-host microenvironment influences the cellular proteome. We introduce such a technology, rapid affinity capture of signaling proteins (GRASP), to investigate the activity of signaling pathways from patient-derived carcinomas and benign epithelial surfaces and apply it to studying important signaling events in ovarian carcinoma. During the progression from benign ovarian epithelium to invasive carcinoma, there is loss of repression of Rho A as evidenced by its dissociation from its inhibitor, Rho Guanine Nucleotide Dissociation Inhibitor (RhoGDI). GRASP is more informative than simply profiling transcript or protein levels. Furthermore, GRASP coupled with mass spectrometry allowed us to identify a protein-binding partner of RhoGDI, demonstrating the power of this technology in the discovery of potentially novel protein-protein interactions. GRASP represents an advance in the field of proteomics as it detects protein interactions present in cells as they exist in their native tissue microenvironment.
Effect of collagen substrates on proteomic modulation of breast cancer cells.
Fontana S, Pucci-Minafra I, Becchi M, Freyria AM, Minafra S.
Proteomics. 2004 Mar;4(3):849-60.
[ expand abstract ]
We have previously described the occurrence, in breast and colon cancer extra-cellular matrix, of an oncofoetal form of collagen, OF/LB, able to induce an increase in cell proliferation and motility in the breast cancer cell line 8701-BC. It also caused an increased amount of type V collagen which appears to exert an anti-proliferative effect on the same cells. The aim of the present study was to investigate, at the proteomic level, the effect of OF/LB and type V collagens used as substrates for neoplastic cell growth. Due to the complexity of a whole proteomic profile, a subset of significant protein classes was used to assess variations in protein expression levels. For this study we adopted a multivariate statistical procedure that allows a global view of the variations induced by different growth conditions, when several variables have to be analyzed simultaneously. The results of this research indicate that in response to different growth substrates, chaperons and heat shock proteins contributed most to the dissimilarity in levels of expression of the selected protein spots. Moreover, we observed that different isoforms of the same protein showed independent levels of expression from one another in relation to the different collagen treatments.
Proteome analysis of human colon cancer by two-dimensional difference gel electrophoresis and mass spectrometry.
Friedman DB, Hill S, Keller JW, Merchant NB, Levy SE, Coffey RJ, Caprioli RM.
Proteomics. 2004 Mar;4(3):793-811.
[ expand abstract ]
Two-dimensional difference gel electrophoresis (2-D DIGE) coupled with mass spectrometry (MS) was used to investigate tumor-specific changes in the proteome of human colorectal cancers and adjacent normal mucosa. For each of six patients with different stages of colon cancer, Cy5-labeled proteins isolated from tumor tissue were combined with Cy3-labeled proteins isolated from neighboring normal mucosa and separated on the same 2-D gel along with a Cy2-labeled mixture of all 12 normal/tumor samples as an internal standard. Over 1500 protein spot-features were analyzed in each paired normal/tumor comparison, and using DIGE technology with the mixed-sample internal standard, statistically significant quantitative comparisons of each protein abundance change could be made across multiple samples simultaneously without interference due to gel-to-gel variation. Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) and tandem (TOF/TOF) MS provided sensitive and accurate mass spectral data for database interrogation, resulting in the identification of 52 unique proteins (including redundancies due to proteolysis and post-translationally modified isoforms) that were changing in abundance across the cohort. Without the benefit of the Cy2-labeled 12 sample mixture internal standard, 42 of these proteins would have been overlooked due to the large degree of variation inherent between normal and tumor samples.
Activity-based ubiquitin-specific protease (USP) profiling of virus-infected and malignant human cells.
Ovaa H, Kessler BM, Rolen U, Galardy PJ, Ploegh HL, Masucci MG.
Proc Natl Acad Sci U S A. 2004 Feb 24;101(8):2253-8.
[ expand abstract ]
The family of ubiquitin (Ub)-specific proteases (USP) removes Ub from Ub conjugates and regulates a variety of cellular processes. The human genome contains many putative USP-encoding genes, but little is known about USP tissue distribution, pattern of expression, activity, and substrate specificity. We have used a chemistry-based functional proteomics approach to identify active USPs in normal, virus-infected, and tumor-derived human cells. Depending on tissue origin and stage of activation/differentiation, different USP activity profiles were revealed. The activity of specific USPs, including USP5, -7, -9, -13, -15, and -22, was up-regulated by mitogen activation or virus infection in normal T and B lymphocytes. UCH-L1 was highly expressed in tumor cell lines of epithelial and hematopoietic cell origin but was not detected in freshly isolated and mitogen-activated cells. Up-regulation of this USP was a late event in the establishment of Epstein-Barr virus-immortalized lymphoblastoid cell lines and correlated with enhanced proliferation, suggesting a possible role in growth transformation.
Proteomic evaluation of core biopsy specimens from breast lesions.
Bisca A, D'Ambrosio C, Scaloni A, Puglisi F, Aprile G, Piga A, Zuiani C, Bazzocchi M, Di Loreto C, Paron I, Tell G, Damante G.
Cancer Lett. 2004 Feb 10;204(1):79-86.
[ expand abstract ]
Analysis of tumour samples by a proteomic technology, which combines two-dimensional gel electrophoresis and mass spectrometry analysis, is a promising approach for molecular characterization of cancer. Proteomic analysis of neoplasms is usually performed on surgical material. The possibility to perform proteomic analysis on pre-operative samples might be useful for diagnostic purposes or for determination of tumour sensitivity to therapy. In this study, we report how tissues from core biopsy of breast lesions can be routinely used to obtain accurate protein expression profiles by proteomic analysis. Protein profiles from fibroadenomas were compared to those from ductal infiltrating carcinomas. By using mass spectrometry, identification of proteins overexpressed in carcinomas with respect to fibroadenomas was obtained. Thus, our study provides a methodology to perform proteomic analysis on pre-operative samples of breast lesions.
Method for analyzing signaling networks in complex cellular systems.
Plavec I, Sirenko O, Privat S, Wang Y, Dajee M, Melrose J, Nakao B, Hytopoulos E, Berg EL, Butcher EC.
Proc Natl Acad Sci U S A. 2004 Feb 3;101(5):1223-8.
[ expand abstract ]
Now that the human genome has been sequenced, the challenge of assigning function to human genes has become acute. Existing approaches using microarrays or proteomics frequently generate very large volumes of data not directly related to biological function, making interpretation difficult. Here, we describe a technique for integrative systems biology in which: (i) primary cells are cultured under biologically meaningful conditions; (ii) a limited number of biologically meaningful readouts are measured; and (iii) the results obtained under several different conditions are combined for analysis. Studies of human endothelial cells overexpressing different signaling molecules under multiple inflammatory conditions show that this system can capture a remarkable range of functions by a relatively small number of simple measurements. In particular, measurement of seven different protein levels by ELISA under four different conditions is capable of reconstructing pathway associations of 25 different proteins representing four known signaling pathways, implicating additional participants in the NF-kappaBorRAS/mitogen-activated protein kinase pathways and defining additional interactions between these pathways.
An automated high performance capillary liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometer for high-throughput proteomics.
Belov ME, Anderson GA, Wingerd MA, Udseth HR, Tang K, Prior DC, Swanson KR,
Buschbach MA, Strittmatter EF, Moore RJ, Smith RD.
J Am Soc Mass Spectrom. 2004 Feb;15(2):212-32.
[ expand abstract ]
We describe a fully automated high performance liquid chromatography 9.4 tesla Fourier transform ion resonance cyclotron (FTICR) mass spectrometer system designed for proteomics research. A synergistic suite of ion introduction and manipulation technologies were developed and integrated as a high-performance front-end to a commercial Bruker Daltonics FTICR instrument. The developments incorporated included a dual-ESI-emitter ion source; a dual-channel electrodynamic ion funnel; tandem quadrupoles for collisional cooling and focusing, ion selection, and ion accumulation, and served to significantly improve the sensitivity, dynamic range, and mass measurement accuracy of the mass spectrometer. In addition, a novel technique for accumulating ions in the ICR cell was developed that improved both resolution and mass measurement accuracy. A new calibration methodology is also described where calibrant ions are introduced and controlled via a separate channel of the dual-channel ion funnel, allowing calibrant species to be introduced to sample spectra on a real-time basis, if needed. We also report on overall instrument automation developments that facilitate high-throughput and unattended operation. These included an automated version of the previously reported very high resolution, high pressure reversed phase gradient capillary liquid chromatography (LC) system as the separations component. A commercial autosampler was integrated to facilitate 24 h/day operation. Unattended operation of the instrument revealed exceptional overall performance: Reproducibility (1-5% deviation in uncorrected elution times), repeatability (<20% deviation in detected abundances for more abundant peptides from the same aliquot analyzed a few weeks apart), and robustness (high-throughput operation for 5 months without significant downtime). When combined with modulated-ion-energy gated trapping, the dynamic calibration of FTICR mass spectra provided decreased mass measurement errors for peptide identifications in conjunction with high resolution capillary LC separations over a dynamic range of peptide peak intensities for each spectrum of 10(3), and >10(5) for peptide abundances in the overall separation.
Two-dimensional Blue native/SDS gel electrophoresis of multi-protein complexes
from whole cellular lysates: a proteomics approach.
Camacho-Carvajal MM, Wollscheid B, Aebersold R, Steimle V, Schamel WW.
Mol Cell Proteomics. 2004 Feb;3(2):176-82.
[ expand abstract ]
Identification and characterization of multi-protein complexes is an important step toward an integrative view of protein-protein interaction networks that determine protein function and cell behavior. The limiting factor for identifying protein complexes is the method for their separation. Blue native PAGE (BN-PAGE) permits a high-resolution separation of multi-protein complexes under native conditions. To date, BN-PAGE has only been applicable to purified material. Here, we show that dialysis permits the analysis of multi-protein complexes of whole cellular lysates by BN-PAGE. We visualized different multi-protein complexes by immunoblotting including forms of the eukaryotic proteasome. Complex dynamics after gamma interferon stimulation of cells was studied, and an antibody shift assay was used to detect protein-protein interactions in BN-PAGE. Furthermore, we identified defined protein complexes of various proteins including the tumor suppressor p53 and c-Myc. Finally, we identified multi-protein complexes via mass spectrometry, showing that the method has a wide potential for functional proteomics.
Intensity-based protein identification by machine learning from a library of tandem mass spectra.
Elias JE, Gibbons FD, King OD, Roth FP, Gygi SP.
Nat Biotechnol. 2004 Feb;22(2):214-9.
[ expand abstract ]
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.
The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SG, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R.
Nat Biotechnol. 2004 Feb;22(2):177-83.
[ expand abstract ]
A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer.
Petricoin EF, Liotta LA.
Curr Opin Biotechnol. 2004 Feb;15(1):24-30.
[ expand abstract ]
Proteomics is more than just generating lists of proteins that increase or decrease in expression as a cause or consequence of pathology. The goal should be to characterize the information flow through the intercellular protein circuitry that communicates with the extracellular microenvironment and then ultimately to the serum/plasma macroenvironment. The nature of this information can be a cause, or a consequence, of disease and toxicity-based processes. Serum proteomic pattern diagnostics is a new type of proteomic platform in which patterns of proteomic signatures from high dimensional mass spectrometry data are used as a diagnostic classifier. This approach has recently shown tremendous promise in the detection of early-stage cancers. The biomarkers found by SELDI-TOF-based pattern recognition analysis are mostly low molecular weight fragments produced at the specific tumor microenvironment.
Nanoscale proteomics.
Shen Y, Tolic N, Masselon C, Pasa-Tolic L, Camp DG 2nd, Lipton MS, Anderson GA, Smith RD.
Anal Bioanal Chem. 2004 Feb;378(4):1037-45.
[ expand abstract ]
Efforts to develop a liquid chromatography (LC)/mass spectrometry (MS) technology for ultra-sensitive proteomics studies (i.e., nanoscale proteomics) are described. The approach combines high-efficiency nanoscale LC (separation peak capacity of approximately 10(3); 15-microm-i.d. packed capillaries with flow rates of 20 nL min(-1), the optimal separation linear velocity) with advanced MS, including high-sensitivity and high-resolution Fourier transform ion cyclotron resonance MS, to perform both single-stage MS and tandem MS (MS/MS) proteomic analyses. The technology enables broad protein identification from nanogram-size proteomics samples and allows the characterization of more abundant proteins from sub-picogram-size samples. Protein identification in such studies using MS is demonstrated from <75 zeptomole of a protein. The average proteome measurement throughput is approximately 50 proteins h(-1) using MS/MS during separations, presently requiring approximately 3 h sample(-1). Greater throughput (approximately 300 proteins h(-1)) and improved detection limits providing more comprehensive proteome coverage can be obtained by using the "accurate mass and time" tag approach developed in our laboratory. This approach provides a dynamic range of at least 10(6) for protein relative abundances and an improved basis for quantitation. These capabilities lay the foundation for studies from single or limited numbers of cells.
Factors that affect ion trap data-dependent MS/MS in proteomics.
Wenner BR, Lynn BC.
J Am Soc Mass Spectrom. 2004 Feb;15(2):150-7.
[ expand abstract ]
Quadrupole ion trap scanning parameters for performing bottom-up proteomics in a data-dependent fashion were evaluated on a Finnigan LCQ Deca mass spectrometer. Evaluation of parameters such as the number of averaged full scans, the number of averaged MS/MS scans, and ion injection times were necessary for acquiring high quality MS/MS spectra that yield favorable b and y ion coverage and high correlation to proteins using database searching algorithms. In this study, we demonstrated how the duty cycle of the mass spectrometer affects the number of peptides that can be successfully identified by SEQUEST using a model system of tryptic BSA peptides to mimic a typical complex mixture associated with bottom-up proteomics. The number of averaged scans and the duration of ion accumulation in the trap had a significant effect on the quality of acquired MS/MS spectra. For example, by increasing the ion injection time from 500 ms to 600 ms, peptide HLVDEPQNLIK improved from being improperly identified to being correctly identified with a SEQUEST cross-correlation score of 3.60. As a result of these experiments, we have devised the following set of ion trap parameters for performing bottom-up proteomics analysis in our laboratory: Three averaged full scans, five averaged MS/MS scans, and a maximum ion injection time of 600 ms.
Proteomics approach to identifying ATP-covalently modified proteins.
Besant PG, Lasker MV, Bui CD, Tan E, Attwood PV, Turck CW.
J Proteome Res. 2004 Jan-Feb;3(1):120-5.
[ expand abstract ]
This study aims to investigate functionally similar proteins based on their capacity to remain bound to ATP under stringent resolving conditions. Using two-dimensional gel electrophoresis and capillary liquid chromatography on-line mass spectrometry, we have identified several mammalian and E. coli proteins that appear to covalently bind ATP. To validate this approach, we obtained commercially purified forms of proteins identified from two-dimensional protein maps and tested their capacity to bind alpha 32P phosphate labeled ATP. This proteomics approach provides an initial screening method of identifying functionally similar proteins for further scrutiny by a more traditional analysis.
Gel based isoelectric focusing of peptides and the utility of isoelectric point in protein identification.
Cargile BJ, Bundy JL, Freeman TW, Stephenson JL Jr.
J Proteome Res. 2004 Jan-Feb;3(1):112-9.
[ expand abstract ]
Here we present the theoretical and experimental evaluation of peptide isoelectric point as a method to aid in the identification of peptides from complex mixtures. Predicted pI values were found to match closely the experimentally obtained data, resulting in the development of a unique filter that lowers the effective false positive rate for peptide identification. Due to the reduction of the false positive rate, the cross-correlation parameters Xcorr and deltaCn from the SEQUEST program can be lowered resulting in 25% more peptide identifications. This approach was successfully applied to analysis of the soluble fraction of the E. coli proteome, where 417 proteins were identified from 1022 peptides using just 20 microg of material.
Multidimensional proteome analysis of human mammary epithelial cells.
Jacobs JM, Mottaz HM, Yu LR, Anderson DJ, Moore RJ, Chen WN, Auberry KJ, Strittmatter EF, Monroe ME, Thrall BD, Camp DG 2nd, Smith RD.
J Proteome Res. 2004 Jan-Feb;3(1):68-75.
[ expand abstract ]
Recent multidimensional liquid chromatography MS/MS studies have contributed to the identification of large numbers of expressed proteins for numerous species. The present study couples size exclusion chromatography of intact proteins with the separation of tryptically digested peptides using a combination of strong cation exchange and high resolution, reversed phase capillary chromatography to identify proteins extracted from human mammary epithelial cells (HMECs). In addition to conventional conservative criteria for protein identifications, the confidence levels were additionally increased through the use of peptide normalized elution times (NET) for the liquid chromatographic separation step. The combined approach resulted in a total of 5838 unique peptides identified covering 1574 different proteins with an estimated 4% gene coverage of the human genome, as annotated by the National Center for Biotechnology Information (NCBI). This database provides a baseline for comparison against variations in other genetically and environmentally perturbed systems. Proteins identified were categorized based upon intracellular location and biological process with the identification of numerous receptors, regulatory proteins, and extracellular proteins, demonstrating the usefulness of this application in the global analysis of human cells for future comparative studies.
A simple solid phase mass tagging approach for quantitative proteomics.
Shi Y, Xiang R, Crawford JK, Colangelo CM, Horvath C, Wilkins JA.
J Proteome Res. 2004 Jan-Feb;3(1):104-11.
[ expand abstract ]
New mass-tagging reagents for quantitative proteomics measurements have been designed using solid phase peptide synthesis technology. The solid phase mass tags have been used to accurately measure the relative amounts of cysteine-containing peptides in model peptide mixtures as well as in mixtures of tryptic digests in the femtomol range. Measurements were made using both matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS) and online reversed-phase capillary liquid chromatography coupled through a nanoelectrospray interface to an ion trap mass spectrometer (capillary LC/ESI-MS). Results of mass-tagging experiments obtained from these two mass spectrometry techniques and their relative advantages and disadvantages for identification and quantitation of mass tagged peptides are compared. These reagents provide a simple, rapid and cost-effective alternative to currently available mass tagging technologies.
Multiple site-directed mutagenesis of more than 10 sites simultaneously and in a single round.
Seyfang A, Jin JH.
Anal Biochem. 2004 Jan 15;324(2):285-91.
[ expand abstract ]
Site-directed mutagenesis is a powerful tool to explore the structure-function relationship of proteins, but most traditional methods rely on the mutation of only one site at a time and efficiencies drop drastically when more than three sites are targeted simultaneously. Many applications in functional proteomics and genetic engineering, including codon optimization for heterologous expression, generation of cysteine-less proteins, and alanine-scanning mutagenesis, would greatly benefit from a multiple-site mutagenesis method with high efficiency. Here we describe the development of a simple and rapid method for site-directed mutagenesis of more than 10 sites simultaneously with up to 100% efficiency. The method uses two terminal tailed primers with a unique 25-nucleotide tail each that are simultaneously annealed to template DNA together with the set of mutagenic primers in between. Following synthesis of the mutant strand by primer extension and ligation with T4 DNA polymerase and ligase, the unique mutant strand-specific tails of the terminal primers are used as anchors to specifically amplify the mutant strand by high-fidelity polymerase chain reaction. We have employed this novel method to mutate simultaneously all 9 and 11 CUG leucine codons of the Hyg and Neo resistance genes, respectively, to the Candida albicans-friendly UUG leucine codon at 100% efficiency.
TopNet: a tool for comparing biological sub-networks, correlating protein
properties with topological statistics.
Yu H, Zhu X, Greenbaum D, Karro J, Gerstein M.
Nucleic Acids Res. 2004 Jan 14;32(1):328-37.
[ expand abstract ]
Biological networks are a topic of great current interest, particularly with the publication of a number of large genome-wide interaction datasets. They are globally characterized by a variety of graph-theoretic statistics, such as the degree distribution, clustering coefficient, characteristic path length and diameter. Moreover, real protein networks are quite complex and can often be divided into many sub-networks through systematic selection of different nodes and edges. For instance, proteins can be sub-divided by expression level, length, amino-acid composition, solubility, secondary structure and function. A challenging research question is to compare the topologies of sub- networks, looking for global differences associated with different types of proteins. TopNet is an automated web tool designed to address this question, calculating and comparing topological characteristics for different sub-networks derived from any given protein network. It provides reasonable solutions to the calculation of network statistics for sub-networks embedded within a larger network and gives simplified views of a sub-network of interest, allowing one to navigate through it. After constructing TopNet, we applied it to the interaction networks and protein classes currently available for yeast. We were able to find a number of potential biological correlations. In particular, we found that soluble proteins had more interactions than membrane proteins. Moreover, amongst soluble proteins, those that were highly expressed, had many polar amino acids, and had many alpha helices, tended to have the most interaction partners. Interestingly, TopNet also turned up some systematic biases in the current yeast interaction network: on average, proteins with a known functional classification had many more interaction partners than those without. This phenomenon may reflect the incompleteness of the experimentally determined yeast interaction network.
UniProt: the Universal Protein knowledgebase.
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9.
[ expand abstract ]
To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.
GELBANK: a database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes.
Babnigg G, Giometti CS.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D582-5.
[ expand abstract ]
GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at http://gelbank.anl.gov and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans.
TrSDB: a proteome database of transcription factors.
Hermoso A, Aguilar D, Aviles FX, Querol E.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D171-3.
[ expand abstract ]
TrSDB-TranScout Database-(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression.
SMART 4.0: towards genomic data integration.
Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J,
Ponting CP, Bork P.
Nucleic Acids Res.
[ expand abstract ]
SMART (Simple Modular Architecture Research Tool) is a web tool (http://smart.embl.de/) for the identification and annotation of protein domains, and provides a platform for the comparative study of complex domain architectures in genes and proteins. The January 2004 release of SMART contains 685 protein domains. New developments in SMART are centred on the integration of data from completed metazoan genomes. SMART now uses predicted proteins from complete genomes in its source sequence databases, and integrates these with predictions of orthology. New visualization tools have been developed to allow analysis of gene intron-exon structure within the context of protein domain structure, and to align these displays to provide schematic comparisons of orthologous genes, or multiple transcripts from the same gene. Other improvements include the ability to query SMART by Gene Ontology terms, improved structure database searching and batch retrieval of multiple entries.
MIPS: analysis and annotation of proteins from whole genomes.
Mewes HW, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G, Munsterkotter M, Pagel P, Strack N, Stumpflen V, Warfsmann J, Ruepp A.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D41-4.
[ expand abstract ]
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Human protein reference database as a discovery resource for proteomics.
Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN, Deshpande N, Suresh S, Rashmi BP, Shanker K, Padma N, Niranjan V, Harsha HC, Talreja N, Vrushabendra BM, Ramya MA, Yatish AJ, Joy M, Shivashankar HN, Kavitha MP, Menezes M, Choudhury DR, Ghosh N, Saravana R, Chandran S, Mohan S, Jonnalagadda CK, Prasad CK, Kumar-Sinha C, Deshpande KS, Pandey A.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D497-501.
[ expand abstract ]
The rapid pace at which genomic and proteomic data is being generated necessitates the development of tools and resources for managing data that allow integration of information from disparate sources. The Human Protein Reference Database (http://www.hprd.org) is a web-based resource based on open source technologies for protein information about several aspects of human proteins including protein-protein interactions, post-translational modifications, enzyme-substrate relationships and disease associations. This information was derived manually by a critical reading of the published literature by expert biologists and through bioinformatics analyses of the protein sequence. This database will assist in biomedical discoveries by serving as a resource of genomic and proteomic information and providing an integrated view of sequence, structure, function and protein networks in health and disease.
Dissecting the human spliceosome through bioinformatics and proteomics approaches.
Peres Lopes GM, de Souza SJ.
J Bioinform Comput Biol. 2004 Jan;1(4):743-50.
[ expand abstract ]
The precise excision of introns from mRNAs is executed by the spliceosome, a cellular machinery composed by five small nuclear RNAs and hundreds of proteins. In the last few years, several groups have used proteomics and computational biology tools to characterize the components of the human spliceosome. These reports have identified basically all known splicing factors and several new proteins. The composition of the human spliceosome confirms the link between splicing and other steps in gene expression. Here we comment on these reports and discuss the perspectives for the coming years.
Prolinks: a database of protein functional linkages derived from coevolution.
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D.
Genome Biol. 2004;5(5):R35.
[ expand abstract ]
The advent of whole-genome sequencing has led to methods that infer protein function and linkages. We have combined four such algorithms (phylogenetic profile, Rosetta Stone, gene neighbor and gene cluster) in a single database--Prolinks--that spans 83 organisms and includes 10 million high-confidence links. The Proteome Navigator tool allows users to browse predicted linkage networks interactively, providing accompanying annotation from public databases. The Prolinks database and the Proteome Navigator tool are available for use online at http://dip.doe-mbi.ucla.edu/pronav.
Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.
Kempka M, Sjodahl J, Bjork A, Roeraade J.
Rapid Commun Mass Spectrom. 2004;18(11):1208-12.
[ expand abstract ]
A method for peak picking for matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOFMS) is described. The method is based on the assumption that two sets of ions are formed during the ionization stage, which have Gaussian distributions but different velocity profiles. This gives rise to a certain degree of peak skewness. Our algorithm deconvolutes the peak and utilizes the fast velocity, bulk ion distribution for peak picking. Evaluation of the performance of the new method was conducted using peptide peaks from a bovine serum albumin (BSA) digest, and compared with the commercial peak-picking algorithms Centroid and SNAP. When using the new two-Gaussian algorithm, for strong signals the mass accuracy was equal to or marginally better than the results obtained from the commercial algorithms. However, for weak, distorted peaks, considerable improvement in both mass accuracy and precision was obtained. This improvement should be particularly useful in\ proteomics, where a lack of signal strength is often encountered when dealing with weakly expressed proteins. Finally, since the new peak-picking method uses information from the entire signal, no adjustments of parameters related to peak height have to be made, which simplifies its practical use
