Proteomic Data Sharing
The Amsterdam Principles: CPTC Leads Proteomic Data Sharing Efforts
As part of its mission to advance the application of clinical proteomics to personalize cancer care, the NCI’s Clinical Proteomic Technologies for Cancer (CPTC) is leading international efforts to address a considerable obstacle to the field: the lack of widely followed policies governing the rapid release of large-scale proteomic data into the public domain. Just as the Human Genome Project (HGP) recognized the power of having community resources of high-quality data, a parallel opportunity exists today in the emerging field of protein biomarker discovery (proteomics). As a result, on August 14, 2008, CPTC sponsored an international summit in The Netherlands that included members from the research community, funding agencies (e.g. NIH), policy makers, and industry to define what it will take to have proteomics data released into the public domain as soon as they are produced.
From climatology to particle physics, advancements in science and health care is made possible through widespread access to results from cutting-edge research, enabling scientists to use and build on this knowledge. This principle was clearly demonstrated in the HGP, where researchers built upon the work of others to create an armamentarium of data resources for the community. These collective resources have paid dividends beyond what anyone could have conceived. Thus, this practice—made possible by the existence of universally endorsed policies governing the standards for and the availability of data in the public domain, as well as centralized repositories and portals for depositing and accessing such data—has been a driver of the rapid pace of genomic discovery. The proteomics community will benefit greatly from adopting an appropriately similar practice.
The outcome of this summit, the Amsterdam Principles, provides recommendations for rapid proteomics data release and sharing policies that are similar to the Bermuda Principles, a series of standardized data sharing policies that served as a catalyst in the world of genomics. It was agreed that, at a minimum, what the community both wants and needs is high quality, well-annotated raw data. Access to these data will require the proper infrastructure: community-supported standardized formats, controlled vocabularies and ontologies, minimal reporting requirements, and publicly available online repositories. The release of such data would put the pace of proteomic research on a trajectory similar to that seen in genomics research.
The Amsterdam Principles include guidelines for the following:
- Timing
- Comprehensiveness
- Format
- Deposition to repositories
- Quality metrics
- Responsibility for proteomics data release
Proteomics researchers face a number of obstacles to completely open data sharing, however, including technical, infrastructural, and policy challenges. Read below to learn more about CPTC activities to overcome these obstacles and recent advances in data sharing throughout the proteomics community.
The Amsterdam Principles White Paper
Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles. Journal of Proteome Research 2009; 8: 3689-3692.
Data Sharing Resources
ProteomeCommons.org
ProteomeCommons is a public proteomics database for annotations and other information linked to the Tranche data repository and to other resources. The ProteomeCommons Tranche network is the first instance of a network in existence using Tranche, a free and open source file sharing tool that encourages widespread data sharing. The network is a cloud of computers to which researchers can upload files and from which files can be downloaded. ProteomeCommons is the first implementation of the Tranche software, and it is also useful as a testing bed for the software.
Tranche Project
Tranche is a free and open source file sharing tool, developed at the University of Michigan, that enables collections of computers to easily share data sets. Designed and built with scientists and researchers in mind, Tranche can handle very large data sets, is secure, is scalable, and all data sets are citable in scientific journals. This project's goal is to solve the problems commonly associated with sharing scientific data, letting you and your collaborators focus on using the data. Tranche has served as the repository for the CPTAC network, hosting all inter-laboratory data and metadata. In 2009, Tranche and its associated annotation tool became caBIG®-silver compliant, making CPTC data accessible to the broader biomedical research community.
Additional Data Sharing Publications
Prepublication data sharing
Nature. 2009 Sep 10;461(7261):168-70.
Sharing The Wealth Of Data
Scientific American worldVIEW. 2008 May.
Opening up Rosetta
SciBX; 2(14); doi:10.1038/scibx.2009.561. Published online April 9 2009.
International Summit on Proteomics Data Release and Sharing Policy.
J. Proteome Res.; 2008 Nov; 7(11) pp 4609 - 4609; (Editorial) [Epub 2008 Oct 7]
