The Cancer Imaging Archive (TCIA), managed by the National Cancer Institute's Cancer Imaging Program (CIP), collects, curates, and hosts digital histopathology and standard-of-care radiology imaging from CPTAC-enrolled patients to provide data for imaging-omics research. Algorithmic analysis of the radiology imaging often requires pre-identification and segmentation of the tumors in the images. To facilitate this research, CIP has funded and released on TCIA comprehensive annotations to four CPTAC imaging collections (Pancreatic Ductal Adenocarcinoma, Clear Cell Renal Cell Carcinoma, Uterine Corpus Endometrial Carcinoma, and Head and Neck Squamous Cell Carcinoma).
Each annotation dataset includes radiologist-reviewed 3D tumor segmentations and seed points identifying the tumors. Depending on the histology, the annotations follow RECIST 1.1 or PERCIST criteria, ensuring accurate and consistent lesion identification across all datasets. Tumor volume calculations are included for each segmented tumor, and many cases include annotated scans from multiple patient care time-points. The annotations are all linked to the comprehensive imaging data, including PET/CT, MRI, and CT scans, and are identified to facilitate correlation with the CPTAC-generated analysis data posted on the proteomic and genomic data commons.
The annotated datasets are publicly accessible through The Cancer Imaging Archive website and through the TCIABrowser extension of 3D Slicer, which is a popular open-source application for medical image analysis.. Researchers and developers can explore the extensive annotations, supplementary information, and sample code provided to facilitate the efficient utilization of these resources. The full CPTAC imaging collections (over 500 radiology cases and 1600 digital histopathology) are available here.
"The release of these four datasets is a significant step in our mission to promote open-access complex data initiatives for cancer research," said Dr. Lalitha K. Shankar, MD, PhD, Chief of Clinical Trials Branch at the CIP. "We are pleased to contribute these valuable resources to the scientific community, enabling researchers and AI experts to make noteworthy advancements in cancer imaging analysis as well as multi-omics assessments, to, ultimately, improve patient outcomes."
Annotation Protocol
For each patient, all scans were reviewed to identify and annotate the clinically relevant time points and sequences/series. In a typical patient all available time points were annotated. The following annotation rules were followed:
- RECIST 1.1 was generally followed for solid tumors (MRI and CT imaging). A maximum of 5 lesions were annotated per patient scan (timepoint); no more than 2 lesions per organ. The same 5 lesions were annotated at each time point. Lymph nodes were annotated if > 1 cm in short axis. Other lesions were annotated if > 1 cm.
- Lesions were annotated in the axial plane. If no axial plane was available, lesions were annotated in the available plane.
- MRIs were annotated using all axial T1-weighted post contrast sequences.
- CTs were annotated using all axial post contrast series.
- Lesions were labeled separately.
- Seed points were automatically generated but reviewed by a radiologist.
- A “negative” annotation was created for any exam without findings.
At each time point:
- A seed point (kernel) was created for each segmented structure. The seed points for each segmentation are provided in a separate DICOM RTSTRUCT file.
- SNOMED-CT “Anatomic Region Sequence” and “Segmented Property Category Code Sequence” and codes were inserted for all segmented structures.
- “Tracking ID” and “Tracking UID” tags were inserted for each segmented structure to enable longitudinal lesion tracking.
- Imaging time point codes were inserted to help identify each annotation in the context of the clinical trial assessment protocol.
- “Clinical Trial Time Point ID” was used to encode time point type using one of the following strings as applicable: “pre-dose” or “post-chemotherapy”.
- Content Item in “Acquisition Context Sequence” was added containing "Time Point Type" using Concept Code Sequence (0040,A168) selected from:
- (255235001, SCT, “Pre-dose”)
- (719864002, SCT, "Post-cancer treatment monitoring")