The MBCproject has released the first of many publicly available data sets in cBioPortal, a web-based platform to allow exploration of genomic data. This data is being released as it is generated and as the project continues to enroll patients, more data will be added.
Our efforts to date have been focused on generating, cleaning, and releasing this data so that the entire research community can now begin to analyze and develop new research directions. As we release more data over time, we anticipate that this will accelerate discoveries in metastatic breast cancer.
To everybody who is taking part in the MBCproject, we would like to extend our most sincere gratitude for your partnership that enables the generation of this data.
Below, you will find the following information:
- Link to the MBCproject data
- Additional information about the data release
- List of common terms used in this research
- Link to the methods used to generate this data
Please feel free to contact us at email@example.com with any questions and include “Data Release” in your email subject line. This page will be updated regularly with additional information.
The MBCproject Data on cBioPortal
Additional Information About the Genomic Data Release
What is the goal of this data release?
The goal of this public data release is to share clinical, genomic, molecular, and patient-reported data on metastatic breast cancer to accelerate discoveries and the development of new treatment strategies. Our goal is to add new data to the collection every 6 months.
How have patients been involved in this data release?
Patients and patient advocates within the metastatic breast cancer community have been involved since the inception of the Metastatic Breast Cancer Project. We engaged patients and advocates over the course of 15 hours to help us guide the presentation and development of this data release. Patients provided extensive feedback on many aspects of this data, including data element names, descriptions of data elements, and how the data is displayed in the portal.
What makes this data different from what is found in existing studies?
The collection of data in this release is different for several reasons:
- All patients included in the study have metastatic breast cancer
- Multiple tumor samples per patient are included for many patients (including primary and metastatic biopsies)
- Data for every patient includes:
- Whole Exome Sequencing (WES) on all samples (as compared to targeted sequencing panel or other more limited testing)
- Samples annotated with demographic, diagnostic, and pathology data
- Samples annotated with treatment data, including all reported drugs given in the metastatic setting and the duration of therapy
- Samples annotated with patient-reported data from the MBCproject enrollment survey
- All of this data has been generated through direct partnership with patients
Will the MBCproject generate and release additional data?
Yes, the MBCproject is an ongoing research initiative. The project will continue to enroll patients and request saliva, blood, and tissue samples indefinitely. Our goal is to release additional data every 6 months as it is generated.
What data elements are included?
Data is included for the following categories:
- Genomic Information
- Medical Record (MedR)
- Pathology Report (PATH)
- Patient-reported (PRD)
Whole Exome Sequencing (WES) information is included for each tumor sample, including mutations, small insertions/deletions, amplifications, and deletions.
Information about diagnosis and treatment was abstracted from patients’ medical records. All information that was derived from a medical record is denoted with the prefix MedR.
Detailed information was abstracted from pathology reports associated with each sample. All information derived from a pathology report is denoted with the prefix PATH.
Every patient that registered for the study had the option of completing an 18-question survey about their experiences with metastatic breast cancer. Ten questions from this survey have been abstracted for inclusion. These responses have been modified and formatted to prevent individual identification without altering scientific utility. All patient-reported data elements are denoted with the prefix PRD.
Are there resources to define the terminology used in this data release?
At the bottom of this page are The National Cancer Institute’s comprehensive dictionary of clinical terms, as well as definitions of some common genomic terms.
Why are some attributes “Unknown” for some cases and not for others?
All of the clinical information that is gathered comes directly from medical records or pathology reports. If the complete report or record that was received did not explicitly contain the information, the field was marked as “Unknown” in order to ensure accuracy.
Why are there more samples than patients?
For several patients, we were able to obtain multiple biopsy samples. Multiple biopsy samples are requested in cases where having multiple biopsies can help answer an important scientific question, and there is ample tissue available for research while still ensuring that tissue is left behind for future clinical use.
Will everyone who registers for this study have their tumor sequenced?
As of October 2017, over 4000 women and men with metastatic breast cancer have registered for the project. Of these registrants, over 2000 have provided consent. Approximately 1400 patients have sent in saliva kits. Medical records and tumor samples are being requested for everyone who has sent in saliva. Obtaining medical records and tissue samples is a manual process involving faxes, phone calls, and additional follow up. The ultimate goal is to obtain medical records and tumors from as many consenting patients as possible. This is the first batch of abstracted medical records and sequenced tumors and our goal is to release new data in 6 month intervals as it is generated.
Can I use this data to inform my clinical care?
This data is intended to be used for research purposes only. Genomic data included in this study is generated in a research lab, not in a clinical lab. All annotations have been de-identified. Therefore, this data cannot be used to inform clinical decision-making. Some of the data has been intentionally altered to protect confidentiality in a way that does not affect data integrity (for example, Age at Diagnosis has been grouped).
Does this data contain any biases?
Yes. Biases will occur based on who learns about the project and enrolls via the website. Some biases in this data set are disproportionate numbers of young participants and white participants. We hope that over time the study becomes more representative and encourage anyone with outreach ideas to contact us at firstname.lastname@example.org.
Detailed methods for how these data were generated and analyzed are available here.
For reference to common terms used in cancer care and research use the National Cancer Institute Dictionary of Cancer Terms
Patients have requested that some genomics terms seen on cBioPortal be defined here. If there are additional terms that you would like defined that are not included here or in the National Cancer Institute Dictionary, please send an email to email@example.com and include “New Glossary Term:” in your email subject line.