One key goal of the MBCproject is to enable scientific discoveries and the development of new treatment strategies for metastatic breast cancer by widely sharing clinical, genomic, molecular, and patient-reported data.
To this end, the MBCproject has first released available data in cBioPortal, a web-based platform to allow exploration of genomic data. Much effort has been put into generating, cleaning, and releasing this data so that the entire global research community can analyze and develop new research directions. The MBCproject data is being released as it is generated, and as the project continues to enroll patients, more data will be added. As we release more data over time, we anticipate that this will help accelerate discoveries in metastatic breast cancer.
To everybody who is taking part in the MBCproject, we would like to extend our most sincere gratitude for your partnership that enables the generation of this data.
Below, you will find the following information:
- Link to the MBCproject data
- Additional information about the data release
- List of common terms used in this research
- Link to the methods used to generate this data
Questions about these data, including how to cite, can be directed to email@example.com.
The MBCproject Data on cBioPortal
Additional Information About the Genomic Data Release
Will the MBCproject generate and release additional data?
Yes, the MBCproject is an ongoing research initiative. The project will continue to enroll patients and request saliva, blood, and tissue samples indefinitely. Our goal is to release data every 6 months as it is generated.
How have patients been involved in this data release?
Patients and patient advocates within the metastatic breast cancer community have been involved since the inception of the Metastatic Breast Cancer Project. We engaged patients and advocates over the course of 15 hours to help us guide the presentation and development of this data release. Patients provided extensive feedback on many aspects of this data, including data element names, descriptions of data elements, and how the data is displayed in the portal.
What makes this data different from what is found in existing studies?
The collection of data in this release is different for several reasons:
- All patients included in the study have metastatic breast cancer
- Multiple tumor samples per patient are included for many patients (including primary and metastatic biopsies)
- Data elements include:
- Whole Exome Sequencing (WES) on all samples (as compared to targeted sequencing panel or other more limited testing)
- Samples annotated with demographic, diagnostic, and pathology data
- Samples annotated with treatment data, including all reported drugs given in the metastatic setting and the duration of therapy
- Samples annotated with patient-reported data from the MBCproject enrollment survey
- All of this data has been generated through direct partnership with patients
What data elements are included?
Data is included for the following categories:
- Genomic Information
- Medical Record (MedR)
- Pathology Report (PATH)
- Patient-reported (PRD)
Whole Exome Sequencing (WES) information is included for each tumor sample, including mutations, small insertions/deletions, amplifications, and deletions.
Information about diagnosis and treatment was abstracted from patients’ medical records. All information that was derived from a medical record is denoted with the prefix MedR.
Detailed information was abstracted from pathology reports associated with each sample. All information derived from a pathology report is denoted with the prefix PATH.
Every patient that registered for the study had the option of completing an 18-question survey about their experiences with metastatic breast cancer. Eight questions from this survey have been abstracted for inclusion. These responses have been modified and formatted to prevent individual identification without altering scientific utility. All patient-reported data elements are denoted with the prefix PRD.
Are there resources to define the terminology used in this data release?
At the bottom of this page are The National Cancer Institute’s comprehensive dictionary of clinical terms.
Why are some attributes “Unknown” for some cases and not for others?
All of the clinical information that is gathered comes directly from medical records or pathology reports. If the complete report or record that was received did not explicitly contain the information, the field was marked as “Unknown” in order to ensure accuracy.
Why are there more samples than patients?
For several patients, we were able to obtain multiple biopsy samples. Multiple biopsy samples are requested in cases where having multiple biopsies can help answer an important scientific question, and there is ample tissue available for research while still ensuring that tissue is left behind for future clinical use.
Will everyone who registers for this study have their tumor sequenced?
As of November 2018, over 5000 women and men with metastatic breast cancer have registered for the project. Of these registrants, over 2900 have provided consent. Approximately 1800 patients have sent in saliva kits. Medical records and tumor samples are being requested for everyone who has sent in saliva. Obtaining medical records and tissue samples is a manual process involving faxes, phone calls, and additional follow up. The ultimate goal is to obtain medical records and tumors from as many consenting patients as possible. Our goal is to release new data in regular intervals as it is generated.
Can I use this data to inform my clinical care?
This data is intended to be used for research purposes only. Genomic data included in this study is generated in a research lab, not in a clinical lab. All annotations have been de-identified. Therefore, this data cannot be used to inform clinical decision-making. Some of the data has been intentionally altered to protect confidentiality in a way that does not affect data integrity (for example, Age at Diagnosis has been grouped).
Does this data contain any biases?
Yes. Biases will occur based on who learns about the project and enrolls via the website. Some biases in this data set are disproportionate numbers of young participants and white participants. We hope that over time the study becomes more representative and encourage anyone with outreach ideas to contact us at firstname.lastname@example.org.
For reference to common terms used in cancer care and research use the National Cancer Institute Dictionary of Cancer Terms