The annotated, tiled data used to train our machine learning model to classify breast metastatic or benign tissue. The dataset is too large to upload as it contains over 300,000 images totaling over 15GB across training, validation and test sets. Can be downloaded from PCAM Github. We have added the annotation data for each dataset as a .csv
file in each directory. It contains malignant or benign annotation as a binary label against each tile.
The 74 WSI dataset used to extract features that serve as covariates in survival modeling. Each WSI is over 1.5GB in size, totaling over 80GB for all 74 image files. They could not be uploaded due to their size. Can be downloaded from TCGA repoistory. The IDs of each WSI used can be found in the /TCGA data/TCGA_data_ID.csv
file.
The corresponding clinical data for each of the 74 TCGA samples was retrieved from CBioPortal. It is recorded in /Survival data/survival_data.csv
. The survival duration in months is under OS_MONTHS
, survival status (deceased, alive) under OS_STATUS
and patient ID under Patient ID
.
/Prediction results/binary_predictions
contains a .csv
file for each of the 74 WSIs. The .csv
file contains discrete tile-level prediction results generated by our model, as a 0 (benign) or 1 (malignant) class label. These are generated by code in generate predictions Each has an asssociated .png
image showing the binary segmentation map of malignant and benign regions according to the binary prediction results, these are generated using code in visualize tiles. All binary prediction data can be viewed and downloaded from google drive.
/Prediction results/probability_predictions
contains similar files for each of the 74 WSIs. Instead of discrete, binary class label predictions, this directory contains tile-level malignant probability (probability of a tile belonging to the malignant class) predictions for each WSI. These are generated by code in generate predictions. Each has an associated .png
image showing a continuous heatmap of the malignant probability distribution of each WSI, these are generated using code in visualize tiles. All probability prediction data can be accessed and downloaded from google drive.