Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 3.02 KB

README.md

File metadata and controls

15 lines (10 loc) · 3.02 KB

Data used in the project

PCAM data

The annotated, tiled data used to train our machine learning model to classify breast metastatic or benign tissue. The dataset is too large to upload as it contains over 300,000 images totaling over 15GB across training, validation and test sets. Can be downloaded from PCAM Github. We have added the annotation data for each dataset as a .csv file in each directory. It contains malignant or benign annotation as a binary label against each tile.

TCGA data

The 74 WSI dataset used to extract features that serve as covariates in survival modeling. Each WSI is over 1.5GB in size, totaling over 80GB for all 74 image files. They could not be uploaded due to their size. Can be downloaded from TCGA repoistory. The IDs of each WSI used can be found in the /TCGA data/TCGA_data_ID.csv file.

Survival data

The corresponding clinical data for each of the 74 TCGA samples was retrieved from CBioPortal. It is recorded in /Survival data/survival_data.csv. The survival duration in months is under OS_MONTHS, survival status (deceased, alive) under OS_STATUS and patient ID under Patient ID.

Prediction results

/Prediction results/binary_predictions contains a .csv file for each of the 74 WSIs. The .csv file contains discrete tile-level prediction results generated by our model, as a 0 (benign) or 1 (malignant) class label. These are generated by code in generate predictions Each has an asssociated .png image showing the binary segmentation map of malignant and benign regions according to the binary prediction results, these are generated using code in visualize tiles. All binary prediction data can be viewed and downloaded from google drive.

/Prediction results/probability_predictions contains similar files for each of the 74 WSIs. Instead of discrete, binary class label predictions, this directory contains tile-level malignant probability (probability of a tile belonging to the malignant class) predictions for each WSI. These are generated by code in generate predictions. Each has an associated .png image showing a continuous heatmap of the malignant probability distribution of each WSI, these are generated using code in visualize tiles. All probability prediction data can be accessed and downloaded from google drive.