Skip to content

Data and code associated with pancreatic cancer drug screening and modeling

Notifications You must be signed in to change notification settings

gomezlab/PDACperturbations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Pancreatic Cancer Cell Line Response to Kinase Inhibitors

Data and code associated with kinase activation state and cell viabiltiy modeling. Most of the code consists of Rmarkdown documents, with the model testing code saved as R script files. The code requires several packages and is organized into sequential steps:

Required Packages

I've written a script that checks for and installs all of the packages required in the repository. I use the pacman package for this purpose and installing pacman if missing is covered in the script. There are also two github based packages:

  • BerginskiRmisc for my custom theme and helper scripts
    • The helper script I use calls the "convert" command from imagemagick to trim whitespace around figures, but this functionality isn't critical for the rest of the results
  • DarkKinaseTools for kinase lists

Data Cleaning and Organization

There are two primary data sets in the repository, results of a screening assay and the data downloaded from the supplement of Klaeger et al. These first scripts take each of these data sets and produce R data files that are then used in downstream processing.

  • Screen Data: here
    • Reads in and organizes the screen data
  • Klaeger Data: here
    • Organizes the Klaeger data for downstream processing

Compound Matching

Most of the compound names in the screen/Klaeger collections don't match up exactly, so we had to go through and manually match most of the shared compounds. This report has the code used to simplify this search:

Preparation for Machine Learning

With the compound/drug names matched, I preped the data sets for machine learning (both regression and above/below 90 viability classification). This code also removes any genes which don't vary in the Klaeger collection after the compounds have been filtered to only the matching compounds. The primary output here are machine learning ready data sets (deposited in the results) and cross-validation split data sets for both regression and binary classification.

Model Testing

The model testing code is saved single self contained scripts which fully implement and run the hyperparameter scanning and model testing. The code is organized this way to make it simplier to run the modelling code on the UNC computing cluster, but should also be compatible with any computing environment. This code takes a long time to run. All of the regression testing models are available here and the binary above/below 90 are available here. There are also scripts (search for run_all_models.R) that build out directory infrastructure and run all the models sequentially.

Prediction Results

Using the random forest model and associated code, predicting the rest of the Klaeger compounds is here.

Reproducibility Script

I've attempted to write a single script that runs all the Rmarkdown files and scripts to completely reproduce the models and figures from the paper. I've only tested the code on Linux, but I see no reason why it wouldn't work on other platforms. Let me know if you attempt to run this script and it fails on your platform.

This script takes a long time to run (7 hours on a Ryzen 7 5800x). This is mostly due to the hyperparameter scanning for the regression and binary models. RAM usage also goes up fairly high during parameter scanning, so you should probably have 64 GB of RAM.

Data Deposition

The following information was supplied regarding data availability: The code is available at GitHub and Zenodo: -https://github.com/gomezlab/PDACperturbations -https://doi.org/10.5281/zenodo.11623371 -Berginiski and Jenner (2024). Kinome state is predictive of cell viability in pancreatic cancer tumor and cancer-associated fibroblast cell lines.

About

Data and code associated with pancreatic cancer drug screening and modeling

Resources

Stars

Watchers

Forks

Packages

No packages published