Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the “Big Data” Era
This repository contains the code and data related to our recent article that compares classical ML appproaches with newer AI techniques in predicting hERG channel inhibition. A breif description of contents of the repository is provided below.
data/
: datasets used to build and validate the modelsext_models/
: prospective validation results from external hERG models (StarDrop and PredhERG)notebooks/
: a jupyter notebook (and model dependecies) that allows consensus prediction on a test datasetscripts/
: scripts used to build hERG models
The Jupyter notebook in notebooks/
can be used to perform a consenus prediction based on the best individual models developed in this study.
The code requires two types of molecular descriptors to be calculated before hand: RDKit descriptors and Morgan fingerprints. The models were built using RDKit features (a total of 119 descriptors) and Morgan fingerprints (1024 bits; radius 2) that were calculated in KNIME. The first column of the file must be SMILES followed by the RDKit descriptors and Morgan fingerprints in the same order. An example test set is available in the code: notebooks/blockers_sampled.csv
.
-
Clone the repository and fetch all files (some files are large and need to be fetched using git lfs)
git clone https://github.com/ncats/herg-ml.git cd herg-ml git lfs install git lfs fetch git lfs pull
-
Create and activate a conda environment
conda create -n herg-ml python=3.6 conda activate herg-ml bash install.sh
-
Launch Jupyter (opens the default web browser - http://localhost:8888/tree)
jupyter notebook
-
Open the file
notebooks/consensus_model.ipynb
and execute the notebook following the in-line instructions -
To end the notebook session, press
Ctrl+C
and choosey
when prompted to shutdown the notebook server -
Deactive the conda environment when finshed
conda deactivate
Note: These instructions were tested in MacOS with Python 3.6
We compared our consensus model (using the prospective validation set) against previous hERG models proposed by Braga et al. (Pred-hERG 4.2) and Ryu et al. (DeepHIT).
Model | Balanced Accuracy | Sensitivity | Specificity |
---|---|---|---|
Our Consensus | 0.80 | 0.74 | 0.86 |
Pred-hERG 4.2a | 0.77 | 0.74 | 0.81 |
DeepHIT | 0.75 | 0.73 | 0.77 |
a Pred-hERG 4.2 returned predictions for 835 out 839 validation set compounds.
A web-based prediction service will be made available in future. If you experience troubles using the currently available models, please contact us.