Large-scale Modeling of Multi-Species Acute Toxicity Endpoints using Consensus of Multi-Task Deep Learning Methods
This repository contains multitask deep learning models developed using acute toxicity data, primarily focusing on the endpoints: lethal dose fifty (LD50); lethal dose low (LDLO); and toxic dose low (TDLO). Please note that the data was obtained from ChemIDPlus.
Our best models are based on a consensus of best developed individual models. We compared our best models against the multi-task deep learning models by Sosnin et al. While they report models for a total of 29 toxicity endpoints, our models are based on a total of 59 endpoints. A total of 18 LD50 endpoints were in common. The results for these 18 endpoints are listed below. The performance measure reported is root mean squared error (lower is better).
species | route | cpds (ours) | cpds (Sosnin et al) | score (ours) | scorea (Sosnin et al) |
---|---|---|---|---|---|
mouse | intraperitoneal | 36295 | 37202 | 0.41 | 0.41 |
mouse | oral | 23373 | 24355 | 0.39 | 0.42 |
mouse | intravenous | 16978 | 17742 | 0.43 | 0.43 |
rat | oral | 10190 | 10743 | 0.52 | 0.53 |
mouse | subcutaneous | 6769 | 7221 | 0.51 | 0.51 |
rat | intraperitoneal | 5021 | 5041 | 0.52 | 0.55 |
rat | intravenous | 2472 | 2538 | 0.52 | 0.54 |
rat | subcutaneous | 1896 | 2014 | 0.63 | 0.64 |
mouse | unreported | 1739 | 1804 | 0.47 | 0.51 |
rabbit | skin | 1495 | 1734 | 0.53 | 0.56 |
mammalb | unreported | 1129 | 1121 | 0.42 | 0.40 |
rabbit | oral | 894 | 910 | 0.58 | 0.58 |
rat | skin | 835 | 930 | 0.61 | 0.63 |
rat | unreported | 806 | 838 | 0.58 | 0.60 |
rabbit | intravenous | 792 | 764 | 0.59 | 0.68 |
guinea pig | oral | 793 | 799 | 0.66 | 0.70 |
rat | oral | 322 | 966 | 0.63 | 0.61 |
rat | intraperitoneal | 318 | 1029 | 0.52 | 0.43 |
a the scores are from the supplementary information of the original article; b the mammalian species and route are unspecified
We also report single-task models using baseline methods: random forest and deep neural networks. The scripts used for modeling can be found under scripts/
. An example notebooks/create_fold_data.ipynb
to create the training and test sets by joining the descriptors and task details for different folds of cross-validation is provided.