Repository for reproducing results for DeepGRP publication (currently under review).
You can install all required packages using poetry with:
git clone https://github.com/fhausmann/deepgrp_reproducibility cd deepgrp_reproducibility poetry install
Note
To fully reproduce the results from the deepgrp paper you need to have a version of RepeatMasker here with cross_match and a version of Repbase which cannot be provided due to licensing restrictions.
To use the packages installed via poetry, you have to activate the poetry environment via:
poetry shell
or run your command using:
poetry run <your command>
You can download all required training/testing data and required programs with make:
poetry run make
Warning, this can take a while, depending on your connection.
All results in the paper are generated with hyperparameter in best_model.toml.
These hyperparameter where found using the following search space:
Parameter | Parameter Name | Distribution |
---|---|---|
Window size | vecsize | q-normal(\mu = 200, \sigma = 20, q = 2) |
Recurrent units | units | q-normal(\mu = 32, \sigma = 5, q = 2) |
Dropout | dropout | Uniform(low = 0, high = 0.4) |
RMSprop momentum | momentum | Uniform(low = 0, high = 1.0) |
RMSprop decay | rho | Uniform(low = 0, high = 1.0) |
Learning rate | learning_rate | Lognormal(\mu = -7, \sigma = 0.5) |
Repeat probability per batch | repeat_probability | Uniform(low = 0, high = 0.49) |
The performance and benchmarking results can be downloaded as json files from results. All trained models can be found at models.
Training of DeepGRP can be done with the jupyter notebook Training_deepgrp.ipynb and dna-nn with Training_dnabrnn.ipynb.
Benchmark can be done with Benchmark.ipynb.
To evaluate the resuts from the benchmark experiments, use Evaluation.ipynb.
All figures of the paper can be generated with Figures.ipynb. They will be saved in a figures subfolder.