RL Reach

RL Reach is a platform for running reproducible reinforcement learning experiments. Training environments are provided to solve the reaching task with the WidowX MK-II robotic arm. The Gym environments and training scripts are adapted from Replab and Stable Baselines Zoo, respectively.

Documentation

Please read the documentation to get started with RL Reach. More details can be found in the associated journal publication or ArXiv ePrint.

Installation

1. Local installation

# Clone the repository
git clone https://github.com/PierreExeter/rl_reach.git && cd rl_reach/code/

# Install and activate the Conda environment
conda env create -f environment.yml
conda activate rl_reach

Note, this Conda environment assumes that you have CUDA 11.1 installed. If you are using another version of CUDA, you will have to install Pytorch manually as indicated here.

2. Docker install

Pull the Docker image (CPU or GPU)

docker pull rlreach/rlreach-cpu:latest
docker pull rlreach/rlreach-gpu:latest

or build image from Dockerfile

docker build -t rlreach/rlreach-cpu:latest . -f docker/Dockerfile_cpu
docker build -t rlreach/rlreach-gpu:latest . -f docker/Dockerfile_gpu

Run commands inside the docker container with run_docker_cpu.sh and run_docker_gpu.sh.

Example:

./docker/run_docker_cpu.sh python run_experiments.py --exp-id 999 --algo ppo --env widowx_reacher-v1 --n-timesteps 30000 --n-seeds 2
./docker/run_docker_cpu.sh python evaluate_policy.py --exp-id 999 --n-eval-steps 1000 --log-info 0 --plot-dim 0 --render 0

Note, the GPU image requires nvidia-docker.

3. CodeOcean

A reproducible capsule is available on CodeOcean.

Test the installation

Manual tests

python tests/manual/1_test_widowx_env.py
python tests/manual/2_test_train.py
python tests/manual/3_test_enjoy.py
python tests/manual/4_test_pytorch.py

Automated tests

pytest tests/auto/all_tests.py -v

Train RL agents

RL experiments can be launched with the script run_experiments.py.

Usage:

Flag	Description	Type	Example
`--exp-id`	Unique experiment ID	int	999
`--algo`	RL algorithm	str	a2c, ddpg, her, ppo, sac, td3
`--env`	Training environment ID	str	widowx_reacher-v1
`--n-timesteps`	Number of training timesteps	int	10³ to 10¹²
`--n-seeds`	Number of runs with different initialisation seeds	int	2 to 10

Example:

python run_experiments.py --exp-id 999 --algo ppo --env widowx_reacher-v1 --n-timesteps 10000 --n-seeds 3

A Bash script that launches multiple experiments is provided for convenience:

./run_all_exp.sh

Evaluate policy and save results

Trained models can be evaluated and the results can be saved with the script evaluate_policy.py.

Usage:

Flag	Description	Type	Example
`--exp-id`	Unique experiment ID	int	999
`--n-eval-steps`	Number of evaluation timesteps	int	1000
`--log-info`	Enable information logging at each evaluation steps	bool	0 (default) or 1
`--plot-dim`	Live rendering of end-effector and goal positions	int	0: do not plot (default), 2: 2D or 3: 3D
`--render`	Render environment during evaluation	bool	0 (default) or 1

Example:

python evaluate_policy.py --exp-id 999 --n-eval-steps 1000 --log-info 0 --plot-dim 0 --render 0

If --log-info was enabled during evaluation, it is possible to plot some useful information as shown in the plot below.

python scripts/plot_episode_eval_log.py --exp-id 999

The plots are generated in the associated experiment folder, e.g. logs/exp_999/ppo/.

Example of environment evaluation plot:

Example of experiment learning curves:

Benchmark

The evaluation metrics, environment's variables, hyperparameters used during the training and parameters for evaluating the environments are logged for each experiments in the file benchmark/benchmark_results.csv. Evaluation metrics of selected experiments ID can be plotted with the script scripts/plot_benchmark.py. The plots are generated in the folder benchmark/plots/.

Usage:

Flag	Description	Type	Example
`--exp-list`	List of experiments to consider for plotting	list of int	26 27 28 29
`--col`	Name of the hyperparameter for the X axis, see column names here	str	n_timesteps

Example:

python scripts/plot_benchmark.py --exp-list 26 27 28 29 --col n_timesteps

Example of benchmark plot:

Optimise hyperparameters

Hyperparameters can be tuned automatically with the optimisation framework Optuna using the script train.py -optimize.

Usage:

Flag	Description	Type	Example
`--algo`	RL algorithm	str	a2c, ddpg, her, ppo, sac, td3
`--env`	Training environment ID	str	widowx_reacher-v1
`--n-timesteps`	Number of training timesteps	int	10³ to 10¹²
`--n-trials`	Number of optimisation trials	int	2 to 100
`--n-jobs`	Number of parallel jobs	int	2 to 16
`--sampler`	Sampler for optimisation search	str	random, tpe, skopt
`--pruner`	Pruner to kill unpromising trials early	str	halving, median, none
`--n-startup-trials`	Number of trials before using optuna sampler	int	2 to 10
`--n-evaluations`	Number of episode to evaluate a trial	int	10 to 20
`--log-folder`	Log folder for the results	str	logs/opti

Example:

python train.py -optimize --algo ppo --env widowx_reacher-v1 --n-timesteps 100000 --n-trials 100 --n-jobs 8 --sampler tpe --pruner median --n-startup-trials 10 --n-evaluations 10 --log-folder logs/opti

A Bash script that launches multiple hyperparameter optimisation runs is provided for convenience:

./opti_all.sh

Clean all the results (Reset the repository)

It could be convenient to clean all the results and log files. Warning, this cannot be undone!

./cleanAll.sh

Training environments

A number of custom Gym environments are available in the gym_envs directory. They simulate the WidowX MK-II robotic manipulator with the Pybullet physics engine. The objective is to bring the end-effector as close as possible to a target position.

Each implemented environment is described here. The action, observation and reward functions are given in this table. Some environment renderings can be found below.

Reaching task	Rendering
Fixed position, no orientation

Random position, no orientation

Fixed position, fixed orientation

Fixed position, random orientation

Moving position, no orientation

Tested on

Ubuntu 18.04
Python 3.7.9
Conda 4.9.2
CUDA 11.1

Citation

Please cite this work as:

@article{aumjaud2021a,
author = {Aumjaud, Pierre and McAuliffe, David and Rodriguez-Lera, Francisco J and Cardiff, Philip},
journal = {Software Impacts},
pages = {100061},
volume = {8},
title = {{rl{\_}reach: Reproducible reinforcement learning experiments for robotic reaching tasks}},
archivePrefix = {arXiv},
arxivId = {2102.04916},
doi = {https://doi.org/10.1016/j.simpa.2021.100061},
year = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
code		code
docs		docs
.gitignore		.gitignore
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Reach

Documentation

Installation

1. Local installation

2. Docker install

3. CodeOcean

Test the installation

Train RL agents

Evaluate policy and save results

Benchmark

Optimise hyperparameters

Clean all the results (Reset the repository)

Training environments

Tested on

Citation

About

Releases

Packages

Languages

PierreExeter/rl_reach

Folders and files

Latest commit

History

Repository files navigation

RL Reach

Documentation

Installation

1. Local installation

2. Docker install

3. CodeOcean

Test the installation

Train RL agents

Evaluate policy and save results

Benchmark

Optimise hyperparameters

Clean all the results (Reset the repository)

Training environments

Tested on

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages