Implemented for Tensorflow 2.0+
- DDPG with prioritized replay
- Primal-Dual DDPG for CMDP
- SAC Discrete
- Install dependancies imported (my tf2 conda env as reference)
- Each file contains example code that runs training on CartPole env
- Training:
python3 TF2_DDPG_LSTM.py
- Tensorboard:
tensorboard --logdir=DDPG/logs
- Install hyperopt https://github.com/hyperopt/hyperopt
- Optional: switch agent used and configure param space in
hyperparam_tune.py
- Run:
python3 hyperparam_tune.py
Agents tested using CartPole env.
Name | On/off policy | Model | Action space support |
---|---|---|---|
DQN | off-policy | Dense, LSTM | discrete |
DDPG | off-policy | Dense, LSTM | discrete, continuous |
AE-DDPG | off-policy | Dense | discrete, continuous |
SAC:bug: | off-policy | Dense | continuous |
PPO | on-policy | Dense | discrete, continuous |
Name | On/off policy | Model | Action space support |
---|---|---|---|
Primal-Dual DDPG | off-policy | Dense | discrete, continuous |
Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs
DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
---|---|
DDPG Basic, 500 reward | DDPG LSTM, time step = 5, 500 reward |
---|---|
AE-DDPG Basic, 500 reward | PPO Basic, 500 reward |
---|---|