Deeper and Larger Network Design for Continous Control in RL

Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:

2020 ICML Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
2020 NeurIPS Workshop D2RL: Deep Dense Architectures in Reinforcement Learning
2021 Arxiv Training Larger Networks for Deep Reinforcement Learning

In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense. It is noteworthing that we only implement single-machine approach for ofe_dense, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.

Supported algorithms

algorithm	continuous control	on-policy / off-policy
Proximal Policy Optimization (PPO) coupled with d2rl	✅	on-policy
Deep Deterministic Policy Gradients (DDPG) coupled with d2rl	✅	off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe	✅	off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense	✅	off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl	✅	off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe	✅	off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense	✅	off-policy
Soft Actor-Critic (SAC) coupled with d2rl	✅	off-policy
Soft Actor-Critic (SAC) coupled with ofe	✅	off-policy
Soft Actor-Critic (SAC) coupled with ofe_dense	✅	off-policy

Instructions

Recommend: Run with Docker

# python        3.6    (apt)
# pytorch       1.4.0  (pip)
# tensorflow    1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-docker

For other dockerfiles, you can go to RL Dockefiles.

Launch experiments

Run with the scripts batch_run_main_d2rl_4seed_cuda.sh / batch_run_main_ofe_4seed_cuda.sh / batch_run_main_ofe_dense_4seed_cuda.sh / batch_run_ppo_d2rl_4seed_cuda.sh:

# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True

bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0

Plot results

# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset, 
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
    data/DDPG_ofe-Hopper-v2/ \
    -l DDPG_ofe -s 10

Performance on MuJoCo

Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Walker2d-v2.

DDPG and its variants
TD3 and its variants
SAC and its variants
PPO and its variants

Citation

@misc{QingLi2021larger,
  author = {Qing Li},
  title = {Deeper and Larger Network Design for Continous Control in RL},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadMe.md

ReadMe.md

Deeper and Larger Network Design for Continous Control in RL

Supported algorithms

Instructions

Recommend: Run with Docker

Launch experiments

Plot results

Performance on MuJoCo

Citation

Files

ReadMe.md

Latest commit

History

ReadMe.md

File metadata and controls

Deeper and Larger Network Design for Continous Control in RL

Supported algorithms

Instructions

Recommend: Run with Docker

Launch experiments

Plot results

Performance on MuJoCo

Citation