Deep Deterministic Policy Gradients (DDPG)

Overview

This repository contains an implementation of the Deep Deterministic Policy Gradients (DDPG) algorithm, as described in the paper "Continuous control with deep reinforcement learning" by Lillicrap et al, and evaluated on various standard continuous control environments from the Gymnasium and MuJoCo libraries. DDPG is an actor-critic, model-free algorithm tailored to continuous action domains. Building on the deterministic policy gradient (DPG) framework, DDPG adapts techniques from Deep Q-Network (DQN) like experience replay and the use of target networks to stablize training and handle high-dimensional, continuous action spaces. The authors also incorporate batch normalization in the actor network to manage the diverse scale of different inputs effectively, however this implementation makes use of PyTorch's LayerNorm as it is invariant to batch size and allows for a cleaner implementation of the target network parameter updates.

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'LunarLanderContinuous-v2'

Pendulum-v1	LunarLanderContinuous-v2	MountainCarContinuous-v0

BipedalWalker-v3	Hopper-v4	Humanoid-v4

Ant-v4	HalfCheetah-v4	HumanoidStandup-v4

InvertedDoublePendulum-v4	InvertedPendulum-v4	Pusher-v4

Reacher-v4	Swimmer-v3	Walker2d-v4

No hyper-parameter tuning was conducted for these benchmarks. This was an intentional choice to compare the generalized algorithm performance across a variety of environments. As such, there are several cases where the agent fails to the effectively learn, and others where the agent was still learning after 10k epochs. DDPG is notably brittle to starting conditions and hyper-parameter choices which can affect its perfoormance, a limitation addressed by subsequent improvments in algorithms like Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO).

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
environments		environments
metrics		metrics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
actor.py		actor.py
agent.py		agent.py
animate.py		animate.py
critic.py		critic.py
main.py		main.py
memory.py		memory.py
noise.py		noise.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Deterministic Policy Gradients (DDPG)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Releases

Packages

Languages

License

naivoder/DDPG

Folders and files

Latest commit

History

Repository files navigation

Deep Deterministic Policy Gradients (DDPG)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages