Twin Delayed Deep Deterministic Policy Gradient (TD3)

Overview

This repository contains a PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3), a reinforcement learning algorithm that addresses some of the key challenges associated with continuous control tasks. The TD3 algorithm builds on the foundation of Deep Deterministic Policy Gradients (DDPG) by introducing several improvements to enhance stability and performance. One of the primary motivations behind TD3 is to mitigate the overestimation bias in Q-learning, which can lead to suboptimal policies. To achieve this, the authors proposed using a pair of critic networks to provide more accurate Q-value estimates. Additionally, TD3 employs a delayed policy update strategy, which reduces the variance in policy updates and helps in achieving more robust learning. Finally, the introduction of target policy smoothing adds noise to the target action, which reduces the likelihood of policy exploitation due to function approximation errors.

🤔 it kinds seems like the catastrophic drops in average score are occuring at regular intervals... could this be a function of the parameter updates?
I'm also not convinced I'm handling the action's correctly for envs with action bounds | x | > 1.

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'LunarLanderContinuous-v2'

No hyperparameter tuning was conducted for the various environments. This was an intentional choice to compare the generalization of the algorithm to different tasks. For this reason, the agent successfully learn in some cases, and in others was still training after 10,000 epochs.

Pendulum-v1	LunarLanderContinuous-v2	MountainCarContinuous-v0

BipedalWalker-v3	Hopper-v4	Humanoid-v4

Ant-v4	HalfCheetah-v4	HumanoidStandup-v4

InvertedDoublePendulum-v4	InvertedPendulum-v4	Pusher-v4

Reacher-v4	Swimmer-v3	Walker2d-v4

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
environments		environments
metrics		metrics
.gitignore		.gitignore
README.md		README.md
actor.py		actor.py
agent.py		agent.py
critic.py		critic.py
main.py		main.py
memory.py		memory.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Releases

Packages

Languages

naivoder/TD3

Folders and files

Latest commit

History

Repository files navigation

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages