Skip to content
/ PPO Public

Pytorch implementation of Proximal Policy Optimization (PPO) for continuous action spaces

Notifications You must be signed in to change notification settings

naivoder/PPO

Repository files navigation

Proximal Policy Optimization (Continuous)

Overview

🚧 🛠️👷‍♀️ 🛑 Under construction...

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'LunarLanderContinuous-v2'

Notes: Reward scaling appears to work really well for some environments (BipedalWalker) but it might be limiting the upper bound of performance on some other environments. I've increased the number of episodes to 50k for the Mujoco environments, if that gives the agent enough time to learn I'll rerun on the Gymnasium ones. Examples in the paper train for millions of timesteps...

Pendulum-v1

MountainCarContinuous-v0

LunarLanderContinuous-v2

Pusher-v4

Reacher-v4

InvertedPendulum-v4

BipedalWalker-v3

InvertedDoublePendulum-v4

Walker2d-v4

Ant-v4

HalfCheetah-v4

Swimmer-v3

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.