Proximal Policy Optimization (Continuous)

Overview

🚧 🛠️👷‍♀️ 🛑 Under construction...

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'LunarLanderContinuous-v2'

Notes: Reward scaling appears to work really well for some environments (BipedalWalker) but it might be limiting the upper bound of performance on some other environments. I've increased the number of episodes to 50k for the Mujoco environments, if that gives the agent enough time to learn I'll rerun on the Gymnasium ones. Examples in the paper train for millions of timesteps...

Pendulum-v1	MountainCarContinuous-v0	LunarLanderContinuous-v2

Pusher-v4	Reacher-v4	InvertedPendulum-v4

BipedalWalker-v3	InvertedDoublePendulum-v4	Walker2d-v4

Ant-v4	HalfCheetah-v4	Swimmer-v3

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
environments		environments
metrics		metrics
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
animate.py		animate.py
main.py		main.py
memory.py		memory.py
networks.py		networks.py
requirements.txt		requirements.txt
tovi.yaml		tovi.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (Continuous)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Releases

Packages

Languages

naivoder/PPO

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (Continuous)

Overview

Setup

Required Dependencies

Running the Algorithm

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages