Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Requirements

You can setup Python liblaries using bash setup.sh. Note that you need a MuJoCo license. Please follow the instruction in mujoco-py for help.

How to run

The expert demonstration data used in the experiment is prepared here, please download and unzip to ./buffers. Then you can use the following command to directly execute the odirl program.

# Example: run on PointMaze envs, GPU
python train_odirl.py \
    --cuda --s_env_id airl_envs:PointMazeReach-v0 --t_env_id airl_envs:PointMazeWallReach-v0 \
    --buffer buffers/airl_envs:PointMazeWallReach-v0/size1000000_std0.01_prand0.0.pth \
    --rollout_length 2000 --samp_fre 30

# Example: run on HalfCheetah envs, GPU
python train_odirl.py \
    --cuda --s_env_id HalfCheetah-v3 --t_env_id HalfCheetahT-v3 \
    --buffer buffers/HalfCheetah-v3/size1000000_std0.01_prand0.0.pth \
    --rollout_length 50000 --epoch_disc 10 --epoch_cla 1 --samp_fre 100

# Example: run on Ant envs, GPU
python train_odirl.py \
    --cuda --s_env_id Ant-v3 --t_env_id AntT-v3 \
    --buffer buffers/Ant-v3/size1000000_std0.01_prand0.0.pth \
    --rollout_length 50000 --epoch_disc 10 --epoch_cla 1 --samp_fre 100

The expert demonstration data can also be retrieved by the following methods.

Train expert

You can train experts using Soft Actor-Critic(SAC)

python train_expert.py --cuda --env_id airl_envs:PointMazeReach-v0 --num_steps 100000 --seed 0

Collect demonstrations

You need to collect demonstraions using trained expert's weight. Note that --std specifies the standard deviation of the gaussian noise add to the action, and --p_rand specifies the probability the expert acts randomly. We set std to 0.01 not to collect too similar trajectories.

python collect_demo.py \
    --cuda --env_id airl_envs:PointMazeReach-v0 \
    --weight logs/airl_envs:PointMazeReach-v0/sac/*/model/*/actor.pth \
    --buffer_size 1000000 --std 0.01 --p_rand 0.0 --seed 0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
airl_envs		airl_envs
gail_airl_ppo		gail_airl_ppo
gym		gym
.gitignore		.gitignore
README.md		README.md
collect_demo.py		collect_demo.py
requirements.txt		requirements.txt
setup.sh		setup.sh
setup_envs.py		setup_envs.py
train_expert.py		train_expert.py
train_odirl.py		train_odirl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Requirements

How to run

Train expert

Collect demonstrations

About

Releases

Packages

Languages

yachenkang/ODIRL

Folders and files

Latest commit

History

Repository files navigation

Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Requirements

How to run

Train expert

Collect demonstrations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages