Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

将AC改为off-policy后,每次训练500条左右的经验就会报错,显示action_dist = torch.distributions.Categorical(probs)这行代码的运行结果为tensor([[nan, nan]] #66

Open
Chensyfighting opened this issue Dec 4, 2023 · 3 comments

Comments

@Chensyfighting
Copy link

1
代码基本没变,我就加了经验回放池等几个操作

@Chensyfighting
Copy link
Author

2
就在原代码的基础上加入了框选的这几行代码
为啥会跑不通呀?我找了好久没找到错
希望有大佬救救

@senren001323
Copy link

2 就在原代码的基础上加入了框选的这几行代码 为啥会跑不通呀?我找了好久没找到错 希望有大佬救救

梯度的问题吧,可以检查一下梯度,旧数据对当前策略参数的更新应该是不太稳定的;我也是新手,意见仅作参考

@Chensyfighting
Copy link
Author

2 就在原代码的基础上加入了框选的这几行代码 为啥会跑不通呀?我找了好久没找到错 希望有大佬救救

梯度的问题吧,可以检查一下梯度,旧数据对当前策略参数的更新应该是不太稳定的;我也是新手,意见仅作参考

感谢,我也找到原因了,确实是梯度问题,梯度更新的时候会出现nan值。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants