Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda runtime error #12

Open
arunpatala opened this issue Mar 14, 2017 · 6 comments
Open

Cuda runtime error #12

arunpatala opened this issue Mar 14, 2017 · 6 comments

Comments

@arunpatala
Copy link

Hi, Nice repo. I am running the example for training with the given dataset. I am getting a cuda runtime error. I am attaching the log file.

log.txt

@da03
Copy link
Collaborator

da03 commented Mar 14, 2017

Hmm I suspect there's something wrong with your cutorch. Can you try th -lcutorch -e "cutorch.test()" and see the results?

@arunpatala
Copy link
Author

"Completed 76020 asserts in 180 tests with 0 failures and 0 errors"
I have tried it on two machines both had the error.
I was able to test the model but not train it.

@SuperWu090
Copy link

SuperWu090 commented Apr 14, 2017

@arunpatala Unfortunately, I had again encountered the same "device-side assert triggered" problem on both Titan x pascal and Maxwell. I have cheched the cutorch, but didn't find any problems. Have you solved this problem ?

@SuperWu090
Copy link

SuperWu090 commented Apr 19, 2017

This problem may attribute to a recent update of cutorch torch/cutorch#708. However, after adding CUDA_LAUNCH_BLOCKING=1, it fails in the same way as before.

@da03
Copy link
Collaborator

da03 commented Apr 19, 2017

Can you try that again? I figured out a bug that may lead to that problem. @SuperWu090

@SuperWu090
Copy link

SuperWu090 commented Apr 20, 2017

@da03 Thanks very much ! I have tested the program. This problem have been solved. However, due to the recent update of openNMT in Batch.lua (seems to be 1b7632a7799be84da0ef8e8407002484e38c0fe1), there seems to be a new problem "~/torch/install/bin/luajit: ~/torch/install/share/lua/5.1/onmt/data/Batch.lua:78: attempt to index a nil value" . This problem may be solved with the earlier version of openNMT (47431c773c2598384ea6f8c2200c25161f2eef12).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants