-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while training with MONAILabel's deepedit #1489
Comments
Thanks for opening the issue here.
I also see that you're using
Caching the dataset is great because is faster. However, it seems the GPU you have can't cache the number of volumes you are using to train the model. I'd recommend:
The downside of this is that training is slower.
Just to understand a bit more about the use case here: 1/ How many labels are you trying to segment (https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/configs/deepedit.py#L42-L51)? 2/ Did you change the default volume size (https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/configs/deepedit.py#L77)? Let us know |
Hi @diazandr3s Thank you for your answer! I am indeed able to execute the training process when I switch from CacheDataset to Dataset. The downside is indeed that this switch entails a significant change in the time the training takes, so I was wondering if there was any way to fix it while still using CacheDataset. Here some more background: 2/I have not changed anything about the volume size yet Hope this helps! Thank you again for your answer! |
Thanks for the reply, @lukasvanderstricht With regards to this:
I meant you train the model on the number of volumes your GPU can cache. Then retrain on the other volumes. Keep using the default batch size of 1. |
Hi, |
Hi @nvahmadi Thanks for the suggestion! It indeed also seems to work, but I don't see a major difference in speed when compared to Kind regards |
Thanks for reporting back, and interesting to note. Not sure why, but for me the speedup was drastic, it was comparable to |
I indeed let it run further than 1 epoch but it still remains as slow as Dataset. I don't cache to NVMe drives though. |
Ok good to know, thanks. One note - I just remembered that I made this experience in context of MONAI Core and on larger batch sizes. I'd need to try myself whether I get similar speed-ups in MONAI Label and e.g. batch sizes of 1. Sorry for the confusion! |
Closing this issue |
Dear all
I am currently using 3D Slicer and its MONAILabel extension to train a segmentation model using the DeepEdit model from the predefined radiology app. Both manual segmentation and training have been going smoothly up till now and the automatic segmentation functionality seems to be doing its job.
However, when I want to further train the model at this point, without having added any new labels (so just starting the training process again), I always get one of the two following errors
It seems weird to me that without changing anything (such as adding new labels), the training suddenly starts to systematically fail while it was working fine before.
Does anyone have any clue as to why these errors occur?
Thanks in advance!
The text was updated successfully, but these errors were encountered: