-
Notifications
You must be signed in to change notification settings - Fork 14
Can't train model, no K80 accelerators #318
Comments
I just tried again in asia-east1 and it failed again. Does anyone have this working this year?
I think something is directing it to a training region that no longer has K80 quota, see the JSON blob below from my Logs explorer that shows the request with "region": "us-central1". I never choose us-central1 for any part of this project, so this is something internal to the project. Maybe hidden in the docker image. { |
Still no success. I replaced all the us-central references I could find anywhere in the fmltc files with asia-east1 and a couple of zone references to asia-east1-b since that region and that particular zone has lots K80's. But it still fails. I suspect something else is trying to run the machine learning in a region/zone that doesn't have K80's. Probably something in the Docker image deployment, but I can see anything else to change. I had hoped to get this working for a season start event that's already passed, and I've got a workshop upcoming at month end. I don't want to use up any team's allocation of training hours since this is not team specific work. Anyone got this working this for the Centerstage season? |
If you're doing a workshop, contact me at ftctech@firstinspires.org and we can talk. Subject to approval, we can credit your account for the time you use for the workshop (talk to us first before doing this). |
When I click the Start Training button for a dataset I get error "The request for 1 K80 accelerators exceeds the allowed maximum of 0".
I've tried with my App Engine in us-east1 and in us-west1.
I've seen the Google page that mention what zones in a region have K80's, but I don't see any way to specify a zone, just a region.
And no region has K80's in all zones.
So how do I make this work? Do you have this working somewhere, which region?
Or is there a way to force the training to occur in a specific region and zone?
The text was updated successfully, but these errors were encountered: