Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement V4TrainingData #722

Merged
merged 11 commits into from
Feb 10, 2019
Merged

Implement V4TrainingData #722

merged 11 commits into from
Feb 10, 2019

Conversation

Cyanogenoid
Copy link
Member

@Cyanogenoid Cyanogenoid commented Feb 9, 2019

This PR updates the current V3TrainingData to V4TrainingData with the following additional information:

  • Q at root node (from side-to-move perspective)
  • Q of best move (from side-to-move perspective)
  • Edit: Draw probability of WDL value head of root and best move, see discussion
  • Legal vs illegal moves.

Aggregate evaluation Q: Storing information about Q allows us to perform resign analysis (LeelaChessZero/lczero-common#1) and try training schemes involving Q, such as the (Q+Z)/2 proposal by Oracle, which I also discussed on our blog. The Q that is stored here is from the same player's perspective as the outcome Z. I included both root Q and best Q since we might as well have both, it's just a 32-bit float extra.

Legal moves: By initialising the vector of probabilities to -1, we can easily distinguish between legal and illegal moves on the training side. This lets us try alternative training schemes where policy is only trained on legal moves rather than all moves, making the training setting as close as possible to the evaluation setting (we don't care about what probability it puts on illegal moves in gameplay after all). In my initial testing on CCRL, this was around 30 Elo stronger, and it could prevent issues with the network placing very low probability on some moves, simply because it wrongly thought that the move is illegal. With this change, all consumers of the training data must threshold the -1s back to 0 if they want to treat it as a probability vector.

I have tried implementing use of protobuf format for training data in https://github.com/Cyanogenoid/lc0/tree/chunk. With the current protobuf training data, they have bigger file size after compression so there is no immediate benefit. I think it's better to separate the functional changes in this PR from the format change to protobuf anyway.

I believe that fersbery, dkappe, and ASilver have been using a prior version of this branch with good success. With decode_training from https://github.com/Cyanogenoid/lczero-training/tree/q, here is the output of a game (note the Root Q and Best Q lines): https://gist.github.com/Cyanogenoid/cc3a4d6a717b9605b074a50d35c320e3

Is there any other data we want to store in the training data?

@Ttl
Copy link
Member

Ttl commented Feb 9, 2019

For compatibility with WDL value head draw value should also be included. Currently WDL head outputs the same Q and additionally D in range 0 to 1 for draw probability. It's enough to add a field for D to support it.

@Tilps
Copy link
Contributor

Tilps commented Feb 9, 2019

I agree with ttl here, no point adding a v4 format without including the WDL via root and best move d.

But this would only be a temporary thing - more input planes will likely render this obsolete relatively soon. Probably good to invest more into working out a better protobuf format when we can.

@Cyanogenoid
Copy link
Member Author

I will add d of the root and the best move. Since that depends on #635, do I wait until that's merged or what is the procedure here?

My thinking was that we can start with this V4TrainingData to try stuff in the T50 run ASAP and move to protobuf format afterwards.

@Tilps
Copy link
Contributor

Tilps commented Feb 9, 2019

Doing some more thinking. Unless the update to the training PR to support both v3 and v4 goes in before the a0 policy head PR to the training codebase - the engine is going to have to output v3 or v4 dependent on whether the input net is the right format - or someone will need to write a v4 to v3 down converter that I can stick into the training pipeline next to rescorer. Because the a0 policy head PR to training doesn't support backwards compatibility so I can't update the T40 run to a new version of lczero-training once its in (or at least the version I saw on ttl's branch didn't support backwards compatibility IIRC).

@Ttl
Copy link
Member

Ttl commented Feb 9, 2019

AZ policy head training code is backwards compatible. Policy head type can be specified in the yaml file and the code understands both old and new policy heads.

@Tilps
Copy link
Contributor

Tilps commented Feb 9, 2019

Same does not appear true for the WDL change though. Do you plan on updating it to also be backwards compatible?

@Ttl
Copy link
Member

Ttl commented Feb 9, 2019

Yes, I will make it also backwards compatible.

@Tilps
Copy link
Contributor

Tilps commented Feb 9, 2019

okay all good then - but the training side version of this PR should go in first.

With regards to ordering with respect to 635 - if this is ready to submit before 635 just store 0 for the draw values and then update them as part of 635.

@Tilps
Copy link
Contributor

Tilps commented Feb 10, 2019

I'll approve this once I've upgraded the server.

@Tilps
Copy link
Contributor

Tilps commented Feb 10, 2019

Actually there is going to be a small window of breakage either way - as I need to merge this PR into rescorer and make rescorer loading code support both v3 and v4.

@Tilps Tilps merged commit 3eda703 into LeelaChessZero:master Feb 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants