Implement V4TrainingData #722

Cyanogenoid · 2019-02-09T06:52:41Z

This PR updates the current V3TrainingData to V4TrainingData with the following additional information:

Q at root node (from side-to-move perspective)
Q of best move (from side-to-move perspective)
Edit: Draw probability of WDL value head of root and best move, see discussion
Legal vs illegal moves.

Aggregate evaluation Q: Storing information about Q allows us to perform resign analysis (LeelaChessZero/lczero-common#1) and try training schemes involving Q, such as the (Q+Z)/2 proposal by Oracle, which I also discussed on our blog. The Q that is stored here is from the same player's perspective as the outcome Z. I included both root Q and best Q since we might as well have both, it's just a 32-bit float extra.

Legal moves: By initialising the vector of probabilities to -1, we can easily distinguish between legal and illegal moves on the training side. This lets us try alternative training schemes where policy is only trained on legal moves rather than all moves, making the training setting as close as possible to the evaluation setting (we don't care about what probability it puts on illegal moves in gameplay after all). In my initial testing on CCRL, this was around 30 Elo stronger, and it could prevent issues with the network placing very low probability on some moves, simply because it wrongly thought that the move is illegal. With this change, all consumers of the training data must threshold the -1s back to 0 if they want to treat it as a probability vector.

I have tried implementing use of protobuf format for training data in https://github.com/Cyanogenoid/lc0/tree/chunk. With the current protobuf training data, they have bigger file size after compression so there is no immediate benefit. I think it's better to separate the functional changes in this PR from the format change to protobuf anyway.

I believe that fersbery, dkappe, and ASilver have been using a prior version of this branch with good success. With decode_training from https://github.com/Cyanogenoid/lczero-training/tree/q, here is the output of a game (note the Root Q and Best Q lines): https://gist.github.com/Cyanogenoid/cc3a4d6a717b9605b074a50d35c320e3

Is there any other data we want to store in the training data?

This reverts commit 318b6cb.

Ttl · 2019-02-09T07:12:33Z

For compatibility with WDL value head draw value should also be included. Currently WDL head outputs the same Q and additionally D in range 0 to 1 for draw probability. It's enough to add a field for D to support it.

Tilps · 2019-02-09T07:27:32Z

I agree with ttl here, no point adding a v4 format without including the WDL via root and best move d.

But this would only be a temporary thing - more input planes will likely render this obsolete relatively soon. Probably good to invest more into working out a better protobuf format when we can.

Cyanogenoid · 2019-02-09T07:35:24Z

I will add d of the root and the best move. Since that depends on #635, do I wait until that's merged or what is the procedure here?

My thinking was that we can start with this V4TrainingData to try stuff in the T50 run ASAP and move to protobuf format afterwards.

Tilps · 2019-02-09T07:41:48Z

Doing some more thinking. Unless the update to the training PR to support both v3 and v4 goes in before the a0 policy head PR to the training codebase - the engine is going to have to output v3 or v4 dependent on whether the input net is the right format - or someone will need to write a v4 to v3 down converter that I can stick into the training pipeline next to rescorer. Because the a0 policy head PR to training doesn't support backwards compatibility so I can't update the T40 run to a new version of lczero-training once its in (or at least the version I saw on ttl's branch didn't support backwards compatibility IIRC).

Ttl · 2019-02-09T07:48:41Z

AZ policy head training code is backwards compatible. Policy head type can be specified in the yaml file and the code understands both old and new policy heads.

Tilps · 2019-02-09T07:49:33Z

Same does not appear true for the WDL change though. Do you plan on updating it to also be backwards compatible?

Ttl · 2019-02-09T07:50:11Z

Yes, I will make it also backwards compatible.

Tilps · 2019-02-09T07:51:31Z

okay all good then - but the training side version of this PR should go in first.

With regards to ordering with respect to 635 - if this is ready to submit before 635 just store 0 for the draw values and then update them as part of 635.

Tilps · 2019-02-10T01:17:06Z

I'll approve this once I've upgraded the server.

Tilps · 2019-02-10T02:17:48Z

Actually there is going to be a small window of breakage either way - as I need to merge this PR into rescorer and make rescorer loading code support both v3 and v4.

Cyanogenoid added 9 commits November 3, 2018 04:19

Include Q evaluation in training data

26bd94f

Flip Q to match winner

264bf2b

Set probabilities of illegal moves to -1

44f5bfb

Include best move info

5838e8b

Remove root_q for now

318b6cb

Merge branch 'master' into q

2a4fe45

Revert "Remove root_q for now"

47bb098

This reverts commit 318b6cb.

clang-format

c79efc9

Comment about illegal moves

beb4932

alreadydone mentioned this pull request Feb 9, 2019

Could snapbacks be a problem? leela-zero/leela-zero#877

Open

Store draw probability of root and best move

c307db5

Cyanogenoid mentioned this pull request Feb 9, 2019

Support V4TrainingData LeelaChessZero/lczero-training#61

Merged

Tilps approved these changes Feb 10, 2019

View reviewed changes

Add missing .

842ef26

Tilps merged commit 3eda703 into LeelaChessZero:master Feb 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement V4TrainingData #722

Implement V4TrainingData #722

Cyanogenoid commented Feb 9, 2019 •

edited

Loading

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Cyanogenoid commented Feb 9, 2019

Tilps commented Feb 9, 2019 •

edited

Loading

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Tilps commented Feb 10, 2019

Tilps commented Feb 10, 2019

Implement V4TrainingData #722

Implement V4TrainingData #722

Conversation

Cyanogenoid commented Feb 9, 2019 • edited Loading

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Cyanogenoid commented Feb 9, 2019

Tilps commented Feb 9, 2019 • edited Loading

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Ttl commented Feb 9, 2019

Tilps commented Feb 9, 2019

Tilps commented Feb 10, 2019

Tilps commented Feb 10, 2019

Cyanogenoid commented Feb 9, 2019 •

edited

Loading

Tilps commented Feb 9, 2019 •

edited

Loading