MDP Generation #46

pkel · 2023-10-31T08:47:29Z

https://arxiv.org/abs/2309.11924

This gets rid of any protocol data stored on the block. This enables future bit-packing of the state. Also, parents are not ordered any more. This enables future merging of isomorphic DAGs.

Run into issue with pynauty certificate. Two non-isomorphic graphs yield the same certificate: pdobsan/pynauty#33 I'll now try applying the canonical relabelling within State.compress() instead.

Next step is to implement rewards

The selfish mining model was hard-coded in mdp.py. I looked there and reimplemented it w.r.t. the new model spec API in sm.py. New step is to make the exploration (mdp.py) work for generic model specifications. Then start thinking about rewards.

The model unrelated parts of the exploration now live in compiler.py. The reimplementation lacks two important features relative to mdp.py but improves elsewhere. - state compression is missing; nothing is truncated so far - termination is missing; exploration continues forever - mdp matrix generation certainly was broken in mdp.py and I tried to fix it. Needs testing though.

Inspired by old state compression in sm.py

To avoid giving rewards twice, common history truncation becomes mandatory.

Traditional and proposed models do not agree for gamma 0.5 ... 0.9, alpha 0.25 ... 0.35. PTO revenue is higher for our model. Reward per progress is higher in the traditional model. Actually, PTO-optimal policy against proposed model performs worse than honest wrt. reward per progress. In this commit I added steady-state weighted PTO revenues to the pipeline. First results on small problems look like PTO transformation, value iteration, steady state calculation, and reward per progress calculation do what they should for the traditional model. I guess most likely option now is that the proposed model violates some assumptions of PTO.

Results confirm value iteration. Speed is about the same for both algorithms. PI is a bit faster for small traditional problems, VI is a bit faster for own model. Most importantly, changing to policy iteration does not solve the old problem that PTO produces higher revenue for own model than traditional model, while reward per progress is lower (sub-honest) in our model for e.g. alpha=.33, gamma=0.75.

Following a5a5a0 and 4cafea, I thought the last source of error is in the calculation of reward per progress. To rule this out I now tried to 1. Use the new policy_evaluation(reachable_only=True) on PTO mdp for small theta, note down number of iterations. 2. Do backpropagation in the ARR mdp for that many steps. 3. Calculate steady state in the ARR mdp 4. Divide steady-state weighted reward by steady-state weighted progress With this, the problem seems to be gone. At least on one difficult instance in a notebook. Still have to integrate this into the pipeline. commit 4cafead (HEAD -> mdp-gen, origin/mdp-gen) Author: Patrik Keller <git@pkel.dev> Date: Thu Sep 14 13:33:58 2023 +0200 mdp. draft policy_iteration Results confirm value iteration. Speed is about the same for both algorithms. PI is a bit faster for small traditional problems, VI is a bit faster for own model. Most importantly, changing to policy iteration does not solve the old problem that PTO produces higher revenue for own model than traditional model, while reward per progress is lower (sub-honest) in our model for e.g. alpha=.33, gamma=0.75. commit a5a5a09 Author: Patrik Keller <git@pkel.dev> Date: Wed Sep 13 21:06:19 2023 +0200 mdp. investigate unexpected results Traditional and proposed models do not agree for gamma 0.5 ... 0.9, alpha 0.25 ... 0.35. PTO revenue is higher for our model. Reward per progress is higher in the traditional model. Actually, PTO-optimal policy against proposed model performs worse than honest wrt. reward per progress. In this commit I added steady-state weighted PTO revenues to the pipeline. First results on small problems look like PTO transformation, value iteration, steady state calculation, and reward per progress calculation do what they should for the traditional model. I guess most likely option now is that the proposed model violates some assumptions of PTO.

Delta to traditional model is gone, finally.

Co-authored-by: roibarzur <roi.barzur@gmail.com>

pkel · 2024-04-29T12:07:36Z

I think this is mostly done. The ideas live on in #48 and #49. Going to merge

pkel added 30 commits October 31, 2023 09:45

mdp: Draft protocol specification

1d80cdd

mdp: Draft state transitions

796071b

mdp. First successful execution

5373fd0

mdp. document protocol spec API

1e8ffab

mdp. guide state exploration by number of PoWs

b8d2ecb

mdp. improve state compression

e6d7532

mdp. count transitions

c56f356

mdp. iterate protocol spec API

ff53997

This gets rid of any protocol data stored on the block. This enables future bit-packing of the state. Also, parents are not ordered any more. This enables future merging of isomorphic DAGs.

mdp. draft bit-packing of state

7c3d916

mdp. bit-pack queued unexplored states

c068d02

mdp. update comments

00ca70e

mdp. Update todos

6331136

mdp. spec parallel PoW

4da5256

mdp. improve bit-packing

77ab0e4

mdp. try to merging isomorphic states

98bc2f0

Run into issue with pynauty certificate. Two non-isomorphic graphs yield the same certificate: pdobsan/pynauty#33 I'll now try applying the canonical relabelling within State.compress() instead.

mdp. merge isomorphic state

3baf675

mdp. stop exploration after a given number of PoWs

e88b67f

mdp. create matrices for mdptoolbox

7a63272

Next step is to implement rewards

mdp. draft interface for specifying mdp models

d669026

mdp. carve out selfish mining model spec

f7f6f0b

The selfish mining model was hard-coded in mdp.py. I looked there and reimplemented it w.r.t. the new model spec API in sm.py. New step is to make the exploration (mdp.py) work for generic model specifications. Then start thinking about rewards.

mdp. reimplement truncation

31955b2

mdp. fix matrix generation

34a5341

mdp. fix small bug in sm.py action continue

84a75d2

mdp. implement isomorphism_class

3a5044c

Inspired by old state compression in sm.py

mdp. reimplement merging of isomorphic states

bef7570

mdp. calculate mining reward for common history

42b43d1

To avoid giving rewards twice, common history truncation becomes mandatory.

mdp. implement stochastic termination in SM model

b2eb861

mdp. Implement naive value iteration

636c9c2

mdp. value iteration w/o matrix math seems 100x faster?

bf0bf89

pkel and others added 25 commits October 31, 2023 09:45

mdp. plot for paper

74fa993

mdp. update notebooks for 1m transitions models

d9e5d8f

mdp. tweak figure for paper

fe47ea7

mdp. change value iteration stopping condition

98e893e

mdp. clarify note on ambiguity in pynauty API

3ca7f64

mdp. notes

eb0f306

mdp. update notebooks for recent results

bdf48c4

mdp. fix reward per progress calculation

ac50d21

mdp. delete todo note

5a0d57c

mdp. Tweak notebooks for recent updates (with small state space)

c739c70

mdp. update notebooks for 100k transitions

75b94f1

mdp. implement traditional selfish mining like Sapirsthein et al.

4d6f0ac

mdp. fix tab-validation.py

57c4d0f

mdp. report RPP for paper

5cc912c

mdp. Update validation notebook with big models.

a3856b0

Delta to traditional model is gone, finally.

mdp. Double check Bar-Zur model against author implementation

22aa896

mdp. tweak paper figure

2cabd48

mdp. add python dependency

df73e8d

mdp. Roi found a bug in my implementation.

2c604a5

Co-authored-by: roibarzur <roi.barzur@gmail.com>

mdp. Roi's fix is effective.

42d6d6f

mdp. prepare short term revenue and consistency MDPs

fc3dbe3

mdp. run first consistency and short term revenue experiments

4305c40

pkel mentioned this pull request Mar 25, 2024

Dynamic environment with probabilistic termination implemented in Rust #48

Merged

pkel force-pushed the mdp-gen branch from 54e2776 to 4305c40 Compare April 29, 2024 11:56

pkel mentioned this pull request Apr 29, 2024

Real-Time Dynamic Programming #49

Merged

pkel merged commit 13102a1 into master Apr 29, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDP Generation #46

MDP Generation #46

pkel commented Oct 31, 2023

pkel commented Apr 29, 2024

MDP Generation #46

MDP Generation #46

Conversation

pkel commented Oct 31, 2023

pkel commented Apr 29, 2024