Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-Time Dynamic Programming #49

Merged
merged 32 commits into from
Aug 25, 2024
Merged

Real-Time Dynamic Programming #49

merged 32 commits into from
Aug 25, 2024

Conversation

pkel
Copy link
Owner

@pkel pkel commented Apr 29, 2024

While working on #48 I came to the conclusion that modern RL algorithms might be overkill for my type of problem. I went back to the tabular solving approach kicked-off in #46. I came up with a new solving algorithm that is similar to value iteration but

  • samples exploration paths from a dynamic environment
  • builds the tabular state space on the fly
  • does dynamic programming state-value updates in the meantime

According to Sutton and Barto book on RL, this falls into the broad category of "Asynchronous Dynamic Programming". After some googling, I think I've implemented Real-Time Dynamic Programming.

The results seem promising. I can now handle a non-truncated hence infinite state space instance of the generic DAG model for Nakamoto/Bitcoin.

@pkel
Copy link
Owner Author

pkel commented Aug 25, 2024

I initially was hyped about this RTDP thing because

  • it does exploration on the fly
  • does not use state approximations
  • Barto/Sutton provide proof that it converges to the optimal policy.

After implementing the algorithm I

  • observed that it does not converge, instead stops exploring new states
  • tried to fix it and failed
  • noticed that the convergence is only guaranteed if all states are visited regularly (maybe all states reachable by optimal policy would be enough)
  • concluded that if all states are visited regularly I could just as well use traditional dynamic programming, e.g. value iteration.

Merging/closing this now, as I'm about to explore a somewhat separate idea which re-uses parts of the tooling.

@pkel pkel merged commit f38ddc7 into master Aug 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant