Difficulty Implementing Example MDP #422

ryan-o-c · 2022-07-29T17:22:51Z

ryan-o-c
Jul 29, 2022

Hello, I just wanted to ask for some help as I'm trying to get to grips with this package. I'm trying to run an end to end implementation of an MDP solver for one of the simple problems (eventually I hope to use the solvers on a more complex MDP I will have to define myself). For now, I'm trying the Tiger problem. I've copied the code for the mdp definition from the documentation page "Defining POMDPs and MDPs", and the simulation code is mostly copied from "Simulation Standards". I'll leave the code at the bottom.

The error I'm encountering is at the line "a = action(policy, b)" which reads:

MethodError: no method matching estimate_value(::MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, ::QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, ::String, ::Int64)
Closest candidates are:
  estimate_value(::Function, ::Union{MDP, POMDP}, ::Any, ::Any) at ~/.julia/packages/MCTS/CyGog/src/domain_knowledge.jl:9
  estimate_value(::Number, ::Union{MDP, POMDP}, ::Any, ::Any) at ~/.julia/packages/MCTS/CyGog/src/domain_knowledge.jl:10
  estimate_value(::MCTS.SolvedRolloutEstimator, ::MDP, ::Any, ::Any) at ~/.julia/packages/MCTS/CyGog/src/domain_knowledge.jl:65

I've tried to define b0 using the default "b0 = initialstate(pomdp)`", but I get an error that @ref is not defined (I can't figure out what @ref even does!). I've tried to define b0 otherwise (as below), and although it doesn't spit an error immediately, I am doubtful that I have it right!

Another issue I'm having is with beliefs; They are vaguely defined in the documentation as they depend on the updater, but I can't figure out what kind of belief is relevant to the updater I'm currently using (DiscreteUpdater)?

When I can hopefully move onto my own MDP, I'm also unsure as to what form the state must take - need I define the entire state space on initialisation? The mountaincar example does not seem to do so, hence my confusion. I have done something similar when using ReinforcementLearning.jl but I'm not sure how exactly to define my state space for this package. The examples seem to just list all of the states, but mine is a very large state space of coordinates, defined by two arrays of size (N,2) over the numbers from 0 to 2N. (For now, N=8!)

I hope someone can help me with this smattering of questions - I'm struggling to get a handle on how this package works and hopefully these answers will get me on my way. Maybe it's very apparent from my questions, but I am a Julia novice and not a fantastic programmer in general, so apologies if my questions are trivial - I have been trying to answer these questions without clogging this forum but I'm at the end of my tether!

Entire code for the problem/solver I'm trying to implement is as follows:

using POMDPTools
using POMDPs
using QuickPOMDPs: QuickPOMDP
using POMDPTools: Deterministic, Uniform, SparseCat
import POMDPTools: ImplicitDistribution
import Distributions: Normal
using MCTS
using StaticArrays

m = QuickPOMDP(
    states = ["left", "right"],
    actions = ["left", "right", "listen"],
    observations = ["left", "right"],
    discount = 0.95,
transition = function (s, a)
    if a == "listen"
        return Deterministic(s) # tiger stays behind the same door
    else # a door is opened
        return Uniform(["left", "right"]) # reset
    end
end,

observation = function (a, sp)
    if a == "listen"
        if sp == "left"
            return SparseCat(["left", "right"], [0.85, 0.15]) # sparse categorical
        else
            return SparseCat(["right", "left"], [0.85, 0.15])
        end
    else
        return Uniform(["left", "right"])
    end
end,

reward = function (s, a)
    if a == "listen"
        return -1.0
    elseif s == a # the tiger was found
        return -100.0
    else # the tiger was escaped
        return 10.0
    end
end,

initialstate = Uniform(["left", "right"]),

solver = MCTSSolver(n_iterations=50, depth=20, exploration_constant=5.0)
policy = solve(solver, m)


up = POMDPTools.BeliefUpdaters.DiscreteUpdater(m)
b0 = uniform_belief(m)

b = POMDPs.initialize_belief(up, b0)
s = rand(initialstate(m))

r_total = 0.0
d = 1.0
while !isterminal(m, s)
    a = action(policy, b)
    s, o, r = @gen(:sp,:o,:r)(m, s, a)
    r_total += d*r
    d *= discount(m)
    b = update(up, b, a, o)
end

Julia indicates that the error is at the line

a = action(policy, b)

and the full error reads:

MethodError: Cannot `convert` an object of type DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String} to an object of type String
Closest candidates are:
  convert(::Type{String}, ::String) at ~/software/julia-1.7.2/share/julia/base/essentials.jl:223
  convert(::Type{T}, ::T) where T<:AbstractString at ~/software/julia-1.7.2/share/julia/base/strings/basic.jl:231
  convert(::Type{T}, ::AbstractString) where T<:AbstractString at ~/software/julia-1.7.2/share/julia/base/strings/basic.jl:232
  ...

Stacktrace:
 [1] push!(a::Vector{String}, item::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ Base ./array.jl:994
 [2] insert_node!(tree::MCTS.MCTSTree{String, String}, planner::MCTSPlanner{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String, String, MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ MCTS ~/.julia/packages/MCTS/CyGog/src/vanilla.jl:336
 [3] build_tree(planner::MCTSPlanner{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String, String, MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ MCTS ~/.julia/packages/MCTS/CyGog/src/vanilla.jl:265
 [4] plan!(planner::MCTSPlanner{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String, String, MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ MCTS ~/.julia/packages/MCTS/CyGog/src/vanilla.jl:248
 [5] action_info(p::MCTSPlanner{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String, String, MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ MCTS ~/.julia/packages/MCTS/CyGog/src/vanilla.jl:203
 [6] action(p::MCTSPlanner{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String, String, MCTS.SolvedRolloutEstimator{RandomPolicy{Random._GLOBAL_RNG, QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::DiscreteBelief{QuickPOMDP{UUID("80821d3f-b8c5-4704-8a04-ff2ad858c3e7"), String, String, String, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{String, Int64}, Bool, Dict{String, Int64}, Vector{String}, Vector{String}, Float64, Vector{String}, var"#20#23", Dict{String, Int64}, var"#19#22", var"#21#24", Uniform{Set{String}}}}}, String})
   @ MCTS ~/.julia/packages/MCTS/CyGog/src/vanilla.jl:208
 [7] top-level scope
   @ In[34]:4
 [8] eval
   @ ./boot.jl:373 [inlined]
 [9] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

P.S. I promise I'm spacing with tabs, but I'm additionally having trouble getting github to format my code nicely in this post!

zsunberg · 2022-07-29T22:46:59Z

zsunberg
Jul 29, 2022
Maintainer

Hi @ryan-o-c , one possible issue is that MCTS is designed for MDPs whereas the problem you are trying to solve is a POMDP.

You might be able to solve the issue by switching from MCTSSolver to POMCPSolver from BasicPOMCP.jl

1 reply

zsunberg Jul 29, 2022
Maintainer

Let us know if that doesn't work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulty Implementing Example MDP #422

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Difficulty Implementing Example MDP #422

ryan-o-c Jul 29, 2022

Replies: 1 comment · 1 reply

zsunberg Jul 29, 2022 Maintainer

zsunberg Jul 29, 2022 Maintainer

ryan-o-c
Jul 29, 2022

Replies: 1 comment 1 reply

zsunberg
Jul 29, 2022
Maintainer

zsunberg Jul 29, 2022
Maintainer