Use POMDPs to solve a simple routine economics problem. #351

azev77 · 2020-11-04T18:26:08Z

azev77
Nov 4, 2020

Hi and thank you for this package!

How can I use this package to solve a very routine dynamic programming problem economists usually solve with value function iteration?
Here is the problem (which happens to have a simple closed form solution, so we can check)

Using your terminology:
state variable: k
action variable: c
reward function: log(c)

Answered by zsunberg

Nov 5, 2020

Thanks for posting! It's good to get problems from different fields. It would be great to make this package easier to pick up for economists!

I think something like this works:

using POMDPs
using QuickPOMDPs: QuickMDP
using POMDPModelTools: Deterministic
using DiscreteValueIteration

const n = 500
const α = 0.65
const β = 0.95
const max_k = 2.0

const states = range(1e-6, max_k, length=n)
ceil_state(k) = states[searchsortedfirst(states, k)] # there is probably a better way, but this minimized thinking

valid_actions() = range(1e-7, max_k^α, length=100)    # all of the possible actions
valid_actions(k) = filter(<(k^α), valid_actions())   # all of the valid actions for k

m = QuickMDP(
    …

View full answer

lassepe · 2020-11-04T22:23:52Z

lassepe
Nov 4, 2020
Maintainer

I am not sure, whether I'm understanding the problem. Maybe you can clarify:

It seems that the problem has deterministic dynamics k' = k ^ \alpha - c?
The immediate reward seems to depend only on action c. If there are no additional constraints or a terminal reward that depends on the state, why is it not optimal to always choose c -> infinity regardless of the current state k? Am I misinterpreting the problem statement or is it missing additional information to make it well-posed?

0 replies

azev77 · 2020-11-04T22:30:58Z

azev77
Nov 4, 2020
Author

@lassepe let me rewrite it w/ different notation:

Please let me know if this is clear.

0 replies

lassepe · 2020-11-04T22:51:55Z

lassepe
Nov 4, 2020
Maintainer

I guess that clarified things a little bit for me but still, I feel like this is missing a constraint. Are you maybe implicitly assuming that k must be non-negative for all time?

0 replies

azev77 · 2020-11-04T22:56:47Z

azev77
Nov 4, 2020
Author

Update:
I think I understand what you mean.
Suppose this was a finite horizon problem w/ 3 periods:
k0 given, choose c0, c1, c2:
c0 + k1 = k0^a
c1 + k2 = k1^a
c2 + k3 = k2^a
we need the constraint k3>=0 (else you would set c2= +Infinity)
-in practice k3=0 will be optimal.
-The infinite horizon version of this constraint is called a transversality condition.

Previous comment:
Yes:
k_t >0. but you're starting w/ k_0 >0 and this constraint would never really bind
c_t>0. This wouldn't matter much b/c log(c)=-Infinity if c=0.

Here is how to do this w/ QuantEcon's solver:

using QuantEcon, SparseArrays;

α=.65; β=.95; f(k)=k.^α; u_log(x)= x > 0. ? log(x) : -Inf ;
n=500; grid = range(1e-6, 2.0, length = n);
C = f.(grid) .- grid'
#
coord = repeat(collect(1:n), 1, n)
s_indices = coord[:]
a_indices = transpose(coord)[:]
L = length(a_indices)
#
R = u_log.(C[:])
Q = spzeros(L, n) # Q = zeros(L, n)
for i in 1:L
    Q[i, a_indices[i]] = 1
end
ddp = DiscreteDP(R, Q, β, s_indices, a_indices)
r_pfi  = solve(ddp, PFI)
r_mpfi = solve(ddp, MPFI)
r_vfi  = solve(ddp, VFI)

Update: here is how to simulate

#
k_init = 0.1
k_init_ind = findfirst(collect(grid) .≥ k_init)
k_path_ind = simulate(r_pfi.mc, 25, init=k_init_ind)
k_path = grid[k_path_ind.+1]

0 replies

zsunberg · 2020-11-05T07:06:52Z

zsunberg
Nov 5, 2020
Maintainer

Thanks for posting! It's good to get problems from different fields. It would be great to make this package easier to pick up for economists!

I think something like this works:

using POMDPs
using QuickPOMDPs: QuickMDP
using POMDPModelTools: Deterministic
using DiscreteValueIteration

const n = 500
const α = 0.65
const β = 0.95
const max_k = 2.0

const states = range(1e-6, max_k, length=n)
ceil_state(k) = states[searchsortedfirst(states, k)] # there is probably a better way, but this minimized thinking

valid_actions() = range(1e-7, max_k^α, length=100)    # all of the possible actions
valid_actions(k) = filter(<(k^α), valid_actions())   # all of the valid actions for k

m = QuickMDP(
    states = states,
    actions = valid_actions,
    transition = (k, c) -> Deterministic(ceil_state(k^α - c)),
    reward = (k, c) -> log(c),
    discount = β
)

solver = SparseValueIterationSolver(verbose=true)
policy = solve(solver, m)
@show value(policy, first(states))

ω = α*β
A1 = α/(1.0-ω)
A0 = ((1-ω)*log(1-ω) + ω*log(ω))/((1-ω)*(1-β))
@show A0 + A1 * log(first(states))

You may be able to do better with GlobalApproximationValueIteration or LocalApproximationValueIteration, but I did not have time to try. (@Shushman this might be an interesting toy problem to try on those)

0 replies

azev77 · 2020-11-05T08:22:39Z

azev77
Nov 5, 2020
Author

@zsunberg thanks it works well!!!
It's amazing how automated it is & how close to the math the code is.

I added code to my previous comment to simulate.
How can I simulate this model?
How can I solve a finite horizon version w/ T=120 & k_120=0.
how can I include an inequality constraint eg: g(k,c) < = 0?
to make this friendlier to economists can we make a simple example-note on how to translate a "generic" dynamic programming problem into POMDP form?
Here is the most well known economist formulation.
QuantEcon.jl offers 3 solvers shown above: PFI, MPFI, VFI.
PFI is very fast. Does POMDPs.jl have Policy Function Iteration?

0 replies

azev77 · 2020-11-05T18:46:41Z

azev77
Nov 5, 2020
Author

Here's a stab at a very basic case (sorry if I missed this in the docs):

Now translated to POMDP.jl:

using POMDPs
using QuickPOMDPs: QuickMDP
using POMDPModelTools: Deterministic
using DiscreteValueIteration

const n = 500
const β = 0.95
const max_s = 2.0
const max_a = 1.5
const states = range(1e-6, max_s, length=n)

ceil_state(s) = states[searchsortedfirst(states, s)]
valid_actions() = range(1e-7, max_a, length=100)    # all of the possible actions
# valid_actions(k) = filter(<(k^α), valid_actions())   # not sure, g(s,0) ?

m = QuickMDP(
    states = states,
    actions = valid_actions,
    transition = (s, a) -> Deterministic(ceil_state( g(s,a) )),
    reward = (s, a) -> r(s,a),
    discount = β
)

solver = SparseValueIterationSolver(verbose=true) # pick favorite solver
policy = solve(solver, m)
# simulate(m, initial_s)

0 replies

zsunberg · 2020-11-17T01:40:41Z

zsunberg
Nov 17, 2020
Maintainer

Hi @azev77, did you figure out your questions?

You can use the simulators from POMDPSimulators.jl, probably the stepthrough function is most intuitive.
Right now we don't have an automated way to convert to a finite-horizon problem. There is a stub for a package that does that here: https://github.com/JuliaPOMDP/FiniteHorizonPOMDPs.jl . Currently, you have to manually modify the state space to include the time, so, for instance, each state could be a (k, time) tuple, and isterminal can be used to indicate that the problem ends after a certain time.
I think the constraint would have to be in the action space (i.e. valid_actions) rather than the transition function. If you are OK with defining a discretized action space, you could use filter to enforce the constraint.
Sadly, I don't have time for that, but I would definitely support its development if anyone can spend time on it.
There isn't a policy iteration package, but it would be easy to implement one.

0 replies

azev77 · 2020-11-17T03:03:22Z

azev77
Nov 17, 2020
Author

Before I forget:
6. does POMDPs.jl have continuous-time solvers? (I can't find it, I might have missed it)
Is that something they want in the future?
For deterministic continuous-time problems:
NLOptControl.jl looked promising for non-stochastic (deterministic) problems, but isn't being maintained much.
SciML/OptimalControl.jl wants to have this in the future but we're gonna have to wait...

0 replies

rejuvyesh · 2020-11-17T04:55:08Z

rejuvyesh
Nov 17, 2020
Maintainer

https://github.com/RoboticExplorationLab/TrajectoryOptimization.jl and related projects might be of interest for continuous control problems. They integrate with IPOpt etc appropriately.

0 replies

zsunberg · 2020-11-20T01:02:18Z

zsunberg
Nov 20, 2020
Maintainer

6: Currently POMDPs.jl does not have any continuous-time solvers :/ I think it would take some serious design work to fit that in, and I think it would probably be worth creating another package.

0 replies

azev77 · 2021-08-04T19:53:57Z

azev77
Aug 4, 2021
Author

@zsunberg
currently the gridspace is discrete (for both the state/action).
In reality, it's a discrete approximation of a continuous interval.
How can I solve the problem above with a continuous interval?

I spent a lot trying to figure out the docs w/ no luck on this.

Here is the discrete Grid POMDP code:

using POMDPs
using QuickPOMDPs: QuickMDP
using POMDPModelTools: Deterministic
using DiscreteValueIteration

α = 0.65; 

"states:"
min_s = eps(); max_s = 2.0; n_s = 200; 
states = range(min_s, max_s, length=n_s)
ceil_state(s) = states[searchsortedfirst(states, s)]
"Transition:" # ṡ=μ(s,a) ⟺ a=μ_inv(s,ṡ)
μ(s,a;α=α)      = (s)^α - a
μ_inv(s,sp;α=α) = (s)^α - sp
"actions:" 
min_a = eps(); max_a = μ_inv(max_s,0.0); n_a = 400; # keeps a≥0
valid_actions() = range(min_a, max_a, length=n_a)         # possible actions any state.
valid_actions(s) = filter(<(μ_inv(s,0)), valid_actions()) # valid actions @ state=s 
"reward:" 
r(s,a) = log(a)
"discount:"
β = 0.95

m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)
solver = SparseValueIterationSolver()
sol = solve(solver, m)

To be clear, I have routine parsimonious code that does this, but I'd really like to use POMDPs.jl
Here is my code:

# 1: discrete VFI
struct MDP γ; S; A; T; R end
BE(P,V,s,a) = P.R(s,a) + P.γ*sum(P.T(s,a,sp)*V[i] for (i,sp) ∈ enumerate(P.S))
policy(s; P, V) = findmax([BE(P, V, s, a) for a in P.A(s)])[end]
function MDP_VFI(P::MDP, k_max::Int64, V)
    for k = 1:k_max 
        Vp = [maximum(BE(P, V, s, a) for a ∈ P.A(s)) for s ∈ P.S]
        V = Vp
    end
    return V
end

γ=0.95;
ns=40; na=40;
S          = [range(1e-6, 2.0, length=ns);]
A(s;na=na) = [range(1e-6, (s).^(.67), length=na);]
fsp(s) = S[searchsortedfirst(S, s)] # Closest s' in S
T(s,a,sp) = (sp == fsp(s^(.67) - a)) ? 1.0 : 0.0
R(s,a) = (1-0.95)*log(a);
P1 = MDP(γ, S, A, T, R)
V0 = [0.0 for s in P1.S]
@time V = MDP_VFI(P1, 50, V0)

# 2: Continuous approx VFI
using Interpolations, LinearAlgebra
approx(gr, val0) = LinearInterpolation(gr, val0, extrapolation_bc = Line())
#
struct MDP γ; S; A; T; R end
RHS(P,V,s,a) = P.R(s,a) + P.γ*V(T(s,a))

γ=0.95; ns=100; na=170;
S          = [range(1e-2, 2.0, length=ns);]
A(s)       = [range(1e-6, (s).^(.67), length=na);]
T(s,a)     = s^(.67) - a
R(s,a)     = log(abs(a))  # (1-0.95)*log(a)
P1 = MDP(γ, S, A, T, R)
#
function MDP_VFI(P::MDP, k_max::Int64, V)
    for k = 1:k_max 
        v̂ = approx(P.S, V)
        rhs = [ [RHS(P,v̂,s,a)  for a ∈ P.A(s)] for s ∈ P.S ]
        rhs = hcat(rhs...)
        Vp   = maximum(rhs, dims=1) |> vec
        V = Vp 
    end
    return V  
end
#
V0 = [log(s) for s in P1.S]
V = MDP_VFI(P1, 50, V0)

1 reply

zsunberg Aug 5, 2021
Maintainer

DiscreteValueIteration.jl itself assumes a discrete state space. It looks like you are using linear interpolation between a set of grid points. The local approximation VI solver would be best for this: https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl. Why don't you take a look at that and see if you can figure out how to use it.

azev77 · 2021-08-05T21:27:07Z

azev77
Aug 5, 2021
Author

I spent a bunch of time yesterday but couldn't figure out how to apply LocalApproximationValueIteration.jl to my problem.

If we can figure this out I'd love to contribute an econ example like I did here: https://pulsipher.github.io/InfiniteOpt.jl/dev/examples/Optimal%20Control/consumption_savings/

In general I will try to write the doc example using generic optimal control notation & then apply a variety of solvers, both discrete grid & continuous grid.

S: state grid
A(s): valid_actions(s)
T(s,a,t): transition function
r(s,a,t): reward function
γ: discount factor

I strongly believe an example w/ this kind of notation would make it much easier for new users to jump into POMDPs.jl.

6 replies

Shushman Aug 9, 2021
Maintainer

@azev77 could you say more about where you're getting stuck? Your parsimonious code from your post 4 days ago is doing Discrete VI on a grid with linear interpolation.
The test example from LocalApproximationValueIteration in https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl/blob/master/test/runtests_versus_discrete_vi.jl does the same thing, with grids of two different resolutions. You can define a RectangleGrid akin to what you need and create a LocalGIFunctionApproximator based on that.
It looks like your true transition model is generative, so you can then create a LocalApproximationValueIterationSolver with .is_mdp_generative set to true (or you can create an explicit transition model based on the vertices of the grid you are interpolating on).

azev77 Aug 9, 2021
Author

@Shushman I really don't understand how to apply LocalApproximationValueIteration() to my simple example:

using POMDPs
using QuickPOMDPs: QuickMDP
using POMDPModelTools: Deterministic
using DiscreteValueIteration

α = 0.65; 

"states:"
min_s = eps(); max_s = 2.0; n_s = 200; 
states = range(min_s, max_s, length=n_s)
ceil_state(s) = states[searchsortedfirst(states, s)]
"Transition:" # ṡ=μ(s,a) ⟺ a=μ_inv(s,ṡ)
μ(s,a;α=α)      = (s)^α - a
μ_inv(s,sp;α=α) = (s)^α - sp
"actions:" 
min_a = eps(); max_a = μ_inv(max_s,0.0); n_a = 400; # keeps a≥0
valid_actions() = range(min_a, max_a, length=n_a)         # possible actions any state.
valid_actions(s) = filter(<(μ_inv(s,0)), valid_actions()) # valid actions @ state=s 
"reward:" 
r(s,a) = log(a)
"discount:"
β = 0.95

m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)
solver = SparseValueIterationSolver()
sol = solve(solver, m)

Works so far.
Next

using POMDPs, POMDPModels
using GridInterpolations
using LocalFunctionApproximation
using LocalApproximationValueIteration

grid = RectangleGrid(states, valid_actions(), ) #[0.0, 1.0]) 
interp = LocalGIFunctionApproximator(grid)
solver = LocalApproximationValueIterationSolver(interp, is_mdp_generative=true)
sol = solve(solver, m)

The last line sol = solve(solver, m) gives me the error

julia> sol = solve(solver, m)
ERROR: AssertionError: solver.n_generative_samples > 0
Stacktrace:
 [1] solve(solver::LocalApproximationValueIterationSolver{LocalGIFunctionApproximator{RectangleGrid{2}}, Random._GLOBAL_RNG}, mdp::QuickMDP{UUID("5c4faf4d-851d-428d-803f-25979525ab61"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#7#9", var"#8#10", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}})
   @ LocalApproximationValueIteration ~/.julia/packages/LocalApproximationValueIteration/JSQJs/src/local_approximation_vi.jl:84
 [2] top-level scope
   @ REPL[31]:1

Can you please show me how to use the LocalApproximationValueIterationSolver() in my example?

Shushman Aug 10, 2021
Maintainer

Unfortunately, I don't have bandwidth to create a MWE for your use case right now.

But as the error suggests, since your mdp is generative, the call to LocalApproximationValueIterationSolver line needs to provide an argument for n_generative_samples that is > 0 (by default, the value is 0, as in https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl/blob/master/src/local_approximation_vi.jl#L12)

rejuvyesh Aug 10, 2021
Maintainer

I think that implies that there should be an error check in the constructor which gives a more useful error rather than an assertion failure.

Shushman Aug 11, 2021
Maintainer

Indeed! Should be addressed once JuliaPOMDP/LocalApproximationValueIteration.jl#21 is merged

azev77 · 2021-08-16T19:02:35Z

azev77
Aug 16, 2021
Author

Hey guys @zsunberg @Shushman. I think I got it to work on my MWE.
(currently super-slow, but I'm sure there are ways to improve that...)

I tried to further simplify my notation to follow the standard MDP notation: state/action/transition/reward/discount

# Parameters for the Neoclassical Growth Model (NGM)
α = 0.65; 
f(s;α=α) = (s)^α
a0(s)    = f(s)
min_s = eps(); max_s = 2.0; n_s = 100;
min_a = eps(); max_a = a0(max_s); n_a = 100; 

"states:"
states = range(min_s, max_s, length=n_s)
ceil_state(s) = states[searchsortedfirst(states, s)] # for discrete VFI

"actions:" 
valid_actions()  = range(min_a, max_a, length=n_a)        # possible actions any state.
valid_actions(s) = filter(<(a0(s)), valid_actions())      # valid actions @ state=s 

"Transition:" 
μ(s,a)      = f(s) - a      

"reward:" 
r(s,a) = log(a)

"discount:"
β = 0.95

using QuickPOMDPs: QuickMDP           #QuickMDP()
using POMDPModelTools: Deterministic
m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)

# DiscreteValueIteration: Both work fast enough
using DiscreteValueIteration
s1 = DiscreteValueIteration.SparseValueIterationSolver()
s2 = DiscreteValueIteration.ValueIterationSolver()
@time sol1 = solve(s1, m)
@time sol2 = solve(s2, m)
value(sol1, states[2]), action(sol1, states[2])
value(sol2, states[2]), action(sol2, states[2])

#rewrite MDP w/o ceil_state() in transition.
m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( μ(s,a) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)

# LocalApproximationValueIteration works, but currently slow!
using GridInterpolations
using LocalFunctionApproximation
using LocalApproximationValueIteration
grid = GridInterpolations.RectangleGrid(states, valid_actions(), ) #[0.0, 1.0]) 
interp = LocalFunctionApproximation.LocalGIFunctionApproximator(grid)
s4 = LocalApproximationValueIterationSolver(interp)
@time sol4 = solve(s4, m)
#190.477798 seconds (2.37 G allocations: 82.610 GiB, 6.87% gc time, 0.21% compilation time)
value(sol4, states[2]), action(sol4, states[2])

Now let's compare solutions from various MDP solvers w/ closed-forms:

using Plots
ω = α*β
A1 = α/(1.0-ω)
A0 = ((1-ω)*log(1-ω) + ω*log(ω))/((1-ω)*(1-β))

# Value
plot(legend=:bottomright, title="Value Functions");
plot!(states[2:end], i->A0 + A1 * log(i), lab="closed form") # way off 
plot!(states[2:end], i->value(sol1, i),   lab="sol1")
plot!(states[2:end], i->value(sol2, i),   lab="sol2")
plot!(states[2:end], i->value(sol4, i),   lab="sol4")
# Policy
plot(legend=:bottomright, title="Policy Functions");
plot!(states[2:end], i -> (1-ω)*(i^α), lab="closed form") # way off 
plot!(states[2:end], i -> action(sol1, i),   lab="sol1")
plot!(states[2:end], i -> action(sol2, i),   lab="sol2")
plot!(states[2:end], i -> action(sol4, i),   lab="sol4")
# Simulation
Tsim=150; s0=states[2]; 

simcf = []; push!(simcf, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    a = (1-ω)*(s^α)
    sp = μ(s,a) 
    #sp = ceil_state(sp)
    push!(simcf, sp)
    s = sp
end 


sim1 = []; push!(sim1, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    #a = valid_actions()[sol1.policy[searchsortedfirst(states, s)]]
    a = action(sol1, s)
    sp = μ(s,a) 
    sp = ceil_state(sp)
    push!(sim1, sp)
    s = sp
end 

sim4 = []; push!(sim4, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    a = action(sol4, s)
    sp = μ(s,a) 
    push!(sim4, sp)
    s = sp
end 

plot(legend=:bottomright, title="Simulation");
plot!(simcf,   lab="closed form")
plot!(sim1,   lab="sol1")
plot!(sim4,   lab="sol4")

Looks about right.
I'm sure there are all kinds of options to speed-up & other bells & whistles...

Btw, I think I was able to solve using MCTS, but unable to extract the solutions:

using MCTS
s3 = MCTS.MCTSSolver()
@time sol3 = solve(s3, m)

julia> value(sol3, states[4])
ERROR: State 0.06060606060606082 not present in MCTS tree.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] value(tr::MCTS.MCTSTree{Float64, Float64}, s::Float64)
   @ MCTS C:\Users\azevelev\.julia\packages\MCTS\ww2qH\src\vanilla.jl:217
 [3] value(planner::MCTSPlanner{QuickMDP{UUID("08e2dc8f-88bd-4faf-8290-5a9f3830c278"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#73#75", var"#74#76", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}, Float64, Float64, MCTS.SolvedRolloutEstimator{POMDPPolicies.RandomPolicy{Random._GLOBAL_RNG, QuickMDP{UUID("08e2dc8f-88bd-4faf-8290-5a9f3830c278"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#73#75", var"#74#76", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}, BeliefUpdaters.NothingUpdater}, Random._GLOBAL_RNG}, Random._GLOBAL_RNG}, s::Float64)
   @ MCTS C:\Users\azevelev\.julia\packages\MCTS\ww2qH\src\vanilla.jl:211
 [4] top-level scope
   @ REPL[312]:1

Finally, I wasn't able to install GlobalApproximationValueIteration (not important for me right now)

julia> 
(@v1.6) pkg> add GlobalApproximationValueIteration
    Updating registry at `C:\Users\azevelev\.julia\registries\General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package CUDAapi [3895d2a7]:
 CUDAapi [3895d2a7] log:
 ├─possible versions are: 0.5.0-4.0.0 or uninstalled
 ├─restricted by julia compatibility requirements to versions: uninstalled
 └─restricted by compatibility requirements with Flux [587475ba] to versions: 1.1.0-1.2.0 — no versions left
   └─Flux [587475ba] log:
     ├─possible versions are: 0.4.1-0.12.6 or uninstalled
     ├─restricted to versions * by an explicit requirement, leaving only versions 0.4.1-0.12.6
     └─restricted by compatibility requirements with GlobalApproximationValueIteration [277f244e] to versions: 0.9.0
       └─GlobalApproximationValueIteration [277f244e] log:
         ├─possible versions are: 0.2.2 or uninstalled
         └─restricted to versions * by an explicit requirement, leaving only versions 0.2.2

To dos:

speed up LocalApproximationValueIterationSolver
extract solutions (value, action) using MCTS
solve w/ GlobalApproximationValueIteration
there are A LOT more solvers for these types of MDPs not currently implemented here (PFI, Envelope Condition Methods, Endogenous Grid Methods etc). Some of these solvers are super-fast!
Currently I do my own simulations, figure out how to simulate w/ POMDPs
Solve finite horizon version of the same problem
Solve stochastic version of the same problem (both infinite horizon & finite horizon)

4 replies

rejuvyesh Aug 17, 2021
Maintainer

We should update the compat for GlobalApproximationValueIteration.

rejuvyesh Aug 17, 2021
Maintainer

For MCTS there are multiple hyperaparameters like n_iterations that you might want to change!

azev77 Aug 17, 2021
Author

Thanks. But for MCTS, how do I extract the value/action?
value(sol3, states[3]), action(sol3, states[3])

zsunberg Aug 17, 2021
Maintainer

Hey @azev77 , awesome! wow, this is really nice code - it is easy to read.

One thing that is slowing this down a lot is the alpha global variable. That is the number one tip in the Julia Performance Tips That makes it impossible for the compiler to create typestable code:

julia> @code_warntype μ(1.0, 0.0)
Variables
  #self#::Core.Const(μ)
  s::Float64
  a::Float64

Body::Any
1 ─ %1 = Main.f(s)::Any
│   %2 = (%1 - a)::Any
└──      return %2

We can talk more about the best way to overcome it in this situation if you want some help.

Thanks. But for MCTS, how do I extract the value/action?
value(sol3, states[3]), action(sol3, states[3])

This is a bug, sorry! JuliaPOMDP/MCTS.jl#83 Thanks for catching. For now, a workaround is to call action before value.

In any case, MCTS is an online solver designed for small discrete action spaces and discrete state spaces, so it may behave much differently (probably worse) than the other solvers. It will be interesting to see.

jgslazzaro · 2021-11-19T02:43:09Z

jgslazzaro
Nov 19, 2021

I was actually trying to do the same. We solve a lot of very similar problem in economics and having a way of mapping the problem to POMDPs.jl would be great.
Here is what I've done for this simple case:

nk = 100
β = 0.9 
α = 0.3
kss = (β * α)^(1/(1-α))
Kgrid = range(eps(),stop = kss*1.1,length=n)
Actions = range(eps(),stop = kss*1.1,length=n)
T(s,a) =    Deterministic(a)
R(s::Real,a::Real) = log(max(s^α - a,0,0)) 
truesol = β * α * Kgrid.^α
mdp = QuickMDP(
    states       = Kgrid,
    actions      = Actions,
    transition   = T,
    reward       = R,
    initialstate = Kgrid,
    discount     = β)

solver = ValueIterationSolver(max_iterations=500,belres=1e-6);
policy = POMDPs.solve(solver, mdp)

It works beautifully and implementing continuous states is very easy:

grid = RectangleGrid(Kgrid)
interp = LocalGIFunctionApproximator(grid) 
mdp2 = QuickMDP(
    actions      = range(Kgrid[1],stop= Kgrid[end],length = 15*length(Kgrid)),
    transition   = T,
    reward       = R,
    discount     = β)

solver = LocalApproximationValueIterationSolver(interp,max_iterations=500,belres=1e-6);
policy = POMDPs.solve(solver, mdp2)

What I've been struggling with is with the stochastic case in which the problem is:

It is the same thing but with an AR(1) process z multplying the production function in the constraint. In this case, the actions are deterministic and z is an exogenous stochastic variable. I am struggling to define the transition function.
I built a huge Array with dimensions (s,s',a) - # of states, # of states, # of actions. Which was composed of 0s in all entries in which the state is different from the action.
It works but this gets extremely slow and memory inefficient.

Ideally, I wanted to write something like:

states =  [[z,k] for  k in Kgrid, z in Z]

function T(s, a  ;pdfZ=pdfZ ,Z=Z)        

    iz = 1 
    for izz ∈ eachindex(Z)
        if s[1] === Z[izz]
            iz = izz 
            break 
        end
    end
    return product_distribution([DiscreteNonParametric(Z, pdfZ[iz, : ]), Deterministic(a)])
end

Where I approximated the AR(1) by a Markov Matrix pdfZ.
But I could not get POMDPs to understand this, I could not make the pdf function to work with product distributions within POMDPs...

2 replies

zsunberg Nov 20, 2021
Maintainer

Thanks for reporting the question - It seems like there are two things that you have tried and I will try to address them separately, along with suggesting a new idea.

`product_distribution([DiscreteNonParametric(Z, pdfZ[iz, : ]), Deterministic(a)])`

I think this will throw an error because POMDPModelTools.Deterministic is not a Distributions.Distribution. I have not thought about whether the distributions in POMDPModelTools should be subtypes of Distributions.Distribution for a while. I think the main conceptual difference is that Distributions.Distributions usually (only?) support numeric types like floats or ints whereas POMDPs.jl seeks to support arbitrary types. I would be happy to revisit this decision. It seems like it would be really nice to be able to have products of distributions. File an issue if you think it would be good to pursue this more.

"I built a huge Array ... but this gets extremely slow and memory inefficient"

You may be able to drastically improve performance by

Consider using SparseArrays
Consider using views: pdfZ[iz, :] actually creates a copy, which involves an allocation, whereas view(pdfZ, iz, :) just creates a window into the already-allocated array.
Consider using Tuples or StaticArrays to represent your states - since they are immutable, they are much more efficient for small vectors than built in standard Arrays.

Another thing to try:

Making your own distribution

You could just make your own distribution... something like

struct MyDistribution
    Z::<whatever type Z is>
    pdfZ::Matrix{Float64}
    iz::Int
    a::Int
end

function Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{<:MyDistribution})
    mydist = s[]
    d1 = SparseCat(mydist.Z, view(mydist.pdfZ, mydist.iz, :))
    return (rand(rng, d1), mydist.a) # note though that sampling from a SparseCat with a large Z might be really slow
end

function POMDPs.pdf(d::MyDistribution, x)
    z, a = x
    if a != d.a
       return 0.0
    else
        # here somehow you have to find the index of z in Z and call that jz (I don't know enough about the structure of Z to do it efficiently)
        return d.pdfZ[d.iz, jz]
    end
end

POMDPs.support(d::MyDistribution) = Base.Iterators.product(d.Z, a)

see https://juliapomdp.github.io/POMDPs.jl/stable/interfaces/ for more details. You may also want to implement POMDPModelTools.weighted_iterator for efficiency.

zsunberg Nov 21, 2021
Maintainer

Depending on what solver you're using, you might just be able to use ImplicitDistiribution. That will work, for example, if you're using LocalApproximationValueIterationSolver for instance.

jgslazzaro · 2021-11-21T01:06:21Z

jgslazzaro
Nov 21, 2021

I'll give it a try this week, it seems that the discrete solvers at least do not yet handle multivariate distributions. I tried a case in which the transition function would return a multivariate Normal of 2 random variables and it would crash because of the pdf function. I haven't checked yet the ImplicitDistribution from POMDPModelTools, but it might be what I've been missing! What I am really interested in is to use the deep learning solvers for a bigger model, but first I am testing the basic cases to make sure I understand what is going on. Thanks!

…

On Sat, Nov 20, 2021 at 6:50 PM Zachary Sunberg ***@***.***> wrote: Depending on what solver you're using, you might just be able to use ImplicitDistiribution. That will work, for example, if you're using LocalApproximationValueIterationSolver for instance. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#351 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7ILYYMG374VSJUHRK6MEDUNA66TANCNFSM5BR4IXMQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

jgslazzaro · 2021-11-28T22:26:48Z

jgslazzaro
Nov 28, 2021

It turns out that POMDPModelTools SparseCat function is precisely what I needed. Thanks! On Sat, Nov 20, 2021 at 7:06 PM João Guilherme Santos Lazzaro < ***@***.***> wrote:

…

I'll give it a try this week, it seems that the discrete solvers at least do not yet handle multivariate distributions. I tried a case in which the transition function would return a multivariate Normal of 2 random variables and it would crash because of the pdf function. I haven't checked yet the ImplicitDistribution from POMDPModelTools, but it might be what I've been missing! What I am really interested in is to use the deep learning solvers for a bigger model, but first I am testing the basic cases to make sure I understand what is going on. Thanks! On Sat, Nov 20, 2021 at 6:50 PM Zachary Sunberg ***@***.***> wrote: > Depending on what solver you're using, you might just be able to use > ImplicitDistiribution. That will work, for example, if you're using > LocalApproximationValueIterationSolver for instance. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#351 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC7ILYYMG374VSJUHRK6MEDUNA66TANCNFSM5BR4IXMQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >

0 replies

azev77 · 2021-12-20T21:58:06Z

azev77
Dec 20, 2021
Author

Hi all, I'm solving a simple investment problem but not getting the correct solution.

    a_0(s;δ=δ) = δ*s
    min_s = eps(); max_s = 2.0*K_SS; n_s = 200;
    min_a = eps(); max_a = a_0(max_s); n_a = 400; # keeps a≥0 
    "states:"
    states = range(min_s, max_s, length=n_s)
    ceil_state(s) = states[searchsortedfirst(states, s)]
    "actions:" 
    valid_actions()  = range(min_a, max_a, length=n_a)        # possible actions any state.
    valid_actions(s) = filter(<(a_0(s)), valid_actions())      # valid actions @ state=s 
    "Transition:" 
    μ(s,a;δ=δ)      = a - δ*s     
    "reward:" 
    r(s,a;z=z) = z*s -a -0.5*(a^2)
    "discount:"
    β = 1/(1+ρ);
    using QuickPOMDPs: QuickMDP           #QuickMDP()
    using POMDPModelTools: Deterministic
    m = QuickMDP(
        states     = states,
        actions    = valid_actions,
        transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
        reward     = (s, a) -> r(s,a),
        discount   = β
    )
    # DiscreteValueIteration: Both work fast! 
    using DiscreteValueIteration
    s1 = DiscreteValueIteration.SparseValueIterationSolver()
    s2 = DiscreteValueIteration.ValueIterationSolver()
    @time sol1 = solve(s1, m) #
    @time sol2 = solve(s2, m) #
    # value(sol1, states[2]), action(sol1, states[2])
    # value(sol2, states[2]), action(sol2, states[2])
    value(sol1, states[end])

    using Plots
    # Value
    plot(legend=:bottomright, title="Value Functions");
    #plot!(states[2:end], i->A0 + A1 * log(i), lab="closed form") 
    plot!(states[2:end], i->value(sol1, i),   lab="sol1")
    plot!(states[2:end], i->value(sol2, i),   lab="sol2")
    # plot!(states[2:end], i->value(sol4, i),   lab="sol4")
    # plot!(states[2:end], i->value(sol3, i),   lab="sol3")

    # Simulation
    Tsim=150; s0=states[170]; # s0=0.5*K_SS; #
    sim1 = []; push!(sim1, s0); 
    for tt in 1:Tsim
        println(tt)
        tt==1 ? s = s0 : nothing 
        #a = valid_actions()[sol1.policy[searchsortedfirst(states, s)]]
        a = action(sol1, s)
        sp = μ(s,a) 
        sp = ceil_state(sp)
        push!(sim1, sp)
        s = sp
        println(s)
    end

I think I need to fiddle w/ the min_a, min_s but not sure how.

1 reply

zsunberg Dec 22, 2021
Maintainer

Tried running this, but K_SS is not defined. It might help if you describe what you expect to see and how the current solution is wrong.

azev77 · 2021-12-23T20:15:44Z

azev77
Dec 23, 2021
Author

Sorry here's the correct code:

δ=0.10; ρ=0.15; # Param 
z= ρ + δ +.02     # Need: z > ρ + δ
I_SS = (z-ρ-δ)/(ρ+δ)
K_SS = I_SS/δ

if 1==1
    a_0(s;δ=δ) = δ*s
    min_s = eps(); max_s = 2.0*K_SS; n_s = 200;
    min_a = eps(); max_a = a_0(max_s); n_a = 400; # keeps a≥0 
    "states:"
    states = range(min_s, max_s, length=n_s)
    ceil_state(s) = states[searchsortedfirst(states, s)]
    "actions:" 
    valid_actions()  = range(min_a, max_a, length=n_a)        # possible actions any state.
    valid_actions(s) = filter(<(a0(s)), valid_actions())      # valid actions @ state=s 
    "Transition:" 
    μ(s,a;δ=δ)      = a - δ*s     
    "reward:" 
    r(s,a;z=z) = z*s -a -0.5*(a^2)
    "discount:"
    β = 1/(1+ρ);
    using QuickPOMDPs: QuickMDP           #QuickMDP()
    using POMDPModelTools: Deterministic
    m = QuickMDP(
        states     = states,
        actions    = valid_actions,
        transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
        reward     = (s, a) -> r(s,a),
        discount   = β
    )
    # DiscreteValueIteration: Both work fast! 
    using DiscreteValueIteration
    s1 = DiscreteValueIteration.SparseValueIterationSolver()
    s2 = DiscreteValueIteration.ValueIterationSolver()
    @time sol1 = solve(s1, m) #
    @time sol2 = solve(s2, m) #
    # value(sol1, states[2]), action(sol1, states[2])
    # value(sol2, states[2]), action(sol2, states[2])
    value(sol1, states[end])

    using Plots
    # Value
    plot(legend=:bottomright, title="Value Functions");
    #plot!(states[2:end], i->A0 + A1 * log(i), lab="closed form") 
    plot!(states[2:end], i->value(sol1, i),   lab="sol1")
    plot!(states[2:end], i->value(sol2, i),   lab="sol2")
    plot!(states[2:end], i->value(sol4, i),   lab="sol4")
    plot!(states[2:end], i->value(sol3, i),   lab="sol3")

    # Simulation
    Tsim=150; s0=states[2]; 
    sim1 = []; push!(sim1, s0); 
    for tt in 1:Tsim
        tt==1 ? s = s0 : nothing 
        #a = valid_actions()[sol1.policy[searchsortedfirst(states, s)]]
        a = action(sol1, s)
        sp = μ(s,a) 
        sp = ceil_state(sp)
        push!(sim1, sp)
        s = sp
    end 

    sim4 = []; push!(sim4, s0); 
    for tt in 1:Tsim
        tt==1 ? s = s0 : nothing 
        a = action(sol4, s)
        sp = μ(s,a) 
        push!(sim4, sp)
        s = sp
    end 

    plot(legend=:bottomright, title="Simulation");
    #plot!(simcf,   lab="closed form")
    plot!(sim1,   lab="sol1")
    plot!(sim4,   lab="sol4")
end

The discourse discussion & closed form solution is here.

2 replies

jgslazzaro Dec 23, 2021

Hi, is this the discrete time case right? If yes, depreciation in the transition function should be (1-delta).

azev77 Dec 23, 2021
Author

@jgslazzaro you're correct. But I still get errors. I think it might be due to the grid bounds (min_s...)

δ=0.10; ρ=0.15; # Param 
z= ρ + δ +.02     # Need: z > ρ + δ
I_SS = (z-ρ-δ)/(ρ+δ)
K_SS = I_SS/δ

min_s = eps(); max_s = 2.0*K_SS; n_s = 200;
min_a = eps(); max_a = K_SS; n_a = 400;
states = range(min_s, max_s, length=n_s)
ceil_state(s) = states[searchsortedfirst(states, s)]
valid_actions()  = range(min_a, max_a, length=n_a)
valid_actions(s) = filter(>(-(1-δ)*s), valid_actions())
μ(s,a;δ=δ)      = a +(1-δ)*s   
r(s,a;z=z) = z*s -a -0.5*(a^2)
β = 1/(1+ρ);

using QuickPOMDPs: QuickMDP           #QuickMDP()
using POMDPModelTools: Deterministic
m = QuickMDP(
states     = states,
actions    = valid_actions,
transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
reward     = (s, a) -> r(s,a),
discount   = β
)

using DiscreteValueIteration
s1 = DiscreteValueIteration.SparseValueIterationSolver()
s2 = DiscreteValueIteration.ValueIterationSolver()
@time sol1 = solve(s1, m) #
@time sol2 = solve(s2, m) #

solve() gives the following error:

ERROR: LoadError: BoundsError: attempt to access 200-element StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}} at index [201]
Stacktrace:
  [1] throw_boundserror(A::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, I::Tuple{Int64})
    @ Base .\abstractarray.jl:651
  [2] checkbounds
    @ .\abstractarray.jl:616 [inlined]
  [3] getindex(r::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, i::Int64)
    @ Base .\range.jl:718
  [4] ceil_state(s::Float64)
    @ Main c:\Users\azevelev\Dropbox\Computation\Julia\4prob\3a_Firm_Invest\Invest\main_Invest.jl:178
  [5] (::var"#36#38")(s::Float64, a::Float64)
    @ Main c:\Users\azevelev\Dropbox\Computation\Julia\4prob\3a_Firm_Invest\Invest\main_Invest.jl:195
  [6] transition(m::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}, s::Float64, a::Float64)
    @ QuickPOMDPs C:\Users\azevelev\.julia\packages\QuickPOMDPs\mIT3P\src\quick.jl:217
  [7] transition_matrix_a_s_sp(mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}})
    @ POMDPModelTools C:\Users\azevelev\.julia\packages\POMDPModelTools\SycBB\src\sparse_tabular.jl:179
  [8] POMDPModelTools.SparseTabularMDP(mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}})
    @ POMDPModelTools C:\Users\azevelev\.julia\packages\POMDPModelTools\SycBB\src\sparse_tabular.jl:30
  [9] solve(solver::SparseValueIterationSolver, mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}})
    @ DiscreteValueIteration C:\Users\azevelev\.julia\packages\DiscreteValueIteration\FjeJj\src\sparse.jl:91
 [10] top-level scope
    @ Untitled-1:36
in expression starting at Untitled-1:36

julia>

jgslazzaro · 2021-12-23T21:37:44Z

jgslazzaro
Dec 23, 2021

Your problem is in the search sorted first that can go out of bounds, check Julia docs for details

…

On Thu, Dec 23, 2021, 6:20 PM azev77 ***@***.***> wrote: @jgslazzaro <https://github.com/jgslazzaro> you're correct. But I still get errors. I think it might be due to the grid bounds (min_s...) δ=0.10; ρ=0.15; # Param z= ρ + δ +.02 # Need: z > ρ + δ I_SS = (z-ρ-δ)/(ρ+δ) K_SS = I_SS/δ min_s = eps(); max_s = 2.0*K_SS; n_s = 200; min_a = eps(); max_a = K_SS; n_a = 400; states = range(min_s, max_s, length=n_s) ceil_state(s) = states[searchsortedfirst(states, s)] valid_actions() = range(min_a, max_a, length=n_a) valid_actions(s) = filter(>(-(1-δ)*s), valid_actions()) μ(s,a;δ=δ) = a +(1-δ)*s r(s,a;z=z) = z*s -a -0.5*(a^2) β = 1/(1+ρ); using QuickPOMDPs: QuickMDP #QuickMDP() using POMDPModelTools: Deterministic m = QuickMDP( states = states, actions = valid_actions, transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ), reward = (s, a) -> r(s,a), discount = β ) using DiscreteValueIteration s1 = DiscreteValueIteration.SparseValueIterationSolver() s2 = DiscreteValueIteration.ValueIterationSolver() @time sol1 = solve(s1, m) # @time sol2 = solve(s2, m) # solve() gives the following error: ERROR: LoadError: BoundsError: attempt to access 200-element StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}} at index [201] Stacktrace: [1] throw_boundserror(A::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, I::Tuple{Int64}) @ Base .\abstractarray.jl:651 [2] checkbounds @ .\abstractarray.jl:616 [inlined] [3] getindex(r::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, i::Int64) @ Base .\range.jl:718 [4] ceil_state(s::Float64) @ Main c:\Users\azevelev\Dropbox\Computation\Julia\4prob\3a_Firm_Invest\Invest\main_Invest.jl:178 [5] (::var"#36#38")(s::Float64, a::Float64) @ Main c:\Users\azevelev\Dropbox\Computation\Julia\4prob\3a_Firm_Invest\Invest\main_Invest.jl:195 [6] transition(m::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}, s::Float64, a::Float64) @ QuickPOMDPs C:\Users\azevelev\.julia\packages\QuickPOMDPs\mIT3P\src\quick.jl:217 [7] transition_matrix_a_s_sp(mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}) @ POMDPModelTools C:\Users\azevelev\.julia\packages\POMDPModelTools\SycBB\src\sparse_tabular.jl:179 [8] POMDPModelTools.SparseTabularMDP(mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}) @ POMDPModelTools C:\Users\azevelev\.julia\packages\POMDPModelTools\SycBB\src\sparse_tabular.jl:30 [9] solve(solver::SparseValueIterationSolver, mdp::QuickMDP{UUID("c4169b2e-f906-4ac2-9a72-ab9bbdebde92"), Float64, Float64, NamedTuple{(:stateindex, :isterminal, :actionindex, :transition, :reward, :states, :actions, :discount), Tuple{Dict{Float64, Int64}, Bool, Dict{Float64, Int64}, var"#36#38", var"#37#39", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, typeof(valid_actions), Float64}}}) @ DiscreteValueIteration C:\Users\azevelev\.julia\packages\DiscreteValueIteration\FjeJj\src\sparse.jl:91 [10] top-level scope @ Untitled-1:36 in expression starting at Untitled-1:36 julia> — Reply to this email directly, view it on GitHub <#351 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7ILYYJTEDQP775QCZMZCLUSOHDHANCNFSM5BR4IXMQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

azev77 Dec 23, 2021
Author

It works now, but I had to increase the maximum state grid to 10*K_SS

Is there a way this can be automated? (so the package tells the user to increase the state grid?)
How can I solve a stochastic version of this problem where z takes two values Z_LOW & Z_HIGH with a 2-state Markov Chain?

Here is the correct code:

δ=0.10; ρ=0.15; # Param 
z= ρ + δ +.02     # Need: z > ρ + δ
I_SS = (z-ρ-δ)/(ρ+δ)
K_SS = I_SS/δ

if 1==1
    min_s = eps(); max_s = 10.0*K_SS; n_s = 800;
    min_a = eps(); max_a = K_SS; n_a = 200;

    "states:"
    states = range(min_s, max_s, length=n_s)
    ceil_state(s) = states[searchsortedfirst(states, s)]
    "actions:" 
    valid_actions()  = range(min_a, max_a, length=n_a)        # possible actions any state.
    valid_actions(s) = filter(>=(-(1-δ)*s), valid_actions())  # valid actions @ state=s 
    "Transition:" 
    μ(s,a;δ=δ)      = a +(1-δ)*s     
    "reward:" 
    r(s,a;z=z) = z*s -a -0.5*(a^2)
    "discount:"
    β = 1/(1+ρ);
    using QuickPOMDPs: QuickMDP           #QuickMDP()
    using POMDPModelTools: Deterministic
    m = QuickMDP(
        states     = states,
        actions    = valid_actions,
        transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
        reward     = (s, a) -> r(s,a),
        discount   = β
    )
    # DiscreteValueIteration: Both work fast! 
    using DiscreteValueIteration
    s1 = DiscreteValueIteration.SparseValueIterationSolver()
    @time sol1 = solve(s1, m) #
    
    s2 = DiscreteValueIteration.ValueIterationSolver()
    @time sol2 = solve(s2, m) #

    # value(sol1, states[2]), action(sol1, states[2])
    # value(sol2, states[2]), action(sol2, states[2])
    # value(sol1, states[end])

    using Plots
    # Value
    plot(legend=:bottomright, title="Value Functions");
    #plot!(states[2:end], i->A0 + A1 * log(i), lab="closed form") 
    plot!(states[2:end], i->value(sol1, i),   lab="sol1")
    plot!(states[2:end], i->value(sol2, i),   lab="sol2")
    # plot!(states[2:end], i->value(sol4, i),   lab="sol4")
    # plot!(states[2:end], i->value(sol3, i),   lab="sol3")

    # Simulation
    Tsim=150; 
    s0=0.5*K_SS;  #s0=states[10]; # 
    sim1 = []; push!(sim1, s0); 
    for tt in 1:Tsim
        s = sim1[tt]
        a = valid_actions()[sol1.policy[searchsortedfirst(states, s)]]
        sp = μ(s,a) 
        sp = ceil_state(sp)
        push!(sim1, sp)
    end 

    # sim4 = []; push!(sim4, s0); 
    # for tt in 1:Tsim
    #     tt==1 ? s = s0 : nothing 
    #     a = action(sol4, s)
    #     sp = μ(s,a) 
    #     push!(sim4, sp)
    #     s = sp
    # end 

    plot(legend=:bottomright, title="Simulation");
    #plot!(simcf,   lab="closed form")
    plot!(sim1,   lab="sol1")
    # plot!(sim4,   lab="sol4")
end

Use POMDPs to solve a simple routine economics problem. #351

azev77 Nov 4, 2020

Replies: 20 comments · 17 replies

lassepe Nov 4, 2020 Maintainer

azev77 Nov 4, 2020 Author

lassepe Nov 4, 2020 Maintainer

azev77 Nov 4, 2020 Author

zsunberg Nov 5, 2020 Maintainer

azev77 Nov 5, 2020 Author

azev77 Nov 5, 2020 Author

zsunberg Nov 17, 2020 Maintainer

azev77 Nov 17, 2020 Author

rejuvyesh Nov 17, 2020 Maintainer

zsunberg Nov 20, 2020 Maintainer

azev77 Aug 4, 2021 Author

zsunberg Aug 5, 2021 Maintainer

azev77 Aug 5, 2021 Author

Shushman Aug 9, 2021 Maintainer

azev77 Aug 9, 2021 Author

Shushman Aug 10, 2021 Maintainer

rejuvyesh Aug 10, 2021 Maintainer

Shushman Aug 11, 2021 Maintainer

azev77 Aug 16, 2021 Author

rejuvyesh Aug 17, 2021 Maintainer

rejuvyesh Aug 17, 2021 Maintainer

azev77 Aug 17, 2021 Author

zsunberg Aug 17, 2021 Maintainer

jgslazzaro Nov 19, 2021

zsunberg Nov 20, 2021 Maintainer

product_distribution([DiscreteNonParametric(Z, pdfZ[iz, : ]), Deterministic(a)])

"I built a huge Array ... but this gets extremely slow and memory inefficient"

Making your own distribution

zsunberg Nov 21, 2021 Maintainer

jgslazzaro Nov 21, 2021

jgslazzaro Nov 28, 2021

azev77 Dec 20, 2021 Author

zsunberg Dec 22, 2021 Maintainer

azev77 Dec 23, 2021 Author

jgslazzaro Dec 23, 2021

azev77 Dec 23, 2021 Author

jgslazzaro Dec 23, 2021

azev77 Dec 23, 2021 Author

azev77
Nov 4, 2020

Replies: 20 comments 17 replies

lassepe
Nov 4, 2020
Maintainer

azev77
Nov 4, 2020
Author

lassepe
Nov 4, 2020
Maintainer

azev77
Nov 4, 2020
Author

zsunberg
Nov 5, 2020
Maintainer

azev77
Nov 5, 2020
Author

azev77
Nov 5, 2020
Author

zsunberg
Nov 17, 2020
Maintainer

azev77
Nov 17, 2020
Author

rejuvyesh
Nov 17, 2020
Maintainer

zsunberg
Nov 20, 2020
Maintainer

azev77
Aug 4, 2021
Author

zsunberg Aug 5, 2021
Maintainer

azev77
Aug 5, 2021
Author

Shushman Aug 9, 2021
Maintainer

azev77 Aug 9, 2021
Author

Shushman Aug 10, 2021
Maintainer

rejuvyesh Aug 10, 2021
Maintainer

Shushman Aug 11, 2021
Maintainer

azev77
Aug 16, 2021
Author

rejuvyesh Aug 17, 2021
Maintainer

rejuvyesh Aug 17, 2021
Maintainer

azev77 Aug 17, 2021
Author

zsunberg Aug 17, 2021
Maintainer

jgslazzaro
Nov 19, 2021

zsunberg Nov 20, 2021
Maintainer

`product_distribution([DiscreteNonParametric(Z, pdfZ[iz, : ]), Deterministic(a)])`

zsunberg Nov 21, 2021
Maintainer

jgslazzaro
Nov 21, 2021

jgslazzaro
Nov 28, 2021

azev77
Dec 20, 2021
Author

zsunberg Dec 22, 2021
Maintainer

azev77
Dec 23, 2021
Author

azev77 Dec 23, 2021
Author

jgslazzaro
Dec 23, 2021

azev77 Dec 23, 2021
Author