-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particles as Distribution parameters #22
Comments
Hey and thanks!
How is this sampling expected to behave? Sample one distribution of the 1000 distributions represented by I can see three ways of representing these distributions:
Is it obvious that |
Thanks!
That's how I would expect it to behave, yes.
Not at all, Particles{Distribution} could be better. I think |
@cscherrer just a question. I was looking at your example code and tried it out and I ran into a few issues. Here was my sequence: using MonteCarloMeasurements, Distributions
μ = Particles(1000,Normal(0,1))
σ = Particles(1000,Normal(0,1))^2
Normal(μ,σ) where I get the error: ERROR: Comparison operators are not well defined for uncertain values and are currently turned off. Call `unsafe_comparisons(true)` to enable comparison operators for particles using the current reduction function Statistics.mean. Change this function using `set_comparison_function(f)`.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] _comparioson_operator at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:344 [inlined]
[3] <=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:360
[4] >=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at ./operators.jl:333
[5] macro expansion at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/utils.jl:5 [inlined]
[6] Normal{Particles{Float64,1000}}(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:35
[7] Normal(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:41
[8] top-level scope at none:0 So now I continue as follows: julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons
julia> Normal(μ,σ)
Normal{Particles{Float64,1000}}(
μ: 0.000425 ± 1.0
σ: 0.999 ± 1.4
) So that works! Then I try the Bernoulli: julia> Bernoulli(Particles(1000,Beta(2,3)))
Bernoulli{Particles{Float64,1000}}(
p: 0.4 ± 0.2
) and it works just fine. If I turn |
@baggepinnen understands this much better then I do, but I think the issue is how to compute julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons
julia> Particles()<Particles()
false
julia> unsafe_comparisons(false)
julia> for f in [<=, >=, <, >]
register_primitive(f)
end
julia> Particles()<Particles()
Part500(0.494 ± 0.5)
Oh, I see, |
Welcome @bhgomes . Your error message is different (hopefully improved) since you use a newer version of the package. @cscherrer is correct in asserting that the error is thrown when particles appear in comparisons, e.g., It is certainly reasonable to have The problem with |
I'm struggling with coming up with a good solution to this. I think the following does what it's supposed to, but is horribly inefficient import MonteCarloMeasurements.particletype
struct ParticleDistribution{D}
d::D
end
function ParticleDistribution(d::Type{<:Distribution}, p...)
@unsafe ParticleDistribution(d(p...))
end
particletype(pd::ParticleDistribution) = particletype(getfield(pd.d,1))
function Base.rand(d::ParticleDistribution{D}) where D
T,N = particletype(d)
i = rand(1:N)
d2 = MonteCarloMeasurements.replace_particles(d.d,P->P isa AbstractParticles, P->P[i])
rand(d2)
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 822 ns
@btime rand(Bernoulli(0.3)) # 4 ns |
It seems that relative to |
It doesn't really do 1000x the work, it just draws one random number, selects that distribution and then draws a random number from that, so two random numbers in total. The inefficiency comes from |
Are the Or if generated functions would help, I've had good luck with GeneralizedGenerated.jl, which gives a lot more flexibility than built-in |
Some progress using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements.particletypetuple
struct ParticleDistribution{D,P}
d::D
constructor::P
end
function ParticleDistribution(constructor::Type{<:Distribution}, p...)
@unsafe ParticleDistribution(constructor(p...), constructor)
end
particletypetuple(pd::ParticleDistribution) = particletypetuple(getfield(pd.d,1))
particletypetuple(::Type{D}) where D <: ParticleDistribution = particletypetuple(getfield(pd.d,1))
@generated function Base.rand(d::ParticleDistribution{D}) where D
nfields = fieldcount(D)
indtuple = ()
N = 0
for field in fieldnames(D)
FT = fieldtype(D, field)
if FT <: AbstractParticles
N = particletypetuple(FT)[2]
indtuple = (indtuple..., :(d.d.$field[i]))
else
indtuple = (indtuple..., :(d.d.$field))
end
end
tupleex = Expr(:tuple, indtuple...)
quote
i = rand(1:$N)
d2 = d.constructor($(tupleex)...)
# d2 = newstruct(d.constructor{typeof.($(tupleex))...}, $(tupleex)...) # This can be used to bypas a missing inner constructor, is a bit slower though
rand(d2)
end
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 194.651 ns (2 allocations: 32 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)
pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) # 149.837 ns (4 allocations: 80 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes) Some assumptions made:
|
I have been thinking more about this and might have changed my mind; Where the reverse representation, i.e., Is there any situation in which computation is made easier by the representation as |
Note to self: It's quite possible that StructArrays.jl can significantly simplify how we deal with the |
This is starting to become useful now. With the code below, particle distributions are represented as a list of regular distributions, but are constructed and printed as if they were This gets around the problem with distributions performing arg checks inside their constructors since the constructor is called with normal scalars. @cscherrer How would you envision calculating the logpdf(pd,x) = mean(logpdf(d,x) for d in pd.d)
pdf(pd,x) = mean(pdf(d,x) for d in pd.d) is correct. using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements: nparticles, indexof_particles
struct ParticleDistribution{D,P}
d::D
constructor::P
end
function ParticleDistribution(constructor::Type{<:Distribution}, p...)
dists = [constructor(getindex.(p, i)...) for i in 1:nparticles(p[1])]
ParticleDistribution(dists, constructor)
end
function Base.rand(d::ParticleDistribution{D}) where D
ind = rand(1:length(d.d))
rand(d.d[ind])
end
function Base.show(io::IO, d::ParticleDistribution)
T = eltype(d.d)
fields = map(fieldnames(T)) do fn
getfield.(d.d, fn)
end
println(io, "Particle", T, "(")
for (i,fn) in enumerate(fieldnames(T))
println(io, string(fn), ": ", Particles(fields[i]))
end
println(io, ")")
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 23.304 ns (0 allocations: 0 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)
pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) # 27.726 ns (0 allocations: 0 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes)
julia> pd
ParticleNormal{Float64}(
μ: 10.0 ± 3.0
σ: 2.0 ± 0.1
) |
This looks great! I wonder, how far can "particle semantics" be pushed? For the last Pretend rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x)) There may be a more efficient implementation, but would this work? At what point do the semantics break down? |
I'm sure it can be made to work, but I'm struggling to understand what it would represent.
I am very open to be convinced of its utility, it feels like hitherto untrodden land :) |
In a typical MCM use case, say you have some function Now what if function f(x)
return rand(Normal(x,1))
end This fits into the same framework, the only difference is that now there's uncertainty other than from the particles themselves. |
This comes up all the time in probabilistic programming. We have uncertainty on our parameter estimates (because they're samples from the posterior distribution), which we propagate through the observation model for forecasting or posterior predictive checks. |
I buy that! This branch now follows the semantics you outlined, i.e. rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x)) It appears to be reasonably efficient to use the naive (but type stable) way of drawing random numbers. julia> pd = ParticleDistribution(Normal, 1±0.1, 1±0.1)
ParticleNormal{Float64}(
μ: 1.0 ± 0.1
σ: 1.0 ± 0.1
)
julia> rand(pd)
Part10000(1.012 ± 1.01)
julia> @btime rand($pd)
116.768 μs (3 allocations: 78.22 KiB)
Part10000(0.996 ± 0.997)
julia> @btime 1 .+ 0.1 .* randn(10000);
89.188 μs (4 allocations: 156.41 KiB) |
For some cases there are some nice optimizations, e.g. rand(ParticleDistribution(Normal, m, s)) == m + s * Particles(Normal()) (assuming I got that right) |
With the current representation we have an array of structs. m + s * Particles(Normal()) implies the struct of arrays representation. One could also store both representations inside a The naive sampling outlined in my previous post is actually nearly just as fast as the struct-of-arrays version, Julia must do a very good job optimizing the broadcasting of julia> @btime rand($pd);
119.941 μs (3 allocations: 78.22 KiB)
julia> @btime $m + $s*Particles{Float64, 10000}(randn(10000));
101.810 μs (9 allocations: 234.66 KiB) |
That's really close! I'd guess the latter might be a little more stable, since it takes advantage of stratified sampling. But I'm not sure about that. You mentioned https://github.com/JuliaArrays/StructArrays.jl, did that seem to fit here? |
It might be a bit more stable thanks to the systematic sampling, but that is significantly slower julia> @btime $m + $s*Particles{Float64, 10000}(systematic_sample(10000, Normal()));
535.076 μs (13 allocations: 391.06 KiB) mostly due to julia> @btime quantile(Normal(), 0.6);
18.838 ns (0 allocations: 0 bytes)
julia> 0.018*10000
180.0 but also since the particles after systematic sampling have to be permuted. |
Oh interesting, I thought |
No you are correct, it's just in my previous benchmark I bypassed that by calling randn manually, so I made the systematic sampling explicit so it would be clearer what's going on. |
Really enjoying this package, thank you for your work on it.
For Soss.jl, here's a simple model:
For simple examples like this, I can now use
MonteCarloMeasurements
to getThe way I get there... it's not so pretty. Let's start with μ and σ:
Surprisingly, this works just fine:
Now trying the obvious thing,
It thinks it works, but it's really horribly broken:
So I got it to work with a little helper function:
So now I have
Much better.
This is fine for one distribution, but most don't compose as nicely as a normal.
Many distributions break entirely, seemingly because of argument checking in
Distributions.jl
:Do you see a path toward...
rand
to work properly?The text was updated successfully, but these errors were encountered: