diff --git a/docs/src/manual/2.performance.md b/docs/src/manual/2.performance.md index 0a32a1c..b7449dc 100644 --- a/docs/src/manual/2.performance.md +++ b/docs/src/manual/2.performance.md @@ -1,3 +1,167 @@ # Performance We will briefly discuss the performance in this page. + +## Linear Algebra Properties on Typed Matrices + +Package `LinearAlgebra.jl` provides several linear algebra operations. By utilizing the Julia type system, we can also improve the performance of these operations. For example, the `issymmetric` function defaults to call the `issymmetric` and check each element. The matrix `Minij` is explicitly known to be symmetric. The following example shows that the `issymmetric` function on the `Minij` typed matrix spent **10.310 ns** and **873.400 μs** on the `Matrix` typed matrix. + +```julia-repl +julia> a = Minij(1000) +1000×1000 Minij{Int64}: +... + +julia> b = Matrix(Minij(1000)) +1000×1000 Matrix{Int64}: +... + +julia> @benchmark issymmetric(a) +BenchmarkTools.Trial: 10000 samples with 999 evaluations. + Range (min … max): 9.810 ns … 89.790 ns ┊ GC (min … max): 0.00% … 0.00% + Time (median): 10.310 ns ┊ GC (median): 0.00% + Time (mean ± σ): 10.798 ns ± 2.083 ns ┊ GC (mean ± σ): 0.00% ± 0.00% + + █▅▇▅▆▆▃▄▄▂ ▃▂▁▃ ▁▂▁▂▂▂▁▁▂ ▁ ▂ + ███████████████████████████▇▆▇▇▇▆▆▄▆▃▄▆▄▄▅▃▅▅▅▄▆▄▄▅▄▄▅▅▄▅▂▅ █ + 9.81 ns Histogram: log(frequency) by time 17.7 ns < + + Memory estimate: 0 bytes, allocs estimate: 0. + +julia> @benchmark issymmetric(b) +BenchmarkTools.Trial: 4883 samples with 1 evaluation. + Range (min … max): 593.700 μs … 13.507 ms ┊ GC (min … max): 0.00% … 0.00% + Time (median): 873.400 μs ┊ GC (median): 0.00% + Time (mean ± σ): 1.009 ms ± 515.315 μs ┊ GC (mean ± σ): 0.00% ± 0.00% + + █▂ ▁▁ + ▂▆██▇▆████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ + 594 μs Histogram: frequency by time 2.69 ms < + + Memory estimate: 0 bytes, allocs estimate: 0. +``` + +## Known Algorithm Working on `Hilbert` + +The following example shows a known algorithm that works on `Hilbert` matrices, `a` is a `Hilbert` typed matrix, and `b` is the same matrix with `Matrix` typed. When doing the `det` operation, the `Hilbert` typed matrix only spent **0.3%** of the time that the normal matrix spent, although the memory usage is **69.02 KiB** and **66.22 MiB** respectively. + +```julia-repl +julia> a = Hilbert{BigFloat}(100) +100×100 Hilbert{BigFloat}: +... + +julia> b = Matrix(Hilbert{BigFloat}(100)) +100×100 Matrix{BigFloat}: +... + +julia> t3 = @benchmark det(a) +BenchmarkTools.Trial: 6985 samples with 1 evaluation. + Range (min … max): 334.500 μs … 740.291 ms ┊ GC (min … max): 0.00% … 68.80% + Time (median): 564.100 μs ┊ GC (median): 0.00% + Time (mean ± σ): 706.671 μs ± 8.853 ms ┊ GC (mean ± σ): 10.32% ± 0.82% + + ▅█▅▂▂▅▄▁ + ▃█████████▇▆▆▄▄▄▄▅▅▅▅▆▆▇███▇▆▇▅▅▄▅▅▄▃▃▃▃▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃ + 334 μs Histogram: frequency by time 1.23 ms < + + Memory estimate: 69.02 KiB, allocs estimate: 3233. + +julia> t4 = @benchmark det(b) +BenchmarkTools.Trial: 32 samples with 1 evaluation. + Range (min … max): 127.925 ms … 229.261 ms ┊ GC (min … max): 8.86% … 4.76% + Time (median): 158.327 ms ┊ GC (median): 9.49% + Time (mean ± σ): 160.932 ms ± 23.576 ms ┊ GC (mean ± σ): 9.48% ± 3.52% + + ▃▃ █ ▃ ▃█ + ▇▁▁▁██▁█▁█▁▇▇▇▁▇▇▇▇██▁▁▁▁▇▇▇▁▁▇▇▁▇▁▁▁▁▇▁▁▁▁▁▁▇▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇ ▁ + 128 ms Histogram: frequency by time 229 ms < + + Memory estimate: 66.22 MiB, allocs estimate: 1333851. +``` + +## Trade between Performance and Memory + +This fresh approach saves substantial memory by trading off the performance. The following example shows that the `Cauchy` typed matrix `a` only spent **63.229 μs** and **114.16 KiB** memory to generate, while the `Matrix` typed matrix `b` spent **3.862 ms** and **7.74 MiB** memory to generate. Also, the memory usage of `a` is **16 bytes** and **8000040 bytes** for `b`. + +```julia-repl +julia> @benchmark a = Cauchy{Float64}(1000) +BenchmarkTools.Trial: 10000 samples with 1 evaluation. + Range (min … max): 27.100 μs … 191.819 ms ┊ GC (min … max): 0.00% … 99.94% + Time (median): 32.100 μs ┊ GC (median): 0.00% + Time (mean ± σ): 63.229 μs ± 1.919 ms ┊ GC (mean ± σ): 35.05% ± 4.29% + + ▅█▇▅▄▃▃▂▂▃▂▃▃▅▅▃▁▁ ▁▁ ▂ + ███████████████████▇▇███▇▇▆▇▇▇▇▆▆▇████▇▇▆▆▄▄▃▂▂▄▅▅▅▆▆▇▇▆▆▇▅▅ █ + 27.1 μs Histogram: log(frequency) by time 125 μs < + + Memory estimate: 114.16 KiB, allocs estimate: 36. + +julia> @benchmark b = Matrix(Cauchy{Float64}(1000)) +BenchmarkTools.Trial: 1288 samples with 1 evaluation. + Range (min … max): 2.413 ms … 18.386 ms ┊ GC (min … max): 0.00% … 48.63% + Time (median): 3.271 ms ┊ GC (median): 0.00% + Time (mean ± σ): 3.862 ms ± 1.674 ms ┊ GC (mean ± σ): 15.96% ± 19.84% + + ▂█▄ ▁▂ + ████▄▆██▆▅▅▅▃▃▃▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▃▂▁▂▂▁▂▂▂▂▂▂▂ ▃ + 2.41 ms Histogram: frequency by time 9.53 ms < + + Memory estimate: 7.74 MiB, allocs estimate: 38. + +julia> Base.summarysize(a) +16 + +julia> Base.summarysize(b) +8000040 +``` + +This improvement is trade off the performance for memory. When accessing each element of the `Cauchy` typed matrix, more time is needed than the `Matrix` typed matrix, which is expected. This can allow machines with insufficient memory to take longer time to run computations that would have been impossible to run before. + +```julia-repl +julia> @benchmark det(a) +BenchmarkTools.Trial: 111 samples with 1 evaluation. + Range (min … max): 20.537 ms … 353.410 ms ┊ GC (min … max): 0.00% … 90.93% + Time (median): 34.151 ms ┊ GC (median): 0.00% + Time (mean ± σ): 45.104 ms ± 42.894 ms ┊ GC (mean ± σ): 7.81% ± 9.53% + + ▄█▅▃ ▁ + ▅████▇█▁▆▅▁▁▁▅▅▁▁▅▁▁▅▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅ + 20.5 ms Histogram: log(frequency) by time 301 ms < + + Memory estimate: 7.64 MiB, allocs estimate: 4. + +julia> @benchmark det(b) +BenchmarkTools.Trial: 175 samples with 1 evaluation. + Range (min … max): 18.639 ms … 314.529 ms ┊ GC (min … max): 0.00% … 91.89% + Time (median): 26.317 ms ┊ GC (median): 0.00% + Time (mean ± σ): 28.670 ms ± 22.610 ms ┊ GC (mean ± σ): 7.81% ± 8.48% + + ▂ ▂▂ ██ ▂▂ ▃ ▅ ▅▂ ▃▂ + ▅▁▃▇█▇██████▆██▅█▅█▅███████▇▆▁▁▆▆▃▁▆▅▅▅▁▁▁▁▁▁▁▃▅▃▁▁▅▁▁▁▁▃▃▁▃ ▃ + 18.6 ms Histogram: frequency by time 44.1 ms < + + Memory estimate: 7.64 MiB, allocs estimate: 4. + +julia> @benchmark sum(a) +BenchmarkTools.Trial: 3104 samples with 1 evaluation. + Range (min … max): 1.124 ms … 7.772 ms ┊ GC (min … max): 0.00% … 0.00% + Time (median): 1.400 ms ┊ GC (median): 0.00% + Time (mean ± σ): 1.604 ms ± 579.750 μs ┊ GC (mean ± σ): 0.00% ± 0.00% + + █▃▁█▄ ▂ + █████▇██▇█▆▃▄▄▃▄▃▄▃▄▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▃▃▂▃▂▂▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂ ▃ + 1.12 ms Histogram: frequency by time 3.56 ms < + + Memory estimate: 16 bytes, allocs estimate: 1. + +julia> @benchmark sum(b) +BenchmarkTools.Trial: 10000 samples with 1 evaluation. + Range (min … max): 243.900 μs … 2.106 ms ┊ GC (min … max): 0.00% … 0.00% + Time (median): 329.800 μs ┊ GC (median): 0.00% + Time (mean ± σ): 355.504 μs ± 91.684 μs ┊ GC (mean ± σ): 0.00% ± 0.00% + + ▂▅█▇▅▃▃▂▁▁▁▁ + ▁▄██████████████▇▆▆▆▆▆▆▆▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁ ▃ + 244 μs Histogram: frequency by time 647 μs < + + Memory estimate: 16 bytes, allocs estimate: 1. +```