docs(docs): update performance section

TypedMatrices · Sep 9, 2024 · f05ec38 · f05ec38
1 parent fe7151f
commit f05ec38
Showing 1 changed file with 164 additions and 0 deletions.
diff --git a/docs/src/manual/2.performance.md b/docs/src/manual/2.performance.md
@@ -1,3 +1,167 @@
 # Performance
 
 We will briefly discuss the performance in this page.
+
+## Linear Algebra Properties on Typed Matrices
+
+Package `LinearAlgebra.jl` provides several linear algebra operations. By utilizing the Julia type system, we can also improve the performance of these operations. For example, the `issymmetric` function defaults to call the `issymmetric` and check each element. The matrix `Minij` is explicitly known to be symmetric. The following example shows that the `issymmetric` function on the `Minij` typed matrix spent **10.310 ns** and **873.400 μs** on the `Matrix` typed matrix.
+
+```julia-repl
+julia> a = Minij(1000)
+1000×1000 Minij{Int64}:
+...
+
+julia> b = Matrix(Minij(1000))
+1000×1000 Matrix{Int64}:
+...
+
+julia> @benchmark issymmetric(a)
+BenchmarkTools.Trial: 10000 samples with 999 evaluations.
+ Range (min … max):   9.810 ns … 89.790 ns  ┊ GC (min … max): 0.00% … 0.00%
+ Time  (median):     10.310 ns              ┊ GC (median):    0.00%
+ Time  (mean ± σ):   10.798 ns ±  2.083 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
+
+  █▅▇▅▆▆▃▄▄▂ ▃▂▁▃ ▁▂▁▂▂▂▁▁▂ ▁                                 ▂
+  ███████████████████████████▇▆▇▇▇▆▆▄▆▃▄▆▄▄▅▃▅▅▅▄▆▄▄▅▄▄▅▅▄▅▂▅ █
+  9.81 ns      Histogram: log(frequency) by time      17.7 ns <
+
+ Memory estimate: 0 bytes, allocs estimate: 0.
+
+julia> @benchmark issymmetric(b)
+BenchmarkTools.Trial: 4883 samples with 1 evaluation.
+ Range (min … max):  593.700 μs …  13.507 ms  ┊ GC (min … max): 0.00% … 0.00%
+ Time  (median):     873.400 μs               ┊ GC (median):    0.00%
+ Time  (mean ± σ):     1.009 ms ± 515.315 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
+
+    █▂   ▁▁
+  ▂▆██▇▆████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
+  594 μs           Histogram: frequency by time         2.69 ms <
+
+ Memory estimate: 0 bytes, allocs estimate: 0.
+```
+
+## Known Algorithm Working on `Hilbert`
+
+The following example shows a known algorithm that works on `Hilbert` matrices, `a` is a `Hilbert` typed matrix, and `b` is the same matrix with `Matrix` typed. When doing the `det` operation, the `Hilbert` typed matrix only spent **0.3%** of the time that the normal matrix spent, although the memory usage is **69.02 KiB** and **66.22 MiB** respectively.
+
+```julia-repl
+julia> a = Hilbert{BigFloat}(100)
+100×100 Hilbert{BigFloat}:
+...
+
+julia> b = Matrix(Hilbert{BigFloat}(100))
+100×100 Matrix{BigFloat}:
+...
+
+julia> t3 = @benchmark det(a)
+BenchmarkTools.Trial: 6985 samples with 1 evaluation.
+ Range (min … max):  334.500 μs … 740.291 ms  ┊ GC (min … max):  0.00% … 68.80%
+ Time  (median):     564.100 μs               ┊ GC (median):     0.00%
+ Time  (mean ± σ):   706.671 μs ±   8.853 ms  ┊ GC (mean ± σ):  10.32% ±  0.82%
+
+   ▅█▅▂▂▅▄▁
+  ▃█████████▇▆▆▄▄▄▄▅▅▅▅▆▆▇███▇▆▇▅▅▄▅▅▄▃▃▃▃▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
+  334 μs           Histogram: frequency by time         1.23 ms <
+
+ Memory estimate: 69.02 KiB, allocs estimate: 3233.
+
+julia> t4 = @benchmark det(b)
+BenchmarkTools.Trial: 32 samples with 1 evaluation.
+ Range (min … max):  127.925 ms … 229.261 ms  ┊ GC (min … max): 8.86% … 4.76%
+ Time  (median):     158.327 ms               ┊ GC (median):    9.49%
+ Time  (mean ± σ):   160.932 ms ±  23.576 ms  ┊ GC (mean ± σ):  9.48% ± 3.52%
+
+      ▃▃ █ ▃         ▃█
+  ▇▁▁▁██▁█▁█▁▇▇▇▁▇▇▇▇██▁▁▁▁▇▇▇▁▁▇▇▁▇▁▁▁▁▇▁▁▁▁▁▁▇▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇ ▁
+  128 ms           Histogram: frequency by time          229 ms <
+
+ Memory estimate: 66.22 MiB, allocs estimate: 1333851.
+```
+
+## Trade between Performance and Memory
+
+This fresh approach saves substantial memory by trading off the performance. The following example shows that the `Cauchy` typed matrix `a` only spent **63.229 μs** and **114.16 KiB** memory to generate, while the `Matrix` typed matrix `b` spent **3.862 ms** and **7.74 MiB** memory to generate. Also, the memory usage of `a` is **16 bytes** and **8000040 bytes** for `b`.
+
+```julia-repl
+julia> @benchmark a = Cauchy{Float64}(1000)
+BenchmarkTools.Trial: 10000 samples with 1 evaluation.
+ Range (min … max):  27.100 μs … 191.819 ms  ┊ GC (min … max):  0.00% … 99.94%
+ Time  (median):     32.100 μs               ┊ GC (median):     0.00%
+ Time  (mean ± σ):   63.229 μs ±   1.919 ms  ┊ GC (mean ± σ):  35.05% ±  4.29%
+
+  ▅█▇▅▄▃▃▂▂▃▂▃▃▅▅▃▁▁                 ▁▁                        ▂
+  ███████████████████▇▇███▇▇▆▇▇▇▇▆▆▇████▇▇▆▆▄▄▃▂▂▄▅▅▅▆▆▇▇▆▆▇▅▅ █
+  27.1 μs       Histogram: log(frequency) by time       125 μs <
+
+ Memory estimate: 114.16 KiB, allocs estimate: 36.
+
+julia> @benchmark b = Matrix(Cauchy{Float64}(1000))
+BenchmarkTools.Trial: 1288 samples with 1 evaluation.
+ Range (min … max):  2.413 ms … 18.386 ms  ┊ GC (min … max):  0.00% … 48.63%
+ Time  (median):     3.271 ms              ┊ GC (median):     0.00%
+ Time  (mean ± σ):   3.862 ms ±  1.674 ms  ┊ GC (mean ± σ):  15.96% ± 19.84%
+
+  ▂█▄   ▁▂
+  ████▄▆██▆▅▅▅▃▃▃▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▃▂▁▂▂▁▂▂▂▂▂▂▂ ▃
+  2.41 ms        Histogram: frequency by time        9.53 ms <
+
+ Memory estimate: 7.74 MiB, allocs estimate: 38.
+
+julia> Base.summarysize(a)
+16
+
+julia> Base.summarysize(b)
+8000040
+```
+
+This improvement is trade off the performance for memory. When accessing each element of the `Cauchy` typed matrix, more time is needed than the `Matrix` typed matrix, which is expected. This can allow machines with insufficient memory to take longer time to run computations that would have been impossible to run before.
+
+```julia-repl
+julia> @benchmark det(a)
+BenchmarkTools.Trial: 111 samples with 1 evaluation.
+ Range (min … max):  20.537 ms … 353.410 ms  ┊ GC (min … max): 0.00% … 90.93%
+ Time  (median):     34.151 ms               ┊ GC (median):    0.00%
+ Time  (mean ± σ):   45.104 ms ±  42.894 ms  ┊ GC (mean ± σ):  7.81% ±  9.53%
+
+   ▄█▅▃ ▁
+  ▅████▇█▁▆▅▁▁▁▅▅▁▁▅▁▁▅▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
+  20.5 ms       Histogram: log(frequency) by time       301 ms <
+
+ Memory estimate: 7.64 MiB, allocs estimate: 4.
+
+julia> @benchmark det(b)
+BenchmarkTools.Trial: 175 samples with 1 evaluation.
+ Range (min … max):  18.639 ms … 314.529 ms  ┊ GC (min … max): 0.00% … 91.89%
+ Time  (median):     26.317 ms               ┊ GC (median):    0.00%
+ Time  (mean ± σ):   28.670 ms ±  22.610 ms  ┊ GC (mean ± σ):  7.81% ±  8.48%
+
+      ▂ ▂▂ ██  ▂▂ ▃   ▅ ▅▂ ▃▂
+  ▅▁▃▇█▇██████▆██▅█▅█▅███████▇▆▁▁▆▆▃▁▆▅▅▅▁▁▁▁▁▁▁▃▅▃▁▁▅▁▁▁▁▃▃▁▃ ▃
+  18.6 ms         Histogram: frequency by time         44.1 ms <
+
+ Memory estimate: 7.64 MiB, allocs estimate: 4.
+
+julia> @benchmark sum(a)
+BenchmarkTools.Trial: 3104 samples with 1 evaluation.
+ Range (min … max):  1.124 ms …   7.772 ms  ┊ GC (min … max): 0.00% … 0.00%
+ Time  (median):     1.400 ms               ┊ GC (median):    0.00%
+ Time  (mean ± σ):   1.604 ms ± 579.750 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
+
+  █▃▁█▄ ▂                                                      
+  █████▇██▇█▆▃▄▄▃▄▃▄▃▄▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▃▃▂▃▂▂▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂ ▃
+  1.12 ms         Histogram: frequency by time        3.56 ms <
+
+ Memory estimate: 16 bytes, allocs estimate: 1.
+
+julia> @benchmark sum(b)
+BenchmarkTools.Trial: 10000 samples with 1 evaluation.
+ Range (min … max):  243.900 μs …  2.106 ms  ┊ GC (min … max): 0.00% … 0.00%
+ Time  (median):     329.800 μs              ┊ GC (median):    0.00%
+ Time  (mean ± σ):   355.504 μs ± 91.684 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
+
+      ▂▅█▇▅▃▃▂▁▁▁▁
+  ▁▄██████████████▇▆▆▆▆▆▆▆▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁ ▃
+  244 μs          Histogram: frequency by time          647 μs <
+
+ Memory estimate: 16 bytes, allocs estimate: 1.
+```