Skip to content

Commit

Permalink
docs(docs): update performance section
Browse files Browse the repository at this point in the history
  • Loading branch information
AnzhiZhang committed Sep 9, 2024
1 parent fe7151f commit f05ec38
Showing 1 changed file with 164 additions and 0 deletions.
164 changes: 164 additions & 0 deletions docs/src/manual/2.performance.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,167 @@
# Performance

We will briefly discuss the performance in this page.

## Linear Algebra Properties on Typed Matrices

Package `LinearAlgebra.jl` provides several linear algebra operations. By utilizing the Julia type system, we can also improve the performance of these operations. For example, the `issymmetric` function defaults to call the `issymmetric` and check each element. The matrix `Minij` is explicitly known to be symmetric. The following example shows that the `issymmetric` function on the `Minij` typed matrix spent **10.310 ns** and **873.400 μs** on the `Matrix` typed matrix.

```julia-repl
julia> a = Minij(1000)
1000×1000 Minij{Int64}:
...
julia> b = Matrix(Minij(1000))
1000×1000 Matrix{Int64}:
...
julia> @benchmark issymmetric(a)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 9.810 ns … 89.790 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.310 ns ┊ GC (median): 0.00%
Time (mean ± σ): 10.798 ns ± 2.083 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅▇▅▆▆▃▄▄▂ ▃▂▁▃ ▁▂▁▂▂▂▁▁▂ ▁ ▂
███████████████████████████▇▆▇▇▇▆▆▄▆▃▄▆▄▄▅▃▅▅▅▄▆▄▄▅▄▄▅▅▄▅▂▅ █
9.81 ns Histogram: log(frequency) by time 17.7 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark issymmetric(b)
BenchmarkTools.Trial: 4883 samples with 1 evaluation.
Range (min … max): 593.700 μs … 13.507 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 873.400 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.009 ms ± 515.315 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂ ▁▁
▂▆██▇▆████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
594 μs Histogram: frequency by time 2.69 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
```

## Known Algorithm Working on `Hilbert`

The following example shows a known algorithm that works on `Hilbert` matrices, `a` is a `Hilbert` typed matrix, and `b` is the same matrix with `Matrix` typed. When doing the `det` operation, the `Hilbert` typed matrix only spent **0.3%** of the time that the normal matrix spent, although the memory usage is **69.02 KiB** and **66.22 MiB** respectively.

```julia-repl
julia> a = Hilbert{BigFloat}(100)
100×100 Hilbert{BigFloat}:
...
julia> b = Matrix(Hilbert{BigFloat}(100))
100×100 Matrix{BigFloat}:
...
julia> t3 = @benchmark det(a)
BenchmarkTools.Trial: 6985 samples with 1 evaluation.
Range (min … max): 334.500 μs … 740.291 ms ┊ GC (min … max): 0.00% … 68.80%
Time (median): 564.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 706.671 μs ± 8.853 ms ┊ GC (mean ± σ): 10.32% ± 0.82%
▅█▅▂▂▅▄▁
▃█████████▇▆▆▄▄▄▄▅▅▅▅▆▆▇███▇▆▇▅▅▄▅▅▄▃▃▃▃▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
334 μs Histogram: frequency by time 1.23 ms <
Memory estimate: 69.02 KiB, allocs estimate: 3233.
julia> t4 = @benchmark det(b)
BenchmarkTools.Trial: 32 samples with 1 evaluation.
Range (min … max): 127.925 ms … 229.261 ms ┊ GC (min … max): 8.86% … 4.76%
Time (median): 158.327 ms ┊ GC (median): 9.49%
Time (mean ± σ): 160.932 ms ± 23.576 ms ┊ GC (mean ± σ): 9.48% ± 3.52%
▃▃ █ ▃ ▃█
▇▁▁▁██▁█▁█▁▇▇▇▁▇▇▇▇██▁▁▁▁▇▇▇▁▁▇▇▁▇▁▁▁▁▇▁▁▁▁▁▁▇▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇ ▁
128 ms Histogram: frequency by time 229 ms <
Memory estimate: 66.22 MiB, allocs estimate: 1333851.
```

## Trade between Performance and Memory

This fresh approach saves substantial memory by trading off the performance. The following example shows that the `Cauchy` typed matrix `a` only spent **63.229 μs** and **114.16 KiB** memory to generate, while the `Matrix` typed matrix `b` spent **3.862 ms** and **7.74 MiB** memory to generate. Also, the memory usage of `a` is **16 bytes** and **8000040 bytes** for `b`.

```julia-repl
julia> @benchmark a = Cauchy{Float64}(1000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 27.100 μs … 191.819 ms ┊ GC (min … max): 0.00% … 99.94%
Time (median): 32.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 63.229 μs ± 1.919 ms ┊ GC (mean ± σ): 35.05% ± 4.29%
▅█▇▅▄▃▃▂▂▃▂▃▃▅▅▃▁▁ ▁▁ ▂
███████████████████▇▇███▇▇▆▇▇▇▇▆▆▇████▇▇▆▆▄▄▃▂▂▄▅▅▅▆▆▇▇▆▆▇▅▅ █
27.1 μs Histogram: log(frequency) by time 125 μs <
Memory estimate: 114.16 KiB, allocs estimate: 36.
julia> @benchmark b = Matrix(Cauchy{Float64}(1000))
BenchmarkTools.Trial: 1288 samples with 1 evaluation.
Range (min … max): 2.413 ms … 18.386 ms ┊ GC (min … max): 0.00% … 48.63%
Time (median): 3.271 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.862 ms ± 1.674 ms ┊ GC (mean ± σ): 15.96% ± 19.84%
▂█▄ ▁▂
████▄▆██▆▅▅▅▃▃▃▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▃▂▁▂▂▁▂▂▂▂▂▂▂ ▃
2.41 ms Histogram: frequency by time 9.53 ms <
Memory estimate: 7.74 MiB, allocs estimate: 38.
julia> Base.summarysize(a)
16
julia> Base.summarysize(b)
8000040
```

This improvement is trade off the performance for memory. When accessing each element of the `Cauchy` typed matrix, more time is needed than the `Matrix` typed matrix, which is expected. This can allow machines with insufficient memory to take longer time to run computations that would have been impossible to run before.

```julia-repl
julia> @benchmark det(a)
BenchmarkTools.Trial: 111 samples with 1 evaluation.
Range (min … max): 20.537 ms … 353.410 ms ┊ GC (min … max): 0.00% … 90.93%
Time (median): 34.151 ms ┊ GC (median): 0.00%
Time (mean ± σ): 45.104 ms ± 42.894 ms ┊ GC (mean ± σ): 7.81% ± 9.53%
▄█▅▃ ▁
▅████▇█▁▆▅▁▁▁▅▅▁▁▅▁▁▅▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
20.5 ms Histogram: log(frequency) by time 301 ms <
Memory estimate: 7.64 MiB, allocs estimate: 4.
julia> @benchmark det(b)
BenchmarkTools.Trial: 175 samples with 1 evaluation.
Range (min … max): 18.639 ms … 314.529 ms ┊ GC (min … max): 0.00% … 91.89%
Time (median): 26.317 ms ┊ GC (median): 0.00%
Time (mean ± σ): 28.670 ms ± 22.610 ms ┊ GC (mean ± σ): 7.81% ± 8.48%
▂ ▂▂ ██ ▂▂ ▃ ▅ ▅▂ ▃▂
▅▁▃▇█▇██████▆██▅█▅█▅███████▇▆▁▁▆▆▃▁▆▅▅▅▁▁▁▁▁▁▁▃▅▃▁▁▅▁▁▁▁▃▃▁▃ ▃
18.6 ms Histogram: frequency by time 44.1 ms <
Memory estimate: 7.64 MiB, allocs estimate: 4.
julia> @benchmark sum(a)
BenchmarkTools.Trial: 3104 samples with 1 evaluation.
Range (min … max): 1.124 ms … 7.772 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.400 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.604 ms ± 579.750 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▁█▄ ▂
█████▇██▇█▆▃▄▄▃▄▃▄▃▄▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▃▃▂▃▂▂▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂ ▃
1.12 ms Histogram: frequency by time 3.56 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
julia> @benchmark sum(b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 243.900 μs … 2.106 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 329.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 355.504 μs ± 91.684 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅█▇▅▃▃▂▁▁▁▁
▁▄██████████████▇▆▆▆▆▆▆▆▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁ ▃
244 μs Histogram: frequency by time 647 μs <
Memory estimate: 16 bytes, allocs estimate: 1.
```

0 comments on commit f05ec38

Please sign in to comment.