Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: functionalities for supporting NeuralOperators.jl #217

Merged
merged 7 commits into from
Nov 5, 2024

Conversation

avik-pal
Copy link
Collaborator

@avik-pal avik-pal commented Nov 3, 2024

  • fft
    • stablehlo.fft
    • tests
  • NNlib.pad_constant
    • stablehlo
    • tests

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: 13bed11 Previous: a17315c Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1237836714 ns 1263749401 ns 0.98
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1181078628 ns 1254668396 ns 0.94
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1244666440 ns 1218277318 ns 1.02
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2520995691 ns 2376495016 ns 1.06
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 225448920 ns 217726580 ns 1.04
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 5433039540 ns 7226166416 ns 0.75
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5262336712 ns 5511150207 ns 0.95
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 6121805326 ns 5102020848 ns 1.20
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 6654536818 ns 6993217459 ns 0.95
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 35399732577 ns 38085761917 ns 0.93
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1320508413 ns 1208392095 ns 1.09
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1256171167.5 ns 1331979590 ns 0.94
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1289987445.5 ns 1228565001 ns 1.05
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2459802462 ns 2452231772 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8834390 ns 8748209 ns 1.01
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1594719649 ns 1578057500 ns 1.01
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1572563209 ns 1557311922 ns 1.01
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1627555250 ns 1557684126 ns 1.04
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 2764021139 ns 2769517816 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 2682591748 ns 3303048898.5 ns 0.81
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1305988072 ns 1303432996 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1199774741.5 ns 1292627349.5 ns 0.93
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1317524235 ns 1312140581.5 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2876325912 ns 2608146101 ns 1.10
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 22713088.5 ns 22645472 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2145367408 ns 2183323759 ns 0.98
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2138354014 ns 2161824787 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2146049510 ns 2150773246 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 3401752026 ns 3353554606 ns 1.01
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 5530337191 ns 6032060527 ns 0.92
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1319398209.5 ns 1315388210 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1255618293 ns 1313576758.5 ns 0.96
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1268766827 ns 1308732662.5 ns 0.97
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2568271454 ns 2435356858 ns 1.05
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7053841 ns 6572926 ns 1.07
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1416284689 ns 1416310529 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1415647618 ns 1409069455 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1417522917 ns 1410196431 ns 1.01
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 2625238876 ns 2620146990 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1340766208 ns 1384443752 ns 0.97
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1275439730 ns 1325713657.5 ns 0.96
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1299474659.5 ns 1268777827.5 ns 1.02
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1271826170.5 ns 1294207842.5 ns 0.98
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2572968803 ns 2374603722 ns 1.08
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 12314527 ns 12110782.5 ns 1.02
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 1706047630 ns 1711411728 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1715179909 ns 1707811998 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 1705348191 ns 1709512803 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 2898523358 ns 2924854567 ns 0.99
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 3166568622.5 ns 2927891069 ns 1.08
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1271100172 ns 1270178508 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1285015972 ns 1317660758 ns 0.98
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1294704559.5 ns 1263311709 ns 1.02
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2499033817 ns 2584191843 ns 0.97
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 27316886 ns 27307540.5 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 2168515158 ns 2190938487 ns 0.99
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2163505498 ns 2166284687 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 2184023752 ns 2137987987 ns 1.02
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 3368049564 ns 3415666738 ns 0.99
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 5649485923.5 ns 6038343271.5 ns 0.94
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1301274508 ns 1233854317 ns 1.05
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1287620835 ns 1299829181.5 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1231210695 ns 1226243251 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2549235201 ns 2393640923 ns 1.07
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 52701734.5 ns 52646968 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2929489222 ns 3006477320 ns 0.97
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 2957495067 ns 2989128551 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2959664638 ns 3003357676 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 4286836180 ns 4443262702 ns 0.96
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 10616408466 ns 24545735518 ns 0.43
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1215633676 ns 1288108103 ns 0.94
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1284427454 ns 1247053980 ns 1.03
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1253724058 ns 1260403416 ns 0.99
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2452415638 ns 2513765600 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 70797645 ns 70692019 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 3157439544 ns 3164689242 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3161185461 ns 3166667974 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 3177237100 ns 3168332239 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 4641316240 ns 4510953172 ns 1.03
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 11032831735 ns 12354970629 ns 0.89
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1291666179 ns 1242550154 ns 1.04
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1260145686 ns 1270011702 ns 0.99
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1285175045.5 ns 1308184956.5 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2381519418 ns 2564144412 ns 0.93
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 20605754 ns 20737061 ns 0.99
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1839920458 ns 1846241603 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 1844208126 ns 1845891211 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1845581818 ns 1838778303 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3092362331 ns 3067201183 ns 1.01
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 2998446872 ns 3142722042.5 ns 0.95

This comment was automatically generated by workflow using github-action-benchmark.

@avik-pal
Copy link
Collaborator Author

avik-pal commented Nov 3, 2024

Also we might want to optimize to a no-op?

Module:
module attributes {transform.with_named_sequence} {
  func.func @main(%arg0: tensor<2x4x3xf64>) -> tensor<2x4x3xf64> {
    %0 = stablehlo.fft %arg0, type =  RFFT, length = [2, 4, 3] : (tensor<2x4x3xf64>) -> tensor<2x4x2xcomplex<f64>>
    %1 = stablehlo.fft %0, type =  IRFFT, length = [2, 4, 3] : (tensor<2x4x2xcomplex<f64>>) -> tensor<2x4x3xf64>
    return %1 : tensor<2x4x3xf64>
  }
}

@avik-pal avik-pal changed the title feat: add fft and variants feat: functionalities for supporting NeuralOperators.jl Nov 3, 2024
@wsmoses
Copy link
Member

wsmoses commented Nov 3, 2024

yeah I think this needs optimization and differentiation rules in the Enzyme-JaX repo to do so

@avik-pal avik-pal force-pushed the ap/neural_operators branch 4 times, most recently from 185dad9 to 672edbf Compare November 5, 2024 01:26
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Benchmark Results

main 3e26602... main/3e266023f96d2b...
comptime/NN/ViT base (optimize = :after_enzyme) 7.23 s 7.24 s 0.999
comptime/NN/ViT base (optimize = :all) 7.3 s 7.27 s 1
comptime/NN/ViT base (optimize = :before_enzyme) 7.02 s 7.16 s 0.981
comptime/NN/ViT base (optimize = :only_enzyme) 7.89 s 7.43 s 1.06
comptime/NN/ViT tiny (optimize = :after_enzyme) 6.29 s 6.73 s 0.935
comptime/NN/ViT tiny (optimize = :all) 6.62 s 6.73 s 0.983
comptime/NN/ViT tiny (optimize = :before_enzyme) 6.77 s 7.17 s 0.945
comptime/NN/ViT tiny (optimize = :only_enzyme) 6.64 s 6.8 s 0.976
comptime/NN/vgg11 bn=false (optimize = :after_enzyme) 0.423 ± 0.023 s 0.438 ± 0.013 s 0.965
comptime/NN/vgg11 bn=false (optimize = :all) 0.417 ± 0.0082 s 0.433 ± 0.0067 s 0.964
comptime/NN/vgg11 bn=false (optimize = :before_enzyme) 0.422 ± 0.0084 s 0.465 ± 0.058 s 0.908
comptime/NN/vgg11 bn=false (optimize = :only_enzyme) 0.422 ± 0.012 s 0.439 ± 0.011 s 0.963
comptime/NN/vgg11 bn=true (optimize = :after_enzyme) 1.1 ± 0.011 s 1.11 ± 0.023 s 0.994
comptime/NN/vgg11 bn=true (optimize = :all) 1.17 ± 0.051 s 1.14 ± 0.0083 s 1.03
comptime/NN/vgg11 bn=true (optimize = :before_enzyme) 1.12 ± 0.019 s 1.14 ± 0.042 s 0.983
comptime/NN/vgg11 bn=true (optimize = :only_enzyme) 1.13 ± 0.027 s 1.2 ± 0.01 s 0.938
comptime/NN/vgg13 bn=false (optimize = :after_enzyme) 0.464 ± 0.0019 s 0.484 ± 0.0064 s 0.958
comptime/NN/vgg13 bn=false (optimize = :all) 0.497 ± 0.048 s 0.5 ± 0.044 s 0.993
comptime/NN/vgg13 bn=false (optimize = :before_enzyme) 0.559 ± 0.026 s 0.493 ± 0.017 s 1.13
comptime/NN/vgg13 bn=false (optimize = :only_enzyme) 0.487 ± 0.026 s 0.501 ± 0.05 s 0.972
comptime/NN/vgg13 bn=true (optimize = :after_enzyme) 1.33 ± 0.0097 s 1.65 ± 0.27 s 0.809
comptime/NN/vgg13 bn=true (optimize = :all) 1.39 ± 0.013 s 1.51 ± 0.053 s 0.92
comptime/NN/vgg13 bn=true (optimize = :before_enzyme) 1.47 ± 0.0099 s 1.34 ± 0.016 s 1.1
comptime/NN/vgg13 bn=true (optimize = :only_enzyme) 1.4 ± 0.032 s 1.42 ± 0.0038 s 0.984
comptime/NN/vgg16 bn=false (optimize = :after_enzyme) 0.564 ± 0.0079 s 0.598 ± 0.016 s 0.943
comptime/NN/vgg16 bn=false (optimize = :all) 0.567 ± 0.0035 s 0.596 ± 0.018 s 0.952
comptime/NN/vgg16 bn=false (optimize = :before_enzyme) 0.594 ± 0.019 s 0.61 ± 0.012 s 0.975
comptime/NN/vgg16 bn=false (optimize = :only_enzyme) 0.62 ± 0.011 s 0.604 ± 0.02 s 1.03
comptime/NN/vgg16 bn=true (optimize = :after_enzyme) 1.79 ± 0.0061 s 1.69 ± 0.012 s 1.06
comptime/NN/vgg16 bn=true (optimize = :all) 1.81 ± 0.024 s 1.73 ± 0.023 s 1.04
comptime/NN/vgg16 bn=true (optimize = :before_enzyme) 1.76 ± 0.0078 s 1.92 ± 0.011 s 0.917
comptime/NN/vgg16 bn=true (optimize = :only_enzyme) 1.72 s 1.85 ± 0.039 s 0.929
comptime/NN/vgg19 bn=false (optimize = :after_enzyme) 0.696 ± 0.026 s 0.673 ± 0.0056 s 1.03
comptime/NN/vgg19 bn=false (optimize = :all) 0.697 ± 0.088 s 0.668 ± 0.0016 s 1.04
comptime/NN/vgg19 bn=false (optimize = :before_enzyme) 0.66 ± 0.015 s 0.703 ± 0.013 s 0.94
comptime/NN/vgg19 bn=false (optimize = :only_enzyme) 0.667 ± 0.033 s 0.723 ± 0.055 s 0.922
comptime/NN/vgg19 bn=true (optimize = :after_enzyme) 2.04 ± 0.03 s 2.09 ± 0.027 s 0.979
comptime/NN/vgg19 bn=true (optimize = :all) 2.06 s 2.16 ± 0.044 s 0.953
comptime/NN/vgg19 bn=true (optimize = :before_enzyme) 2.14 s 2.15 ± 0.044 s 0.999
comptime/NN/vgg19 bn=true (optimize = :only_enzyme) 2.26 s 2.11 ± 0.027 s 1.07
comptime/basics/2D sum (optimize = :after_enzyme) 28.5 ± 1.1 ms 26.6 ± 0.67 ms 1.07
comptime/basics/2D sum (optimize = :all) 0.0322 ± 0.00091 s 30.2 ± 1.1 ms 1.07
comptime/basics/2D sum (optimize = :before_enzyme) 29.9 ± 0.91 ms 28.3 ± 0.79 ms 1.06
comptime/basics/2D sum (optimize = :only_enzyme) 24.7 ± 0.87 ms 23.3 ± 0.85 ms 1.06
comptime/basics/cos.(x) (optimize = :after_enzyme) 0.0344 ± 0.0011 s 0.0326 ± 0.00085 s 1.05
comptime/basics/cos.(x) (optimize = :all) 0.036 ± 0.0012 s 0.0363 ± 0.00066 s 0.99
comptime/basics/cos.(x) (optimize = :before_enzyme) 0.035 ± 0.0012 s 0.0348 ± 0.0011 s 1.01
comptime/basics/cos.(x) (optimize = :only_enzyme) 0.0324 ± 0.00077 s 30 ± 0.67 ms 1.08
comptime/basics/∇cos (optimize = :all) 0.0526 ± 0.0013 s 0.0508 ± 0.0016 s 1.03
runtime/NN/ViT base (optimize = :after_enzyme) 6.32 s 6.37 s 0.993
runtime/NN/ViT base (optimize = :all) 6.27 s 6.31 s 0.994
runtime/NN/ViT base (optimize = :before_enzyme) 6.28 s 6.34 s 0.99
runtime/NN/ViT base (optimize = :only_enzyme) 7.57 s 7.82 s 0.968
runtime/NN/ViT tiny (optimize = :after_enzyme) 1.64 s 1.62 s 1.01
runtime/NN/ViT tiny (optimize = :all) 1.63 s 1.69 s 0.966
runtime/NN/ViT tiny (optimize = :before_enzyme) 1.7 s 1.71 s 0.995
runtime/NN/ViT tiny (optimize = :only_enzyme) 2.59 s 2.8 s 0.927
runtime/NN/vgg11 bn=false (optimize = :after_enzyme) 2.12 s 2.11 s 1.01
runtime/NN/vgg11 bn=false (optimize = :all) 2.09 s 2.12 s 0.986
runtime/NN/vgg11 bn=false (optimize = :before_enzyme) 2.18 s 2.23 s 0.974
runtime/NN/vgg11 bn=false (optimize = :only_enzyme) 1.93 s 1.98 s 0.971
runtime/NN/vgg11 bn=true (optimize = :after_enzyme) 2.34 s 2.32 s 1.01
runtime/NN/vgg11 bn=true (optimize = :all) 2.3 s 2.35 s 0.976
runtime/NN/vgg11 bn=true (optimize = :before_enzyme) 2.32 s 2.29 s 1.01
runtime/NN/vgg11 bn=true (optimize = :only_enzyme) 2.38 s 2.5 s 0.952
runtime/NN/vgg13 bn=false (optimize = :after_enzyme) 3.09 s 3.02 s 1.02
runtime/NN/vgg13 bn=false (optimize = :all) 3.05 s 3.02 s 1.01
runtime/NN/vgg13 bn=false (optimize = :before_enzyme) 3.07 s 2.92 s 1.05
runtime/NN/vgg13 bn=false (optimize = :only_enzyme) 2.77 s 2.93 s 0.947
runtime/NN/vgg13 bn=true (optimize = :after_enzyme) 3.27 s 3.41 s 0.958
runtime/NN/vgg13 bn=true (optimize = :all) 3.26 s 3.33 s 0.977
runtime/NN/vgg13 bn=true (optimize = :before_enzyme) 3.28 s 3.22 s 1.02
runtime/NN/vgg13 bn=true (optimize = :only_enzyme) 3.33 s 3.56 s 0.935
runtime/NN/vgg16 bn=false (optimize = :after_enzyme) 3.82 s 4.09 s 0.935
runtime/NN/vgg16 bn=false (optimize = :all) 3.74 s 3.74 s 0.999
runtime/NN/vgg16 bn=false (optimize = :before_enzyme) 3.84 s 3.86 s 0.995
runtime/NN/vgg16 bn=false (optimize = :only_enzyme) 3.79 s 3.77 s 1.01
runtime/NN/vgg16 bn=true (optimize = :after_enzyme) 4.22 s 4.06 s 1.04
runtime/NN/vgg16 bn=true (optimize = :all) 4.33 s 4.06 s 1.07
runtime/NN/vgg16 bn=true (optimize = :before_enzyme) 4.21 s 4.22 s 0.998
runtime/NN/vgg16 bn=true (optimize = :only_enzyme) 4.45 s 4.79 s 0.929
runtime/NN/vgg19 bn=false (optimize = :after_enzyme) 4.66 s 4.77 s 0.976
runtime/NN/vgg19 bn=false (optimize = :all) 4.65 s 4.47 s 1.04
runtime/NN/vgg19 bn=false (optimize = :before_enzyme) 4.66 s 4.76 s 0.979
runtime/NN/vgg19 bn=false (optimize = :only_enzyme) 4.51 s 4.84 s 0.933
runtime/NN/vgg19 bn=true (optimize = :after_enzyme) 5.18 s 5.12 s 1.01
runtime/NN/vgg19 bn=true (optimize = :all) 5.03 s 5.24 s 0.959
runtime/NN/vgg19 bn=true (optimize = :before_enzyme) 5.06 s 5.08 s 0.996
runtime/NN/vgg19 bn=true (optimize = :only_enzyme) 5.6 s 5.71 s 0.981
time_to_load 1.96 ± 0.0078 s 2.03 ± 0.0065 s 0.964

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@avik-pal avik-pal force-pushed the ap/neural_operators branch 2 times, most recently from 43edb9a to 711ae3e Compare November 5, 2024 19:47
test/nn/nnlib.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal marked this pull request as ready for review November 5, 2024 20:13
@avik-pal avik-pal requested a review from wsmoses November 5, 2024 20:33
@wsmoses wsmoses merged commit cc8e24f into main Nov 5, 2024
15 of 30 checks passed
@wsmoses wsmoses deleted the ap/neural_operators branch November 5, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants