Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support tracing scalars #205

Merged
merged 5 commits into from
Oct 29, 2024
Merged

feat: support tracing scalars #205

merged 5 commits into from
Oct 29, 2024

Conversation

avik-pal
Copy link
Collaborator

realized I need this to correctly compile conditionals with scalars assignments in the branches

using Reactant

x = (3, [3.14])

function f(x)
    return x[1] * x[2]
end

x_ra = Reactant.to_rarray(x; track_numbers=(Number,))

f2 = @compile f(x_ra)

f2(Reactant.to_rarray((5, [3.14]); track_numbers=(Number,)))
# 1-element ConcreteRArray{Float64, 1} with indices 1:
#  15.700000000000001

x_ra2 = Reactant.to_rarray(x)

f3 = @compile f(x_ra2)

f3(Reactant.to_rarray((5, [3.14])))
# 1-element ConcreteRArray{Float64, 1} with indices 1:
#  9.42

src/ConcreteRArray.jl Outdated Show resolved Hide resolved
src/ConcreteRArray.jl Outdated Show resolved Hide resolved
src/Tracing.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal force-pushed the ap/compile_scalars branch 2 times, most recently from 8523b92 to 758cdbb Compare October 28, 2024 19:10
@avik-pal avik-pal marked this pull request as ready for review October 28, 2024 19:22
@avik-pal avik-pal requested a review from wsmoses October 28, 2024 19:56
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: 0daba07 Previous: 6866f05 Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1322584227 ns 1331366172 ns 0.99
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1221486116 ns 1333147663 ns 0.92
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1214344564 ns 1530087148 ns 0.79
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2515659454 ns 3054505913 ns 0.82
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 209071673 ns 230499086 ns 0.91
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 6901119113 ns 5310260095 ns 1.30
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5105465355 ns 5117023688 ns 1.00
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 5005652910 ns 5696115262 ns 0.88
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 6886227540 ns 6877852849 ns 1.00
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 34591434718 ns 31332890298 ns 1.10
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1340985005 ns 1377593493 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1302012152 ns 1419824350 ns 0.92
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1320620472.5 ns 1370284838 ns 0.96
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2591812938 ns 2678832914 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8561485.5 ns 8674244 ns 0.99
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1575845217 ns 1711793645 ns 0.92
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1558152104 ns 1593142398 ns 0.98
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1558730196 ns 1553322361 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 2780363922 ns 2788658689 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 3271054443 ns 3829077050 ns 0.85
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1240223491 ns 1292126503 ns 0.96
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1256375982.5 ns 1255035565.5 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1244404617.5 ns 1241708897.5 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2602330731 ns 2627736445 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 21190208 ns 21072763 ns 1.01
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2130668545 ns 2141201304 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2153353124 ns 2150005669 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2143169147 ns 2153988118 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 3377410739 ns 3403989575 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 7238760389 ns 6112741293.5 ns 1.18
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1550484437 ns 1319271524 ns 1.18
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1448096482.5 ns 1321146725.5 ns 1.10
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1374064333 ns 1336696942 ns 1.03
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2684580495 ns 2956277164 ns 0.91
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7058256 ns 7206908 ns 0.98
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1408961160 ns 1446752820 ns 0.97
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1408868905 ns 1420199736 ns 0.99
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1416126215 ns 1419361542 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 2605413634 ns 2620658225 ns 0.99
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1271670341 ns 1343552629 ns 0.95
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1355000484.5 ns 1339740837 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1306460772 ns 1297025776.5 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1262060845 ns 1307783663.5 ns 0.97
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2462923881 ns 2422904622 ns 1.02
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 15077483.5 ns 13705203.5 ns 1.10
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 1694370275 ns 1689997595 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1694509636 ns 1709524846 ns 0.99
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 1689116977 ns 1699909787 ns 0.99
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 2905077920 ns 2910898852 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 3168341384.5 ns 3172820959 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1320147772 ns 1272957214 ns 1.04
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1295019849 ns 1288787604 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1297646562 ns 1292898756 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2497432172 ns 2528780375 ns 0.99
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 25490229 ns 25554565 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 2165694270 ns 2158163346 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2156464565 ns 2164587691 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 2146132438 ns 2157132364 ns 0.99
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 3380408606 ns 3395183827 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 6125358757 ns 6334070118 ns 0.97
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1228753754 ns 1228186625 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1218091304.5 ns 1202678487.5 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1382808870.5 ns 1189947014.5 ns 1.16
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2586400392 ns 2346429480 ns 1.10
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 50283150 ns 50144272.5 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2988784534 ns 2973474372 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 2961909710 ns 3005386081 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2980384618 ns 2959359145 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 4360034270 ns 4361243109 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 10339495840 ns 9378921820 ns 1.10
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1297833585 ns 1247107066 ns 1.04
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1343765606 ns 1226562970 ns 1.10
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1340663875 ns 1232975234.5 ns 1.09
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2460592463 ns 2344405060 ns 1.05
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 68052301 ns 67888329.5 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 3150959760 ns 3168087012 ns 0.99
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3265309119 ns 3147204975 ns 1.04
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 3257425383 ns 3238059309 ns 1.01
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 4580754061 ns 4525581424 ns 1.01
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 14398188408 ns 13793978482 ns 1.04
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1314372616 ns 1191983955 ns 1.10
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1350654984 ns 1202390873 ns 1.12
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1265075998.5 ns 1204541731 ns 1.05
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2610157927 ns 2466522108 ns 1.06
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 19681478.5 ns 19336923 ns 1.02
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1855503761 ns 2207876303 ns 0.84
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 1850937702 ns 1998857639 ns 0.93
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1845721750 ns 1904292223 ns 0.97
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3067838641 ns 3061997168 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 3931238447 ns 3696727697 ns 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

github-actions bot commented Oct 28, 2024

Benchmark Results

main abb4207... main/abb42070a454a4...
comptime/NN/ViT base (optimize = :after_enzyme) 7.02 s 7.23 s 0.97
comptime/NN/ViT base (optimize = :all) 7.24 s 7.61 s 0.952
comptime/NN/ViT base (optimize = :before_enzyme) 7.24 s 7.24 s 1
comptime/NN/ViT base (optimize = :only_enzyme) 7.19 s 7.5 s 0.959
comptime/NN/ViT tiny (optimize = :after_enzyme) 6.57 s 6.69 s 0.982
comptime/NN/ViT tiny (optimize = :all) 6.7 s 6.87 s 0.976
comptime/NN/ViT tiny (optimize = :before_enzyme) 6.61 s 6.64 s 0.995
comptime/NN/ViT tiny (optimize = :only_enzyme) 6.74 s 6.88 s 0.979
comptime/NN/vgg11 bn=false (optimize = :after_enzyme) 0.438 ± 0.02 s 0.441 ± 0.016 s 0.992
comptime/NN/vgg11 bn=false (optimize = :all) 0.435 ± 0.01 s 0.46 ± 0.02 s 0.945
comptime/NN/vgg11 bn=false (optimize = :before_enzyme) 0.434 ± 0.011 s 0.422 ± 0.009 s 1.03
comptime/NN/vgg11 bn=false (optimize = :only_enzyme) 0.433 ± 0.026 s 0.424 ± 0.035 s 1.02
comptime/NN/vgg11 bn=true (optimize = :after_enzyme) 1.17 ± 0.025 s 1.26 ± 0.025 s 0.932
comptime/NN/vgg11 bn=true (optimize = :all) 1.13 ± 0.036 s 1.18 ± 0.037 s 0.962
comptime/NN/vgg11 bn=true (optimize = :before_enzyme) 1.14 ± 0.0081 s 1.22 ± 0.023 s 0.932
comptime/NN/vgg11 bn=true (optimize = :only_enzyme) 1.15 ± 0.021 s 1.13 ± 0.0067 s 1.02
comptime/NN/vgg13 bn=false (optimize = :after_enzyme) 0.49 ± 0.038 s 0.511 ± 0.045 s 0.959
comptime/NN/vgg13 bn=false (optimize = :all) 0.514 ± 0.042 s 0.516 ± 0.0083 s 0.996
comptime/NN/vgg13 bn=false (optimize = :before_enzyme) 0.504 ± 0.037 s 0.543 ± 0.047 s 0.93
comptime/NN/vgg13 bn=false (optimize = :only_enzyme) 0.479 ± 0.0046 s 0.485 ± 0.03 s 0.986
comptime/NN/vgg13 bn=true (optimize = :after_enzyme) 1.34 ± 0.036 s 1.38 ± 0.028 s 0.967
comptime/NN/vgg13 bn=true (optimize = :all) 1.4 ± 0.025 s 1.4 ± 0.028 s 1
comptime/NN/vgg13 bn=true (optimize = :before_enzyme) 1.37 ± 0.016 s 1.39 ± 0.042 s 0.986
comptime/NN/vgg13 bn=true (optimize = :only_enzyme) 1.38 ± 0.038 s 1.39 ± 0.05 s 0.992
comptime/NN/vgg16 bn=false (optimize = :after_enzyme) 0.579 ± 0.0065 s 0.584 ± 0.027 s 0.992
comptime/NN/vgg16 bn=false (optimize = :all) 0.611 ± 0.015 s 0.606 ± 0.019 s 1.01
comptime/NN/vgg16 bn=false (optimize = :before_enzyme) 0.578 ± 0.0089 s 0.591 ± 0.029 s 0.978
comptime/NN/vgg16 bn=false (optimize = :only_enzyme) 0.579 ± 0.015 s 0.616 ± 0.049 s 0.941
comptime/NN/vgg16 bn=true (optimize = :after_enzyme) 1.81 ± 0.0044 s 1.77 ± 0.013 s 1.02
comptime/NN/vgg16 bn=true (optimize = :all) 1.77 ± 0.0087 s 1.8 ± 0.016 s 0.98
comptime/NN/vgg16 bn=true (optimize = :before_enzyme) 1.82 ± 0.014 s 1.77 ± 0.0067 s 1.03
comptime/NN/vgg16 bn=true (optimize = :only_enzyme) 1.74 ± 0.038 s 1.83 ± 0.0051 s 0.951
comptime/NN/vgg19 bn=false (optimize = :after_enzyme) 0.655 ± 0.022 s 0.664 ± 0.013 s 0.987
comptime/NN/vgg19 bn=false (optimize = :all) 0.687 ± 0.022 s 0.715 ± 0.046 s 0.961
comptime/NN/vgg19 bn=false (optimize = :before_enzyme) 0.668 ± 0.04 s 0.699 ± 0.005 s 0.956
comptime/NN/vgg19 bn=false (optimize = :only_enzyme) 0.661 ± 0.019 s 0.675 ± 0.0098 s 0.979
comptime/NN/vgg19 bn=true (optimize = :after_enzyme) 2.14 ± 0.023 s 2.19 ± 0.031 s 0.979
comptime/NN/vgg19 bn=true (optimize = :all) 2.11 ± 0.0038 s 2.21 ± 0.0073 s 0.954
comptime/NN/vgg19 bn=true (optimize = :before_enzyme) 2.16 ± 0.026 s 2.17 ± 0.019 s 0.996
comptime/NN/vgg19 bn=true (optimize = :only_enzyme) 2.4 s 2.1 ± 0.0022 s 1.14
comptime/basics/2D sum (optimize = :after_enzyme) 28.5 ± 1.4 ms 28.3 ± 1.1 ms 1.01
comptime/basics/2D sum (optimize = :all) 0.0327 ± 0.0016 s 0.0321 ± 0.0013 s 1.02
comptime/basics/2D sum (optimize = :before_enzyme) 30.1 ± 1.5 ms 30.1 ± 1.2 ms 1
comptime/basics/2D sum (optimize = :only_enzyme) 25.1 ± 1.6 ms 24.8 ± 1 ms 1.01
comptime/basics/cos.(x) (optimize = :after_enzyme) 0.0357 ± 0.0022 s 0.034 ± 0.0011 s 1.05
comptime/basics/cos.(x) (optimize = :all) 0.0373 ± 0.0012 s 0.037 ± 0.0011 s 1.01
comptime/basics/cos.(x) (optimize = :before_enzyme) 0.0356 ± 0.0012 s 0.0356 ± 0.0009 s 1
comptime/basics/cos.(x) (optimize = :only_enzyme) 0.0323 ± 0.0016 s 0.0316 ± 0.0012 s 1.02
comptime/basics/∇cos (optimize = :all) 0.0527 ± 0.0022 s 0.0532 ± 0.002 s 0.99
runtime/NN/ViT base (optimize = :after_enzyme) 6.38 s 6.35 s 1
runtime/NN/ViT base (optimize = :all) 6.5 s 6.45 s 1.01
runtime/NN/ViT base (optimize = :before_enzyme) 6.47 s 6.41 s 1.01
runtime/NN/ViT base (optimize = :only_enzyme) 7.8 s 7.7 s 1.01
runtime/NN/ViT tiny (optimize = :after_enzyme) 1.68 s 1.7 s 0.986
runtime/NN/ViT tiny (optimize = :all) 1.71 s 1.63 s 1.05
runtime/NN/ViT tiny (optimize = :before_enzyme) 1.7 s 1.73 s 0.986
runtime/NN/ViT tiny (optimize = :only_enzyme) 2.73 s 2.63 s 1.04
runtime/NN/vgg11 bn=false (optimize = :after_enzyme) 2.13 s 2.1 s 1.02
runtime/NN/vgg11 bn=false (optimize = :all) 2.18 s 2.1 s 1.04
runtime/NN/vgg11 bn=false (optimize = :before_enzyme) 2.17 s 2.09 s 1.04
runtime/NN/vgg11 bn=false (optimize = :only_enzyme) 1.97 s 1.95 s 1.01
runtime/NN/vgg11 bn=true (optimize = :after_enzyme) 2.27 s 2.35 s 0.968
runtime/NN/vgg11 bn=true (optimize = :all) 2.27 s 2.33 s 0.971
runtime/NN/vgg11 bn=true (optimize = :before_enzyme) 2.24 s 2.32 s 0.967
runtime/NN/vgg11 bn=true (optimize = :only_enzyme) 2.39 s 2.37 s 1.01
runtime/NN/vgg13 bn=false (optimize = :after_enzyme) 3 s 3 s 1
runtime/NN/vgg13 bn=false (optimize = :all) 3.04 s 3.04 s 0.999
runtime/NN/vgg13 bn=false (optimize = :before_enzyme) 3.05 s 3.04 s 1.01
runtime/NN/vgg13 bn=false (optimize = :only_enzyme) 2.93 s 2.8 s 1.05
runtime/NN/vgg13 bn=true (optimize = :after_enzyme) 3.25 s 3.2 s 1.02
runtime/NN/vgg13 bn=true (optimize = :all) 3.31 s 3.26 s 1.02
runtime/NN/vgg13 bn=true (optimize = :before_enzyme) 3.27 s 3.3 s 0.99
runtime/NN/vgg13 bn=true (optimize = :only_enzyme) 3.39 s 3.4 s 0.996
runtime/NN/vgg16 bn=false (optimize = :after_enzyme) 3.79 s 3.87 s 0.979
runtime/NN/vgg16 bn=false (optimize = :all) 3.82 s 3.84 s 0.994
runtime/NN/vgg16 bn=false (optimize = :before_enzyme) 3.85 s 3.83 s 1
runtime/NN/vgg16 bn=false (optimize = :only_enzyme) 3.67 s 3.71 s 0.989
runtime/NN/vgg16 bn=true (optimize = :after_enzyme) 4.2 s 4.13 s 1.02
runtime/NN/vgg16 bn=true (optimize = :all) 4.21 s 4.2 s 1
runtime/NN/vgg16 bn=true (optimize = :before_enzyme) 4.29 s 4.21 s 1.02
runtime/NN/vgg16 bn=true (optimize = :only_enzyme) 4.49 s 4.39 s 1.02
runtime/NN/vgg19 bn=false (optimize = :after_enzyme) 4.53 s 4.57 s 0.991
runtime/NN/vgg19 bn=false (optimize = :all) 4.61 s 4.59 s 1
runtime/NN/vgg19 bn=false (optimize = :before_enzyme) 4.54 s 4.68 s 0.971
runtime/NN/vgg19 bn=false (optimize = :only_enzyme) 4.61 s 4.48 s 1.03
runtime/NN/vgg19 bn=true (optimize = :after_enzyme) 5.14 s 5.18 s 0.99
runtime/NN/vgg19 bn=true (optimize = :all) 5.13 s 5.09 s 1.01
runtime/NN/vgg19 bn=true (optimize = :before_enzyme) 5.17 s 5.06 s 1.02
runtime/NN/vgg19 bn=true (optimize = :only_enzyme) 5.62 s 5.59 s 1.01
time_to_load 2.01 ± 0.0079 s 2.03 ± 0.035 s 0.995

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

src/ConcreteRArray.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal merged commit fce399c into main Oct 29, 2024
16 of 24 checks passed
@avik-pal avik-pal deleted the ap/compile_scalars branch October 29, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants