-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support tracing scalars #205
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wsmoses
reviewed
Oct 28, 2024
avik-pal
force-pushed
the
ap/compile_scalars
branch
2 times, most recently
from
October 28, 2024 19:10
8523b92
to
758cdbb
Compare
avik-pal
force-pushed
the
ap/compile_scalars
branch
from
October 28, 2024 20:17
a20086b
to
6d2d35b
Compare
wsmoses
approved these changes
Oct 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: 0daba07 | Previous: 6866f05 | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1322584227 ns |
1331366172 ns |
0.99 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1221486116 ns |
1333147663 ns |
0.92 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1214344564 ns |
1530087148 ns |
0.79 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2515659454 ns |
3054505913 ns |
0.82 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
209071673 ns |
230499086 ns |
0.91 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
6901119113 ns |
5310260095 ns |
1.30 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5105465355 ns |
5117023688 ns |
1.00 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
5005652910 ns |
5696115262 ns |
0.88 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
6886227540 ns |
6877852849 ns |
1.00 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
34591434718 ns |
31332890298 ns |
1.10 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1340985005 ns |
1377593493 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1302012152 ns |
1419824350 ns |
0.92 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1320620472.5 ns |
1370284838 ns |
0.96 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2591812938 ns |
2678832914 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8561485.5 ns |
8674244 ns |
0.99 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1575845217 ns |
1711793645 ns |
0.92 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1558152104 ns |
1593142398 ns |
0.98 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1558730196 ns |
1553322361 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
2780363922 ns |
2788658689 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
3271054443 ns |
3829077050 ns |
0.85 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1240223491 ns |
1292126503 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1256375982.5 ns |
1255035565.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1244404617.5 ns |
1241708897.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2602330731 ns |
2627736445 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
21190208 ns |
21072763 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2130668545 ns |
2141201304 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2153353124 ns |
2150005669 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2143169147 ns |
2153988118 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
3377410739 ns |
3403989575 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
7238760389 ns |
6112741293.5 ns |
1.18 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1550484437 ns |
1319271524 ns |
1.18 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1448096482.5 ns |
1321146725.5 ns |
1.10 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1374064333 ns |
1336696942 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2684580495 ns |
2956277164 ns |
0.91 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7058256 ns |
7206908 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1408961160 ns |
1446752820 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1408868905 ns |
1420199736 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1416126215 ns |
1419361542 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
2605413634 ns |
2620658225 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1271670341 ns |
1343552629 ns |
0.95 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1355000484.5 ns |
1339740837 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1306460772 ns |
1297025776.5 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1262060845 ns |
1307783663.5 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2462923881 ns |
2422904622 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
15077483.5 ns |
13705203.5 ns |
1.10 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
1694370275 ns |
1689997595 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1694509636 ns |
1709524846 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
1689116977 ns |
1699909787 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
2905077920 ns |
2910898852 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
3168341384.5 ns |
3172820959 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1320147772 ns |
1272957214 ns |
1.04 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1295019849 ns |
1288787604 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1297646562 ns |
1292898756 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2497432172 ns |
2528780375 ns |
0.99 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
25490229 ns |
25554565 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
2165694270 ns |
2158163346 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2156464565 ns |
2164587691 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
2146132438 ns |
2157132364 ns |
0.99 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
3380408606 ns |
3395183827 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
6125358757 ns |
6334070118 ns |
0.97 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1228753754 ns |
1228186625 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1218091304.5 ns |
1202678487.5 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1382808870.5 ns |
1189947014.5 ns |
1.16 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2586400392 ns |
2346429480 ns |
1.10 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
50283150 ns |
50144272.5 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2988784534 ns |
2973474372 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2961909710 ns |
3005386081 ns |
0.99 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2980384618 ns |
2959359145 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
4360034270 ns |
4361243109 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
10339495840 ns |
9378921820 ns |
1.10 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1297833585 ns |
1247107066 ns |
1.04 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1343765606 ns |
1226562970 ns |
1.10 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1340663875 ns |
1232975234.5 ns |
1.09 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2460592463 ns |
2344405060 ns |
1.05 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
68052301 ns |
67888329.5 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
3150959760 ns |
3168087012 ns |
0.99 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3265309119 ns |
3147204975 ns |
1.04 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
3257425383 ns |
3238059309 ns |
1.01 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
4580754061 ns |
4525581424 ns |
1.01 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
14398188408 ns |
13793978482 ns |
1.04 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1314372616 ns |
1191983955 ns |
1.10 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1350654984 ns |
1202390873 ns |
1.12 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1265075998.5 ns |
1204541731 ns |
1.05 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2610157927 ns |
2466522108 ns |
1.06 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
19681478.5 ns |
19336923 ns |
1.02 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1855503761 ns |
2207876303 ns |
0.84 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1850937702 ns |
1998857639 ns |
0.93 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1845721750 ns |
1904292223 ns |
0.97 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3067838641 ns |
3061997168 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
3931238447 ns |
3696727697 ns |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
avik-pal
force-pushed
the
ap/compile_scalars
branch
from
October 28, 2024 22:11
6d2d35b
to
1d0648e
Compare
mofeing
reviewed
Oct 28, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
realized I need this to correctly compile conditionals with scalars assignments in the branches