tuned moe configs v2 #33

divakar-amd · 2024-06-05T14:30:36Z

Update the fused moe config.json files. These config files utilizes all the available Triton kernel parameters for tuning. Used by both Prefill and Decode fused_moe kernels.

Update the fused moe config.json files. These config files utilizes all the available Triton kernel parameters for tuning.

hthangirala · 2024-06-05T22:33:13Z

As per our discussion today, update this PR with the following:

Tuning script used to generate this config.
Collect perf with tuning using latest vllm (use Matt's docker to save time: rocm/pytorch-private:vllm0.4.3_ROCm6.1_exec_dashboard_pretuned_0604)
Collect perf on H100 perf numbers to compare:
- Using upstream which defaults to ray
- Using upstream with torchrun. (Check with Matt on how he got our near-upstream vllm to run with torchrun)

shajrawi

I will leave the final confirmation for Hari - but we are good to go as far as the state of our master branch is concerned.

hthangirala · 2024-06-12T23:56:30Z

Couple items we discussed to complete this PR:

Post Mixtral improvement with this PR using TP=1, 2, 4, 8
Confirm moe tuning using bfloat16 provides similar moe kernel perf (time) if using the tuning config from this PR (done using float16).

divakar-amd · 2024-06-14T15:49:33Z

Docker used: rocm/pytorch-private:vllm0.4.3_ROCm6.1_exec_dashboard_pretuned_0605

Table below shows the mixtral mi300 timings (seconds) with v1 & v2 (this PR) moe tuned configs

Calculating % gain observed from the above table

The charts below show timings with v1 & v2 moe tuned configs

The table below shows TP=8 fp16 vs bf16 numbers using the v2 config. The bf16 numbers were similar to that obtained when tuned with bf16 datatype.

hthangirala

Thanks for the measurements!

divakar-amd · 2024-06-14T19:20:32Z

Fixed. Reverted the revert. Added separate commit for init files fix.

tuned moe configs v2

7716f47

Update the fused moe config.json files. These config files utilizes all the available Triton kernel parameters for tuning.

divakar-amd requested review from hthangirala and shajrawi June 5, 2024 14:30

divakar-amd self-assigned this Jun 5, 2024

shajrawi reviewed Jun 6, 2024

View reviewed changes

add moe tuning script v2

91acb71

divakar-amd force-pushed the divakar-amd-patch-1 branch 2 times, most recently from d3eaa94 to af93dba Compare June 11, 2024 21:58

make ruff & yapf & isort happy

eb80843

divakar-amd force-pushed the divakar-amd-patch-1 branch from af93dba to eb80843 Compare June 11, 2024 22:03

[nit] update __init__ for tuning script

fd04661

divakar-amd force-pushed the divakar-amd-patch-1 branch from 463f981 to fd04661 Compare June 13, 2024 14:31

hthangirala approved these changes Jun 14, 2024

View reviewed changes

hthangirala merged commit 38ada92 into main Jun 14, 2024
13 checks passed

hthangirala mentioned this pull request Jun 14, 2024

Revert "Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe kern…" #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tuned moe configs v2 #33

tuned moe configs v2 #33

divakar-amd commented Jun 5, 2024

hthangirala commented Jun 5, 2024

shajrawi left a comment

hthangirala commented Jun 12, 2024

divakar-amd commented Jun 14, 2024

hthangirala left a comment

divakar-amd commented Jun 14, 2024

tuned moe configs v2 #33

tuned moe configs v2 #33

Conversation

divakar-amd commented Jun 5, 2024

hthangirala commented Jun 5, 2024

shajrawi left a comment

Choose a reason for hiding this comment

hthangirala commented Jun 12, 2024

divakar-amd commented Jun 14, 2024

hthangirala left a comment

Choose a reason for hiding this comment

divakar-amd commented Jun 14, 2024