Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tuned moe configs v2 #33

Merged
merged 4 commits into from
Jun 14, 2024
Merged

tuned moe configs v2 #33

merged 4 commits into from
Jun 14, 2024

Conversation

divakar-amd
Copy link

Update the fused moe config.json files. These config files utilizes all the available Triton kernel parameters for tuning. Used by both Prefill and Decode fused_moe kernels.

Update the fused moe config.json files. These config files utilizes all the available Triton kernel parameters for tuning.
@divakar-amd divakar-amd self-assigned this Jun 5, 2024
@hthangirala
Copy link

As per our discussion today, update this PR with the following:

  1. Tuning script used to generate this config.
  2. Collect perf with tuning using latest vllm (use Matt's docker to save time: rocm/pytorch-private:vllm0.4.3_ROCm6.1_exec_dashboard_pretuned_0604)
  3. Collect perf on H100 perf numbers to compare:
    • Using upstream which defaults to ray
    • Using upstream with torchrun. (Check with Matt on how he got our near-upstream vllm to run with torchrun)

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave the final confirmation for Hari - but we are good to go as far as the state of our master branch is concerned.

@divakar-amd divakar-amd force-pushed the divakar-amd-patch-1 branch 2 times, most recently from d3eaa94 to af93dba Compare June 11, 2024 21:58
@hthangirala
Copy link

Couple items we discussed to complete this PR:

  1. Post Mixtral improvement with this PR using TP=1, 2, 4, 8
  2. Confirm moe tuning using bfloat16 provides similar moe kernel perf (time) if using the tuning config from this PR (done using float16).

@divakar-amd
Copy link
Author

Docker used: rocm/pytorch-private:vllm0.4.3_ROCm6.1_exec_dashboard_pretuned_0605

Table below shows the mixtral mi300 timings (seconds) with v1 & v2 (this PR) moe tuned configs
image

Calculating % gain observed from the above table
image

The charts below show timings with v1 & v2 moe tuned configs
image
image
image
image

The table below shows TP=8 fp16 vs bf16 numbers using the v2 config. The bf16 numbers were similar to that obtained when tuned with bf16 datatype.
image

Copy link

@hthangirala hthangirala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the measurements!

@hthangirala hthangirala merged commit 38ada92 into main Jun 14, 2024
13 checks passed
@divakar-amd
Copy link
Author

Fixed. Reverted the revert. Added separate commit for init files fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants