Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding fp8 to gradlib #44

Merged
merged 15 commits into from
Jun 10, 2024
Merged

Adding fp8 to gradlib #44

merged 15 commits into from
Jun 10, 2024

Conversation

charlifu
Copy link

@charlifu charlifu commented Jun 7, 2024

This PR add fp8 gemm tunner functionality to gradlib.

  • add optional function arguments for the output type and scaling factors in function HipbSolIdxBlas and HipbFindAllSolIdxBlas. This should allow us to not break old fp16 gemm tunner (Need to be tested).
  • add a fp8_gemm_tuner.py to read input shapes for fp8 gemm and output the best solution idx in hipblaslt. This should be merged into exisiting gemm_tuner.py in the future.
  • Add instructions to run and tune fp8 gemm.

mawong-amd and others added 2 commits June 7, 2024 18:22
@mawong-amd mawong-amd force-pushed the charlifu/adding_fp8_gradlib branch from 11726d3 to a2d13df Compare June 7, 2024 18:40
@mawong-amd mawong-amd changed the base branch from main to 531_merge_linting June 7, 2024 18:41
@mawong-amd mawong-amd force-pushed the charlifu/adding_fp8_gradlib branch from a2d13df to 03e97c2 Compare June 7, 2024 18:43
@mawong-amd mawong-amd changed the base branch from 531_merge_linting to main June 7, 2024 20:53
@charlifu
Copy link
Author

@gshtras just tested gradlib for fp16 for single gpu execution of 7b, and performance improvement is observed.

@charlifu charlifu merged commit d254de7 into main Jun 10, 2024
13 checks passed
@charlifu charlifu deleted the charlifu/adding_fp8_gradlib branch June 14, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants