Skip to content

Releases: ROCm/hipBLASLt

hipBLASLt 0.2.0 for ROCm 5.6.1

29 Aug 20:11
Compare
Choose a tag to compare

Added

  • Added CI tests for tensilelite
  • Initilized extension group gemm APIs (FP16 only)
  • Added group gemm sample app: example_hipblaslt_groupedgemm

Fixed

  • Fixed ScaleD kernel incorrect results

Optimizations

  • Tuned equality sizes for HHS data type
  • Reduced host side overhead for hipblasLtMatmul()
  • Removed unused kernel arguments
  • Schedule valus setup before first s_waitcnt
  • Refactored tensilelite host codes
  • Optimized building time

hipBLASLt 0.2.0 for ROCm 5.6.0

28 Jun 23:17
Compare
Choose a tag to compare

Added

  • Added CI tests for tensilelite
  • Initilized extension group gemm APIs (FP16 only)
  • Added group gemm sample app: example_hipblaslt_groupedgemm

Fixed

  • Fixed ScaleD kernel incorrect results

Optimizations

  • Tuned equality sizes for HHS data type
  • Reduced host side overhead for hipblasLtMatmul()
  • Removed unused kernel arguments
  • Schedule valus setup before first s_waitcnt
  • Refactored tensilelite host codes
  • Optimized building time

hipBLASLt 0.1.0 for ROCm 5.5.1

24 May 19:06
Compare
Choose a tag to compare

hipBLASLt code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

hipBLASLt 0.1.0 for ROCm 5.5.0

01 May 21:03
Compare
Choose a tag to compare

Added

  • Enable hipBLASLt APIs
  • Support gfx90a
  • Support problem type: fp32, fp16, bf16
  • Support activation: relu, gelu
  • Support bias vector
  • Support Scale D vector
  • Integreate with tensilelite kernel generator
  • Add Gtest: hipblaslt-test
  • Add full function tool: hipblaslt-bench
  • Add sample app: example_hipblaslt_preference

Optimizations

  • Gridbase solution search algorithm for untuned size
  • Tune 10k sizes for each problem type