Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

Merged
merged 12 commits into from
Dec 11, 2023

Commits on Dec 8, 2023

  1. Add support to f16 and bf16 to contraction

    - Support _Float16
    - Support hip_bfloat16
    - Add unit test of _Float16 and hip_bfloat16
    - Add sample of _Float16 and hip_bfloat16
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    c5fbcec View commit details
    Browse the repository at this point in the history
  2. Add support to f32_f16, f32_bf16, f64_f32 to contraction

    - Support ABCD data type f32 and compute type f16, bf16
    - Support ABCD data type f64 and compute type f32
    - Fixed bug: alpha, beta were passed in as wrong data type in unit test
    of contraction
    - Create sample template of contraction
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    185a2ab View commit details
    Browse the repository at this point in the history
  3. Add placeholder for solution unique_id

    Solution unique_ids of Actor Critic are have not been ready yet, but we
    put some placeholders in the new Actor Critic to make the unit tests be
    able to pass.
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    ab8d557 View commit details
    Browse the repository at this point in the history
  4. Update contraction device instances

    Update contraction device instances since CK has updated them.
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    df27e32 View commit details
    Browse the repository at this point in the history
  5. Print C in sample output

    1. Initiate the data with 0.01, 0.02, ... by default
    2. Print C
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    f85df83 View commit details
    Browse the repository at this point in the history
  6. Set CK contraction instance only run once

    When logger level is set to HIPTENSOR_LOG_LEVEL_PERF_TRACE, we make CK
    instances measure the running time. The problem is that CK internally
    will run the contraction 10 times by default. This leads to an issues:
    
    1. It returns wrong result for C = alpha A x B + beta C
    
    Set StreamConfig.nrepeat_ = 1, the contraction will be run once
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    5c45a8c View commit details
    Browse the repository at this point in the history
  7. Fixed a bug in CPU reference

    1. ck::bhalf_t cannot cast to float or double by static_cast.
    Use ck::type_convert() to fix it.
    
    2. epsilon() is not good value to measure the relative difference of
    data. It is too small for double (eps < 10e-13).
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    f631818 View commit details
    Browse the repository at this point in the history
  8. Add commnets

    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    e5cefe7 View commit details
    Browse the repository at this point in the history
  9. Rename contraction sameple files

    The pattern of contraction sameple file is
    
    - bilinear: simple_bilinear_contraction_<A>_<B>_<C>_<D>_compute_<compute>.cpp
    - scale   : simple_scale_contraction_<A>_<B>_<C>_compute_<compute>.cpp
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    4345a1c View commit details
    Browse the repository at this point in the history
  10. Improve CPU reference accurary

    The relative difference between contraction result and CPU reference is
    less than 0.1% after the improvement.
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    43f33ee View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    fec9065 View commit details
    Browse the repository at this point in the history
  12. Update CPU reference

    1. Revert the default threshold of relative difference to (100 * std::numeric_limits<T>::epsilon())
    2. Update CPU reference to make the difference between CPU reference and output of contraction instance
    is less than (100 * std::numeric_limits<T>::epsilon()).
    CongMa13 committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    b21fe0b View commit details
    Browse the repository at this point in the history