Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support AVX2 for run_container_to_uint32_array #642

Merged
merged 2 commits into from
Jul 22, 2024

Commits on Jul 20, 2024

  1. support AVX2 for run_container_to_uint32_array

    1. support AVX for run_container_to_uint32_array
    2. add dense range for run container
    
    baseline
    ```
    
     number of values in container = 256
    run_container_to_uint32_array(out, Bt, 1234):  3.64 cycles per operation
    
     number of values in container = 2018
    run_container_to_uint32_array(out, Bt, 1234):  3.07 cycles per operation
    
     number of values in container = 14498
    run_container_to_uint32_array(out, Bt, 1234):  3.47 cycles per operation
    
     number of values in container = 7826
    run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation
    
     number of values in container = 8152
    run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation
    
     number of values in container = 8189
    run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation
    
     number of values in container = 8191
    run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation
    
    ```
    
    AVX2 version:
    ```
    
     number of values in container = 256
    run_container_to_uint32_array(out, Bt, 1234):  4.38 cycles per operation
    
     number of values in container = 2018
    run_container_to_uint32_array(out, Bt, 1234):  3.77 cycles per operation
    
     number of values in container = 14498
    run_container_to_uint32_array(out, Bt, 1234):  4.19 cycles per operation
    
     number of values in container = 7826
    run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation
    
     number of values in container = 8152
    run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation
    
     number of values in container = 8189
    run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation
    
     number of values in container = 8191
    run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation
    
    ```
    
    SIMD version works well on dense case. However, if the length of each runs is small, a single operation will have an if additional overhead.
    stdpain committed Jul 20, 2024
    Configuration menu
    Copy the full SHA
    e2e0cd0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6ca047b View commit details
    Browse the repository at this point in the history