Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support AVX2 for run_container_to_uint32_array #642

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

stdpain
Copy link
Contributor

@stdpain stdpain commented Jul 20, 2024

#454

  1. support AVX for run_container_to_uint32_array
  2. add dense range for run container

benchmark result:

sparse:

values baseline AVX2
256 3.64 cycles 3.68 cycles
2018 3.07 cycles 3.07 cycles
14498 3.47 cycles 3.57 cycles

dense:

values baseline AVX2
7826 0.18 cycles 0.10 cycles
8152 0.18 cycles 0.10 cycles
8189 0.18 cycles 0.10 cycles
8191 0.18 cycles 0.10 cycles

1. support AVX for run_container_to_uint32_array
2. add dense range for run container

baseline
```

 number of values in container = 256
run_container_to_uint32_array(out, Bt, 1234):  3.64 cycles per operation

 number of values in container = 2018
run_container_to_uint32_array(out, Bt, 1234):  3.07 cycles per operation

 number of values in container = 14498
run_container_to_uint32_array(out, Bt, 1234):  3.47 cycles per operation

 number of values in container = 7826
run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation

 number of values in container = 8152
run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation

 number of values in container = 8189
run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation

 number of values in container = 8191
run_container_to_uint32_array(out, Bt, 1234):  0.18 cycles per operation

```

AVX2 version:
```

 number of values in container = 256
run_container_to_uint32_array(out, Bt, 1234):  4.38 cycles per operation

 number of values in container = 2018
run_container_to_uint32_array(out, Bt, 1234):  3.77 cycles per operation

 number of values in container = 14498
run_container_to_uint32_array(out, Bt, 1234):  4.19 cycles per operation

 number of values in container = 7826
run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation

 number of values in container = 8152
run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation

 number of values in container = 8189
run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation

 number of values in container = 8191
run_container_to_uint32_array(out, Bt, 1234):  0.10 cycles per operation

```

SIMD version works well on dense case. However, if the length of each runs is small, a single operation will have an if additional overhead.
@stdpain stdpain force-pushed the support_avx2_for_run_to_u32array branch from 895fb54 to e2e0cd0 Compare July 20, 2024 09:09
src/containers/run.c Outdated Show resolved Hide resolved
@stdpain stdpain force-pushed the support_avx2_for_run_to_u32array branch from 4ec1eaf to ddb2b1a Compare July 20, 2024 14:41
@stdpain stdpain force-pushed the support_avx2_for_run_to_u32array branch from ddb2b1a to 6ca047b Compare July 20, 2024 14:51
@lemire
Copy link
Member

lemire commented Jul 20, 2024

@stdpain Can you update your benchmark results?

@stdpain
Copy link
Contributor Author

stdpain commented Jul 21, 2024

result update

@lemire
Copy link
Member

lemire commented Jul 22, 2024

I manually verified that it appears helpful. Merging. This will part of the next release.

@lemire lemire merged commit e326af3 into RoaringBitmap:master Jul 22, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants