-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to CRoaring 2.0.2 #119
Conversation
02946de
to
82fca7e
Compare
If you think that there is a bug in CRoaring, the best first step would be to push the issue upstream with a reproducible test case. We can also add more fuzzing to CRoaring.
I am not sure what the question is. Can you specify which bitmap and which rules? |
@lemire, I'm not sure if it would be considered a bug or not. The issue is that a deserialized bitmap may not be "internally valid". In the specific case I found (C example code below), it appears the deserialized bitmap has a bitset container which says it has a different cardinality than the number of bits actually set in the bits. I'm asking: How bad is that. What can we expect from operating on a bitmap which isn't "internally valid"? I'm worried about e.g. we could probably get a negative cardinality since cardinality is stored as an i32, and what that could do, etc. Will it just return possibly nonsense, or could it invoke full undefined behavior? Example CRoaring Code#include <roaring/roaring.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "test.h"
const char bad_portable_bin[8223] = {
59, 48, 2, 0, 1, 0, 0, 255, 239, 2, 0, 1, 0, 8, 0, 255, 127, 2, 0, 0, 0, 255, 143, 0, 160, 255, 95, 0, 0, 5, 0,
85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
};
int main() {
roaring_bitmap_t *bitmap = roaring_bitmap_portable_deserialize_safe(bad_portable_bin, sizeof(bad_portable_bin));
assert(bitmap != NULL);
const char *reason = NULL;
int result = EXIT_FAILURE;
if (!roaring_bitmap_internal_validate(bitmap, &reason)) {
printf("safely deserialized invalid bitmap: %s\n", reason);
goto finish;
}
result = EXIT_SUCCESS;
finish:
roaring_bitmap_free(bitmap);
return result;
} This will output "safely deserialized invalid bitmap: cardinality is incorrect", as assigned here: And finally, is just running |
Possibly quite bad. As in... it could lead to hard crashes. I don't have an example in mind, but I would not assume that it is safe.
That's certainly possible because we serialize both independently. |
@saulius, I'm recommending we add a call to validate the bitmap after deserializing, and add an unsafe method to allow deserializing without that validation for performance. Maybe yanking 2.0.0 once we release a fix? Also, I'm not super sure how to fix the buildtime_bindgen builds. It seems related to #110, although it's a different error. It does bindgen just fine on my machine (an M1 mac) so it's a bit hard for me to debug. @lemire it's also a bit unexpected that CRoaring (as a library) writes directly to stderr, running the fuzzing, I get a lot of output e.g. I think I checked for all the assumptions I think CRoaring might make in |
It shouldn't. I will write a PR. |
per [github thread][1], deserialized bitmaps aren't safe to use unless validated. For maximum performance, also provide an unsafe deserialize function which does not do validation, `try_deserialize_unchecked` [1]: RoaringBitmap#119 (comment)
82fca7e
to
e08e066
Compare
1a7e7de
to
8c9afae
Compare
per [github thread][1], deserialized bitmaps aren't safe to use unless validated. For maximum performance, also provide an unsafe deserialize function which does not do validation, `try_deserialize_unchecked` [1]: RoaringBitmap#119 (comment)
8c9afae
to
e024d27
Compare
aba766f
to
9c6414b
Compare
9c6414b
to
8256f71
Compare
I'm not sure how to get it working, and we don't need it. I've verfied that all the current bindings are totally independant of the environment: croaring doesn't use any #ifdefs to change any types defined in the APIs. All types in the API are the same for all architectures (that's not to say the size of the types are the same, for instance, size_t is used, but that is translated to the single type usize. Disable bindgen layout tests. These are archetecture specific (and really are just testing bindgen's own implementation.
Despite the major version bump, it seems nothing actually major has changed
Call internal validation function after each step in fuzzing
per [github thread][1], deserialized bitmaps aren't safe to use unless validated. For maximum performance, also provide an unsafe deserialize function which does not do validation, `try_deserialize_unchecked` [1]: RoaringBitmap#119 (comment)
8256f71
to
d149518
Compare
@saulius, I'm planning to merge this in the next few days, especially because of the memory unsafely possibility in deserialization currently, unless you have any issues. |
Thanks for fixing this @Dr-Emann.
Do you mean yank |
D'oh! Yes, that's what I meant. Got confused with 1.0 of croaring-rs and the recent bump to 2.0 of CRoaring. |
per [github thread][1], deserialized bitmaps aren't safe to use unless validated. For maximum performance, also provide an unsafe deserialize function which does not do validation, `try_deserialize_unchecked` [1]: #119 (comment)
See CRoaring release 2.0.0 and CRoaring release 2.0.1. I don't believe there's any significant changes from our perspective, other than a new internal validation function.
Adds a binding for the new
roaring_bitmap_internal_validate
function for testing purposes.Add a fuzzer for deserialization, and include calls to
internal_validate
.Per below discussion, deserialization functions now call
internal_validate
before returning a bitmap, since an inconsistent bitmap can lead to adverse behavior possibly including UB. This also adds atry_deserialize_unchecked
unsafe function to allow fully unsafe deserialization when the caller knows the data a valid bitmap.This removes the buildtime_bindgen feature (the feature itself is still there, but it is now a no-op). I've verfied that all the current bindings are totally independant of the environment: croaring doesn't use any #ifdefs to change any types defined in the APIs. All types in the API are the same for all architectures (that's not to say the size of the types are the same, for instance, size_t is used, but that is translated to the single type usize).