Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to CRoaring 2.0.2 #119

Merged
merged 7 commits into from
Oct 4, 2023
Merged

Conversation

Dr-Emann
Copy link
Member

@Dr-Emann Dr-Emann commented Sep 26, 2023

See CRoaring release 2.0.0 and CRoaring release 2.0.1. I don't believe there's any significant changes from our perspective, other than a new internal validation function.

Adds a binding for the new roaring_bitmap_internal_validate function for testing purposes.

Add a fuzzer for deserialization, and include calls to internal_validate.

Per below discussion, deserialization functions now call internal_validate before returning a bitmap, since an inconsistent bitmap can lead to adverse behavior possibly including UB. This also adds a try_deserialize_unchecked unsafe function to allow fully unsafe deserialization when the caller knows the data a valid bitmap.

This removes the buildtime_bindgen feature (the feature itself is still there, but it is now a no-op). I've verfied that all the current bindings are totally independant of the environment: croaring doesn't use any #ifdefs to change any types defined in the APIs. All types in the API are the same for all architectures (that's not to say the size of the types are the same, for instance, size_t is used, but that is translated to the single type usize).

@Dr-Emann Dr-Emann force-pushed the internal_validate branch 2 times, most recently from 02946de to 82fca7e Compare September 26, 2023 03:15
@lemire
Copy link
Member

lemire commented Sep 26, 2023

@Dr-Emann

Unfortunately, this is finding some issues

If you think that there is a bug in CRoaring, the best first step would be to push the issue upstream with a reproducible test case. We can also add more fuzzing to CRoaring.

do you have any handle on how unsafe it might be to handle a bitmap which doesn't follow the rules like that

I am not sure what the question is. Can you specify which bitmap and which rules?

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Sep 26, 2023

@lemire, I'm not sure if it would be considered a bug or not.

The issue is that a deserialized bitmap may not be "internally valid". In the specific case I found (C example code below), it appears the deserialized bitmap has a bitset container which says it has a different cardinality than the number of bits actually set in the bits.

I'm asking: How bad is that. What can we expect from operating on a bitmap which isn't "internally valid"? I'm worried about e.g. we could probably get a negative cardinality since cardinality is stored as an i32, and what that could do, etc. Will it just return possibly nonsense, or could it invoke full undefined behavior?

Example CRoaring Code
#include <roaring/roaring.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "test.h"

const char bad_portable_bin[8223] = {
        59, 48, 2, 0, 1, 0, 0, 255, 239, 2, 0, 1, 0, 8, 0, 255, 127, 2, 0, 0, 0, 255, 143, 0, 160, 255, 95, 0, 0, 5, 0,
        85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
};

int main() {
    roaring_bitmap_t *bitmap = roaring_bitmap_portable_deserialize_safe(bad_portable_bin, sizeof(bad_portable_bin));
    assert(bitmap != NULL);
    const char *reason = NULL;

    int result = EXIT_FAILURE;
    if (!roaring_bitmap_internal_validate(bitmap, &reason)) {
        printf("safely deserialized invalid bitmap: %s\n", reason);
        goto finish;
    }
    result = EXIT_SUCCESS;

finish:
    roaring_bitmap_free(bitmap);
    return result;
}

This will output "safely deserialized invalid bitmap: cardinality is incorrect", as assigned here:
https://github.com/RoaringBitmap/CRoaring/blob/a103d3811702b9389c538881c9974e9a7a7552af/src/containers/bitset.c#L1012

And finally, is just running roaring_bitmap_internal_validate afterwards enough to ensure a deserialized bitmap is safe to perform all operations on? We could add that to our deserialize method, and add an unsafe option for performance if they're sure the data really contains a validly serialized bitmap.

@lemire
Copy link
Member

lemire commented Sep 26, 2023

How bad is that.

Possibly quite bad. As in... it could lead to hard crashes.

I don't have an example in mind, but I would not assume that it is safe.

it appears the deserialized bitmap has a bitset container which says it has a different cardinality than the number of bits actually set in the bits.

That's certainly possible because we serialize both independently.

@Dr-Emann
Copy link
Member Author

@saulius, I'm recommending we add a call to validate the bitmap after deserializing, and add an unsafe method to allow deserializing without that validation for performance. Maybe yanking 2.0.0 once we release a fix?

Also, I'm not super sure how to fix the buildtime_bindgen builds. It seems related to #110, although it's a different error. It does bindgen just fine on my machine (an M1 mac) so it's a bit hard for me to debug.

@lemire it's also a bit unexpected that CRoaring (as a library) writes directly to stderr, running the fuzzing, I get a lot of output e.g. I failed to find one of the right cookies. Found 1073754170.

I think I checked for all the assumptions I think CRoaring might make in roaring_bitmap_internal_validate: that if it succeeds, the bitmap should behave sanely, but I'm of course worried I missed something.

@lemire
Copy link
Member

lemire commented Sep 26, 2023

it's also a bit unexpected that CRoaring (as a library) writes directly to stderr,

It shouldn't. I will write a PR.

Dr-Emann added a commit to Dr-Emann/croaring-rs that referenced this pull request Sep 26, 2023
per [github thread][1], deserialized bitmaps aren't safe to use unless
validated.

For maximum performance, also provide an unsafe deserialize function which does
not do validation, `try_deserialize_unchecked`

[1]: RoaringBitmap#119 (comment)
@Dr-Emann Dr-Emann changed the title Update to CRoaring 2.0.0 Update to CRoaring 2.0.1 Sep 26, 2023
@Dr-Emann Dr-Emann force-pushed the internal_validate branch 2 times, most recently from 1a7e7de to 8c9afae Compare September 27, 2023 10:43
Dr-Emann added a commit to Dr-Emann/croaring-rs that referenced this pull request Sep 29, 2023
per [github thread][1], deserialized bitmaps aren't safe to use unless
validated.

For maximum performance, also provide an unsafe deserialize function which does
not do validation, `try_deserialize_unchecked`

[1]: RoaringBitmap#119 (comment)
@Dr-Emann Dr-Emann changed the title Update to CRoaring 2.0.1 Update to CRoaring 2.0.2 Sep 29, 2023
@Dr-Emann Dr-Emann force-pushed the internal_validate branch 13 times, most recently from aba766f to 9c6414b Compare September 30, 2023 03:03
I'm not sure how to get it working, and we don't need it.

I've verfied that all the current bindings are totally independant of the
environment: croaring doesn't use any #ifdefs to change any types defined in
the APIs. All types in the API are the same for all architectures (that's not
to say the size of the types are the same, for instance, size_t is used, but
that is translated to the single type usize.

Disable bindgen layout tests. These are archetecture specific (and really are
just testing bindgen's own implementation.
Despite the major version bump, it seems nothing actually major has changed
Call internal validation function after each step in fuzzing
per [github thread][1], deserialized bitmaps aren't safe to use unless
validated.

For maximum performance, also provide an unsafe deserialize function which does
not do validation, `try_deserialize_unchecked`

[1]: RoaringBitmap#119 (comment)
@Dr-Emann
Copy link
Member Author

Dr-Emann commented Oct 1, 2023

@saulius, I'm planning to merge this in the next few days, especially because of the memory unsafely possibility in deserialization currently, unless you have any issues.

@saulius
Copy link
Member

saulius commented Oct 4, 2023

Thanks for fixing this @Dr-Emann.

Maybe yanking 2.0.0 once we release a fix?

Do you mean yank 1.0.0 of croaring-rs? I think this can be done.

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Oct 4, 2023

Maybe yanking 2.0.0 once we release a fix?

Do you mean yank 1.0.0 of croaring-rs? I think this can be done.

D'oh! Yes, that's what I meant. Got confused with 1.0 of croaring-rs and the recent bump to 2.0 of CRoaring.

@Dr-Emann Dr-Emann merged commit 3d0fcf6 into RoaringBitmap:master Oct 4, 2023
6 checks passed
Dr-Emann added a commit that referenced this pull request Oct 4, 2023
per [github thread][1], deserialized bitmaps aren't safe to use unless
validated.

For maximum performance, also provide an unsafe deserialize function which does
not do validation, `try_deserialize_unchecked`

[1]: #119 (comment)
@Dr-Emann Dr-Emann deleted the internal_validate branch October 4, 2023 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants