Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only allow some specific Master elements to be set as infinite #176

Merged
merged 12 commits into from
Dec 23, 2023

Conversation

robUx4
Copy link
Contributor

@robUx4 robUx4 commented Dec 19, 2023

In Matroska only the Segment and Cluster are allowed to do so.

Doing so on elements that are not allowed to will just not do anything and assert in debug builds.

Fixes #167

@robUx4 robUx4 added api-break breaks the API (e.g. programs using it will have to adjust their source code) abi-break breaks the ABI (e.g. programs linked against the library have to be recompiled) labels Dec 19, 2023
@robUx4
Copy link
Contributor Author

robUx4 commented Dec 19, 2023

Good thing I'm writing some test, the callable API is not public. And the public one does nothing :(

@robUx4 robUx4 force-pushed the infinite branch 2 times, most recently from 0d946dd to f4df00e Compare December 19, 2023 16:43
Copy link
Contributor

@mbunkus mbunkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the variable & function names I'm fine with this type of change. Still have to give it a try to see if anything breaks in MKVToolNix, though.

ebml/EbmlElement.h Outdated Show resolved Hide resolved
@robUx4 robUx4 force-pushed the infinite branch 2 times, most recently from 2cc2699 to 348c03c Compare December 20, 2023 07:17
So we have access to the semantic constraints in the constructor.
Only master elements can and only very few.
Because it's a binary element, not a Master element.
Setting whatever on an EbmlMaster returns true.
Setting true on other classes returns false.
Setting false on other classes returns true.
Fixes an issue in EbmlElement::FindNextID() where an EbmlMaster was set to
infinite/unknown even though it's not allowed for that element.

EbmlElement::FindNextElement() would also allow an infinite/unknown size
for any master element even though it's not allowed.

That should improve error recovery by skipping more invalid sizes given the reading context.
We can simplify use the semantic of each element directly.

And the assert will work all any element, not just the master ones.
@robUx4
Copy link
Contributor Author

robUx4 commented Dec 20, 2023

I added some more commits to return when the setting worked. So that the logic to find elements can only an unknown size on elements that can have one (Segment and Cluster). And in the end we can just use a method without a virtual version calling another method of EbmlElement.

Copy link
Contributor

@mbunkus mbunkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this branch (together with the corresponding one in libEBML) a try. Unfortunately it mkvmerge now fails to read certain files that were created by fuzzers. The program aborts in EbmlMaster::Read()EbmlElement::SetSizeInfinite with the following assertion:

lib/libebml/ebml/EbmlElement.h:334: bool libebml::EbmlElement::SetSizeInfinite(bool): Assertion `!bIsInfinite || ClassInfo.CanHaveInfiniteSize()' failed.

I really don't like the library aborting when it encounters invalid data or data that goes against the specs.

Here are two sample files. Build MKVToolNix with the two infinite branches, then try to remux the files (a simple mkvmerge -o out.mkv 1089-2.mkv suffices to trigger the assertion).

I'm OK with the writing side preventing me from creating spec-violating files (within reason), but the reading side should never, ever just abort the whole program when it encounters invalid data. Leave it up to the application to handle, or skip such elements, or or or…

@robUx4
Copy link
Contributor Author

robUx4 commented Dec 22, 2023

I'm OK with the writing side preventing me from creating spec-violating files (within reason), but the reading side should never, ever just abort the whole program when it encounters invalid data. Leave it up to the application to handle, or skip such elements, or or or…

There's no way to tell we are reading or writing where the assert is. Now now the setter whether it was successful or not, we can check that on the caller side. I'll see what I can do.

Also it's only aborting in debug builds. It won't abort for regular users, the error is handled properly afterwards.

As for these files, your tests will probably not produce the same result anymore since they are using an infinite size where it's not allowed. So these elements that have been read regardless before will be skipped until a valid element (at that level) is found.

It is called in FindNextId/FindNextElement which always checks for failure.
So don't need the assert there.

It's also called from the EbmlMaster constructor.
The copy constructor is fine since the the original can only have a valid state.
The other constructor has a default value of finite size and the macros never set this to infinite.
In libmatroska the KaxSegment constructor calls SetSizeInfinite() rather than using the default value.
But we don't need an assert there, since we know it's allowed.
@robUx4
Copy link
Contributor Author

robUx4 commented Dec 22, 2023

I dropped the assert, it seems to be unnecessary (don't bring any code safety/checks) in all cases.

@robUx4
Copy link
Contributor Author

robUx4 commented Dec 22, 2023

We could use C++17's [[nodiscard]] to make sure that the caller uses the API responsibly. Not checking the error (when setting as infinite) is usually wrong.

@mbunkus
Copy link
Contributor

mbunkus commented Dec 22, 2023

Also it's only aborting in debug builds. It won't abort for regular users, the error is handled properly afterwards.

True, but then again it would destroy my ability to test those files regularly as I only develop using debug mode.

As for these files, your tests will probably not produce the same result anymore since they are using an infinite size where it's not allowed. So these elements that have been read regardless before will be skipped until a valid element (at that level) is found.

That's perfectly fine. I don't mind if the way these elements are handled changes as long as the new way is sensible. Skipping such elements would be fine by me; I'd just update the expected test checksum.

I dropped the assert, it seems to be unnecessary (don't bring any code safety/checks) in all cases.

Thanks!

We could use C++17's [[nodiscard]] to make sure that the caller uses the API responsibly. Not checking the error (when setting as infinite) is usually wrong.

That'd be completely fine by me. We'd have to bump the requirements to C++17, though; it's at C++14 at the moment. Wrt. MKVToolNix this would not be an issue as MKVToolNix requires C++17 already, but I don't know about the other users of the libraries. What does VLC require?

@robUx4
Copy link
Contributor Author

robUx4 commented Dec 22, 2023

That'd be completely fine by me. We'd have to bump the requirements to C++17, though; it's at C++14 at the moment. Wrt. MKVToolNix this would not be an issue as MKVToolNix requires C++17 already, but I don't know about the other users of the libraries. What does VLC require?

Currently VLC uses C++14. But it seems when setting C++14 it works (at least in clang). We may also detect if the compiler supports it in CMake and not use it when it doesn't.

Anyway that's for another PR. Can I merge this one ?

@mbunkus
Copy link
Contributor

mbunkus commented Dec 22, 2023

My test cases work or have different checksums — but they don't abort anymore. Thanks. Please merge.

@neheb
Copy link
Contributor

neheb commented Dec 23, 2023

We could use C++17's [[nodiscard]] to make sure that the caller uses the API responsibly. Not checking the error (when setting as infinite) is usually wrong.

unknown attributes are permitted. which means it's not necessary to bump the C++ version just for this.

That being said, modern gcc, clang, and MSVC default to C++14. MSVC doesn't even have a switch for C++11.

@robUx4 robUx4 merged commit d0a39b4 into Matroska-Org:master Dec 23, 2023
20 checks passed
@robUx4 robUx4 deleted the infinite branch December 23, 2023 06:02
robUx4 added a commit to Matroska-Org/foundation-source that referenced this pull request Jan 13, 2024
It's not allowed by the specs.

Since Matroska-Org/libebml#176 it's not possible to have a Binary element
set as infinite.
robUx4 added a commit to robUx4/libmatroska that referenced this pull request Jan 13, 2024
It's not allowed by the specs.

Since Matroska-Org/libebml#176 it's not possible to have a Binary element
set as infinite.
robUx4 added a commit to Matroska-Org/foundation-source that referenced this pull request Jan 14, 2024
It's not allowed by the specs.

Since Matroska-Org/libebml#176 it's not possible to have a Binary element
set as infinite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abi-break breaks the ABI (e.g. programs linked against the library have to be recompiled) api-break breaks the API (e.g. programs using it will have to adjust their source code)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep the information about elements that can be infinite
3 participants