Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json/Validator: add findNamedElements() to detect duplicate "bom-ref" definitions #240

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jimklimov
Copy link
Contributor

@jimklimov jimklimov commented Jul 21, 2023

As discussed on Slack, I found that for JSON SBOMs the cyclonedx-cli validate does not complain about resulting document from cyclonedx-cli merge (which has lots of duplicate "bom-ref" entities). Conversion of that document to XML and validating the result does rightfully fail.

The PR below should hopefully fix this, but please note these are my first steps in C# and I have no idea how to force my cyclonedx-cli utility to actually build against the custom dll/nupkg created from this codebase. I guess I won't know until this is merged and released :(

UPDATE: Now tested, see below.

The code intentionally introduces a method to recursively findNamedElements(JsonElement element, string name) so that the same approach can be easily used if more schema elements would be standardized to have unique values across the whole document, and to start iteration from any point in the JSON structure (not necessarily from document root). AFAIK as of the spec version 1.4 (looking at XML XSD xs:unique constraint) there is just one bom-ref so far.

… definitions

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
@sonatype-lift
Copy link

sonatype-lift bot commented Jul 21, 2023

Sonatype Lift is retiring

Sonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console.
We are extremely grateful and thank you for your support over the years.

📖 Read about the impacts and timeline

@jimklimov
Copy link
Contributor Author

jimklimov commented Jul 21, 2023

Now tested and works (complains) as expected! :)
It was a roundabout way to get the DLLs used by the tool, but it sufficed for local testing on a freshly merged SBOM:

:; cyclonedx.exe validate --input-file test-merge.json --input-format json --input-version v1_4 --fail-on-errors
Validating JSON BOM...
'bom-ref' value of myProduct/standard@git-2023.07.18.05.46.25: expected 1 mention, actual 2
...
'bom-ref' value of pkg:maven/com.google.code.findbugs/jsr305@3.0.2?type=jar: expected 1 mention, actual 3
'bom-ref' value of pkg:maven/io.cucumber/cucumber-java8@7.11.0?type=jar: expected 1 mention, actual 2
'bom-ref' value of pkg:maven/io.cucumber/cucumber-junit-platform-engine@7.11.0?type=jar: expected 1 mention, actual 2
'bom-ref' value of pkg:maven/org.junit.platform/junit-platform-suite@1.9.2?type=jar: expected 1 mention, actual 2
'bom-ref' value of pkg:maven/org.springframework.boot/spring-boot-starter-test@3.0.2?type=jar: expected 1 mention, actual 2
'bom-ref' value of pkg:maven/jakarta.xml.bind/jakarta.xml.bind-api@4.0.0?type=jar: expected 1 mention, actual 2
...
BOM is not valid.

:; echo $?
1

Works reasonably fast (5s for an SBOM with ~1K components), despite the probably inefficient way to handle concatenation in the map of lists (using a single instance instead of passing it around could be faster -- but worse for reusability, parallel exec, etc, and cumbersome for the static-method nature of this class).

That said, if there is a way in C# to pass dictionaries by reference, so the whole recursion would manipulate one (caller-provided?) instance of the map, follow-up to optimize this is welcome :)

The first hit (for myProduct) is due to mentioning its bom-ref in both metadata/component and in the components[] list when I hand-crafted one of the original SBOM documents for the merge -- kept here to prove that uniqueness is indeed tracked across the document and not just for elements in the components[] array.

…tring representation of the JsonElement, not object (which is in fact unique for each node)

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ubleshooting

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
@andreas-hilti
Copy link
Contributor

andreas-hilti commented Aug 5, 2023

It would be nice to be able to check the bom-ref uniqueness, thus I appreciate your PR.

If we could in addition check whether the bom-refs that are used in the SBOM, for instance in the dependencies section, actually exist, this would be the icing on the cake.

That said, if there is a way in C# to pass dictionaries by reference, so the whole recursion would manipulate one (caller-provided?) instance of the map, follow-up to optimize this is welcome :)

Specify "ref", both on the function definition as well as when calling it, see for instance
https://www.tutlane.com/tutorial/csharp/csharp-pass-by-reference-ref-with-examples

@jimklimov
Copy link
Contributor Author

Interesting point about the further checks, but so far I'm too entangled in making the merge work (add a level of spec-based awareness about squashing together equivalent objects with different contents, like different known aspects of same bom-ref after a merge of many docs) - hoping to finish and PR that soon-ish...

@jimklimov
Copy link
Contributor Author

Finally got around to try ref. While it works functionally, I could not measure any performance difference (2.845 sec for my test file in both cases) so would leave this be :)

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
@jimklimov
Copy link
Contributor Author

And thanks for the suggestion, added a simple check that for each ref in the document, its target bom-ref is present. Chopped away one of components[] from my test JSON and it works:

Validating JSON BOM...
'ref' value of pkg:maven/io.cucumber/tag-expressions@5.0.1?type=jar was used in 1 place(s); expected a 'bom-ref' defined for it, but there was none
BOM is not valid.

@jimklimov
Copy link
Contributor Author

There's also a dependsOn[] (e.g. in the dependency[] for each ref) that seemed like fair game for such checks, but I thought against constraining with it. On one hand, can't quickly figure out a way how; on another - this is roughly what the merge operation fixes where needed: adds more info into the tree, dangling it at reasonable points.

@jimklimov
Copy link
Contributor Author

For now I declare the PR done on my side ;)

Copy link
Member

@coderpatros coderpatros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, just one issue with it to resolve first.

src/CycloneDX.Core/Json/Validator.cs Outdated Show resolved Hide resolved
@andreas-hilti
Copy link
Contributor

AFAIK as of the spec version 1.4 (looking at XML XSD xs:unique constraint) there is just one bom-ref so far.

@jimklimov Just FYI: in version 1.5 much more objects can have a bom-ref.

@jimklimov
Copy link
Contributor Author

jimklimov commented Aug 16, 2023

What I meant was that as of recently, only bom-ref (AFAIK) was a value unique across many locations of one document's scope. Others were not constrained or unique in a list (e.g. a list of hashes of some one entity does not have two MD5 values, but other entities may have even same hashes), etc.

@jkowalleck
Copy link
Member

jkowalleck commented Aug 20, 2023

just a side note, caused by https://cyclonedx.slack.com/archives/CVA0G10FN/p1692519065532109

CDX 1.5 made it pretty clear where a ref MUST have a counter-part in a bom-ref in the same document,
and where a ref could also be a BOM-link, and if so, which one.

for XML, see the declarations and the usages.
see https://github.com/CycloneDX/specification/blob/1.5/schema/bom-1.5.xsd#L38-L55
see https://github.com/CycloneDX/specification/blob/1.5/schema/bom-1.5.xsd#L57-L83

with these, you should be able to implement plausibility checks on the ref properties, if it was needed.

@jimklimov
Copy link
Contributor Author

I'd rather say the ref checks are a "scope creep", at least for my initial PR goals, so someone can do it separately later :)

Also the codebase I started from did not have a concept (nor a copy) of spec 1.5. Finally, is it right to impose new constraints on older-spec documents? (Do they change something allowed-relaxed before and required now)?

@andreas-hilti
Copy link
Contributor

I'd rather say the ref checks are a "scope creep", at least for my initial PR goals, so someone can do it separately later :)

Fine with me. I'd be happy if the piled up PRs (especially for the cli) were merged, and then we could continue from there.

@jimklimov
Copy link
Contributor Author

Thanks @jkowalleck for the clarification links and @andreas-hilti for suggestion to look at back-references to the BomRefs within a document. Combined with review comments in #245 (comment) and some strange merged BOMs I see in internal testing (that have several same ref values in the resulting dependencies[] list), I am having second thoughts about squashing together optional+required scopes - and so a need to identify, rename and thoroughly rewrite "bom-ref" values may become a necessity sooner than I thought.

Currently investigating if that is the culprit...

@jimklimov
Copy link
Contributor Author

For future reference: locations of bomrefs and bomlinks are highlighted in https://github.com/CycloneDX/specification/pull/236/files

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants