Feat: Add Mysql2 and Trilogy `db.collection.name` attribute #1109

hannahramadan · 2024-08-08T17:40:25Z

The db.collection.name attribute is conditionally required for Database spans. This PR uses regex to detect the collection name and report it as the db.collection.name attribute for Mysql2 and Trilogy instrumentation.

Closes #1100

instrumentation/trilogy/lib/opentelemetry/instrumentation/trilogy/patches/client.rb

kaylareopelle

A few small wording things on the trilogy test instructions you can keep or leave. Other than that, looks great! Thank you!

instrumentation/trilogy/test/opentelemetry/instrumentation/trilogy/instrumentation_test.rb

Co-authored-by: Kayla Reopelle <87386821+kaylareopelle@users.noreply.github.com>

arielvalentin · 2024-08-21T17:05:07Z

instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb

@@ -83,9 +86,17 @@ def _otel_client_attributes

            attributes[SemanticConventions::Trace::DB_NAME] = _otel_database_name
            attributes[SemanticConventions::Trace::PEER_SERVICE] = config[:peer_service]
+            attributes['db.collection.name'] = collection_name(sql)


I would like to avoid adding additional overhead to the instrumentation if possible.

In cases where the SQL statement is omitted or not sanitized, there isn't any regexp scan that occurs.

Is there a way to optimize it so that there isn't additional processing?

I would also like to consider that this db.collection.name matches newer versions of the OTel Schema and the instrumentations are still using pre-1.0 semantics.

If I am not mistaken that would have been db.table.name. I think we should continue using pre-1.0 semantics until we have
OTEL_SEMCONV_STABILITY_OPT_IN implemented

https://opentelemetry.io/docs/specs/semconv/database/database-spans/

Thanks @arielvalentin! It makes sense to keep consistent with pre-1.0 semantics. In this case, the attribute is db.sql.table. I've made this update!

Re additional processing: I see your point about putting every SQL statement through a regexp scan. Because the table/collection name isn't available on the client, I'm not sure how else we'd be able to get this information. We could make table name an opt-in/opt-out attribute, but is that level of control something we want to provide? What do you think?

I mean it's difficult to say. I feel like maybe this should only be done when the we also include the db.statement and I don't know if it would be possible to combine with the obfuscation code.

Maybe I'm making too much of it? I'm not sure.

Perhaps benchmarking will put me a bit at ease.

I don't know if only including the table name if we're recording the db.statement deviates from the convention (because as of now, the convention has it that recording the collection name isn't based on any condition, just availability).

The change to Regexp.last_match(1) and an updated regex string improved speed. In the following benchmark example, I used the MySQL obfuscation SQL code as a baseline, and found running the extra db.collection.name sql was 1.19x slower.

require 'benchmark/ipsa' TABLE_NAME = /\b(?:(?:FROM|INTO|UPDATE)|(?:(?:CREATE|DROP|ALTER)\s+TABLE(?:\s+IF\s+(?:NOT\s+)?EXISTS)?))\s+["']?([\w.]+)["']?/i MYSQL_OBFUSCATION = /(?-mix:'(?:[^']|'')*?(?:\\'.*|'(?!')))|(?-mix:"(?:[^"]|"")*?(?:\\".*|"(?!")))|(?-mix:(\$(?!\d)[^$]*?\$).*?(?:\1|$))|(?-mix:\{?(?:[0-9a-fA-F]\-*){32}\}?)|(?-mix:-?\b(?:[0-9]+\.)?[0-9]+([eE][+-]?[0-9]+)?\b)|(?i-mx:\b(?:true|false|null)\b)|(?-mix:0x[0-9a-fA-F]+)|(?i-mx:(?:#|--).*?(?=\r|\n|$))|(?m-ix:\/\*.*?\*\/)|(?-mix:q'\[.*?(?:\]'|$)|q'\{.*?(?:\}'|$)|q'\<.*?(?:\>'|$)|q'$.*?(?:$'|$))/ SQL = 'SELECT * FROM test_table' def collection_name shared_operation Regexp.last_match(1) if SQL =~ TABLE_NAME end def no_collection_name shared_operation SQL end def shared_operation SQL.gsub(MYSQL_OBFUSCATION, '?') end Benchmark.ipsa do |x| x.report('collection_name') { collection_name } x.report('no_collection_name') { no_collection_name } x.compare! end

@arielvalentin We chatted about this in the SIG today and decided the reporting of table/collection names can be put behind a feature flag. What do you think? If that works, the remaining consideration is if the default on or off - do you have any preference on this?

@arielvalentin - I made a suggestion for the config name db_collection_name and default value include, with omit as the other option: 8328668

Hi @arielvalentin! Wanted to check in and see if you had thoughts on the above.

Thanks for your patience. I'm going to add this to my list to review by EoD tomorrow

…nnahramadan/opentelemetry-ruby-contrib into mysql_libs_db_collection_name

instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb

arielvalentin · 2024-10-09T03:44:50Z

@hannahramadan Thank you for your patience waiting for my response.

We discussed some of the details during the SIG on 2024-10-08 which I will share here:

Default to `omit`

Adding additional regular expressions to extract data adds more overhead than I would like in high volume systems that are performance sensitive. I would like to make it so that users opt-in to this attribute.

Using Mixed Schema Versions

I think I mentioned this already in a separate comment, but I believe that our instrumentations are still emitting pre-1.0 schema attributes. I would have expected the attribute name to be db.sql.table and then once we had the DB Semconv Stability functionality it would emit db.collection.name and/or db.sql.table depending on the user selected option.

I am of the opinion it may be a bit more confusing in this case to have the db.statement attribute along with the db.collection.name, when users I think would expect to see db.query.text instead; so, I think it is best to align on using deprecated attributes until we introduce the use of semconv stability.

Is that not what we want?

Config options are tied to a Schema Version

I think we set a bad precedent with setting individual attribute options because the names are now tied to a specific schema version. I am not certain what to do about that.

Do we continue to have options when we move things along?

Remove Structural Duplication

Identical code is now going to appear in multiple instrumentations, which means that if we need to make changes it will require applying them to multiple locations.

I think that it would be best to extract this functionality into a common library that could be shared across DB instrumentations.

Though the gem is currently named sql-obfuscation I think it is the best place to include this functionality, where instead of it solely being responsible for sanitizing SQL, the gem could provide a helper that extracts or enriches a DB client span. It could use the instrumentation config to make decisions about what attributes to include and what logic to apply.

Something like this:

  SqlProcessor.append_db_attributes(span, sql, config)

Given that this is more of a client side processor than it is a query sanitizer, then I think we should rename the gem to reflect that it is processing SQL and generating attributes for it.

An additional benefit to removing structural duplication is that it would reduce the number of places where we would have to add any DB attribute semconv stability compatibility.

Are Span Processors out of the question?

This is something we did not discuss.

With the introduction of OnEnding, the specification leaves room for us to do some Span Processing outside of the instrumentation code.

This would mean that instrumentations could all emit the SQL and then the SqlProcessor could be run in OnEnding and potentially reduce some of the complexity in the DB instrumentations.

The processor could extract attributes from the SQL and sanitize or omit the statements. This would change configurations so that these options could not be configured on the instrumentations anymore but rather in the span processor.

I think that trade off may be acceptable since it is unlikely we will want to have different configurations for different SQL datastores.

All that being said

We agreed that I would unblock this on the condition that this attribute be omitted by default and move forward with structural duplication and refactor the code in a future PR.

hannahramadan added 2 commits August 8, 2024 10:44

Add Mysql2 and Trilogy collection_name attribute

43d24ee

Remove empty line

8bd5818

hannahramadan force-pushed the mysql_libs_db_collection_name branch from d9e2fbd to 8bd5818 Compare August 8, 2024 17:44

hannahramadan changed the title ~~Add Mysql2 and Trilogy db.collection.name attribute~~ Feat: Add Mysql2 and Trilogy db.collection.name attribute Aug 8, 2024

Appease rubocop

146bd71

hannahramadan marked this pull request as ready for review August 12, 2024 21:12

hannahramadan requested review from fbogsany, mwear, robertlaurin, dazuma, ericmustin, arielvalentin, ahayworth, plantfansam, robbkidd, simi, kaylareopelle and xuan-cao-swi as code owners August 12, 2024 21:12

hannahramadan requested review from a team August 12, 2024 21:12

hannahramadan commented Aug 15, 2024

View reviewed changes

instrumentation/trilogy/lib/opentelemetry/instrumentation/trilogy/patches/client.rb Outdated Show resolved Hide resolved

xuan-cao-swi approved these changes Aug 15, 2024

View reviewed changes

Refactor compact!

f9c244e

kaylareopelle approved these changes Aug 20, 2024

View reviewed changes

instrumentation/trilogy/test/opentelemetry/instrumentation/trilogy/instrumentation_test.rb Outdated Show resolved Hide resolved

instrumentation/trilogy/test/opentelemetry/instrumentation/trilogy/instrumentation_test.rb Outdated Show resolved Hide resolved

hannahramadan and others added 2 commits August 20, 2024 11:58

Apply suggestions from code review

6047052

Co-authored-by: Kayla Reopelle <87386821+kaylareopelle@users.noreply.github.com>

Merge branch 'main' into mysql_libs_db_collection_name

cc43b33

arielvalentin requested changes Aug 21, 2024

View reviewed changes

hannahramadan added 3 commits August 21, 2024 12:42

Update to older semantic convention

b61e51e

Merge branch 'mysql_libs_db_collection_name' of https://github.com/ha…

bc7df18

…nnahramadan/opentelemetry-ruby-contrib into mysql_libs_db_collection_name

Update regex

a23b954

Use correct variable

f099af3

arielvalentin reviewed Aug 26, 2024

View reviewed changes

instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb Outdated Show resolved Hide resolved

hannahramadan added 3 commits August 26, 2024 11:20

Go directly to matching data vs MatchData object

8134ee3

Feature flag

8328668

No nils

8f42fa2

hannahramadan mentioned this pull request Sep 5, 2024

Add db.collection.name to mysql-based instrumentation libraries #1100

Open

kaylareopelle assigned hannahramadan Oct 14, 2024

Omit db_sql_table by default

e73b23a

hannahramadan requested review from a team as code owners October 14, 2024 19:16

Capture table names in double quotes

46616da

This was referenced Oct 14, 2024

Explore renaming/enhancing opentelemetry-helpers-sql-obfuscation gem #1194

Open

[sql] Parse db.operation.name and db.collection.name from db.query.text open-telemetry/opentelemetry-dotnet-contrib#2222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add Mysql2 and Trilogy `db.collection.name` attribute #1109

Feat: Add Mysql2 and Trilogy `db.collection.name` attribute #1109

hannahramadan commented Aug 8, 2024 •

edited

Loading

kaylareopelle left a comment

arielvalentin Aug 21, 2024

hannahramadan Aug 21, 2024

arielvalentin Aug 21, 2024

hannahramadan Aug 26, 2024

hannahramadan Aug 27, 2024

hannahramadan Aug 27, 2024 •

edited

Loading

hannahramadan Sep 9, 2024

arielvalentin Sep 9, 2024

arielvalentin commented Oct 9, 2024

Feat: Add Mysql2 and Trilogy db.collection.name attribute #1109

Are you sure you want to change the base?

Feat: Add Mysql2 and Trilogy db.collection.name attribute #1109

Conversation

hannahramadan commented Aug 8, 2024 • edited Loading

kaylareopelle left a comment

Choose a reason for hiding this comment

arielvalentin Aug 21, 2024

Choose a reason for hiding this comment

hannahramadan Aug 21, 2024

Choose a reason for hiding this comment

arielvalentin Aug 21, 2024

Choose a reason for hiding this comment

hannahramadan Aug 26, 2024

Choose a reason for hiding this comment

hannahramadan Aug 27, 2024

Choose a reason for hiding this comment

hannahramadan Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

hannahramadan Sep 9, 2024

Choose a reason for hiding this comment

arielvalentin Sep 9, 2024

Choose a reason for hiding this comment

arielvalentin commented Oct 9, 2024

Default to omit

Using Mixed Schema Versions

Config options are tied to a Schema Version

Remove Structural Duplication

Are Span Processors out of the question?

All that being said

Feat: Add Mysql2 and Trilogy `db.collection.name` attribute #1109

Feat: Add Mysql2 and Trilogy `db.collection.name` attribute #1109

hannahramadan commented Aug 8, 2024 •

edited

Loading

hannahramadan Aug 27, 2024 •

edited

Loading

Default to `omit`