Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Mysql2 and Trilogy db.collection.name attribute #1109

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
3 changes: 3 additions & 0 deletions instrumentation/mysql2/example/mysql2.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,6 @@
client.query("SELECT * from information_schema.INNODB_TABLES; /**Dé**/").each do |row|
puts row
end

client.query('CREATE TABLE test_table (id SERIAL PRIMARY KEY, name VARCHAR(50), age INT)')
client.query('DROP TABLE test_table')
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ module Mysql2
module Patches
# Module to prepend to Mysql2::Client for instrumentation
module Client
# Capture the first word (including letters, digits, underscores, & '.', ) that follows common table commands
TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)\s+([\w\.]+)/i

def query(sql, options = {})
tracer.in_span(
_otel_span_name(sql),
Expand Down Expand Up @@ -47,7 +50,7 @@ def _otel_span_name(sql)
end

def _otel_span_attributes(sql)
attributes = _otel_client_attributes
attributes = _otel_client_attributes(sql)
case config[:db_statement]
when :include
attributes[SemanticConventions::Trace::DB_STATEMENT] = sql
Expand All @@ -68,7 +71,7 @@ def _otel_database_name
(query_options[:database] || query_options[:dbname] || query_options[:db])&.to_s
end

def _otel_client_attributes
def _otel_client_attributes(sql)
# The client specific attributes can be found via the query_options instance variable
# exposed on the mysql2 Client
# https://github.com/brianmario/mysql2/blob/ca08712c6c8ea672df658bb25b931fea22555f27/lib/mysql2/client.rb#L25-L26
Expand All @@ -83,9 +86,17 @@ def _otel_client_attributes

attributes[SemanticConventions::Trace::DB_NAME] = _otel_database_name
attributes[SemanticConventions::Trace::PEER_SERVICE] = config[:peer_service]
attributes['db.collection.name'] = collection_name(sql)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid adding additional overhead to the instrumentation if possible.

In cases where the SQL statement is omitted or not sanitized, there isn't any regexp scan that occurs.

Is there a way to optimize it so that there isn't additional processing?

I would also like to consider that this db.collection.name matches newer versions of the OTel Schema and the instrumentations are still using pre-1.0 semantics.

If I am not mistaken that would have been db.table.name. I think we should continue using pre-1.0 semantics until we have
OTEL_SEMCONV_STABILITY_OPT_IN implemented

https://opentelemetry.io/docs/specs/semconv/database/database-spans/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arielvalentin! It makes sense to keep consistent with pre-1.0 semantics. In this case, the attribute is db.sql.table. I've made this update!

Re additional processing: I see your point about putting every SQL statement through a regexp scan. Because the table/collection name isn't available on the client, I'm not sure how else we'd be able to get this information. We could make table name an opt-in/opt-out attribute, but is that level of control something we want to provide? What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's difficult to say. I feel like maybe this should only be done when the we also include the db.statement and I don't know if it would be possible to combine with the obfuscation code.

Maybe I'm making too much of it? I'm not sure.

Perhaps benchmarking will put me a bit at ease.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if only including the table name if we're recording the db.statement deviates from the convention (because as of now, the convention has it that recording the collection name isn't based on any condition, just availability).

The change to Regexp.last_match(1) and an updated regex string improved speed. In the following benchmark example, I used the MySQL obfuscation SQL code as a baseline, and found running the extra db.collection.name sql was 1.19x slower.

require 'benchmark/ipsa'

TABLE_NAME = /\b(?:(?:FROM|INTO|UPDATE)|(?:(?:CREATE|DROP|ALTER)\s+TABLE(?:\s+IF\s+(?:NOT\s+)?EXISTS)?))\s+["']?([\w.]+)["']?/i
MYSQL_OBFUSCATION = /(?-mix:'(?:[^']|'')*?(?:\\'.*|'(?!')))|(?-mix:"(?:[^"]|"")*?(?:\\".*|"(?!")))|(?-mix:(\$(?!\d)[^$]*?\$).*?(?:\1|$))|(?-mix:\{?(?:[0-9a-fA-F]\-*){32}\}?)|(?-mix:-?\b(?:[0-9]+\.)?[0-9]+([eE][+-]?[0-9]+)?\b)|(?i-mx:\b(?:true|false|null)\b)|(?-mix:0x[0-9a-fA-F]+)|(?i-mx:(?:#|--).*?(?=\r|\n|$))|(?m-ix:\/\*.*?\*\/)|(?-mix:q'\[.*?(?:\]'|$)|q'\{.*?(?:\}'|$)|q'\<.*?(?:\>'|$)|q'\(.*?(?:\)'|$))/
SQL = 'SELECT * FROM test_table'

def collection_name
  shared_operation
  Regexp.last_match(1) if SQL =~ TABLE_NAME
end

def no_collection_name
  shared_operation
  SQL
end

def shared_operation
  SQL.gsub(MYSQL_OBFUSCATION, '?')
end

Benchmark.ipsa do |x|
  x.report('collection_name') { collection_name }
  x.report('no_collection_name') { no_collection_name }

  x.compare!
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielvalentin We chatted about this in the SIG today and decided the reporting of table/collection names can be put behind a feature flag. What do you think? If that works, the remaining consideration is if the default on or off - do you have any preference on this?

Copy link
Contributor Author

@hannahramadan hannahramadan Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielvalentin - I made a suggestion for the config name db_collection_name and default value include, with omit as the other option: 8328668

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arielvalentin! Wanted to check in and see if you had thoughts on the above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. I'm going to add this to my list to review by EoD tomorrow


attributes
end

def collection_name(sql)
sql.scan(TABLE_NAME).flatten[0]
rescue StandardError
nil
end

def tracer
Mysql2::Instrumentation.instance.tracer
end
Expand Down
54 changes: 54 additions & 0 deletions instrumentation/mysql2/test/fixtures/sql_table_name.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[
{
"name": "from",
"sql": "SELECT * FROM test_table"
},
{
"name": "select_count_from",
"sql": "SELECT COUNT(*) FROM test_table WHERE condition"
},
{
"name": "from_with_subquery",
"sql": "SELECT * FROM (SELECT * FROM test_table) AS table_alias"
},
{
"name": "insert_into",
"sql": "INSERT INTO test_table (column1, column2) VALUES (value1, value2)"
},
{
"name": "update",
"sql": "UPDATE test_table SET column1 = value1 WHERE condition"
},
{
"name": "delete_from",
"sql": "DELETE FROM test_table WHERE condition"
},
{
"name": "create_table",
"sql": "CREATE TABLE test_table (column1 datatype, column2 datatype)"
},
{
"name": "create_table_if_not_exists",
"sql": "CREATE TABLE IF NOT EXISTS test_table (column1 datatype, column2 datatype)"
},
{
"name": "alter_table",
"sql": "ALTER TABLE test_table ADD column_name datatype"
},
{
"name": "drop_table",
"sql": "DROP TABLE test_table"
},
{
"name": "drop_table_if_exists",
"sql": "DROP TABLE IF EXISTS test_table"
},
{
"name": "insert_into",
"sql": "INSERT INTO test_table values('', 'a''b c',0, 1 , 'd''e f''s h')"
},
{
"name": "from_with_join",
"sql": "SELECT columns FROM test_table JOIN table2 ON test_table.column = table2.column"
}
]
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
# 1. Build the opentelemetry/opentelemetry-ruby-contrib image
# - docker-compose build
# 2. Bundle install
# - docker-compose run ex-instrumentation-mysql2-test bundle install
# - docker-compose run ex-instrumentation-mysql2-test bundle exec appraisal install
# 3. Run test suite
# - docker-compose run ex-instrumentation-mysql2-test bundle exec rake test
# - docker-compose run ex-instrumentation-mysql2-test bundle exec appraisal rake test
describe OpenTelemetry::Instrumentation::Mysql2::Instrumentation do
let(:instrumentation) { OpenTelemetry::Instrumentation::Mysql2::Instrumentation.instance }
let(:exporter) { EXPORTER }
Expand Down Expand Up @@ -473,6 +473,24 @@
end
end
end

describe '#connection_name' do
def self.load_fixture
data = File.read("#{Dir.pwd}/test/fixtures/sql_table_name.json")
JSON.parse(data)
end

load_fixture.each do |test_case|
name = test_case['name']
query = test_case['sql']

it "returns the table name for #{name}" do
table_name = client.send(:collection_name, query)

expect(table_name).must_equal('test_table')
end
end
end
end
end unless ENV['OMIT_SERVICES']
end
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ module Trilogy
module Patches
# Module to prepend to Trilogy for instrumentation
module Client
# Capture the first word (including letters, digits, underscores, & '.', ) that follows common table commands
TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)\s+([\w\.]+)/i

def initialize(options = {})
@connection_options = options # This is normally done by Trilogy#initialize

Expand Down Expand Up @@ -76,6 +79,9 @@ def client_attributes(sql = nil)
attributes['db.instance.id'] = @connected_host unless @connected_host.nil?

if sql
collection_name = collection_name(sql)
attributes['db.collection.name'] = collection_name if collection_name

case config[:db_statement]
when :obfuscate
attributes[::OpenTelemetry::SemanticConventions::Trace::DB_STATEMENT] =
Expand All @@ -88,6 +94,12 @@ def client_attributes(sql = nil)
attributes
end

def collection_name(sql)
sql.scan(TABLE_NAME).flatten[0]
rescue StandardError
nil
end

def database_name
connection_options[:database]
end
Expand Down
54 changes: 54 additions & 0 deletions instrumentation/trilogy/test/fixtures/sql_table_name.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[
{
"name": "from",
"sql": "SELECT * FROM test_table"
},
{
"name": "select_count_from",
"sql": "SELECT COUNT(*) FROM test_table WHERE condition"
},
{
"name": "from_with_subquery",
"sql": "SELECT * FROM (SELECT * FROM test_table) AS table_alias"
},
{
"name": "insert_into",
"sql": "INSERT INTO test_table (column1, column2) VALUES (value1, value2)"
},
{
"name": "update",
"sql": "UPDATE test_table SET column1 = value1 WHERE condition"
},
{
"name": "delete_from",
"sql": "DELETE FROM test_table WHERE condition"
},
{
"name": "create_table",
"sql": "CREATE TABLE test_table (column1 datatype, column2 datatype)"
},
{
"name": "create_table_if_not_exists",
"sql": "CREATE TABLE IF NOT EXISTS test_table (column1 datatype, column2 datatype)"
},
{
"name": "alter_table",
"sql": "ALTER TABLE test_table ADD column_name datatype"
},
{
"name": "drop_table",
"sql": "DROP TABLE test_table"
},
{
"name": "drop_table_if_exists",
"sql": "DROP TABLE IF EXISTS test_table"
},
{
"name": "insert_into",
"sql": "INSERT INTO test_table values('', 'a''b c',0, 1 , 'd''e f''s h')"
},
{
"name": "from_with_join",
"sql": "SELECT columns FROM test_table JOIN table2 ON test_table.column = table2.column"
}
]
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@
require_relative '../../../../lib/opentelemetry/instrumentation/trilogy'
require_relative '../../../../lib/opentelemetry/instrumentation/trilogy/patches/client'

# This test suite requires a running mysql service within a dedicated test container. We can use the same
# docker-compose service as the mysql2 instrumentation tests. The test container should run the mysql client.
# To run tests locally:
# 1. Build the opentelemetry/opentelemetry-ruby-contrib image
# - docker-compose build
# 2. Open a bash shell in the test container and cd to the trilogy directory
# - docker-compose run ex-instrumentation-mysql2-test bash -c 'cd ../trilogy && bash'
# 3. Bundle install and run tests with the Appraisals gem
# - bundle exec appraisal install && bundle exec appraisal rake test

describe OpenTelemetry::Instrumentation::Trilogy do
let(:instrumentation) { OpenTelemetry::Instrumentation::Trilogy::Instrumentation.instance }
let(:exporter) { EXPORTER }
Expand Down Expand Up @@ -626,5 +636,23 @@
end
end
end

describe '#connection_name' do
def self.load_fixture
data = File.read("#{Dir.pwd}/test/fixtures/sql_table_name.json")
JSON.parse(data)
end

load_fixture.each do |test_case|
name = test_case['name']
query = test_case['sql']

it "returns the table name for #{name}" do
table_name = client.send(:collection_name, query)

expect(table_name).must_equal('test_table')
end
end
end
end
end
Loading