Retrieves DBSQL query history and merges it into a Delta Lake table.
A System Table containing similar data is on the Databricks roadmap, so solutions like this will be mostly unecessary in the near future.
- Ingests DBSQL query history based on supplied parameters
- Supports incremental ingestion
- Works in triggered or continuous mode:
triggered
- ingests data and exitscontinuous
- ingests data every N seconds until cancelled
- Deletion Vectors (DBR 14.1 and above) and Liquid Clustering (DBR 13.3 and above). If you need to support an earlier DBR, update the
create_target_table
function accordingly. - The Python whl includes the Databricks SDK 0.25.1. If you use the module directly, be sure to test any other versions prior to use.
- Download the python whl file from the latest release. It is not published to PyPi.
- Upload the whl file as a workspace file or to a Unity Catalog volume
- Option 1: Install as cluster-scoped library
- Go to the Libraries menu of your cluster
- Select Install new and navigate to where you saved the whl
- Option 2: Install as notebook-scoped library
- Run the following in a notebook cell:
%pip install /path/to/my_package.whl
- Run the following in a notebook cell:
See the libraries documentation for full details and additional options.
The code below shows how to run the DBSQL Query Logger. Refer to docstrings for more details.
The examples folder includes a notebook that uses widgets for easy parameterization.
import logging
from dbsql_query_logger import QueryLogger
logging.basicConfig(
format="%(asctime)s %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%z"
)
logger = logging.getLogger('dbsql_query_logger')
logger.setLevel(logging.INFO)
query_logger = QueryLogger(
catalog = 'chris_koester',
schema = 'observe',
table = 'query_history',
pipeline_mode = 'triggered',
backfill_period = '24 hours',
reset = 'no'
)
query_logger.run()
Authentication is handled by the Databricks SDK, which implements the Databricks client unified authentication standard. Examples include:
- Notebook - uses native notebook authentication. No configuration necessary.
- Environment variables - environment variable can be set as part of the cluster configuration for scheduled jobs.