databricks · arpitjasa-db · Feb 6, 2024 · Feb 6, 2024 · Feb 6, 2024
diff --git a/README.md b/README.md
@@ -5,22 +5,22 @@
 This repo provides a customizable stack for starting new ML projects
 on Databricks that follow production best-practices out of the box.
 
-Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML assets
+Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML resources
 management, with an easy transition to production. You can also use MLOps Stacks as a building block in automation for creating new data science projects with production-grade CI/CD pre-configured.
 
 The default stack in this repo includes three modular components: 
 
 | Component                   | Description                                                                                                                                                           | Why it's useful                                                                                                                                                                         |
 |-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [ML Code](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/)                     | Example ML project structure ([training](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/training) and [batch inference](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/deployment/batch_inference), etc), with unit tested Python modules and notebooks                                                                                           | Quickly iterate on ML problems, without worrying about refactoring your code into tested modules for productionization later on.                                                        |
-| [ML Assets as Code](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/assets) | ML pipeline assets ([training](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/assets/model-workflow-asset.yml.tmpl) and [batch inference](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/assets/batch-inference-workflow-asset.yml.tmpl) jobs, etc) defined through [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-cli.html)    | Govern, audit, and deploy changes to your ML assets (e.g. "use a larger instance type for automated model retraining") through pull requests, rather than adhoc changes made via UI. |
-| CI/CD([GitHub Actions](template/{{.input_root_dir}}/.github/) or [Azure DevOps](template/{{.input_root_dir}}/.azure/))                       | [GitHub Actions](https://docs.github.com/en/actions) or [Azure DevOps](https://azure.microsoft.com/en-us/products/devops) workflows to test and deploy ML code and assets | Ship ML code faster and with confidence: ensure all production changes are performed through automation and that only tested code is deployed to prod                                   |
+| [ML Resources as Code](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/resources) | ML pipeline resources ([training](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/resources/model-workflow-resource.yml.tmpl) and [batch inference](template/{{.input_root_dir}}/{{template%20`project_name_alphanumeric_underscore`%20.}}/resources/batch-inference-workflow-resource.yml.tmpl) jobs, etc) defined through [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-cli.html)    | Govern, audit, and deploy changes to your ML resources (e.g. "use a larger instance type for automated model retraining") through pull requests, rather than adhoc changes made via UI. |
+| CI/CD([GitHub Actions](template/{{.input_root_dir}}/.github/) or [Azure DevOps](template/{{.input_root_dir}}/.azure/))                       | [GitHub Actions](https://docs.github.com/en/actions) or [Azure DevOps](https://azure.microsoft.com/en-us/products/devops) workflows to test and deploy ML code and resources | Ship ML code faster and with confidence: ensure all production changes are performed through automation and that only tested code is deployed to prod                                   |
 
 See the [FAQ](#FAQ) for questions on common use cases.
 
 ## ML pipeline structure and development loops
 
-An ML solution comprises data, code, and models. These assets need to be developed, validated (staging), and deployed (production). In this repository, we use the notion of dev, staging, and prod to represent the execution
+An ML solution comprises data, code, and models. These resources need to be developed, validated (staging), and deployed (production). In this repository, we use the notion of dev, staging, and prod to represent the execution
 environments of each stage. 
 
 An instantiated project from MLOps Stacks contains an ML pipeline with CI/CD workflows to test and deploy automated model training and batch inference jobs across your dev, staging, and prod Databricks workspaces. 
@@ -85,17 +85,17 @@ Others must be correctly specified for CI/CD to work:
    to enable them to view and debug CI test results
  * ``input_databricks_prod_workspace_host``: URL of production Databricks workspace. We encourage granting data scientists working on the current ML project non-admin (read) access to this workspace,
    to enable them to view production job status and see job logs to debug failures.
- * ``input_default_branch``: Name of the default branch, where the prod and staging ML assets are deployed from and the latest ML code is staged.
+ * ``input_default_branch``: Name of the default branch, where the prod and staging ML resources are deployed from and the latest ML code is staged.
  * ``input_release_branch``: Name of the release branch. The production jobs (model training, batch inference) defined in this
     repo pull ML code from this branch.
 
 Or used for project initialization:
  * ``input_project_name``: name of the current project
- * ``input_read_user_group``: User group name to give READ permissions to for project assets (ML jobs, integration test job runs, and machine learning assets). A group with this name must exist in both the staging and prod workspaces. Defaults to "users", which grants read permission to all users in the staging/prod workspaces. You can specify a custom group name e.g. to restrict read permissions to members of the team working on the current ML project.
+ * ``input_read_user_group``: User group name to give READ permissions to for project resources (ML jobs, integration test job runs, and machine learning resources). A group with this name must exist in both the staging and prod workspaces. Defaults to "users", which grants read permission to all users in the staging/prod workspaces. You can specify a custom group name e.g. to restrict read permissions to members of the team working on the current ML project.
   * ``input_include_models_in_unity_catalog``: If selected, models will be registered to [Unity Catalog](https://docs.databricks.com/en/mlflow/models-in-uc.html#models-in-unity-catalog). Models will be registered under a three-level namespace of `<catalog>.<schema_name>.<model_name>`, according the the target environment in which the model registration code is executed. Thus, if model registration code runs in the `prod` environment, the model will be registered to the `prod` catalog under the namespace `<prod>.<schema>.<model_name>`. This assumes that the respective catalogs exist in Unity Catalog (e.g. `dev`, `staging` and `prod` catalogs). Target environment names, and catalogs to be used are defined in the Databricks bundles files, and can be updated as needed.
  * ``input_schema_name``: If using [Models in Unity Catalog](https://docs.databricks.com/en/mlflow/models-in-uc.html#models-in-unity-catalog), specify the name of the schema under which the models should be registered, but we recommend keeping the name the same as the project name. We default to using the same `schema_name` across catalogs, thus this schema must exist in each catalog used. For example, the training pipeline when executed in the staging environment will register the model to `staging.<schema_name>.<model_name>`, whereas the same pipeline executed in the prod environment will register the mode to `prod.<schema_name>.<model_name>`. Also, be sure that the service principals in each respective environment have the right permissions to access this schema, which would be `USE_CATALOG`, `USE_SCHEMA`, `MODIFY`, `CREATE_MODEL`, and `CREATE_TABLE`.
  * ``input_unity_catalog_read_user_group``: If using [Models in Unity Catalog](https://docs.databricks.com/en/mlflow/models-in-uc.html#models-in-unity-catalog), define the name of the user group to grant `EXECUTE` (read & use model) privileges for the registered model. Defaults to "account users".
- * ``input_include_feature_store``: If selected, will provide [Databricks Feature Store](https://docs.databricks.com/machine-learning/feature-store/index.html) stack components including: project structure and sample feature Python modules, feature engineering notebooks, ML asset configs to provision and manage Feature Store jobs, and automated integration tests covering feature engineering and training.
+ * ``input_include_feature_store``: If selected, will provide [Databricks Feature Store](https://docs.databricks.com/machine-learning/feature-store/index.html) stack components including: project structure and sample feature Python modules, feature engineering notebooks, ML resource configs to provision and manage Feature Store jobs, and automated integration tests covering feature engineering and training.
  * ``input_include_mlflow_recipes``: If selected, will provide [MLflow Recipes](https://mlflow.org/docs/latest/recipes.html) stack components, dividing the training pipeline into configurable steps and profiles.
 
 See the generated ``README.md`` for next steps!
@@ -116,20 +116,20 @@ production model serving endpoints.
 
 However, you can create a single workspace stack, by supplying the same workspace URL for
 `input_databricks_staging_workspace_host` and `input_databricks_prod_workspace_host`. If you go this route, we
-recommend using different service principals to manage staging vs prod assets,
-to ensure that CI workloads run in staging cannot interfere with production assets.
+recommend using different service principals to manage staging vs prod resources,
+to ensure that CI workloads run in staging cannot interfere with production resources.
 
 ### I have an existing ML project. Can I productionize it using MLOps Stacks?
 Yes. Currently, you can instantiate a new project and copy relevant components
 into your existing project to productionize it. MLOps Stacks is modularized, so
-you can e.g. copy just the GitHub Actions workflows under `.github` or ML asset configs
- under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/assets`` 
+you can e.g. copy just the GitHub Actions workflows under `.github` or ML resource configs
+ under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources`` 
 and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml`` into your existing project.
 
 ### Can I adopt individual components of MLOps Stacks?
 For this use case, we recommend instantiating via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) 
-and copying the relevant subdirectories. For example, all ML asset configs
-are defined under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/assets``
+and copying the relevant subdirectories. For example, all ML resource configs
+are defined under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``
 and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``, while CI/CD is defined e.g. under `.github`
 if using GitHub Actions, or under `.azure` if using Azure DevOps.
 
@@ -142,7 +142,7 @@ for details on how to do this.
 ### Does the MLOps Stacks cover data (ETL) pipelines?
 
 Since MLOps Stacks is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html),
-it's not limited only to ML workflows and assets - it works for assets across the Databricks Lakehouse. For instance, while the existing ML
+it's not limited only to ML workflows and resources - it works for resources across the Databricks Lakehouse. For instance, while the existing ML
 code samples contain feature engineering, training, model validation, deployment and batch inference workflows,
 you can use it for Delta Live Tables pipelines as well.
 

diff --git a/databricks_template_schema.json b/databricks_template_schema.json
@@ -92,7 +92,7 @@
       "order": 8,
       "type": "string",
       "default": "main",
-      "description": "\nName of the default branch, where the prod and staging ML assets are deployed from and the latest ML code is staged. Default",
+      "description": "\nName of the default branch, where the prod and staging ML resources are deployed from and the latest ML code is staged. Default",
       "skip_prompt_if": {
         "properties": {
           "input_setup_cicd_and_project": {
@@ -118,7 +118,7 @@
       "order": 10,
       "type": "string",
       "default": "users",
-      "description": "\nUser group name to give READ permissions to for project assets (ML jobs, integration test job runs, and machine learning assets). A group with this name must exist in both the staging and prod workspaces. Default",
+      "description": "\nUser group name to give READ permissions to for project resources (ML jobs, integration test job runs, and machine learning resources). A group with this name must exist in both the staging and prod workspaces. Default",
       "skip_prompt_if": {
         "properties": {
           "input_setup_cicd_and_project": {
@@ -146,8 +146,8 @@
       "type": "string",
       "description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist, and we recommend keeping the name the same as the project name as well as giving the service principals the right access. Default",
       "default": "{{ .input_project_name }}",
-      "pattern": "^[^ .\\/]*$",
-      "pattern_match_failure_message": "Valid schema names cannot contain any of the following characters: \" \", \".\", \"\\\", \"/\"",
+      "pattern": "^[^ .\\-\\/]*$",
+      "pattern_match_failure_message": "Valid schema names cannot contain any of the following characters: \" \", \".\", \"-\", \"\\\", \"/\"",
       "skip_prompt_if": {
         "anyOf":[
           {

diff --git a/stack-customization.md b/stack-customization.md
@@ -51,7 +51,7 @@ MLOps Stacks provides example ML code.
 You may want to customize the example code, e.g. further prune it down into a skeleton for data scientists
 to fill out.
 
-If you customize this component, you can still use the CI/CD and ML asset components to build production ML pipelines, as long as you provide ML
+If you customize this component, you can still use the CI/CD and ML resource components to build production ML pipelines, as long as you provide ML
 notebooks with the expected interface. For example, model training under ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/training/notebooks/`` and inference under
 ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/batch_inference/notebooks/``. See code comments in the notebook files for the expected interface & behavior of these notebooks.
 
@@ -63,18 +63,18 @@ MLOps Stacks currently has the following sub-components for CI/CD:
 * Logic to trigger model deployment through REST API calls to your CD system, when model training completes.
   This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/ModelDeployment.py``
 
-### ML asset configs
-Root ML asset config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``. 
-It defines the ML config assets to be included and workspace host for each deployment target.
+### ML resource configs
+Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``. 
+It defines the ML config resources to be included and workspace host for each deployment target.
 
-ML asset configs (databricks CLI bundles code definitions of ML jobs, experiments, models etc) can be found under 
-``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/assets``, along with docs.
+ML resource configs (databricks CLI bundles code definitions of ML jobs, experiments, models etc) can be found under 
+``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``, along with docs.
 
 You can update this component to customize the default ML pipeline structure for new ML projects in your organization,
 e.g. add additional model inference jobs or modify the default instance type used in ML jobs.
 
 When updating this component, you may want to update developer-facing docs in
-``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/assets/README.md``.
+``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources/README.md``.
 
 ### Docs
 After making customizations, make any changes needed to

diff --git a/template/update_layout.tmpl b/template/update_layout.tmpl
@@ -35,7 +35,7 @@
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `feature_engineering`) }}
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/feature_engineering`) }}
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithFeatureStore.py`) }}
-    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `assets/feature-engineering-workflow-asset.yml`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `resources/feature-engineering-workflow-resource.yml`) }}
 # Remove Delta and MLflow Recipes code in cases of Feature Store.
 {{ else if (eq .input_include_feature_store `yes`) }}
     # delta_paths
@@ -72,7 +72,7 @@
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `feature_engineering`) }}
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/feature_engineering`) }}
     {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `training/notebooks/TrainWithFeatureStore.py`) }}
-    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `assets/feature-engineering-workflow-asset.yml`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `resources/feature-engineering-workflow-resource.yml`) }}
 {{ end }}
 
 # Remove utils if using Models in Unity Catalog

diff --git a/template/{{.input_root_dir}}/.azure/devops-pipelines/README.md.tmpl b/template/{{.input_root_dir}}/.azure/devops-pipelines/README.md.tmpl
@@ -1,7 +1,7 @@
 # CI/CD Workflow Definitions
 This directory contains CI/CD workflow definitions using [Azure DevOps Pipelines](https://azure.microsoft.com/en-gb/products/devops/pipelines/),
 under ``devops-pipelines``. These workflows cover testing and deployment of both ML code (for model training, batch inference, etc) and 
-Databricks ML asset definitions. 
+Databricks ML resource definitions. 
 
 To set up CI/CD for a new project,
 please refer to [Setting up CI/CD](<../../README.md#Setting up CI/CD>) and following the [MLOps Setup Guide](../../docs/mlops-setup.md#Steps).