Databricks Workflow Jobs are powerful tools for orchestrating data pipelines, analytics tasks, and other workloads. Anyone working with multiple Databricks environments (Dev, UAT, Prod) quickly realizes the need for seamless workflow promotion between environments, as well as robust version control for workflow definitions.
Naturally, Git repositories — whether GitHub, Azure DecOps, or another flavor — are the go-to solution everybody wants to use.
Databricks and T1A recommend the “Jobs-as-Code” approach with Databricks Asset Bundles, which suggests packaging workflow definitions (YAML files) alongside underlying Notebooks Code. This method, deployable via CLI, integrates smoothly into CI/CD pipelines, making it a mature, native, and automation-friendly solution. We actively use Databricks Asset Bundles in some of our larger projects with great success.
However, in our experience, some smaller or less technically mature client teams have found CLI-based and YAML-first Databricks Asset Bundles intimidating or overwhelming, often resorting to manual copy/paste of Workflows JSON definitions between environments.
Recently, Databricks introduced Apps functionality that allows deploying 3rd party or partner-provided web apps inside the Databricks Workspace. These apps can offer:
We set out to experiment with Databricks Apps to create a user-friendly, interactive UI that streamlines Workflow Jobs synchronization with Git Repositories. The goal? A simplified solution for teams not yet ready to adopt Databricks Asset Bundles but still needing structured version control and cross-environment promotions for job definitions.
We designed a Databricks App (planning to release as open source) that provides an intuitive Web UI for:
A Data Engineering development team works in the DEV environment, where they create and modify notebooks, write SQL queries, and configure orchestration Workflow Jobs - including adding new jobs, updating existing ones, and removing obsolete ones.
A Tech Lead wants to commit all Workflow Job changes to a Git Folder to ensure these updates can be replicated and promoted to UAT and then PROD.
Deployment instructions are provided in the README.md of the GitHub repo.
The app needs to be deployed separately in each Databricks Workspace where you plan to synchronize Workflow Jobs with a Git Repo via exporting or importing.
CAUTION: Databricks Apps run on always-on compute, which can lead to high costs. Recommended approach:
If or when Databricks implements an auto-pause / auto-resume feature for rarely used apps (such as this one) it would would help optimize costs. Until then, manual intervention is required.
This App can run outside the Databricks Environment as a Docker Container — whether in local Docker or any managed containerized environment such as Azure Container Apps. The deployment environment must handle user authentication and TLS termination since these are not built into the app itself (Databricks environment natively provides this).
When deploying as a standalone container (outside of Databricks Apps), certain additional environment variables such as Databricks Host and Token must be set so the app could communicate with Databricks REST APIs. See README.md for details.
Running the app as an Azure Container App can be significantly cheaper than an always-on Databricks App, based on current pricing. This is relevant unless or until Databricks introduces an auto-stop/auto-resume feature for apps.
You will still need to deploy as many copies of the app as many Databricks Workspace environments you have, since each deployed instance will be pointed towards a single Databricks Workspace where it facilitates either Export or Import process (rarely both).
When promoting workflows between environments, some job configuration aspects differ and require overrides, including:
At this time (v0.2), the App supports configurable mapping overrides on the receiving side (for Importing) only for Compute Resources and Run As principals. The Job Definition files in the Git Repo will reflect the original workflow configuration in DEV environment (as exported).
Administrators of the target (importing) environment manage overrides via a Resource Name Mapping File, maintained as a Workspace File in each target environment.
Example resources mapping file:
1{"compute_name_mappings":
2 {"cluster_name_mappings":
3 {"bi-users-dev-cluster":
4 "bi-users-prod",
5 "rajesh-dev-cluster":
6 "etl-prod",
7 "unknown":
8 "etl-prod"
9 }
10 "warehouse_name_mappings":
11 {"starter-dev-warehouse":
12 "etl-prod-warehouse-small",
13 "unknown":
14 "etl-prod-warehouse-small"
15 }
16 },
17 "run_as_mappings":
18 {"developer.lastname@company.com":
19 {"user_name":
20 "super.admin@company.com"
21 },
22 "1234d931-d019-48a3-b606-431cc316ecdd":
23 {"user_name":
24 "super.admin@company.com"
25 },
26 "another.developer@company.com":
27 {"service_principal_name":
28 "692bc6d0-ffa3-11ed-be56-0242ac120002"
29 },
30 "default":
31 {"service_principal_name":
32 "692bc6d0-ffa3-11ed-be56-0242ac120002"
33 }
34 }
35}
If this approach resonates with you, we invite you to deploy the app in your own Databricks Workspace. It’s open source, and you can find the GitHub repository linked below:
Have any issues, bugs, or feature requests? Feel free to open an issue in the GitHub repository, and we’ll do our best to address it promptly!
Databricks Apps offer a powerful mechanism for building user-centric tools with rich interactive modern UIs handling complex background tasks within the Databricks platform
Databricks Asset Bundles (DABs) continues being an officially recommended native CI-CD friendly approach for git version control and promotions of Workflows, Notebooks Code and other bundled assets - when Data Engineering teams are technically mature and capable to use it.
➡️ This app is functional (as a first version) and can be used for workflows git version control and promotions by non-sophisticated clients that are not quite ready for DABs.
💡 There is still room for further improvement of the app, which we may address in subsequent versions, subject to clients interest
💡 We identified several suggestions to Databricks Apps themselves, e.g. what capabilities we wish the vendor to add or improve at the platform level: