How to get started with dbt and Gitpod

Jul 12, 2023

How to get started with dbt and Gitpod

@ejoreo's avatar on GitHub Eric O'Rear

dbt helps data teams by enabling faster data transformation, providing organized and transparent workflows, and ensuring reliable data sets. The incorporation of tools like dbt in the face of larger and more complex datasets is forcing organizations to labor against many of the same development challenges traditionally faced by software engineers.

One of these challenges is managing the analytics engineers’ development environments across different use cases, workload configurations, versions, and so on. By using a cloud development environment like Gitpod for your dbt projects, you can ensure your data and analytics engineers are always working in a secure and reproducible context.

With this guide you will learn how to:

  • Create a Gitpod cloud development environment
  • Configure, establish, and test connections between Gitpod, dbt, and your data warehouse
  • Customize your workspace IDE

Requirements

To follow along, you will need:

This guide will use the dbt + BigQuery and dbt + Snowflake templates as examples, but any cloud setup that is supported by dbt core in a local dev environment can also be run in a Gitpod workspace.

Turn your dbt project into a Gitpod workspace

The fastest way to open up your dbt project in a Gitpod workspace is to prefix the GitHub/GitLab/Bitbucket url in the browser with “gitpod.io/#”.

You will be prompted to confirm the Context URL (your Git repo), IDE of choice, and the workspace class. We recommend selecting VS Code (browser or desktop) for dbt projects because of the capabilities of the vscode-dbt-power-user extension. As far as provisioning, Gitpod processes heavy workloads in the cloud warehouse, so the Standard workspace is sufficient for most use cases.

Automate and standardize your dbt development environments

The first step is to add a .gitpod.yml file to the root of the repository. This file describes workspace configurations, including:

  • The installation of languages and dependencies
  • The configuration of the terminal(s) and opened ports
  • The installation of extensions in the IDE

A .gitpod.yml file can be added manually, or a boilerplate version can be generated by running gp init.

Gitpod uses Docker images as the foundation for instances of development environments, or what we refer to as workspaces. The default workspace image for Gitpod contains support for multiple languages, such as Go, Java, Python, and JavaScript, but you can also use slimmer images or specify your own.

While the specifics will change depending on the data platform, the .gitpod.Dockerfile file is where you will pull your Gitpod workspace image, set the path of your dbt profiles directory, and install your requirements.

Like the .gitpod.yml file, .gitpod.Dockerfile needs to be added to the root of the repository. Here is an example .gitpod.Dockerfile, consistent across both the dbt + BigQuery and dbt +Snowflake templates:

# Use Gitpod's latest Python image.
FROM gitpod/workspace-python:latest

# Set the path of dbt's profiles file.
ENV DBT_PROFILES_DIR=./profiles/

# Copy requirements file from host into Container.
COPY requirements.txt /tmp

# Install the requirements.
RUN cd /tmp && pip install -r requirements.txt

After using the standard Python image, setting up environment variables, setting the dbt profile path, and installing the requirements, dbt is now ready to be set up. For these examples, the only requirement is installing the matchingdbt adapter for your warehouse.

In the following .gitpod.yml examples, the .gitpod.Dockerfile configured above will be called first, installing languages and dependencies.

Each object in the tasks section creates a new terminal in the development environment. In our examples, a terminal named connect executes three commands to complete and test the dbt configuration:

.gitpod.yml
# BigQuery
image:
    file: .gitpod.Dockerfile

ports:
    - port: 8080
      onOpen: open-preview

tasks:
    - name: connect
      command: |
          echo $DBT_SERVICE_ACCOUNT > $GITPOD_REPO_ROOT/profiles/service_account.json
          dbt debug
          dbt deps
      openMode: split-left
    - name: generate docs
      command: |
          dbt docs generate
          dbt docs serve --no-browser --port 8080
      openMode: split-right
.gitpod.yml
# Snowflake
image:
    file: .gitpod.Dockerfile

ports:
    - port: 8080
      onOpen: open-preview

tasks:
    - name: connect
      # The private SSH key is stored in a single line as DBT_SNOWFLAKE_PRIVATE_KEY.
      # Unfortunately, Snowflake will only accept the key if it is multi-line.
      # The sed command transforms the key
      # and then stores it as a file, which can be processed by Snowflake.
      command: |
          echo "${DBT_SNOWFLAKE_PRIVATE_KEY}" | sed -e "s/-----BEGIN PRIVATE KEY-----/&\n/" -e "s/-----END PRIVATE KEY-----/\n&/" -e "s/\S\{64\}/&\n/g" > $GITPOD_REPO_ROOT/profiles/private_key.p8
          dbt debug
          dbt deps
      openMode: split-left
    - name: generate docs
      command: |
          dbt docs generate
          dbt docs serve --no-browser --port 8080
      openMode: split-right

Following the reference to the custom Docker image, your dbt credentials need to be passed into Gitpod so that the workspace can connect to your data platform. The most convenient way of making auth credentials accessible inside of the workspace is using Gitpod’s user-specific environment variables.

The dbt debug command tests the connection with the database. When executing this, dbt searches for the credentials to connect with the database in the profiles.yml file, shown here:

.gitpod.yml
# BigQuery
default:
    target: dev
    outputs:
        dev:
            type: bigquery
            method: service-account
            project: "{{ env_var('DBT_PROJECT') }}"
            dataset: "{{ env_var('DBT_DEV_DATASET') }}"
            threads: 4
            keyfile: "{{ env_var('GITPOD_REPO_ROOT') }}/profiles/service_account.json"
            location: "{{ env_var('DBT_LOCATION') }}"
.gitpod.yml
# Snowflake
default:
    target: dev
    outputs:
        dev:
            type: snowflake
            account: "{{ env_var('DBT_SNOWFLAKE_ACCOUNT') }}"
            user: "{{ env_var('DBT_SNOWFLAKE_USER') }}"
            private_key_path: "{{ env_var('GITPOD_REPO_ROOT') }}/profiles/private_key.p8"

            database: "{{ env_var('DBT_SNOWFLAKE_DB') }}"
            warehouse: "{{ env_var('DBT_SNOWFLAKE_WH') }}"
            schema: "{{ env_var('DBT_SNOWFLAKE_SCHEMA') }}"

This file contains references to environment variables that have to be set by the user. This is the only step users of the repository have to do manually in order to launch a functional dbt dev environment once the configuration has been added to the repository, and only needs to be done once.

After the connection has been tested successfully, the workspace is ready to be used.

Customize VS Code and Git for dbt + Gitpod

The .gitpod.yml file also allows you to describe IDE extensions and configurations.

We recommend using VS Code for dbt projects in Gitpod workspaces. While VS Code is not ideal for these environments out-of-the-box, there are several extensions that offer a greatly improved development experience, namely the vscode-dbt-power-user extension. Some of this extension’s best feature are:

  • Autocompletion of dbt models
  • The ability to preview model results in VS Code
  • The ability to display model lineage
  • Integration of ability to run and test dbt models into VS Code’s UI

For syntax highlighting, we recommend jinjahtml.

Beyond your IDE, the .gitpod.yml file also gives you the opportunity to configure prebuilds for GitHub repositories. Prebuilds can install dependencies and run builds before a workspace opens, especially helpful for code bases that are large or can’t be compiled directly. Check the documentation for a more detailed look at these options.

For a basic set of recommended extensions and Github prebuild configurations, you can add the following to your .gitpod.yml file:

.gitpod.yml
# Same for both BigQuery and Snowflake projects
vscode:
    extensions:
        - ms-python.python
        - mechatroner.rainbow-csv
        - innoverio.vscode-dbt-power-user
        - ms-toolsai.jupyter
        - ms-toolsai.jupyter-keymap
        - ms-toolsai.jupyter-renderers
        - ms-toolsai.vscode-jupyter-cell-tags
        - ms-toolsai.vscode-jupyter-slideshow
        - samuelcolvin.jinjahtml

github:
    prebuilds:
        master: true
        branches: true
        pullRequests: true
        pullRequestsFromForks: false
        addCheck: true
        addComment: false
        addBadge: false

You can preview your configs by running gp validate. For any workspace configuration options to persist, you must commit the .gitpod.yml and .gitpod.Dockerfile to the root of the repository and start a new workspace (a workspace restart is not sufficient). Once committed, configs become available to other users launching the workspace.

Try Gitpod today.

Share this post

Was this helpful?

Stay in the loop

Get a weekly email with our latest thinking, news, and insights.

By submitting this form, I confirm that I acknowledge the collection and processing of personal data by Gitpod, as further described in the Privacy Policy.