The Ultimate Guide to Unit Testing With dbt

How to become a DataOps unicorn with test-driven development

Jan 17, 2024

∙ Paid

This article is part of the Master dbt with engineering mindset playlist. Click here to explore the full series.

This article is part of the Build rock-solid data pipelines playlist. Click here to explore the full series.

Do your stakeholders come to you and tell you there’s something wrong with their reports?

Do you feel anxious when you need to modify the logic of a complex dbt model you haven’t touched in months?

Do you often find yourself in a situation where your new model doesn’t comply with the requirements after you spend hours working on it?

Believe it or not, most analytics and data engineers would answer “Yes” to these questions immediately. The chances are your answer is positive, too.

Let’s be honest, you have a severe problem!

You work slowly and produce buggy code that doesn’t align with colleagues’ needs. People can’t rely on you to provide quality work and don’t trust you. You are a liability.

But fret not! There’s a well-known process explicitly designed to target these issues. Test-driven development with unit tests is a practice widely used by software engineers but not popular among data professionals.

By implementing unit tests, you will improve the communication with your partners in the business. You will also be more confident and iterate faster on your tasks. You will be among the few world-class data professionals who address issues with their models upfront.

In the last issue of Data Gibberish, you learn what unit tests and TDD are. In this article, we are discussing:

Short recap of the previous article;
What components unit test need;
Step-by-step tutorial on implementing unit tests with dbt;
Improving your DataOps process.

Reading time: 10 minutes

Recap

As usual, I strongly recommend reading my last article. Yet, here's the essence:

Mastering No-Fail dbt Models With Unit Testing and TDD

Yordan Ivanov

January 10, 2024

Read full story

You, as most dbt community members, only test data in production. This means you learn about mistakes after you load faulty data in our warehouse. This practice is slow and expensive, but most importantly, it impacts your people’s trust in you.

Unit tests are specific kinds of tests that don't exist natively in dbt. They ensure that you comply with your requirements and that your models’ behaviour does not break in the future.

You can revert your process and test before you start writing code. This process is called Test-driven development, or TDD for short. It’s steps on unit tests but also helps you clarify specifics and avoid design flaws upfront.

I wrapped up my last article with a promise to show you how you can implement unit tests in our dbt projects. And here we are.

Overview

I want to spread the word about unit tests as much as possible and give you a complete step-by-step plan for implementing unit tests with dbt. This is not yet another talk about theory; this is a story. It’s precisely what allowed us to move from ETL to ELT stack and processes in under six months.

Dext switched from ETL to ELT, it’s that easy!

Yordan Ivanov

March 31, 2021

Read full story

But wait, there's more! Random snippets of wisdom and code are not enough here. I added unit tests to the famous Jaffle Shop project from dbt Labs, so you can try everything out directly without copying and pasting the code from the article.

https://github.com/ivanovyordan/jaffle_shop

Here's what you'll do if you want to steal my unit test blueprint today:

Create a CI dbt profile;
Install dbt Utils;
Add Dummy Data;
Add Expectations;
Run the Tests.

Now, let’s step even further. Consider this issue of Data Gibberish as a hands-on workshop on unit testing with dbt.

Let's get this show on the road!

Milestone 1: Create a CI dbt Profile

Most dbt projects I've seen have two targets:

dev for your local development environment;
prod to build your production data and make it accessible to the world.

Effective unit tests require an isolated environment. This isolation means no external factors are affecting your models' behaviour. They need a lab environment if you wish.

To create that environment, open your profiles.yml file (usually ~/.dbt/profiles.yml) and add a ci target to your project. You can find the sample profiles file in my fork of the Jaffle Shop project. And this is what the CI target looks like:

In this case, I work with a DuckDB database. This means you can run the tests for free using a single file. If you work with a proper warehouse, say Snowflake, you can create a personal testing schema next to your dev schema.

Milestone 2: Install dbt utils

Before working on your unit tests, you have to install the dbt utils plugin.

dbt utils is a package with some handy macros and tests. For this step, you need the equality test. However, I have never seen a dbt project without this plugin, and I firmly believe it's a must-have no matter what.

To install it, add this line to your packages.yml file. Here's what my Jaffle Shop file looks like:

Once you save that file, open your terminal and run this command in your project's folder:

Now, buckle up because this may be pretty challenging.

Thank you for reading Data Gibberish. This post is public so feel free to share it.

Milestone 3: Add Dummy Data

You can now start with generating our mock input. But this is not just a random subset of your data. You need to be strategic when working on that piece. Collect the requirements and understand what your data means.

As I told you, you should not work with your production data. Instead, you need just a handful of handcrafted examples. Think about every corner case and manually create your version of the input data.

For example, your Jaffle Shop customers can split their orders among different payments.

And here’s our dummy data with order 9 split into two payments.

You need to create a test directory in your data folder or wherever you store seeds and save your fixtures as CSV files. But remember, your seeds must have the same names as your raw tables. And again, you can find the actual file on GitHub.

But we are not through yet. There's one problem you need to handle with your mock data.

If you run dbt seed, you will create tables in your data warehouse with the names of those files. You want this data to be available only when you run your tests and not mess with your data on production. To do that, you'll benefit from the ci profile from the previous section.

Keep reading with a 7-day free trial

Subscribe to Data Gibberish to keep reading this post and get 7 days of free access to the full post archives.