The Ultimate Database Guide: Choosing Relational, Document, or Time Series to Drive Success
Explore the strengths and use cases of relational, document, and time series databases—and learn how to select the best option to power your project's success.
Last week, I promised you to discuss how you can leverage AI to speed up your coding. This turned into a series. I need one more week to polish that. Today, we have something else from my pipeline.
Greetings, curious reader,
Picking the right database can make or break a project. Especially when dealing with analytics. You've got plenty of options—relational databases, document stores, time series databases, data lakes, and data warehouses.
Each one has its strengths, and each has its limits. Choosing wisely can mean faster queries, lower costs, and a solution that scales with you instead of holding you back.
In this guide, I will walk you through 5 database types, how they structure data, and where they shine or fall flat. By the end, you'll have a solid idea of which fits your needs best—and why.
Also, check at the end of the article. I created an interactive questionnaire to help you pick the best solution for your next project.
Reading time: 8 minutes
Option #1: Relational Databases
Note: Here, I mean traditional OLTP databases like MySQL and PostgreSQL.
Relational databases organise data in tables with rows and columns, where each table has a fixed schema. Tables can reference each other through primary and foreign keys, letting you define relationships between data entities.
This structure works best for data with clear relationships. That approach is reliable for applications where data consistency matters. Plus, relational databases support SQL, which is powerful and familiar to most people.
Imagine an e-commerce system. You'd have tables for customers, orders, and products. Each table has a specific schema. You can link them up through relationships. Here's a simple layout:
Story: Why I Chose PostgreSQL Over a Data Warehouse
Here's a real-world example: I worked on a project where the stakeholders thought they needed a full-blown data warehouse. Their research showed that big companies use data lakes or warehouses. They were convinced this was the way to go.
They only had a few gigabytes of data and no plans to grow too fast. So, I convinced them to go with PostgreSQL.
My solution proved to be cost-effective, easy to manage and performed great. Plus, the team already knew SQL, so they didn't have to learn new tools. This setup has been serving them well for years with no problems.
Why Relational Databases Are Good for Structured Data
Data Integrity: Primary and foreign keys keep relationships in check so your data stays clean.
Complex Queries: SQL lets you write powerful queries with joins, aggregations, and more.
ACID Compliance: Transactions are reliable, which is essential for data accuracy.
Ideal Use Cases
Relational databases are best for applications with structured data and clear relationships, like e-commerce, financial systems, or customer management. PostgreSQL is a fantastic choice because it's solid, open-source, and supports advanced SQL.
Option #2: Document Databases
Document databases store data in JSON-like documents, which makes them flexible. This means fields can vary from one document to the next, which isn't possible in a relational database. You can also nest data within documents, so complex, hierarchical data fits naturally.
Each record is a self-contained document, so you don't need a fixed schema. This flexibility is handy if your data structure changes over time.
Example Structure: Product Catalogue
In a product catalogue, each product document could have a different structure, depending on the type of product.
Story: Using MongoDB for a Chat App
A few years back, although it feel like ages, I built a chat app with a Node.js backend. MongoDB was perfect here because each chat room had different metadata, users, and messages. With MongoDB's JSON document model, I didn't need to worry about tables and joins. Plus, it integrated easily with Node.js.
But let's be clear: MongoDB is not suitable for analytics. It can't handle complex queries or joins effectively. Don't even think about using MongoDB if you're dealing with analytical workloads. For anything beyond flexible, semi-structured data, I'd say "No."
Why Document Databases Are Good for Semi-Structured Data
Schema Flexibility: Useful when data structure changes often.
Nested Data Support: You can store hierarchical data without needing multiple tables.
Horizontal Scalability: Built to scale out across servers, making it good for distributed apps.
Ideal Use Cases
Document databases are suitable for content management, product catalogues, and user profiles. They're flexible but fall short on heavy analytics or complex querying. MongoDB can work well in specific cases but isn't a go-to for analytics. But I must say relational databases are catching up.
Option #3: Time Series Databases
Time series databases are built for timestamped data. That means data in a time series database is indexed by time.
This structure allows efficient time-based queries and operations, like aggregating data over specific intervals or downsampling older data.
Time series databases are excellent for applications where you need to track metrics or events over time.
Example Structure: Sensor Data
Here's a typical table in TimescaleDB for sensor data, where each reading is recorded with a timestamp.
Story: My Current TimescaleDB Project
I've used time series before, but the project I'm currently working on is the most interesting. I'm using TimescaleDB with a client who needs to track historical data and Change Data Capture (CDC). Timescale makes querying over time periods super easy, and it's fast.
I'll cover this project in more detail in another Data Gibberish issue. However, TimescaleDB has quickly become my go-to for any project with a heavy time component.
Why Time Series Databases Are Good for Time-Based Data
Efficient Storage for Time-Based Data: Stores data by time, which reduces storage needs.
Optimised for Time-Based Queries: Ideal for aggregations over time intervals.
Automatic Downsampling: Many time series databases handle downsampling to keep storage costs down.
Ideal Use Cases
Time series databases are ideal for monitoring IoT applications and financial data where you need to track changes over time. TimescaleDB is an excellent choice, combining SQL with time series optimisations.
Option #4: Data Warehouses
Data warehouses like ClickHouse and Snowflake use columnar storage. Rather than storing rows together, each column is stored separately.
This structure is ideal for analytical queries that only need specific fields from each row. These databases are optimised for high-performance querying and aggregation.
Example Structure: Sales Data in Columnar Storage
In a columnar data warehouse, sales data is stored by columns, making it efficient to access metrics for analytics:
Why Data Warehouses Are Good for Analytics
Efficient Aggregations: Columnar storage is fast for aggregation-heavy queries.
Separation of Storage and Compute: Cloud warehouses like Snowflake let you scale storage and compute independently.
Batch Processing: Data warehouses are perfect for handling large-scale data processing in batches.
Ideal Use Cases
Data warehouses work best for business intelligence, reporting, and big analytical queries. Snowflake is my favourite choice—it's easy to scale, fast, and works well with SQL.
Did you know? I wrote an extensive Snowflake learning guide. And you can have this for free!
You only need to share Data Gibberish with 5 friends or coworkers and ask them to subscribe for free. As a bonus, you will also get 3 months of Data Gibberish Pro subscription.
Option #5: Data Lakes
Data lakes are designed to store massive amounts of raw data in their original format. Built on object storage, like S3, data lakes let you store everything—from structured to unstructured data—without worrying about schemas.
You will often see data stored in a directory structure. Each file can be in any format — CSV, JSON, Parquet — making data lakes highly flexible.
Example Structure: Customer Interaction Logs in S3
Each log file could be a JSON document, making storing data easy without much setup.
Story: My Experience with S3 as a Data Lake
You've probably used S3 as a data lake if you work in data. It's simple—no complex setup. Just drop your files in. And if you're working with pandas or Spark, it integrates seamlessly.
S3 is my go-to for raw data storage. It's cheap, reliable, and works with any analytics tool you'd want.
Why Data Lakes Are Good for Raw Data Storage
Flexible Storage: Can handle any format—structured, unstructured, or semi-structured.
Cost-Effective: S3 and similar object storage are cheap for large datasets.
Lakehouse Capabilities: With tools like Apache Iceberg, data lakes can add query capabilities, blending storage with analytics.
Ideal Use Cases
Data lakes are perfect for big data storage, machine learning, and storing raw data. Lakehouse technologies can add SQL querying, which makes data lakes like S3 even more powerful.
If you have enjoyed the newsletter so far, please share the summary on LinkedIn. It really does help!
Final Thoughts
We're seeing more hybrid solutions popping up, with lakehouses leading the charge. Lakehouses mix the flexibility of data lakes with the analytical power of data warehouses.
Open table formats like Apache Iceberg are making them more practical and accessible. Now, you can run SQL on massive, unstructured data without complex ETL. And this is a game-changer for analytics.
On top of that, the line between OLAP (analytical) and OLTP (transactional) is getting blurry. More solutions are supporting both.
We're seeing databases that handle batch and streaming data in the same setup. Databases are getting more versatile, and hybrid architectures are becoming the norm.
So, where does that leave you? It's still early, but the world is moving fast. What has been working for decades may be obsolete in a few years.
My recommendation is to keep up with the news and experiment when possible. You never know what the best choice for your new project would be.
To make it easier, I built a short questionnaire for you. Answer a few questions to learn which solution will work best for you.
Summary
To recap:
Relational Databases (PostgreSQL): Great for structured data with relationships.
Document Databases (MongoDB): Flexible for semi-structured data but not for analytics.
Time-series Databases (TimescaleDB): Optimised for timestamped data like monitoring and IoT.
Data Warehouses (Snowflake): Built for high-performance analytics on structured data.
Data Lakes (S3): Ideal for raw data storage with lakehouse options for querying.
Choose based on your data structure, scalability needs, and query requirements. For most analytics, a mix of relational databases and data warehouses (or data lakes with lakehouse features) will give you a balanced, powerful setup.
Did you like this article? I have a series on how to use AI to do data engineering. We will start with simple tricks and finish with an approach that removes over 80% of manually written code. Stay tuned!
Until next time,
Yordan
Join the Community
I believe that weekly newsletters were insufficient. A group of leaders and I launched an exclusive Discord community where you can dive into vibrant discussions on software and data engineering, leadership, and the creator economy.
Join today, and let's supercharge our professional journeys together!
How Am I Doing?
I love hearing you. How am I doing with Data Gibberish? Is there anything you’d like to see more or less? Which aspects of the newsletter do you enjoy the most?
Use the links below, or even better, hit reply and say “Hello”. Be honest!
Key to “data warehouse” is the star schema table design. Still relational, just a remodeling of the OLTP database designs. The columned storage just makes it faster.