Announcements & April 2024 Recap

Learn what’s new with Data Gibberish and what you missed last month

May 06, 2024

Hi there,

I am starting a new initiative here. In addition to the regular weekly articles, you will receive a monthly recap.

Starting today, every first Monday of the month, you will receive a summary of what I wrote about the previous month so you can catch up if you missed anything you want to read.

Don't worry! You can always unsubscribe from the monthly or weekly newsletters.

But wait, there is more!

Going Paid

Starting in June, I will enable payments for Data Gibberish. Here's a quick FAQ on what will change:

Do I need to pay to read the articles?

No! You will still receive most, if not all, of the articles for free in your inbox. But I will lock every article, aside from some handpicked, one month after I publish them.

What benefits can paid subscribers expect?

You will still get a ton of value for free, but here are some benefits if you choose to support my newsletter:

Access to the entire archive
FREE templates and resource
Monthly Ask Me Anything posts

What about templates?

Most templates you now get for free will be locked behind a paywall. You can get it by subscribing to Data Gibberish or clicking the Gumroad links.

When the paid option will take effect?

As I said, we are going live in June. This means you have one month to save any articles, templates, and resources that are available for free. That said, you can pledge your support now.

Anything else I need to know?

Yes! I will message every one of you who has pledged support so far. I don't want you to be surprised by any charges you may have forgotten.

Also, check the Founding member plan. You want to take advantage of the bonus for this level of support.

Last, there will be an option to get a paid subscription for free. Here's how.

Referral Program

As part of the paid offering, you can get some referral awards. You don't need to be a paid member of Data Gibberish, but you can get a membership by referring your friends or coworkers.

The Plan

10 referrals: 1 month comp
25 referrals: A freebie based on your votes
100 referrals: A free 20 minute mentoring video call

What do you mean by votes?

I have been working on a few ideas to help you boost your data engineering skills lately. But I don't want to give you something generic. You deserve a reward that fits your needs.

So, I have created a short survey to help me learn more about you. Hurry up because you have only two weeks to vote!

Yes, I want to help!

That's all. Now, to the monthly summary.

Self-Service BI Is a Lie: 3 Problems You Can Resolve Today And Improve It

The idea behind self-service BI is to empower business users to query data, build reports, and explore insights. However, there are significant challenges that often lead to poor decisions and loss of trust in the data:

Inconsistent metrics and definitions across teams
Lack of scalability as data complexity grows
Inadequate data literacy among users

Solutions to Enable Effective Self-Service BI

To overcome these challenges and enable a degree of self-service BI that works, focus on:

Establishing a data governance committee to align on standard definitions and calculations for critical metrics. Document these in a shared data dictionary.
Adopting a semantic layer to manage business logic. Create a curated data model with friendly naming that users can self-serve from while keeping data consistent.
Investing in data literacy training for everyone. Teach users how to interpret data, understand common pitfalls, and know when to ask for help. Make data competency part of the culture.

Key Takeaways

Absolute self-service BI is a myth. It requires a solid foundation of proper infrastructure, governance, and education.
Standardise KPI definitions, build a scalable data architecture and invest in data literacy to enable effective self-service.
Self-service BI is a spectrum, not all-or-nothing. The degree to which you can enable it depends on the strength of your data foundations.
Embrace that self-service success requires collaboration between data teams and business users and hard work to strengthen critical pillars.

Focusing on these fundamentals can empower users with a pragmatic level of self-service BI that drives accurate results for the organisation. It takes effort, but building that solid data foundation is worth the investment.

Share Data Gibberish

Taps & Targets: Simplify ETL Through Singer's Data Pipeline Blueprint

Singer is an open-source standard for building data integration pipelines. It defines how to write scripts that move data between various sources and destinations, such as databases, APIs, and files.

Components of Singer

Singer pipelines consist of two main independent components:

Taps: Scripts that extract data from sources and output it in a standardised JSON format.
Targets: Scripts that consume data from taps and load it into destinations.

Taps and targets communicate through a pipe, with the tap sending data to the target.

Messages in Singer

To ensure compatibility and smooth data flow, Singer uses specific message types:

Schema: Defines each table or data stream's structure, fields, and data types.
Record: A single row or document of data conforming to the defined schema.
State: Keeps track of the last successful data sync point, allowing taps to resume extraction from where they left off.

Key Takeaways

Singer simplifies the creation of data integration pipelines by providing a standard format for data extraction and loading scripts. With its growing ecosystem of taps and targets and an active community, Singer is an excellent choice for organisations looking to streamline their data integration processes.

AWS for Data Engineers: Conquer the Cloud in 90 Days

AWS is a great way to scale your data platform, and I created a learning plan for you. The plan is designed to help you become competent in AWS within 90 days. It focuses on core services and building an end-to-end data platform.

Days 1-30: AWS Basics

Build a Data Lake on S3Learn S3 basics (buckets, objects, storage classes, permissions)
Analyse Data with AthenaLearn Athena basics (databases, tables, querying data)

Days 31-60: Intermediate Skills

Data Processing Pipelines with Glue and KinesisLearn Glue basics (crawlers, jobs, workflows) and Kinesis basics (streams, shards, producers, consumers)
Data Warehousing with RedshiftLearn Redshift basics (clusters, nodes, distribution styles)

Days 61-90: Advanced Topics

Big Data Processing with EMRLearn EMR basics (clusters, nodes, steps)
Cost OptimizationLearn to track and analyse costs using Cost Explorer and Budgets

Key Takeaways

By the end of the 90 days, you will have built an end-to-end data platform on AWS, gaining hands-on experience with core services and learning how to optimise for cost and performance. This is the beginning of your AWS journey, and it's essential to continue learning and exploring as the cloud evolves.

I've Been Using Meltano for 4 Years: Here's My Full Review

Meltano is an open-source platform for data integration and orchestration. It leverages industry standards like Singer for data pipelines and Airflow for workflow management.

User Experience

The code-first approach makes navigation easy, even for beginners
Smooth-running data pipelines thanks to Singer and Airflow integration
Customisable with many available plugins
It may have a steeper learning curve for those new to data engineering

Key Features

YAML-based configuration
Singer integration
Airflow orchestration
Other plugins expand capabilities with tools like dbt, Elementary and Evidence

Performance

Reliable data processing capabilities thanks to Singer
Streamlined workflows and automation improve efficiency
Challenges with running Meltano in a cloud environment
Requires technical expertise for proper cloud deployment

Support and Community

No paid 24/7 customer support
Vibrant Slack community with nearly 5,000 members
Community support for troubleshooting, best practices, ideas

Conclusion

Meltano is a robust DataOps OS ideal for data engineers who prefer a code-first approach and adherence to industry standards. While it may have a learning curve for non-techies and some cloud limitations, its extensive plugin library and customisation options make it a strong choice for data professionals.

😍 How Am I Doing?

I love hearing from readers and am always looking for feedback. How am I doing with Data Gibberish? Is there anything you’d like to see more or less of*? Which aspects of the newsletter do you enjoy the most?*

Hit the ❤️ button and share it with a friend or coworker.

Thank you for reading Data Gibberish. This post is public so feel free to share it.

Data Gibberish

Announcements & April 2024 Recap

Learn what’s new with Data Gibberish and what you missed last month

Going Paid

Do I need to pay to read the articles?

What benefits can paid subscribers expect?

What about templates?

When the paid option will take effect?

Anything else I need to know?

Referral Program

The Plan

What do you mean by votes?

Self-Service BI Is a Lie: 3 Problems You Can Resolve Today And Improve It

Solutions to Enable Effective Self-Service BI

Key Takeaways

Taps & Targets: Simplify ETL Through Singer's Data Pipeline Blueprint

Components of Singer

Messages in Singer

Key Takeaways

AWS for Data Engineers: Conquer the Cloud in 90 Days

Days 1-30: AWS Basics

Days 31-60: Intermediate Skills

Days 61-90: Advanced Topics

Key Takeaways

I've Been Using Meltano for 4 Years: Here's My Full Review

User Experience

Key Features

Performance

Support and Community

Conclusion

😍 How Am I Doing?

Discussion about this post