Announcements & April 2024 Recap
Learn what’s new with Data Gibberish and what you missed last month
Hi there,
I am starting a new initiative here. In addition to the regular weekly articles, you will receive a monthly recap.
Starting today, every first Monday of the month, you will receive a summary of what I wrote about the previous month so you can catch up if you missed anything you want to read.
Don't worry! You can always unsubscribe from the monthly or weekly newsletters.
But wait, there is more!
Going Paid
Starting in June, I will enable payments for Data Gibberish. Here's a quick FAQ on what will change:
Do I need to pay to read the articles?
No! You will still receive most, if not all, of the articles for free in your inbox. But I will lock every article, aside from some handpicked, one month after I publish them.
What benefits can paid subscribers expect?
You will still get a ton of value for free, but here are some benefits if you choose to support my newsletter:
Access to the entire archive
FREE templates and resource
Monthly Ask Me Anything posts
What about templates?
Most templates you now get for free will be locked behind a paywall. You can get it by subscribing to Data Gibberish or clicking the Gumroad links.
When the paid option will take effect?
As I said, we are going live in June. This means you have one month to save any articles, templates, and resources that are available for free. That said, you can pledge your support now.
Anything else I need to know?
Yes! I will message every one of you who has pledged support so far. I don't want you to be surprised by any charges you may have forgotten.
Also, check the Founding member plan. You want to take advantage of the bonus for this level of support.
Last, there will be an option to get a paid subscription for free. Here's how.
Referral Program
As part of the paid offering, you can get some referral awards. You don't need to be a paid member of Data Gibberish, but you can get a membership by referring your friends or coworkers.
The Plan
10 referrals: 1 month comp
25 referrals: A freebie based on your votes
100 referrals: A free 20 minute mentoring video call
What do you mean by votes?
I have been working on a few ideas to help you boost your data engineering skills lately. But I don't want to give you something generic. You deserve a reward that fits your needs.
So, I have created a short survey to help me learn more about you. Hurry up because you have only two weeks to vote!
That's all. Now, to the monthly summary.
Self-Service BI Is a Lie: 3 Problems You Can Resolve Today And Improve It
The idea behind self-service BI is to empower business users to query data, build reports, and explore insights. However, there are significant challenges that often lead to poor decisions and loss of trust in the data:
Inconsistent metrics and definitions across teams
Lack of scalability as data complexity grows
Inadequate data literacy among users
Solutions to Enable Effective Self-Service BI
To overcome these challenges and enable a degree of self-service BI that works, focus on:
Establishing a data governance committee to align on standard definitions and calculations for critical metrics. Document these in a shared data dictionary.
Adopting a semantic layer to manage business logic. Create a curated data model with friendly naming that users can self-serve from while keeping data consistent.
Investing in data literacy training for everyone. Teach users how to interpret data, understand common pitfalls, and know when to ask for help. Make data competency part of the culture.
Key Takeaways
Absolute self-service BI is a myth. It requires a solid foundation of proper infrastructure, governance, and education.
Standardise KPI definitions, build a scalable data architecture and invest in data literacy to enable effective self-service.
Self-service BI is a spectrum, not all-or-nothing. The degree to which you can enable it depends on the strength of your data foundations.
Embrace that self-service success requires collaboration between data teams and business users and hard work to strengthen critical pillars.
Focusing on these fundamentals can empower users with a pragmatic level of self-service BI that drives accurate results for the organisation. It takes effort, but building that solid data foundation is worth the investment.
Taps & Targets: Simplify ETL Through Singer's Data Pipeline Blueprint
Singer is an open-source standard for building data integration pipelines. It defines how to write scripts that move data between various sources and destinations, such as databases, APIs, and files.
Components of Singer
Singer pipelines consist of two main independent components:
Taps: Scripts that extract data from sources and output it in a standardised JSON format.
Targets: Scripts that consume data from taps and load it into destinations.
Taps and targets communicate through a pipe, with the tap sending data to the target.
Messages in Singer
To ensure compatibility and smooth data flow, Singer uses specific message types:
Schema: Defines each table or data stream's structure, fields, and data types.
Record: A single row or document of data conforming to the defined schema.
State: Keeps track of the last successful data sync point, allowing taps to resume extraction from where they left off.
Key Takeaways
Singer simplifies the creation of data integration pipelines by providing a standard format for data extraction and loading scripts. With its growing ecosystem of taps and targets and an active community, Singer is an excellent choice for organisations looking to streamline their data integration processes.
AWS for Data Engineers: Conquer the Cloud in 90 Days
AWS is a great way to scale your data platform, and I created a learning plan for you. The plan is designed to help you become competent in AWS within 90 days. It focuses on core services and building an end-to-end data platform.
Days 1-30: AWS Basics
Build a Data Lake on S3Learn S3 basics (buckets, objects, storage classes, permissions)
Analyse Data with AthenaLearn Athena basics (databases, tables, querying data)
Days 31-60: Intermediate Skills
Data Processing Pipelines with Glue and KinesisLearn Glue basics (crawlers, jobs, workflows) and Kinesis basics (streams, shards, producers, consumers)
Data Warehousing with RedshiftLearn Redshift basics (clusters, nodes, distribution styles)
Days 61-90: Advanced Topics
Big Data Processing with EMRLearn EMR basics (clusters, nodes, steps)
Cost OptimizationLearn to track and analyse costs using Cost Explorer and Budgets
Key Takeaways
By the end of the 90 days, you will have built an end-to-end data platform on AWS, gaining hands-on experience with core services and learning how to optimise for cost and performance. This is the beginning of your AWS journey, and it's essential to continue learning and exploring as the cloud evolves.
I've Been Using Meltano for 4 Years: Here's My Full Review
Meltano is an open-source platform for data integration and orchestration. It leverages industry standards like Singer for data pipelines and Airflow for workflow management.
User Experience
The code-first approach makes navigation easy, even for beginners
Smooth-running data pipelines thanks to Singer and Airflow integration
Customisable with many available plugins
It may have a steeper learning curve for those new to data engineering
Key Features
YAML-based configuration
Singer integration
Airflow orchestration
Other plugins expand capabilities with tools like dbt, Elementary and Evidence
Performance
Reliable data processing capabilities thanks to Singer
Streamlined workflows and automation improve efficiency
Challenges with running Meltano in a cloud environment
Requires technical expertise for proper cloud deployment
Support and Community
No paid 24/7 customer support
Vibrant Slack community with nearly 5,000 members
Community support for troubleshooting, best practices, ideas
Conclusion
Meltano is a robust DataOps OS ideal for data engineers who prefer a code-first approach and adherence to industry standards. While it may have a learning curve for non-techies and some cloud limitations, its extensive plugin library and customisation options make it a strong choice for data professionals.
😍 How Am I Doing?
I love hearing from readers and am always looking for feedback. How am I doing with Data Gibberish? Is there anything you’d like to see more or less of*? Which aspects of the newsletter do you enjoy the most?*
Hit the ❤️ button and share it with a friend or coworker.
all the best