AI Mastery for Data Engineers: Supercharge Your Coding Workflow in VS Code
Turn VS Code into your personal AI-powered assistant engineer using Cline and Claude to generate and optimise code for complex data projects.
Greetings, curious reader,
As a data engineer, you spend a good chunk of your day writing, debugging, or optimising code. These tasks are essential for building pipelines and maintaining systems. Yet, they don't create the biggest impact on your projects or your career.
Coding is a tactical task. But the most successful engineers know their real value comes from solving bigger problems. I'm talking about things like designing scalable systems or aligning data initiatives with business objectives.
Imagine if you could hand off the repetitive parts of coding to a more junior assistant. Tools like Claude and Cline can do precisely that. They generate, debug, and optimise code for you. This gives you time to focus on the work that drives real results.
In this guide, I'll show you how to integrate Claude and Cline into your VS Code workflow. You'll also see a step-by-step example of building a simple ELT pipeline using Python libraries like requests, pandas, and DuckDB. By the end, you'll save hours of coding time and know how to use AI responsibly and effectively.
Why Coding Alone is Holding You Back
Think about your biggest wins in data engineering. Chances are, they didn't come from writing the perfect SQL query. Instead, they likely came from implementing scalable systems, improving reliability, or cutting costs. Tactical work like coding contributes to those goals, but it's only a tiny part of the equation.
To make a greater impact, you need to spend more time on high-leverage activities. That's where AI tools like Claude and Cline come in—they handle the tactical work so you can focus on the strategy.
Meet Your New Teammates: Claude and Cline
What Is Claude?
Claude is an AI assistant developed by Anthropic and designed to help with a wide range of coding tasks. It excels at:
Writing Python scripts and ETL pipelines.
Debugging code and resolving errors.
Optimising SQL queries for performance and cost-efficiency.
We already discussed Perplexity and ChatGPT, but when it comes to coding, Claude is the best. What sets Claude apart is its ability to follow detailed instructions.
With the right prompt, Claude writes code at a level comparable to a mid- or senior-level developer. You'll see how this works in the example later in this article.
What Is Cline?
Cline is a VS Code extension that connects AI tools like Claude to your development environment. It's designed to:
Streamline your workflow by automating repetitive tasks like testing and documentation.
Provide AI-assisted code suggestions directly in VS Code.
Help you integrate AI into your existing coding processes without needing extra tools.
Cline acts as a bridge, letting you bring AI into the tools you already know and use every day. Its power is incredible!
AI Doesn't Replace You—It Works With You
One thing to keep in mind: AI is not a replacement for your skills or expertise. It works best as an assistant, helping you move faster and focus on higher-value tasks. If you don't know how to code, you won't be able to guide AI effectively or validate its outputs. In other words, you still need to know the "why" behind the "what."
For newcomers, this is especially important. While AI can help you build projects faster, relying on it without understanding the fundamentals will backfire. You won't be able to explain your code in interviews or troubleshoot when things go wrong. Treat AI as a partner, not a substitute.
Setting Up VS Code for AI-Powered Coding
Step #1: Install the Tools You'll Need
To get started, set up your environment:
Download and install VS Code if you don't already have it (link).
Install the Cline extension from the VS Code marketplace (link).
Obtain an API key for Claude from Anthropic (link).
Step #2: Configure Your Workspace
Customise your VS Code workspace for AI-enhanced coding:
Add your API key to Cline.
Add extensions for linting, debugging, and code formatting like ruff.
Set up your Python environment to include all required dependencies for your project.
Step #3: Integrate AI Into Your Workflow
Here's how you'll use Claude and Cline in your day-to-day coding:
Use Claude to write boilerplate code, debug issues, and optimise scripts.
Let Cline automate testing and generate documentation for your projects.
Continuously review and validate AI outputs to ensure reliability.
A Practical Example: Building a Data Pipeline with AI
Now, let's put everything into action by building a simple ELT pipeline. You'll use Python libraries like requests, pandas, and DuckDb, with help from Claude and Cline.
Full disclosure: I recorded a video of how I generated the entire project. It appeared I only recorded a static screen. So you will only enjoy screenshots.
Step #1: Bootstrap The Project
Start with bootstrapping your project:
Create an empty folder and open VS Code.
Use a prompt like this:
Bootstrap an empty data engineering project. Use poetry for virtual environment management, black and isort for code formatting.
Review your project structure.
Step #2: Extract Data from an API Using requests
Start by extracting data from a public API.
Prompt Claude to generate a Python script for fetching data from a weather API.
Use a structured prompt like this:
Write a Python script that fetches weather data from OpenWeatherMap's API. Use the requests package. Include pagination and error handling.
Review the script for accuracy and test it in VS Code.
Example output:
Step #3: Transform Data Using pandas
or polars
Next, clean and transform the data for analysis. Ask Claude to generate code for data transformations. A prompt like this will help guide its output:
Write a Python function that takes JSON data, converts it into a pandas DataFrame, renames columns, handles missing values, and converts timestamps to a datetime format.
Here’s an example of what Claude might generate:
If you prefer polars, you can ask Claude to use that library instead. AI can easily switch between libraries based on your prompt. After generating the transformation code, use Cline to create unit tests for your function to ensure it handles edge cases like null values or unexpected data types.
Step #4: Load Data into DuckDB for Analysis
Once the data is cleaned, the next step is to load it into DuckDB for fast, in-memory querying. Prompt Claude with something like:
Write a Python script to create a DuckDB table from a pandas DataFrame and query the average temperature.
Did you know? I wrote an extensive Snowflake learning guide. And you can have this for free!
You only need to share Data Gibberish with 5 friends or coworkers and ask them to subscribe for free. As a bonus, you will also get 3 months of Data Gibberish Pro membership.
Here’s an example of AI-generated code:
DuckDB’s ability to execute analytical queries efficiently makes it ideal for ad hoc analysis, and using Claude ensures you don’t waste time writing boilerplate code for integration.
Step #5: Automate Documentation and Testing
Documentation and testing often take up more time than expected. With AI, you can automate these tasks. Use Cline to generate a project README
file outlining the pipeline’s setup and usage. Additionally, ask Claude to create integration tests for your pipeline to ensure the entire workflow—from API extraction to DuckDB querying—runs smoothly.
For example, you can prompt Claude with:
Write an integration test to ensure the data pipeline extracts data from the API, transforms it using pandas, and loads it into DuckDB.
That’s All Folks
In five easy steps, you generated the entire data pipeline with the tests and the documentation. Not only that, but you didn't even need to run any commands manually. All that remains is to push that to GitHub.
Tips for Using AI Responsibly and Effectively
Using AI effectively requires more than knowing how to ask for code—it’s about maintaining oversight and guiding AI outputs. Here are some tips to make the most of tools like Claude and Cline:
Write Great Prompts:
Be specific in your instructions to AI. For example, instead of “Write a Python script,” try “Write a Python script to fetch weather data from OpenWeatherMap’s API, handle pagination, and raise an exception if the response fails.”
Include constraints, examples, or expected outputs to guide the AI toward better results.
Validate All Outputs:
Treat AI-generated code as a first draft, not the final product.
Test everything. Debugging AI outputs ensures that the generated code not only runs but also aligns with your project’s needs.
Keep Improving Your Skills:
The better your coding skills, the better you’ll be at writing prompts, spotting errors, and optimising AI outputs.
Practise coding regularly to keep your expertise sharp. AI complements your abilities; it doesn’t replace them.
Balance Automation and Oversight:
Use AI for efficiency, but stay hands-on with decisions involving architecture and system design. These are high-leverage areas where your expertise matters most.
Common Challenges and How to Overcome Them
Even with powerful tools like Claude and Cline, you may encounter challenges. Here’s how to address common issues:
Poor Outputs from AI:
Problem: Claude’s initial output doesn’t match your expectations.
Solution: Refine your prompt. Be specific about your requirements, constraints, and examples. Break down complex tasks into smaller prompts to improve accuracy.
Bugs or Errors in Generated Code:
Problem: AI-generated code contains logical errors or inefficiencies.
Solution: Test and debug manually. Use AI to explain its logic or identify potential bugs if the output is unclear.
Scaling AI Across Teams:
Problem: Team members struggle to adopt AI tools effectively.
Solution: Provide training on tools like Claude and Cline. Establish best practices for writing prompts and validating AI outputs.
If you have enjoyed the newsletter so far, please show some love on LinkedIn and Threads or forward it to your friends. It really does help!
Final Thoughts
AI won’t take your job, but people who know how to use AI effectively might.
Coding remains essential, but the focus is shifting. Engineers who embrace AI gain the freedom to think more about the "why" behind each project:
Why does this system exist?
Why does it need to scale?
Why is this transformation critical to the business?
AI won’t replace you; it amplifies what you do best. It handles tactical tasks, freeing you to focus on strategy and innovation. Combining coding expertise with AI tools allows you to design scalable systems, align projects with business goals, and solve bigger challenges.
The job market is evolving. Knowing how to code is no longer enough—it’s just the baseline. What sets you apart is your ability to work strategically and use AI to enhance your output. The future of data engineering belongs to those who master both coding and AI, using them together to unlock their full potential.
Summary
AI tools like Claude and Cline can help you shift your focus. By delegating repetitive tasks to AI, you free up time for high-leverage activities.
In this article, you saw how to use these tools in VS Code to build a simple ELT pipeline with Python. You also learned practical tips for integrating AI into your workflow, from crafting effective prompts to validating outputs.
Remember, AI is not a replacement for your skills. It’s a tool to help you do more faster. When used responsibly, AI becomes a powerful ally. It lets you reclaim time and focus on driving real value in your projects.
There’s much more about AI and coding, but that’s not why you are here. Next week, we are changing gears.
I will share thoughts about the industry and the job market in 2025. I will mention AI, but this won’t be the main topic.
How Am I Doing?
I love hearing you. How am I doing with Data Gibberish? Is there anything you’d like to see more or less? Which aspects of the newsletter do you enjoy the most?
Use the links below, or even better, hit reply and say “Hello”. Be honest!