AI Mastery for Data Engineers: Smarter Code Generation with ChatGPT
Learn how to craft prompts that produce efficient, scalable code with ChatGPT, along with best practices for debugging, optimisation, and handling edge cases.
Greetings, curious reader,
As a data engineer, you handle repetitive tasks, complex ETL processes, and debugging. These can consume hours, especially with large datasets or intricate code.
AI tools like ChatGPT can simplify code generation, error diagnosis, and optimisation, allowing you to focus on more valuable tasks. By incorporating ChatGPT, you can generate functional code faster, save time on debugging, and improve performance without extensive manual effort.
This guide will show you how to use ChatGPT effectively for coding in data engineering, using real-world examples from my SpaceX data project. You'll learn how to craft prompts for code generation, debugging, optimisation, and multi-step workflows.
As the last time, I have two sets of prompts:
Basic for everyone
Advanced for Pro Data Gibberish members
And again, Basic prompts are good enough. You can do a decent job even without supporting my work.
Want to see the advanced library? Check this page.
Let's dive in.
Crafting Effective Prompts for Code Generation
You need to know what you want to achieve when working with code. Be explicit about the outcome you're seeking, and break it down into clear steps.
The more detail you provide in your prompt, the better the results will be. Think of ChatGPT as a junior assistant. You can guide it to create the code you need, but only if you explain the requirements in detail.
Effective prompts lead to "good code"™. When you give ChatGPT specific instructions, it produces modular code that meets your needs. A vague prompt often results in generic output, so refining prompts is essential.
To write effective prompts, specify the programming language, define the task clearly, and break down complex processes into smaller steps. This will help you get code that's ready for real-world use.
Prompting
Let's say you need to extract SpaceX launch data from the REST API. A simple prompt might not capture all the requirements.
Generate Python code to load SpaceX launch data from a CSV file into Snowflake.
This prompt produces a simple script for extracting data without additional features. ChatGPT generates code that extracts the data and prints it directly.
You need more detail to turn that into working data engineering code. Here's a part of the result produced by the advanced prompt.
With an advanced prompt, ChatGPT can provide code that includes error handling and logging, creating a more robust solution suitable for production environments.
Common Pitfalls
Avoid vague terms like "process data." Specify actions, such as "load data into Snowflake" or "aggregate data by year."
Don't assume ChatGPT knows the dataset structure. If your data has specific columns, mention them in the prompt. Test the generated code in parts to catch issues early.
Pro Tip
When crafting prompts, focus on what you want the code to accomplish. Don't be too strict on the how. This gives ChatGPT room to find the most efficient solution.
Did you know? I wrote an extensive Snowflake learning guide. And you can have this for free!
You only need to share Data Gibberish with 5 friends or coworkers and ask them to subscribe for free. As a bonus, you will also get 3 months of Data Gibberish Pro membership.
Using ChatGPT for Code Debugging
Debugging can be time-consuming. ChatGPT can help identify and fix errors by analysing error messages and suggesting solutions. This support is invaluable when you're facing unfamiliar issues.
When using ChatGPT for debugging, paste the error message and relevant code snippet. Describe what you expect the code to do. This helps ChatGPT diagnose the problem more accurately.
Prompting
You encounter an error while calculating the success rate of SpaceX launches by year. A syntax error in the SQL query slows the work, but ChatGPT can assist you with quick debugging.
I'm getting a syntax error with this SQL query to calculate SpaceX launch success rates.
Here's the query: [insert SQL code here]. Can you help me fix it?
This prompt is direct and focuses on the syntax issue. ChatGPT suggests a corrected version of the query.
With the advanced prompt, ChatGPT knows everything about your environment. It gives a correct and concise answer right away.
Pro Tip
Describe the outcome you're looking for, such as "yearly success rates." This gives ChatGPT more context to understand your goal and identify issues.
Optimising Code with AI
Optimising code is crucial when working with large datasets. Efficient code runs faster and uses less memory. ChatGPT can suggest performance improvements, allowing you to handle large volumes of data smoothly.
To optimise code with ChatGPT, provide the script and mention performance issues, like slow execution or high memory use. ChatGPT can suggest changes to enhance efficiency.
Prompting
Your ETL pipeline for SpaceX data loads data into Snowflake, but the process is slow. By asking ChatGPT for optimisation tips, you can speed up the pipeline and reduce memory usage.
Here's a Python script for loading SpaceX data into Snowflake.
Can you suggest ways to make it run faster?
With this prompt, ChatGPT might suggest removing unnecessary steps or using faster data structures.
An advanced prompt focuses on requirements like parallel execution. ChatGPT may recommend changing the libraries or the data structures.
Common Pitfalls
Avoid vague terms like "make it faster." Specify goals like "reduce memory usage" or "increase processing speed."
I believe that weekly newsletters are insufficient. A group of leaders and I launched an exclusive Discord community where you can dive into vibrant discussions on software and data engineering, leadership, and the creator economy.
Join today, and let's supercharge our professional journeys together!
Prompt-Chaining for Multi-Step Tasks
Many data engineering tasks involve multiple steps, like extraction, transformation, and loading. Prompt-chaining lets you guide ChatGPT through each part of a complex workflow.
Start with a high-level prompt for the first step, then use each output as the basis for the next prompt. This method keeps ChatGPT's responses focused, even for lengthy processes.
Real-World Scenario
You need to create an ETL pipeline that extracts SpaceX data from an API, cleans it, and loads it into Snowflake. By chaining prompts, You can build each step sequentially.
"Generate Python code to extract SpaceX data from an API, with retry logic for failed requests."
"Add a transformation step to remove duplicates, handle missing values, and format timestamps."
"Write efficient code to load the transformed data into Snowflake, with logging for each step."
Common Pitfalls
Avoid asking ChatGPT to generate entire workflows in one go. Break complex tasks into smaller steps for better accuracy. Add logging in each step to track the pipeline's status, especially in production environments.
Final Thoughts
In the past, you might have spent hours searching StackOverflow for solutions. Now, with ChatGPT, you can get answers and code snippets in minutes.
But AI isn't here to take your job. It's here to make you more productive. It is here to help you accomplish more in less time.
The key to working effectively with AI is clarity on your goal. AI can support you, but it doesn't replace technical excellence.
Stupid prompt = stupid results
Understanding your goals and clearly defining the desired outcome is a valuable skill. Let ChatGPT handle the repetitive coding tasks so you can focus on the bigger picture and make strategic decisions.
Summary
AI tools like ChatGPT are here to make you more productive, not replace you. Instead of losing hours searching StackOverflow, you can get precise solutions in seconds. You can unlock ChatGPT's full potential for coding, debugging, and optimisation by mastering clear, detailed prompts.
Effective AI use begins with knowing what you want. When you're specific about your goals, ChatGPT becomes a powerful assistant, handling repetitive code so you can focus on the bigger picture.
In this guide, I showed you practical ways to leverage ChatGPT to create smarter workflows in data engineering. With each prompt, you save time and sharpen your own understanding.
Let AI handle the heavy lifting so you can focus on making a bigger impact.
But this is just the beginning. Next week, I will show you how to code with AI without a browser.
If you have enjoyed the newsletter, please show some love on LinkedIn and Threads or forward it to your friends. It really does help!
How Am I Doing?
I love hearing you. How am I doing with Data Gibberish? Is there anything you’d like to see more or less? Which aspects of the newsletter do you enjoy the most?
Use the links below, or even better, hit reply and say “Hello”. Be honest!