13 Command-Line Tools to 10x Your Productivity as a Data Engineer
If you work with data and live in the terminal, these CLI tools belong in your daily toolkit. Lightweight, fast, and built for real productivity.
Greetings, Data Engineer,
Navigating massive data workflows using the terminal can feel slow and inefficient. Without the right command-line tools, your productivity takes a hit—and your workflow becomes chaotic.
In this list I’ll show you powerful, lightweight CLI tools that help you move faster, automate tasks, and explore data seamlessly.
How much do you spend for learning and development per year
🧩 1. jq – JSON Processor
jq is a lightweight command-line tool for parsing, filtering, and transforming JSON. It’s designed for speed and simplicity, giving you the ability to reshape JSON in real time—without needing to spin up a Python script or import pandas.
As a data engineer, JSON is everywhere. APIs return it. Log aggregators output it. Metadata services store it.
jq
makes it effortless to extract what you need and discard the rest.
Whether you're quickly debugging an API response or chaining transformations in a shell script, jq
keeps you in flow.
You stay in the terminal. You stay fast.
Example: Let’s say you’re testing an external data source and want to preview the title of a blog post from a mock API. Try this:
curl -s https://jsonplaceholder.typicode.com/posts/1 | jq '.title'
It’s one line, but it scales. Pipe it into xargs
, combine with awk
, or feed it into a scheduler. This tool turns messy JSON into actionable input.
🌐 2. httpie – User-Friendly HTTP Client
httpie is a modern HTTP client for the terminal. It simplifies API testing with readable output and intuitive syntax.
httpie
beats curl
when you want to quickly test webhooks or check response payloads during data ingestion.
When testing APIs, most people would use Postman, but if you, like myself, live in the terminal, httpie
is the best.
Example: Run this to see a blog post from a public fake API:
http GET https://jsonplaceholder.typicode.com/posts/1
Or simulate a POST request:
http POST https://jsonplaceholder.typicode.com/posts title="CLI Tools" body="This is great" userId:=1
💾 3. pgcli – PostgreSQL CLI with Auto-Completion
pgcli adds tab-completion and syntax highlighting to your Postgres sessions. It helps you move faster across large schemas.
pgcli
reduces typos and boosts efficiency when querying your data warehouse or staging DB. When using pgcli
you are blazingly fast compared to old-school psql
.
Example: Spin up a local Postgres container:
docker run --name pg -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres
Connect using:
pgcli -h localhost -U postgres
Now just start typing a query and hit TAB:
SELECT * FROM pg_<TAB>
Want the same functionality with MySQL, Vertica, or another database? Check DBCLI.
🔎 4. fzf – Fuzzy Finder
fzf is an interactive fuzzy finder. It helps you instantly search files, commands, or text in large codebases.
With fzf
, you’ll find anything faster—SQL scripts, logs, even past shell history. And previewing files in the terminal looks like magic to others.
Example: Search all .sql
files:
find . -name "*.sql" | fzf
To search your command history:
history | fzf
It’s a rapid way to rerun commands without digging.
Aside from that, fzf
has some nice keyboard shortcuts and functionalities like file preview.
🦇 5. bat – Enhanced cat Command
bat
replaces cat
with a better interface—showing syntax highlighting, line numbers, and Git changes.
bat
makes reviewing YAMLs, SQL files, and Python scripts easier on the eyes.
Example: View a file with syntax highlighting:
bat ~/.bashrc
It detects the file type and colours it accordingly.
ProTip: Replace cat
with bat
and you won’t need to learn the new command.
🚀 6. starship – Customisable Prompt
starship is a fast, minimal prompt. It adds Git status, DB names, and language versions directly to your CLI prompt.
starship
gives you real-time context without extra commands—perfect when switching projects or environments.
Example: Install with:
curl -sS https://starship.rs/install.sh | sh
Then add to your shell config:
eval "$(starship init zsh)" # or bash (I use fish, btw)
Now you have all the info about your projects’ tech stack just by cd
-ing in them.
📂 7. zoxide – Smart Directory Jumper
zoxide remembers where you go. It replaces cd
with smarter navigation based on your habits.
Navigating between code, logs, and config folders becomes lightning fast. Just type a part of the directory you want to go to, and zoxide
will read your mind and magically get you there .
Example: Install and configure:
curl -sS https://raw.githubusercontent.com/ajeetdsouza/zoxide/main/install.sh | bash
eval "$(zoxide init zsh --cmd cd)"
Now jump like this:
cd airflow
It finds the directory based on your usage history.
⚙️ 8. direnv – Environment Variable Manager
My buddy Alex Campbell showed me this tool 5+ years ago. Now, I can’t live without it.
direnv automatically loads environment variables when you enter a directory.
direnv
helps you manage different project setups—DB creds, API keys, configs—without manual sourcing.
Example: In a project folder:
echo 'export DATABASE_URL=postgres://postgres:pass@localhost:5432/mydb' > .envrc
direnv allow
Now, every time you cd
into that folder, your environment is ready to go.
📝 9. lnav – Log File Navigator
lnav provides a terminal-based UI for browsing logs with SQL queries and filters.
It speeds up debugging by letting you inspect logs directly—no need to import them elsewhere.
Example: Try with system logs:
lnav /var/log/syslog # or any other log file
Or write a SQL query to filter:
SELECT * FROM syslog_log WHERE log_level = 'error'
🦙 10. ollama – AI-Powered Terminal Assistant
ollama lets you run LLMs like Code Llama locally from your terminal—without sending data to the cloud.
For you, as a data engineer, ollama
is important, because it allows you to use AI against your data in a secure sandbox.
his means you can build smart data products without the risk of exposing sensitive data.
Example: After installing and downloading a model:
ollama run codellama
Then prompt:
Write a bash script to back up a Postgres DB every night.
Instant bash automation, no browser required.
😎 11. visidata – Terminal Spreadsheet
visidata is an interactive CLI spreadsheet viewer. Load CSVs, TSVs, SQLite, or even JSON.
You can profile and explore datasets without loading them into Spark or Pandas. As a bonus, you can you your favourite vim motions.
Example: Try with a CSV:
curl -O https://people.sc.fsu.edu/~jburkardt/data/csv/hw_200.csv
vd hw_200.csv
You’ll see an interactive table where you can sort, summarise, and filter.
♻️ 12. delta – Git Diff Highlighter
delta enhances git diff
with colour, syntax highlighting, and line indicators.
With delta, you can spot schema changes or logic bugs instantly. It’s ideal for reviewing SQL or Python pipelines.
Example: Pipe a diff through it:
git diff | delta
Much easier than parsing raw output.
🔄 13. tmux – Terminal Multiplexer
tmux lets you split your terminal into multiple panes and manage long-running sessions. Everything stays live—even across reboots or dropped SSH connections.
With tmux
paired to a launcher like tmuxinator, you just need to type mux
to spin up entire project environments instantly.
Your dbt workspace might have Neovim on one side and a terminal running tests on the other. Your Airflow setup might show scheduler logs, DAG code, and a CLI tool—all in separate windows, ready to go.
Example: Instead of manually opening terminals, just run:
mux dbt
Now you’re in a full dbt dev environment with a vertical split: code on the left, terminal on the right. Another window handles tests or builds.
Switch context?
mux etl
You’re now in an ETL-focused workspace—one pane for logs, another for debugging Python scripts, another for editing DAGs. Each layout is defined once and re-used forever.
This isn’t just multitasking. It’s mental clarity.
Each project lives in its own contained universe. Nothing gets mixed up. You focus, build, and switch environments without losing flow.
🚀 Optimising Their Use
Combine these tools for powerful workflows. Use fzf
to find SQL scripts, then open them with bat
. Launch pgcli
in a tmux
pane while monitoring logs in another. Use ollama
to generate new bash scripts, and direnv
to auto-load secrets.
Set up aliases. Add functions to your .zshrc
. Build a workflow where everything you need is one keystroke away.
Did you know the Data Gibberish community dives into a new data topic every day?
💭 Final Thoughts
The terminal is your cockpit. These tools don’t just save time—they shape how you build.
They help you stay in flow, move with precision, and build data platforms that run smooth.
Your stack doesn’t need to be fancy. Just fast. Just fluid. Just yours.
Got a tool I missed? I want to hear about it.
Cheers,
😍 How Am I Doing?
Your feedback shapes Data Gibberish. Vote now to improve the publication.