Skip to main content

Python embedded SQL cleanup

py-sql-cleaner

Clean up Python files that hide SQL in triple-quoted strings. Scan them, format safe queries, or extract long queries into reviewable SQL files.

Step 1 · Beforejobs/load_paid_users.py
query = """
select user_id, sum(amount)
from analytics.orders
where status='paid'
"""
Step 2 · Runextract command
py-sql-cleaner extract jobs/load_paid_users.py --out-dir sql
Step 3 · AfterPython reference + new SQL file
jobs/load_paid_users.py
query = "sql/paid_users.sql"
sql/paid_users.sql
SELECT
  user_id,
  SUM(amount)
FROM analytics.orders
WHERE status = 'paid'
GROUP BY user_id
How it works

Three cleanup jobs, one workflow

py-sql-cleaner targets the cleanup work that piles up after a Python job has accumulated large embedded queries. Find the strings, format the safe ones in place, and move review-worthy queries into real SQL files.

It does not execute SQLThe tool only reads and rewrites files. There is no database connection, credential handling, or query execution path.
It starts with inventoryList shows which embedded queries exist and which ones are safe before any rewrite command runs.
It keeps risky templates visiblef-strings and Jinja-like SQL are reported as skipped so cleanup work does not hide runtime behavior.
01

First, get a read-only inventory of SQL hidden in Python files.

Find embedded SQL

Find triple-quoted strings that look like SQL, then separate safe blocks from dynamic templates before cleanup begins.

Run
py-sql-cleaner list jobs/
Maps the cleanup surface

Shows file, variable name, line count, and whether the block is safe to rewrite.

Separates safe from risky

Plain strings are marked safe. f-strings and Jinja-like templates are visible but skipped.

Read-only by design

The list command changes nothing, so you can inspect a project before choosing a cleanup command.

Input Python and scan result
Example inputjobs/*.py
# jobs/load_users.py
users_query = """
select user_id, email from analytics.users
"""

# jobs/sessions.py
sessions_query = f"""
select * from sessions where ds = '{run_date}'
"""
py-sql-cleaner list jobs/
jobs/load_users.py:14users_query18 linessafe
jobs/load_users.py:42events_sql24 linessafe
jobs/sessions.py:11sessions_query31 linesskip · f-string
jobs/orders.py:67orders_sql12 linessafe
jobs/cohorts.py:29cohort_query44 linesskip · jinja
Found 5 blocks3 safe2 skipped
02

Clean up embedded SQL without moving it out of the Python file.

Format SQL inside Python

Rewrite only the SQL text inside a Python string. Use it when the query should stay embedded but still needs readable formatting.

Run
py-sql-cleaner format jobs/load_paid_users.py --dry-run
Keeps the query in place

Only the SQL body changes. The Python variable and surrounding code stay where they are.

Preview before writing

Dry-run shows the diff. Check mode makes the same rule usable in CI.

Protects dynamic SQL

Runtime-sensitive strings are skipped by default instead of being reformatted blindly.

Before and after
Beforejobs/load_paid_users.py
query = """
select u.user_id,u.email,count(o.order_id) as paid_orders,
sum(o.amount) as revenue
from analytics.users u join analytics.orders o on o.user_id=u.user_id
where o.status='paid' and o.created_at>=dateadd(day,-30,current_date)
group by u.user_id,u.email
having sum(o.amount)>100
order by revenue desc limit 50
"""
format
Afterjobs/load_paid_users.py
query = """
SELECT
  u.user_id,
  u.email,
  COUNT(o.order_id) AS paid_orders,
  SUM(o.amount) AS revenue
FROM analytics.users AS u
JOIN analytics.orders AS o
  ON o.user_id = u.user_id
WHERE
  o.status = 'paid'
  AND o.created_at >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY
  u.user_id,
  u.email
HAVING SUM(o.amount) > 100
ORDER BY revenue DESC
LIMIT 50
"""
03

Move large queries from Python into real SQL files.

Extract to .sql

Use extraction when a query should be reviewed as SQL, not as a giant Python string. Python keeps a small reference; SQL gets its own file.

Run
py-sql-cleaner extract jobs/load_users.py --out-dir sql
Shrinks the Python file

The long string becomes a path reference, so the job code is easier to scan.

Creates a real SQL artifact

The query gets its own SQL file, with cleaner diffs and normal editor support.

Formats while extracting

Safe blocks are formatted as they move. Unsafe blocks remain visible but untouched.

Python file becomes Python + SQL file
Beforejobs/load_users.py
# jobs/load_users.py
query = """
  select user_id, updated_at
  from analytics.users
  qualify row_number() over(
    partition by user_id
    order by updated_at desc
  ) = 1
"""
extract
Afterjobs/load_users.py
# jobs/load_users.py
query = "sql/load_users.sql"
New SQL filesql/load_users.sql
-- sql/load_users.sql
SELECT
  user_id,
  updated_at
FROM analytics.users
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY updated_at DESC
) = 1