py-sql-cleaner

First, get a read-only inventory of SQL hidden in Python files.

Find embedded SQL

Find triple-quoted strings that look like SQL, then separate safe blocks from dynamic templates before cleanup begins.

Run

py-sql-cleaner list jobs/load_users.py

Maps the cleanup surface

Shows file, variable name, line count, and whether the block is safe to rewrite.

Separates safe from risky

Plain strings are marked safe. f-strings and Jinja-like templates are visible but skipped.

Read-only by design

The list command changes nothing, so you can inspect a project before choosing a cleanup command.

Input Python and scan result

Example inputjobs/load_users.py

# jobs/load_users.py
users_query = """
select user_id, email from analytics.users
"""

sessions_query = f"""
select * from sessions where ds = '{run_date}'
"""

py-sql-cleaner list jobs/load_users.py

jobs/load_users.py:14users_query18 linessafe

jobs/load_users.py:42events_sql24 linessafe

jobs/load_users.py:76sessions_query31 linesskip · f-string

jobs/load_users.py:118orders_sql12 linessafe

jobs/load_users.py:144cohort_query44 linesskip · jinja

Found 5 blocks3 safe2 skipped

Clean up embedded SQL without moving it out of the Python file.

Format SQL inside Python

Rewrite only the SQL text inside a Python string. Use it when the query should stay embedded but still needs readable formatting.

Run

py-sql-cleaner format jobs/load_paid_users.py --dry-run

Keeps the query in place

Only the SQL body changes. The Python variable and surrounding code stay where they are.

Preview before writing

Dry-run shows the diff. Check mode makes the same rule usable in CI.

Protects dynamic SQL

Runtime-sensitive strings are skipped by default instead of being reformatted blindly.

Before and after

Beforejobs/load_paid_users.py

query = """
select u.user_id,u.email,count(o.order_id) as paid_orders,
sum(o.amount) as revenue
from analytics.users u join analytics.orders o on o.user_id=u.user_id
where o.status='paid' and o.created_at>=dateadd(day,-30,current_date)
group by u.user_id,u.email
having sum(o.amount)>100
order by revenue desc limit 50
"""

format

Afterjobs/load_paid_users.py

query = """
SELECT
  u.user_id,
  u.email,
  COUNT(o.order_id) AS paid_orders,
  SUM(o.amount) AS revenue
FROM analytics.users AS u
JOIN analytics.orders AS o
  ON o.user_id = u.user_id
WHERE
  o.status = 'paid'
  AND o.created_at >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY
  u.user_id,
  u.email
HAVING SUM(o.amount) > 100
ORDER BY revenue DESC
LIMIT 50
"""

Move large queries from Python into real SQL files.

Extract to .sql

Use extraction when a query should be reviewed as SQL, not as a giant Python string. Python keeps a small reference; SQL gets its own file.

Run

py-sql-cleaner extract jobs/load_users.py --out-dir sql

Shrinks the Python file

The long string becomes a path reference, so the job code is easier to scan.

Creates a real SQL artifact

The query gets its own SQL file, with cleaner diffs and normal editor support.

Formats while extracting

Safe blocks are formatted as they move. Unsafe blocks remain visible but untouched.

Python file becomes Python + SQL file

Beforejobs/load_users.py

# jobs/load_users.py
load_users_query = """
  select user_id, updated_at
  from analytics.users
  qualify row_number() over(
    partition by user_id
    order by updated_at desc
  ) = 1
"""

extract

Afterjobs/load_users.py

# jobs/load_users.py
load_users_query = "sql/load_users.sql"

New SQL filesql/load_users.sql

-- sql/load_users.sql
SELECT
  user_id,
  updated_at
FROM analytics.users
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY updated_at DESC
) = 1

Three cleanup jobs, one workflow

Find embedded SQL

Format SQL inside Python

Extract to .sql