8 Ways to Improve Query Optimization in Snowflake (With Examples)

Published: April 10, 2026

down-chevron

Nikesh Vora

Technical Product Manager @ Coefficient

Desktop Hero Image Mobile Hero Image

Snowflake handles massive data volumes well. But slow queries almost always come down to the same handful of root causes like bad pruning, wrong warehouse size, disk spillage, inefficient joins. Each one has a clear fix.

This guide covers 8 query optimization techniques ordered by typical impact, with concrete SQL examples for each. It is written for data engineers who know Snowflake and want actionable techniques rather than a conceptual overview.

Before You Optimize: Use the Query Profile

The biggest mistake in Snowflake optimization is fixing the wrong thing. Before touching any query, open the Query Profile in Snowsight and look at the Most Expensive Nodes section. This tells you exactly which operation is the bottleneck — TableScan, Sort, Join, Aggregate — so you know where to focus.

Three stats to check immediately:

  • Partitions Scanned vs Partitions Total — if these are close, pruning is not working.
  • Bytes Spilled to Local/Remote Storage — any spillage means the warehouse ran out of memory.
  • Queuing time — if the query spent most of its time waiting, the problem is concurrency, not the query itself.

You can also pull this data programmatically:

SELECT query_id, query_text, warehouse_size,

       partitions_scanned, partitions_total,

       bytes_spilled_to_local_storage,

       bytes_spilled_to_remote_storage,

       queued_overload_time,

       total_elapsed_time

FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())

WHERE start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())

ORDER BY total_elapsed_time DESC

LIMIT 20;

Identify the bottleneck first. Then apply the relevant technique below.

1. Stop Using SELECT *

Snowflake uses columnar storage. Every column you select is a column that gets scanned from disk. SELECT * forces Snowflake to read every column even when downstream logic only uses three of them. On wide tables this adds significant scan time.

— Before: scans every column
SELECT * FROM orders WHERE order_date >= ‘2026-01-01’;

— After: scans only what is needed
SELECT order_id, customer_id, amount, status
FROM orders
WHERE order_date >= ‘2026-01-01’

One nuance worth knowing: SELECT * inside a CTE is fine if the outer query specifies columns explicitly. Snowflake’s optimizer handles this correctly and will not scan unused columns in that pattern.

2. Apply WHERE Filters Early and on Clustered Columns

Snowflake uses predicate pushdown. WHERE filters are applied at the storage layer, which lets Snowflake skip entire micro-partitions rather than scanning the full table. This is called partition pruning — and it is one of the highest-impact optimisations available.

Pruning only works when the column being filtered is naturally ordered within the table’s micro-partitions. Date columns often are. A column like customer_id typically is not.

Check if pruning is working by comparing Partitions Scanned to Partitions Total in the Query Profile. If they are close, pruning is not effective and clustering may help.

— Check clustering depth on a table
SELECT SYSTEM$CLUSTERING_INFORMATION(‘orders’, ‘(order_date)’);

— Define a clustering key for a frequently filtered column
ALTER TABLE orders CLUSTER BY (order_date);

— Or cluster by an expression for higher cardinality columns
ALTER TABLE orders CLUSTER BY (YEAR(order_date), region);

Clustering is worth the maintenance cost only on very large tables — typically billions of rows or multiple terabytes. Snowflake’s Automatic Clustering runs background reclustering and bills at standard compute credit rates. On smaller tables the overhead outweighs the benefit. Check SYSTEM$CLUSTERING_INFORMATION before committing to a clustering key.

3. Right-Size Your Virtual Warehouse

Warehouse size is the most direct lever on query speed — but bigger is not always better and always costs more. Understanding when to scale up vs when not to matters.

Credit consumption by warehouse size (per hour):

  • XS: 1 credit — lightweight queries, low concurrency
  • S: 2 credits — standard analyst workloads
  • M: 4 credits — moderate ETL and BI queries
  • L: 8 credits — complex joins, large aggregations
  • XL: 16 credits — heavy batch processing
  • 2XL and above: 32+ credits — scale only when the bottleneck is memory or parallelism

Scale up when: queries are spilling to disk (the warehouse has run out of memory), or when complex joins and sorts are slow on large datasets. Scaling up typically halves query time, keeping total credit cost roughly the same while delivering a much faster result.

Do not scale up when: the problem is queuing. Queuing means multiple queries are competing for the same warehouse. The fix is a multi-cluster warehouse or dedicated warehouses by workload — not a larger single warehouse.

— Compare credit cost across warehouse sizes for the same query
SELECT warehouse_size,
AVG(total_elapsed_time) / 1000 AS avg_seconds,
AVG(credits_used_cloud_services) AS avg_credits
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_text ILIKE ‘%your_query_pattern%’
GROUP BY warehouse_size
ORDER BY avg_seconds;

4. Eliminate Disk Spillage

When a query’s intermediate results exceed the virtual warehouse’s available memory, Snowflake spills to local SSD — which is slower. When SSD is also exhausted, it spills to remote object storage — which is much slower. A query that spills to remote storage can take 10x longer than the same query with sufficient memory.

— Find queries with significant spillage in the last 24 hours
SELECT query_id, query_text, warehouse_size,
bytes_spilled_to_local_storage,
bytes_spilled_to_remote_storage,
total_elapsed_time / 1000 AS elapsed_seconds
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
AND bytes_spilled_to_remote_storage > 0
ORDER BY bytes_spilled_to_remote_storage DESC;

Two fixes depending on the cause:

  • Increase the warehouse size — the most direct fix when a single query simply needs more memory for a large join or sort.
  • Rewrite the query to reduce intermediate row counts — filter earlier, avoid unnecessary joins, aggregate before joining rather than after.
Any bytes_spilled_to_remote_storage > 0 should be treated as a critical issue. Remote spillage typically explains the longest-running queries on a Snowflake account.

5. Avoid Cartesian Products and Accidental Cross Joins

A missing or mismatched join key causes a Cartesian product — every row from the left table multiplied by every row from the right. 10,000 rows × 10,000 rows = 100,000,000 rows processed. At scale this destroys performance and inflates credit costs fast.

— Before: missing join condition causes cross join
SELECT a.customer_id, b.order_id
FROM customers a, orders b
WHERE a.region = ‘EMEA’;

— After: explicit join with correct condition
SELECT a.customer_id, b.order_id
FROM customers a
JOIN orders b ON a.customer_id = b.customer_id
WHERE a.region = ‘EMEA’;

How to spot it in the Query Profile: look for a row count in the output that is a suspiciously large multiple of both input tables, or an abnormally high Rows Produced figure in a Join node. In QUERY_HISTORY, rows_produced will be orders of magnitude higher than expected for the result set.

6. Use CTEs Carefully — They Are Not Automatically Materialised

This is the most misunderstood Snowflake optimisation. CTEs are not cached or materialised by default in Snowflake. If you reference the same CTE more than once in a query, Snowflake may execute it multiple times — once for each reference. For an expensive CTE (large aggregation, complex join) this multiplies the compute cost.

— Risky: expensive_cte may execute twice
WITH expensive_cte AS (
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id
)
SELECT a.customer_id, a.total_spend, b.segment
FROM expensive_cte a
JOIN expensive_cte b ON a.customer_id = b.customer_id
WHERE a.total_spend > 10000;

— Better: materialise once as a temporary table
CREATE TEMPORARY TABLE temp_customer_spend AS
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id;

SELECT a.customer_id, a.total_spend, b.segment
FROM temp_customer_spend a
JOIN temp_customer_spend b ON a.customer_id = b.customer_id
WHERE a.total_spend > 10000;

CTEs referenced only once are fine and generally improve readability. The issue only arises when a CTE is expensive to compute and is referenced more than once in the same query. Use a temporary table to materialise it once.

7. Use Result Cache and Know When It Won’t Help

Snowflake automatically caches query results for 24 hours at the Cloud Services layer. An identical query on unchanged underlying data returns the cached result at near-zero compute cost. For recurring dashboard queries and scheduled reports, this can eliminate significant warehouse spend.

— Check whether recent queries hit the result cache

SELECT query_id, query_text,

       total_elapsed_time / 1000 AS elapsed_seconds,

       CASE WHEN total_elapsed_time < 500

            AND bytes_scanned = 0 THEN ‘Cache hit’

            ELSE ‘Cache miss’ END AS cache_status

FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())

WHERE start_time >= DATEADD(hour, -1, CURRENT_TIMESTAMP())

ORDER BY start_time DESC;

What breaks the result cache:

  • Any data change in the underlying tables. Even a single row insert invalidates the cache for that table.
  • Any change in the query text including whitespace, capitalisation or comment changes.
  • A different Snowflake role or warehouse executing the same query text.
  • Queries using non-deterministic functions like CURRENT_TIMESTAMP(), RANDOM(), etc.

The practical implication: structure recurring dashboard queries to be byte-for-byte identical across runs. Build them as named queries in your BI tool or as Snowflake Tasks with fixed SQL text. Avoid embedding session variables or timestamps directly in query text when the result is already date-filtered in a WHERE clause.

8. Separate Workloads Across Dedicated Warehouses

Running ETL jobs, BI dashboard queries and ad-hoc analyst exploration on the same warehouse causes queuing. One large overnight ETL job can block analysts waiting for results at 9am. Workload isolation is the fix — dedicated warehouses by type so no workload competes with another.

— Create dedicated warehouses by workload type

CREATE WAREHOUSE etl_wh

    WAREHOUSE_SIZE = ‘LARGE’

    AUTO_SUSPEND = 60

    AUTO_RESUME = TRUE

    COMMENT = ‘ETL and data pipeline jobs’;

 

CREATE WAREHOUSE bi_wh

    WAREHOUSE_SIZE = ‘MEDIUM’

    AUTO_SUSPEND = 120

    AUTO_RESUME = TRUE

    COMMENT = ‘BI tool and dashboard queries’;

 

CREATE WAREHOUSE analyst_wh

    WAREHOUSE_SIZE = ‘SMALL’

    AUTO_SUSPEND = 60

    AUTO_RESUME = TRUE

    COMMENT = ‘Ad-hoc analyst exploration’;

 

— Route a scheduled task to the correct warehouse

CREATE OR REPLACE TASK daily_revenue_transform

    WAREHOUSE = etl_wh

    SCHEDULE = ‘USING CRON 0 2 * * * UTC’

AS

    INSERT INTO revenue_summary

    SELECT DATE_TRUNC(‘day’, order_date) AS day,

           SUM(amount) AS total_revenue

    FROM orders

    GROUP BY 1;

 

ALTER TASK daily_revenue_transform RESUME;

Auto-suspend keeps idle warehouses from burning credits between runs. Set auto-suspend to 60 seconds for ETL warehouses that run on a schedule and 120 to 300 seconds for BI warehouses where users expect immediate response. See Snowflake Tasks documentation for scheduling options including cron syntax and dependency chaining.

Reduce Ad-Hoc Query Load by Getting Data to Business Teams Directly

Every ad-hoc query a business user fires against Snowflake consumes warehouse credits. For finance managers, RevOps leads and ops teams whose recurring data needs are consistent and predictable reports and analysis like pipeline by week, revenue by region, headcount by department — these one-off queries add up fast and create a backlog for the data team.

Coefficient’s Snowflake connector pulls live Snowflake data into Google Sheets and Excel on a scheduled auto-refresh. Business teams get live data in the tools they already use, without filing tickets or querying the warehouse directly. Recurring data needs become scheduled Coefficient refreshes rather than ad-hoc warehouse queries which reduces both credit consumption and analyst toil.

The Bottom Line

Most Snowflake query performance problems trace back to the same root causes: bad pruning, wrong warehouse size, disk spillage, inefficient joins and repeated CTE computation. Diagnose with the Query Profile before touching anything. Fix the biggest bottleneck first and measure credit consumption before and after.

For teams looking to reduce the volume of ad-hoc queries hitting the warehouse, Coefficient’s Snowflake connector gives business users live, scheduled data access in spreadsheets.