Snowflake Schema
The snowflake schema is a normalized extension of the star schema — chosen when the analytical benefits of structured dimension hierarchies outweigh the query performance cost of additional joins. Understanding precisely when to snowflake (and when not to) is one of the most important judgment calls in data warehouse design, and one that separates experienced dimensional modelers from those who inadvertently rebuild a 3NF OLTP schema inside their data warehouse.
TalkingSchema's AI copilot generates snowflake schemas with properly structured dimension hierarchies, generates the correct multi-join query patterns, and helps you make the star-versus-snowflake decision with clear trade-off visibility.
Star Schema vs. Snowflake Schema: The Trade-Off
| Dimension | Star Schema | Snowflake Schema |
|---|---|---|
| Dimension table structure | Single denormalized table | Multiple normalized tables |
| Query joins | Fewer (1 join per dimension) | More (multiple per hierarchy) |
| Query complexity | Low | Medium–High |
| Storage | More (redundant attribute values) | Less (normalized) |
| ETL complexity | Lower | Higher |
| BI tool compatibility | Excellent | Good (with proper metadata layer) |
| Maintenance | Easier | More complex |
| Recommended for | Most analytics workloads | Large hierarchical dimensions, cost-sensitive platforms |
The 2026 consensus: For cloud data warehouses (Snowflake, BigQuery, Databricks, Redshift), the star schema is almost always the better choice. Compute is cheap; developer time is not. Snowflake the company uses star schemas in its own example datasets.
The snowflake schema is worth considering when:
- A dimension table exceeds 10–20 million rows (unusual — most dimensions are small)
- A dimension hierarchy has 4+ levels used independently in reporting
- Your organization has a strict normalization governance mandate for the warehouse layer
Example: GSSC Snowflake Schema
Taking the GSSC star schema and normalizing the product and supplier dimensions:
-- ============================================================
-- GSSC Analytics: Snowflake Schema
-- Normalized dimensions for product and supplier hierarchies
-- ============================================================
-- ── Product Category Hierarchy ───────────────────────────────
-- Level 1: Product Category
CREATE TABLE dim_product_category (
category_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
category_name VARCHAR(100) NOT NULL UNIQUE,
category_group VARCHAR(50) -- Higher-level grouping: 'Industrial', 'Consumer'
);
-- Level 2: Product (references category)
CREATE TABLE dim_product (
product_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_bk UUID NOT NULL,
sku VARCHAR(50) NOT NULL,
product_name VARCHAR(200) NOT NULL,
category_key INTEGER NOT NULL REFERENCES dim_product_category(category_key),
carbon_score_band VARCHAR(20),
carbon_intensity_score DECIMAL(5, 2),
is_active BOOLEAN NOT NULL,
effective_from DATE NOT NULL,
effective_to DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
-- ── Supplier Hierarchy ───────────────────────────────────────
-- Level 1: Carbon Tier (shared reference table)
CREATE TABLE dim_carbon_tier (
tier_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tier_code VARCHAR(1) NOT NULL UNIQUE, -- A, B, C, D
tier_label VARCHAR(50) NOT NULL, -- 'Tier A — Gold Certified'
min_score DECIMAL(5, 2),
max_score DECIMAL(5, 2),
procurement_weight DECIMAL(5, 4) -- Scoring weight for procurement decisions
);
-- Level 2: Country / Region (shared across supplier and customer dimensions)
CREATE TABLE dim_geography (
geography_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
country VARCHAR(100) NOT NULL,
region VARCHAR(50) NOT NULL, -- 'EMEA', 'APAC', 'AMER'
sub_region VARCHAR(50)
);
-- Level 3: Supplier (references carbon tier and geography)
CREATE TABLE dim_supplier (
supplier_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
supplier_bk UUID NOT NULL,
company_name VARCHAR(200) NOT NULL,
geography_key INTEGER NOT NULL REFERENCES dim_geography(geography_key),
carbon_tier_key INTEGER NOT NULL REFERENCES dim_carbon_tier(tier_key),
certification VARCHAR(100),
is_active BOOLEAN NOT NULL,
effective_from DATE NOT NULL,
effective_to DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
-- ── Fact table unchanged from star schema ────────────────────
-- Fact tables reference the leaf-level dimension keys only.
-- Hierarchy traversal is done in queries via joins to parent tables.
CREATE TABLE fact_sales (
sales_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
order_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
customer_key INTEGER NOT NULL REFERENCES dim_customer(customer_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key), -- leaf
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key), -- leaf
sales_order_bk UUID NOT NULL,
line_item_bk UUID NOT NULL,
quantity INTEGER NOT NULL,
unit_price_usd DECIMAL(12, 4) NOT NULL,
line_revenue_usd DECIMAL(14, 4) NOT NULL,
line_carbon_kg DECIMAL(10, 3)
);
Querying the snowflake schema
The price of normalization is visible in query complexity:
-- Star schema query: Revenue by product category
-- 3 joins
SELECT
pc.category_name,
SUM(fs.line_revenue_usd) AS total_revenue
FROM fact_sales fs
JOIN dim_product dp ON fs.product_key = dp.product_key
JOIN dim_date dd ON fs.order_date_key = dd.date_key
WHERE dd.year = 2025
GROUP BY pc.category_name;
-- Wait — in star schema, category_name is IN dim_product (denormalized)
-- The above query is actually:
SELECT
dp.category AS category_name,
SUM(fs.line_revenue_usd) AS total_revenue
FROM fact_sales fs
JOIN dim_product dp ON fs.product_key = dp.product_key
JOIN dim_date dd ON fs.order_date_key = dd.date_key
WHERE dd.year = 2025
GROUP BY dp.category;
-- Snowflake schema query: Same result, one more join
SELECT
pc.category_name,
SUM(fs.line_revenue_usd) AS total_revenue
FROM fact_sales fs
JOIN dim_product dp ON fs.product_key = dp.product_key
JOIN dim_product_category pc ON dp.category_key = pc.category_key
JOIN dim_date dd ON fs.order_date_key = dd.date_key
WHERE dd.year = 2025
GROUP BY pc.category_name;
Generating a Snowflake Schema
Convert the current GSSC star schema to a snowflake schema.
Normalize the following:
- Extract dim_product_category (category, category_group) from dim_product
- Extract dim_carbon_tier (code, label, procurement_weight) from dim_supplier
- Create a shared dim_geography (country, region, sub_region) referenced
by both dim_supplier and dim_customer
Keep dim_date flat — date hierarchies work better as columns in a star design.
Keep dim_warehouse flat — small table, hierarchy overhead not justified.
Show the complete ERD and generate DDL.
Frequently Asked Questions
Does Snowflake (the company's database) recommend snowflake schemas?
Counterintuitively, no. Snowflake the database platform recommends star schemas for most workloads on its platform. The database's name reflects its early technical architecture, not a recommendation for schema design. The query optimizer handles star schema joins efficiently; additional normalization adds complexity without meaningful performance benefit on Snowflake's columnar engine.
What is the Outrigger pattern?
An outrigger is a dimension table that is referenced by another dimension table (rather than a fact table). It is a snowflake-like normalization of a shared reference domain — such as dim_geography being referenced by both dim_supplier and dim_customer. TalkingSchema's AI generates outriggers when a shared reference domain is identified during snowflaking.
How do BI tools handle snowflake schemas?
Modern BI tools (Tableau, Power BI, Looker, Metabase) handle snowflake schemas with a semantic layer configuration. Looker's explores and Power BI's relationship view both support multi-hop dimension joins. However, misconfigurations in this layer are a common source of incorrect reports. The star schema's simpler join structure reduces this risk.