Skip to main content

Guide: Fact Tables vs Dimension Tables

The fact/dimension distinction is the fundamental organizing principle of the data warehouse. Every table in a dimensional model belongs to one category or the other — and confusing the two categories is the most common source of performance problems, incorrect aggregations, and unmaintainable analytical schemas.

This guide provides a definitive reference for fact and dimension table design, with specific rules, common mistakes, and SQL examples from the GSSC supply chain domain.


Fact Tables: The Measurements

What belongs in a fact table

AlwaysNever
Foreign keys to dimension tablesDescriptive text (names, labels, descriptions)
Additive numeric measuresDerived calculations (store components, compute ratios at query time)
Semi-additive measures (with documentation)Business keys (store in the corresponding dimension)
Degenerate dimensions (order number, invoice ID)Date timestamps as raw TIMESTAMPTZ (use a date key integer)
Date key integers (FK to dim_date)

The four types of measures

Fully additive: Can be SUM()-ed across every dimension without producing incorrect results.

-- Correct: sum revenue across any dimension combination
SELECT
ds.country,
dd.year,
SUM(fs.line_revenue_usd) AS total_revenue
FROM fact_sales fs
JOIN dim_supplier ds ON fs.supplier_key = ds.supplier_key
JOIN dim_date dd ON fs.order_date_key = dd.date_key
GROUP BY ds.country, dd.year;

Examples: line_revenue_usd, quantity_ordered, cost_of_goods_usd, emissions_kg

Semi-additive: Can be summed across some dimensions, not others. Requires careful query design.

-- WRONG: SUM of inventory across dates is meaningless
-- (adds Monday stock to Tuesday stock to Wednesday stock)
SELECT SUM(quantity_on_hand) AS total_stock -- INCORRECT across time
FROM fact_inventory_snapshot;

-- CORRECT: Latest snapshot per product per warehouse
SELECT
product_key,
warehouse_key,
quantity_on_hand
FROM fact_inventory_snapshot
WHERE snapshot_date_key = (SELECT MAX(snapshot_date_key)
FROM fact_inventory_snapshot);

-- CORRECT: Sum across warehouses for a single date (is additive across space)
SELECT
dw.region,
SUM(fi.quantity_on_hand) AS regional_stock
FROM fact_inventory_snapshot fi
JOIN dim_warehouse dw ON fi.warehouse_key = dw.warehouse_key
WHERE fi.snapshot_date_key = 20251231
GROUP BY dw.region;

Examples: quantity_on_hand, account balances, headcount

Non-additive: Never summed. Always computed from additive components.

-- WRONG: Averaging unit prices across rows produces nonsense
SELECT AVG(unit_price_usd) AS avg_price -- MISLEADING

-- CORRECT: Compute weighted average from its components
SELECT
SUM(line_revenue_usd) / SUM(quantity_ordered) AS weighted_avg_price
FROM fact_sales;

Examples: unit_price, ratios, percentages, rates

Derived (computed at query time): Do not store pre-computed ratios in fact tables.

-- WRONG: Storing gross_margin_pct in the fact table
gross_margin_pct DECIMAL(6,4) -- Never store this

-- CORRECT: Store components; compute ratio in reporting layer
gross_margin_usd = line_revenue_usd - cost_of_goods_usd
-- gross_margin_pct = SUM(gross_margin_usd) / SUM(line_revenue_usd) at query time

Dimension Tables: The Context

What belongs in a dimension table

AlwaysNever
Surrogate integer primary keyMeasures (numeric facts)
Business/natural key (for ETL)Foreign keys to fact tables
All descriptive attributes for filtering/groupingAnything requiring real-time calculation
Hierarchy attributes (flat or snowflaked)Columns with null values as the common case
SCD tracking columns (if applicable)

Wide and flat: the guiding principle

Dimension tables should be wide (many columns) and flat (minimal joins). When a customer is classified by segment, region, and tier — store all three directly in dim_customer. Do not create a dim_customer_segment table and reference it with a foreign key unless the segment table has many rows (>10,000) and is queried independently as a reporting dimension.

-- WRONG (over-normalized): two joins for a simple customer query
SELECT
cs.segment_name,
cr.region_name,
SUM(fs.line_revenue_usd)
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_key = dc.customer_key
JOIN dim_customer_segment cs ON dc.segment_key = cs.segment_key -- unnecessary
JOIN dim_customer_region cr ON dc.region_key = cr.region_key -- unnecessary
GROUP BY cs.segment_name, cr.region_name;

-- CORRECT (flat dimension): one join
SELECT
dc.customer_segment,
dc.region,
SUM(fs.line_revenue_usd)
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_key = dc.customer_key
GROUP BY dc.customer_segment, dc.region;

Generating Fact and Dimension Tables with TalkingSchema

Ask for grain-first fact table design

Design a fact table for purchase order management.
Grain: one row per purchase order line item.

Business keys from source: purchase_orders.po_id, purchase_order_items.item_id
Available dimensions: dim_date, dim_supplier, dim_product, dim_warehouse

Measures needed:
- Ordered quantity
- Received quantity
- Unit cost
- Total line cost
- Days late or early (received vs expected) — degenerate or derived?

Identify: which measures are additive, semi-additive, or should be derived at query time.

Ask for a wide, flat dimension design

Design a dim_customer dimension table.
Source: OLTP customers table.

Requirements:
- SCD Type 2 tracking for tier changes
- Derived columns: credit_band from credit_limit_usd ('Low' < $10k, 'Medium' $10k–$100k, 'High' > $100k)
- Regional grouping (country → region → global_region)
- All attributes flat in one table — no separate dim_region child table

Include: surrogate key, business key, all descriptive columns, SCD columns.

Frequently Asked Questions

How many rows should a dimension table have?

Most dimension tables are small — thousands to tens of thousands of rows. dim_date is pre-populated with ~3,650 rows for 10 years. dim_product might have 50,000 SKUs. dim_customer might have 100,000 enterprise customers. Dimension tables with millions of rows are unusual and may indicate a design problem — they may be confusing a dimension with a fact.

Should I store totals in the fact table?

No. Pre-aggregated totals (daily revenue, monthly total) belong in aggregate tables or materialized views — not in the transactional fact table. The fact table always holds atomic grain; aggregation happens in the reporting or BI layer.

What is the role of unknown members in dimension tables?

When a fact row cannot be assigned a valid dimension key — for example, a shipment with no known carrier — it references a special unknown member row in the dimension table (surrogate key = -1 by convention). Never use NULL for a foreign key in a fact table; NULLs break GROUP BY and aggregate correctness.