What is the difference between a fact table and a dimension table?

A fact table stores measurements of business events — quantitative, numeric, and typically large in row count. A dimension table stores the descriptive context that makes those measurements meaningful — who, what, where, when, why. In a sales data warehouse: fact_sales holds revenue, quantity, and discount figures per transaction. dim_customer holds the customer's name, country, tier, and credit classification. Fact tables join to dimension tables via surrogate keys. Fact tables are narrow (mostly foreign keys and measures). Dimension tables are wide (many descriptive columns).

What is an additive measure in a data warehouse?

An additive measure is a numeric column in a fact table that can be correctly summed across all dimensions — time, product, customer, geography, and any other dimension in the model. Revenue and quantity are fully additive. A semi-additive measure can be summed across some dimensions but not others — inventory quantity can be summed across warehouses but not summed across time periods (adding Monday's stock to Tuesday's stock produces a meaningless number). A non-additive measure cannot be meaningfully summed at all — ratios, percentages, and unit prices. Non-additive measures must be computed from their additive components (store numerator and denominator separately, compute the ratio at query time).

What is a surrogate key in a dimension table?

A surrogate key is a system-generated integer primary key assigned by the data warehouse load process — independent of any business or natural key. Surrogate keys are used in dimension tables because: (1) they are compact integers that join faster than UUID or VARCHAR business keys; (2) they enable slowly changing dimension (SCD) tracking — multiple rows for the same entity (same business key) can coexist with different surrogate keys; (3) they insulate the data warehouse from upstream key changes. The business key (customer ID, product SKU) is stored as a separate column for ETL joins and row identification.

¿Cuál es la diferencia entre una tabla de hechos y una tabla de dimensiones?

Una tabla de hechos almacena medidas numéricas de eventos de negocio (como ventas, pedidos, envíos) con claves foráneas a las dimensiones. Una tabla de dimensiones almacena los atributos descriptivos que dan contexto a esas medidas: cliente, producto, tiempo, geografía. La tabla de hechos responde a '¿cuánto?' mientras que las dimensiones responden a '¿quién?, ¿qué?, ¿cuándo?, ¿dónde?'. TalkingSchema genera automáticamente tablas de hechos y dimensiones correctamente diseñadas a partir de requisitos en español.

ファクトテーブルとディメンションテーブルの違いは何ですか？

ファクトテーブルはビジネスイベントの数値的な測定値（売上、注文、出荷など）をディメンションへの外部キーとともに格納します。ディメンションテーブルはその測定値にコンテキストを与える説明的な属性（顧客、製品、時間、地域）を格納します。ファクトテーブルは「いくら？」に答え、ディメンションは「誰が？何を？いつ？どこで？」に答えます。TalkingSchemaは日本語の要件から正しく設計されたファクトテーブルとディメンションテーブルを自動生成します。

事实表和维度表有什么区别？

事实表存储业务事件的数字度量（如销售额、订单量、运输量）以及指向维度的外键。维度表存储为这些度量提供上下文的描述性属性：客户、产品、时间、地理位置。事实表回答'多少？'，而维度表回答'谁？什么？何时？何地？'。TalkingSchema可以从中文需求描述中自动生成正确设计的事实表和维度表。

Guide: Fact Tables vs Dimension Tables

note

For a full walkthrough of star schema design with real fact and dimension tables — including a financial services warehouse with SCD2 and grain declaration — see the Star Schema vs Snowflake Schema blog post.

The fact/dimension distinction is the fundamental organizing principle of the data warehouse. Every table in a dimensional model belongs to one category or the other — and confusing the two categories is the most common source of performance problems, incorrect aggregations, and unmaintainable analytical schemas.

This guide provides a definitive reference for fact and dimension table design, with specific rules, common mistakes, and SQL examples from the GSSC supply chain domain.

Fact Tables: The Measurements

What belongs in a fact table

Always	Never
Foreign keys to dimension tables	Descriptive text (names, labels, descriptions)
Additive numeric measures	Derived calculations (store components, compute ratios at query time)
Semi-additive measures (with documentation)	Business keys (store in the corresponding dimension)
Degenerate dimensions (order number, invoice ID)	Date timestamps as raw TIMESTAMPTZ (use a date key integer)
Date key integers (FK to dim_date)

The four types of measures

Fully additive: Can be SUM()-ed across every dimension without producing incorrect results.

-- Correct: sum revenue across any dimension combination
SELECT
    ds.country,
    dd.year,
    SUM(fs.line_revenue_usd) AS total_revenue
FROM fact_sales fs
JOIN dim_supplier ds ON fs.supplier_key = ds.supplier_key
JOIN dim_date dd ON fs.order_date_key = dd.date_key
GROUP BY ds.country, dd.year;

Examples: line_revenue_usd, quantity_ordered, cost_of_goods_usd, emissions_kg

Semi-additive: Can be summed across some dimensions, not others. Requires careful query design.

-- WRONG: SUM of inventory across dates is meaningless
-- (adds Monday stock to Tuesday stock to Wednesday stock)
SELECT SUM(quantity_on_hand) AS total_stock -- INCORRECT across time
FROM fact_inventory_snapshot;

-- CORRECT: Latest snapshot per product per warehouse
SELECT
    product_key,
    warehouse_key,
    quantity_on_hand
FROM fact_inventory_snapshot
WHERE snapshot_date_key = (SELECT MAX(snapshot_date_key)
                           FROM fact_inventory_snapshot);

-- CORRECT: Sum across warehouses for a single date (is additive across space)
SELECT
    dw.region,
    SUM(fi.quantity_on_hand) AS regional_stock
FROM fact_inventory_snapshot fi
JOIN dim_warehouse dw ON fi.warehouse_key = dw.warehouse_key
WHERE fi.snapshot_date_key = 20251231
GROUP BY dw.region;

Examples: quantity_on_hand, account balances, headcount

Non-additive: Never summed. Always computed from additive components.

-- WRONG: Averaging unit prices across rows produces nonsense
SELECT AVG(unit_price_usd) AS avg_price -- MISLEADING

-- CORRECT: Compute weighted average from its components
SELECT
    SUM(line_revenue_usd) / SUM(quantity_ordered) AS weighted_avg_price
FROM fact_sales;

Examples: unit_price, ratios, percentages, rates

Derived (computed at query time): Do not store pre-computed ratios in fact tables.

-- WRONG: Storing gross_margin_pct in the fact table
gross_margin_pct DECIMAL(6,4) -- Never store this

-- CORRECT: Store components; compute ratio in reporting layer
gross_margin_usd = line_revenue_usd - cost_of_goods_usd
-- gross_margin_pct = SUM(gross_margin_usd) / SUM(line_revenue_usd) at query time

Dimension Tables: The Context

What belongs in a dimension table

Always	Never
Surrogate integer primary key	Measures (numeric facts)
Business/natural key (for ETL)	Foreign keys to fact tables
All descriptive attributes for filtering/grouping	Anything requiring real-time calculation
Hierarchy attributes (flat or snowflaked)	Columns with null values as the common case
SCD tracking columns (if applicable)

Wide and flat: the guiding principle

Dimension tables should be wide (many columns) and flat (minimal joins). When a customer is classified by segment, region, and tier — store all three directly in dim_customer. Do not create a dim_customer_segment table and reference it with a foreign key unless the segment table has many rows (>10,000) and is queried independently as a reporting dimension.

-- WRONG (over-normalized): two joins for a simple customer query
SELECT
    cs.segment_name,
    cr.region_name,
    SUM(fs.line_revenue_usd)
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_key = dc.customer_key
JOIN dim_customer_segment cs ON dc.segment_key = cs.segment_key  -- unnecessary
JOIN dim_customer_region cr ON dc.region_key = cr.region_key     -- unnecessary
GROUP BY cs.segment_name, cr.region_name;

-- CORRECT (flat dimension): one join
SELECT
    dc.customer_segment,
    dc.region,
    SUM(fs.line_revenue_usd)
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_key = dc.customer_key
GROUP BY dc.customer_segment, dc.region;

Generating Fact and Dimension Tables with TalkingSchema

Ask for grain-first fact table design

Design a fact table for purchase order management.
Grain: one row per purchase order line item.

Business keys from source: purchase_orders.po_id, purchase_order_items.item_id
Available dimensions: dim_date, dim_supplier, dim_product, dim_warehouse

Measures needed:
- Ordered quantity
- Received quantity
- Unit cost
- Total line cost
- Days late or early (received vs expected) — degenerate or derived?

Identify: which measures are additive, semi-additive, or should be derived at query time.

Ask for a wide, flat dimension design

Design a dim_customer dimension table.
Source: OLTP customers table.

Requirements:
- SCD Type 2 tracking for tier changes
- Derived columns: credit_band from credit_limit_usd ('Low' < $10k, 'Medium' $10k–$100k, 'High' > $100k)
- Regional grouping (country → region → global_region)
- All attributes flat in one table — no separate dim_region child table

Include: surrogate key, business key, all descriptive columns, SCD columns.

Frequently Asked Questions

How many rows should a dimension table have?

Most dimension tables are small — thousands to tens of thousands of rows. dim_date is pre-populated with ~3,650 rows for 10 years. dim_product might have 50,000 SKUs. dim_customer might have 100,000 enterprise customers. Dimension tables with millions of rows are unusual and may indicate a design problem — they may be confusing a dimension with a fact.

Should I store totals in the fact table?

No. Pre-aggregated totals (daily revenue, monthly total) belong in aggregate tables or materialized views — not in the transactional fact table. The fact table always holds atomic grain; aggregation happens in the reporting or BI layer.

What is the role of `unknown` members in dimension tables?

When a fact row cannot be assigned a valid dimension key — for example, a shipment with no known carrier — it references a special unknown member row in the dimension table (surrogate key = -1 by convention). Never use NULL for a foreign key in a fact table; NULLs break GROUP BY and aggregate correctness.

Fact Tables: The Measurements​

What belongs in a fact table​

The four types of measures​

Dimension Tables: The Context​

What belongs in a dimension table​

Wide and flat: the guiding principle​

Generating Fact and Dimension Tables with TalkingSchema​

Ask for grain-first fact table design​

Ask for a wide, flat dimension design​

Frequently Asked Questions​

How many rows should a dimension table have?​

Should I store totals in the fact table?​

What is the role of unknown members in dimension tables?​