Skip to main content

Kimball - Dimensional Modeling

Dimensional modeling is not merely a schema design style — it is a complete methodology for making data warehouse schemas queryable, maintainable, and correct. Ralph Kimball's Data Warehouse Toolkit (first published 1996, revised three times since) remains the single most cited reference in data engineering, and for good reason: its prescriptions about grain, conformed dimensions, and the bus architecture have proven correct across three decades of data warehouse evolution.

TalkingSchema's AI copilot encodes Kimball methodology and applies it automatically. You describe requirements in plain language; the AI generates schemas that would earn the approval of a senior data architect — with grain defined first, dimensions designed for reuse, and the bus matrix implicit in every shared dimension.


The Kimball Lifecycle in TalkingSchema

Phase 1: Define the Bus Matrix

The bus matrix is the backbone of enterprise dimensional modeling — a spreadsheet-like artifact that maps each business process (row) to the dimensions it shares (columns). Before generating any schema, experienced data architects build the bus matrix to identify conformed dimension candidates.

Ask TalkingSchema to draft a bus matrix before schema generation:

Based on the GSSC supply chain OLTP schema, draft a dimensional modeling bus matrix.
Business processes to include:
- Sales order fulfillment
- Purchase order management
- Shipment and logistics
- Inventory management

For each process: identify the grain, candidate fact table name, and which dimensions
it shares with other processes. Identify conformed dimension candidates.

TalkingSchema output:

Business ProcessGrainfact_tabledim_datedim_supplierdim_productdim_customerdim_warehouse
Sales order fulfillmentPer line itemfact_sales✅ order_date
Purchase order managementPer PO linefact_purchases✅ order_date
Shipment & logisticsPer shipmentfact_shipments✅ shipped_date
Inventory managementDaily snapshotfact_inventory✅ snapshot_date

Conformed dimensions: dim_date, dim_supplier, dim_product, dim_warehouse — all shared across two or more processes.

Phase 2: Design conformed dimensions

A conformed dimension is one of the most important concepts in dimensional modeling. It means that two fact tables referencing the same dimension are using identical keys, identical granularity, and identical attribute definitions — enabling drill-across analysis without explicit joins between fact tables.

Generate a conformed dim_supplier dimension that works across:
fact_sales (needs company_name, carbon_tier, country, region)
fact_purchases (needs company_name, carbon_tier, payment_terms)
fact_shipments (needs company_name, country, region)

The conformed dimension must contain a superset of all attributes needed by
any fact process. Use SCD Type 2 for carbon_tier changes. Include effective_from,
effective_to, is_current.

Phase 3: Define grain and generate fact tables

Grain definition is the most critical step in dimensional modeling. Ambiguous or incorrect grain produces incorrect aggregations — the most invisible and dangerous category of data quality error.

Generate fact_sales for the GSSC sales process.

Grain definition: one row per sales order line item.
This means: each row represents one product SKU within one sales order.
A single order with 5 line items produces 5 rows in fact_sales.

Confirm: does this grain support SUM(line_revenue_usd) GROUP BY customer?
Answer: yes — each row has one customer key, so summing is correct.

Confirm: can I SUM(quantity) across different orders?
Answer: yes — fully additive by all dimensions.

Generate the fact table with all dimension foreign keys and measures.

GSSC Dimensional Model: Complete Schema

-- ============================================================
-- GSSC Enterprise Data Warehouse — Dimensional Model
-- Methodology: Kimball (The Data Warehouse Toolkit, 4th ed.)
-- ============================================================

-- ── Conformed: Date ──────────────────────────────────────────
CREATE TABLE dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD
full_date DATE NOT NULL UNIQUE,
year SMALLINT NOT NULL,
half_year SMALLINT NOT NULL, -- 1 or 2
quarter SMALLINT NOT NULL,
quarter_label CHAR(6) NOT NULL, -- 'Q1 2025'
month SMALLINT NOT NULL,
month_name VARCHAR(10) NOT NULL,
week_of_year SMALLINT NOT NULL,
day_of_week SMALLINT NOT NULL,
day_name VARCHAR(10) NOT NULL,
is_weekend BOOLEAN NOT NULL,
is_month_end BOOLEAN NOT NULL,
is_quarter_end BOOLEAN NOT NULL,
fiscal_year SMALLINT NOT NULL, -- Adjust to company fiscal calendar
fiscal_quarter SMALLINT NOT NULL,
fiscal_period SMALLINT NOT NULL
);

-- ── Conformed: Supplier (SCD Type 2) ────────────────────────
CREATE TABLE dim_supplier (
supplier_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
supplier_bk UUID NOT NULL, -- Business/natural key
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50) NOT NULL,
carbon_tier VARCHAR(1) NOT NULL,
carbon_tier_label VARCHAR(50) NOT NULL,
certification VARCHAR(100),
payment_terms VARCHAR(50), -- For purchase process
risk_rating VARCHAR(10), -- 'Low', 'Medium', 'High'
is_active BOOLEAN NOT NULL,
-- SCD Type 2 tracking
row_effective_date DATE NOT NULL,
row_expiry_date DATE,
is_current_row BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Product (SCD Type 2) ─────────────────────────
CREATE TABLE dim_product (
product_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_bk UUID NOT NULL,
sku VARCHAR(50) NOT NULL,
product_name VARCHAR(200) NOT NULL,
category VARCHAR(100) NOT NULL,
subcategory VARCHAR(100),
carbon_intensity_score DECIMAL(5, 2),
carbon_band VARCHAR(10), -- 'Low', 'Med', 'High', 'Critical'
cost_band VARCHAR(10),
-- SCD Type 1: supplier name (overwrite on change)
current_supplier_name VARCHAR(200),
-- SCD Type 2 tracking
row_effective_date DATE NOT NULL,
row_expiry_date DATE,
is_current_row BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Customer (SCD Type 2) ────────────────────────
CREATE TABLE dim_customer (
customer_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_bk UUID NOT NULL,
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50) NOT NULL,
customer_tier VARCHAR(20) NOT NULL,
credit_band VARCHAR(10),
lifetime_value_band VARCHAR(10), -- Updated periodically
row_effective_date DATE NOT NULL,
row_expiry_date DATE,
is_current_row BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Warehouse ─────────────────────────────────────
-- Type 1 (overwrite) — warehouses rarely change structurally
CREATE TABLE dim_warehouse (
warehouse_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
warehouse_bk UUID NOT NULL,
warehouse_name VARCHAR(200) NOT NULL,
city VARCHAR(100) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50) NOT NULL,
climate_zone VARCHAR(30),
capacity_band VARCHAR(10), -- 'Small', 'Medium', 'Large', 'Mega'
is_active BOOLEAN NOT NULL
);

-- ── Fact: Sales ──────────────────────────────────────────────
-- Grain: one row per sales order line item
-- Partitioned by order_date_key for query performance
CREATE TABLE fact_sales (
sales_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
order_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
ship_date_key INTEGER REFERENCES dim_date(date_key),
customer_key INTEGER NOT NULL REFERENCES dim_customer(customer_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
-- Degenerate dimensions
order_number VARCHAR(50) NOT NULL,
line_number INTEGER NOT NULL,
-- Fully additive measures
quantity_ordered INTEGER NOT NULL,
unit_price_usd DECIMAL(12, 4) NOT NULL,
line_revenue_usd DECIMAL(14, 4) NOT NULL,
discount_amount_usd DECIMAL(12, 4) NOT NULL DEFAULT 0,
net_revenue_usd DECIMAL(14, 4) NOT NULL,
cost_of_goods_usd DECIMAL(14, 4),
gross_margin_usd DECIMAL(14, 4),
-- Semi-additive (sum across products, not time periods)
line_carbon_kg DECIMAL(10, 3)
) PARTITION BY RANGE (order_date_key);

-- ── Fact: Purchases ──────────────────────────────────────────
CREATE TABLE fact_purchases (
purchase_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
order_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
expected_date_key INTEGER REFERENCES dim_date(date_key),
received_date_key INTEGER REFERENCES dim_date(date_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
po_number VARCHAR(50) NOT NULL,
quantity_ordered INTEGER NOT NULL,
quantity_received INTEGER,
unit_cost_usd DECIMAL(12, 4) NOT NULL,
line_cost_usd DECIMAL(14, 4) NOT NULL,
is_on_time BOOLEAN,
days_variance INTEGER -- received_date - expected_date
);

-- ── Fact: Shipments ──────────────────────────────────────────
CREATE TABLE fact_shipments (
shipment_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
ship_date_key INTEGER REFERENCES dim_date(date_key),
arrive_date_key INTEGER REFERENCES dim_date(date_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
shipment_ref VARCHAR(50) NOT NULL,
po_number VARCHAR(50),
carrier VARCHAR(100),
emissions_kg DECIMAL(10, 3),
transit_days INTEGER,
is_on_time BOOLEAN
);

-- ── Fact: Inventory (Periodic Snapshot) ─────────────────────
-- Grain: daily inventory position per product per warehouse
CREATE TABLE fact_inventory_snapshot (
snapshot_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
snapshot_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
-- Semi-additive measures: SUM across warehouses OK; SUM across dates NOT OK
quantity_on_hand INTEGER NOT NULL,
quantity_on_order INTEGER NOT NULL DEFAULT 0,
reorder_point INTEGER NOT NULL,
-- Derived flags
is_below_reorder BOOLEAN NOT NULL,
days_of_supply DECIMAL(8, 2) -- quantity_on_hand / avg_daily_demand
);

Frequently Asked Questions

What is the Kimball Group's definition of grain?

The Kimball Group defines grain as: "The grain establishes exactly what a single fact table row represents." This definition must be documented before any columns are added to the fact table. If you cannot write the grain definition as a single declarative sentence, the model is not ready to be implemented.

What is a periodic snapshot fact table?

A periodic snapshot captures the state of a subject at a regular interval — daily, weekly, or monthly. Unlike a transactional fact table (which records events as they occur), a snapshot fact table always has one row per time period per grain entity, even if nothing changed. The inventory snapshot in the GSSC model is a classic example: every day, every warehouse/product combination gets one row showing current stock levels.

What is a factless fact table?

A factless fact table records the occurrence of an event with no numeric measures — for example, a table tracking which products are on promotion during which periods. It contains only dimension foreign keys. TalkingSchema can generate factless fact tables: ask "Generate a factless fact table recording which products are in the supplier's active catalogue during each date period."