What is dimensional modeling?

Dimensional modeling is a data warehouse design methodology developed by Ralph Kimball that organizes data into fact tables (measurements of business events) and dimension tables (descriptive context). Unlike normalized OLTP design, dimensional modeling is optimized for query performance, business user comprehension, and BI tool compatibility. The methodology defines precise rules for grain definition, dimension design, slowly changing dimensions, conformed dimensions, and the data warehouse bus architecture.

What is the Kimball Bus Architecture?

The Kimball Bus Architecture is a design strategy for enterprise data warehouses that uses conformed dimensions — dimension tables shared across multiple fact tables — as the integration mechanism. A conformed date dimension is used by the sales fact table, the inventory fact table, and the shipment fact table. Because they all reference the same dim_date with the same keys, cross-process analytics (comparing sales volume to shipment delays by month) are possible without data duplication. The Bus Matrix documents which dimensions are shared across which business processes.

How does TalkingSchema support dimensional modeling?

TalkingSchema's AI copilot understands Kimball dimensional modeling methodology and applies it automatically when you describe analytical requirements. It defines grain before generating fact tables, recommends conformed dimensions across business processes, identifies degenerate dimensions, suggests SCD type assignments, and generates the complete multi-fact-table schema on the ERD canvas for visual review and approval.

¿Qué es el modelado dimensional y la metodología Kimball?

El modelado dimensional es una metodología de diseño de almacenes de datos desarrollada por Ralph Kimball que organiza los datos en tablas de hechos (medidas de eventos de negocio) y tablas de dimensiones (contexto descriptivo). A diferencia del diseño OLTP normalizado, el modelado dimensional está optimizado para el rendimiento de consultas, la comprensión del usuario de negocio y la compatibilidad con herramientas de BI. TalkingSchema implementa automáticamente la metodología Kimball: define el grano antes de generar las tablas de hechos, identifica dimensiones conformadas y genera el esquema completo con bus arquitectónico.

Qu'est-ce que la modélisation dimensionnelle et la méthodologie Kimball ?

La modélisation dimensionnelle est une méthodologie de conception d'entrepôts de données développée par Ralph Kimball qui organise les données en tables de faits (mesures d'événements métier) et tables de dimensions (contexte descriptif). Contrairement à la conception OLTP normalisée, la modélisation dimensionnelle est optimisée pour les performances des requêtes, la compréhension des utilisateurs métier et la compatibilité avec les outils BI. TalkingSchema applique automatiquement la méthodologie Kimball.

Was ist dimensionale Modellierung und die Kimball-Methodik?

Dimensionale Modellierung ist eine Data-Warehouse-Designmethodik, die von Ralph Kimball entwickelt wurde und Daten in Faktentabellen (Messungen von Geschäftsereignissen) und Dimensionstabellen (beschreibender Kontext) organisiert. Im Gegensatz zum normalisierten OLTP-Design ist die dimensionale Modellierung für Abfrageleistung, Verständlichkeit für Business-User und BI-Tool-Kompatibilität optimiert. TalkingSchema wendet die Kimball-Methodik automatisch an.

ディメンショナルモデリングとKimball手法とは何ですか？

ディメンショナルモデリングはRalph Kimballが開発したデータウェアハウス設計手法で、データをファクトテーブル（ビジネスイベントの測定値）とディメンションテーブル（説明的コンテキスト）に整理します。正規化されたOLTP設計とは異なり、ディメンショナルモデリングはクエリパフォーマンス、ビジネスユーザーの理解しやすさ、BIツールとの互換性のために最適化されています。TalkingSchemaはKimball手法を自動的に適用します。

Kimball - Dimensional Modeling

Dimensional modeling is not merely a schema design style — it is a complete methodology for making data warehouse schemas queryable, maintainable, and correct. Ralph Kimball's Data Warehouse Toolkit (first published 1996, revised three times since) remains the single most cited reference in data engineering, and for good reason: its prescriptions about grain, conformed dimensions, and the bus architecture have proven correct across three decades of data warehouse evolution.

TalkingSchema's AI copilot encodes Kimball methodology and applies it automatically. You describe requirements in plain language; the AI generates schemas that would earn the approval of a senior data architect — with grain defined first, dimensions designed for reuse, and the bus matrix implicit in every shared dimension.

The Kimball Lifecycle in TalkingSchema

Phase 1: Define the Bus Matrix

The bus matrix is the backbone of enterprise dimensional modeling — a spreadsheet-like artifact that maps each business process (row) to the dimensions it shares (columns). Before generating any schema, experienced data architects build the bus matrix to identify conformed dimension candidates.

Ask TalkingSchema to draft a bus matrix before schema generation:

Based on the GSSC supply chain OLTP schema, draft a dimensional modeling bus matrix.
Business processes to include:
- Sales order fulfillment
- Purchase order management
- Shipment and logistics
- Inventory management

For each process: identify the grain, candidate fact table name, and which dimensions
it shares with other processes. Identify conformed dimension candidates.

TalkingSchema output:

Business Process	Grain	fact_table	dim_date	dim_supplier	dim_product	dim_customer	dim_warehouse
Sales order fulfillment	Per line item	fact_sales	✅ order_date	✅	✅	✅	✅
Purchase order management	Per PO line	fact_purchases	✅ order_date	✅	✅	—	✅
Shipment & logistics	Per shipment	fact_shipments	✅ shipped_date	✅	—	—	✅
Inventory management	Daily snapshot	fact_inventory	✅ snapshot_date	—	✅	—	✅

Conformed dimensions: dim_date, dim_supplier, dim_product, dim_warehouse — all shared across two or more processes.

Phase 2: Design conformed dimensions

A conformed dimension is one of the most important concepts in dimensional modeling. It means that two fact tables referencing the same dimension are using identical keys, identical granularity, and identical attribute definitions — enabling drill-across analysis without explicit joins between fact tables.

Generate a conformed dim_supplier dimension that works across:
fact_sales (needs company_name, carbon_tier, country, region)
fact_purchases (needs company_name, carbon_tier, payment_terms)
fact_shipments (needs company_name, country, region)

The conformed dimension must contain a superset of all attributes needed by
any fact process. Use SCD Type 2 for carbon_tier changes. Include effective_from,
effective_to, is_current.

Phase 3: Define grain and generate fact tables

Grain definition is the most critical step in dimensional modeling. Ambiguous or incorrect grain produces incorrect aggregations — the most invisible and dangerous category of data quality error.

Generate fact_sales for the GSSC sales process.

Grain definition: one row per sales order line item.
This means: each row represents one product SKU within one sales order.
A single order with 5 line items produces 5 rows in fact_sales.

Confirm: does this grain support SUM(line_revenue_usd) GROUP BY customer?
Answer: yes — each row has one customer key, so summing is correct.

Confirm: can I SUM(quantity) across different orders?
Answer: yes — fully additive by all dimensions.

Generate the fact table with all dimension foreign keys and measures.

GSSC Dimensional Model: Complete Schema

-- ============================================================
-- GSSC Enterprise Data Warehouse — Dimensional Model
-- Methodology: Kimball (The Data Warehouse Toolkit, 4th ed.)
-- ============================================================

-- ── Conformed: Date ──────────────────────────────────────────
CREATE TABLE dim_date (
    date_key          INTEGER PRIMARY KEY,           -- YYYYMMDD
    full_date         DATE NOT NULL UNIQUE,
    year              SMALLINT NOT NULL,
    half_year         SMALLINT NOT NULL,             -- 1 or 2
    quarter           SMALLINT NOT NULL,
    quarter_label     CHAR(6) NOT NULL,              -- 'Q1 2025'
    month             SMALLINT NOT NULL,
    month_name        VARCHAR(10) NOT NULL,
    week_of_year      SMALLINT NOT NULL,
    day_of_week       SMALLINT NOT NULL,
    day_name          VARCHAR(10) NOT NULL,
    is_weekend        BOOLEAN NOT NULL,
    is_month_end      BOOLEAN NOT NULL,
    is_quarter_end    BOOLEAN NOT NULL,
    fiscal_year       SMALLINT NOT NULL,             -- Adjust to company fiscal calendar
    fiscal_quarter    SMALLINT NOT NULL,
    fiscal_period     SMALLINT NOT NULL
);

-- ── Conformed: Supplier (SCD Type 2) ────────────────────────
CREATE TABLE dim_supplier (
    supplier_key        INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    supplier_bk         UUID NOT NULL,               -- Business/natural key
    company_name        VARCHAR(200) NOT NULL,
    country             VARCHAR(100) NOT NULL,
    region              VARCHAR(50) NOT NULL,
    carbon_tier         VARCHAR(1) NOT NULL,
    carbon_tier_label   VARCHAR(50) NOT NULL,
    certification       VARCHAR(100),
    payment_terms       VARCHAR(50),                 -- For purchase process
    risk_rating         VARCHAR(10),                 -- 'Low', 'Medium', 'High'
    is_active           BOOLEAN NOT NULL,
    -- SCD Type 2 tracking
    row_effective_date  DATE NOT NULL,
    row_expiry_date     DATE,
    is_current_row      BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Product (SCD Type 2) ─────────────────────────
CREATE TABLE dim_product (
    product_key             INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    product_bk              UUID NOT NULL,
    sku                     VARCHAR(50) NOT NULL,
    product_name            VARCHAR(200) NOT NULL,
    category                VARCHAR(100) NOT NULL,
    subcategory             VARCHAR(100),
    carbon_intensity_score  DECIMAL(5, 2),
    carbon_band             VARCHAR(10),             -- 'Low', 'Med', 'High', 'Critical'
    cost_band               VARCHAR(10),
    -- SCD Type 1: supplier name (overwrite on change)
    current_supplier_name   VARCHAR(200),
    -- SCD Type 2 tracking
    row_effective_date      DATE NOT NULL,
    row_expiry_date         DATE,
    is_current_row          BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Customer (SCD Type 2) ────────────────────────
CREATE TABLE dim_customer (
    customer_key        INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_bk         UUID NOT NULL,
    company_name        VARCHAR(200) NOT NULL,
    country             VARCHAR(100) NOT NULL,
    region              VARCHAR(50) NOT NULL,
    customer_tier       VARCHAR(20) NOT NULL,
    credit_band         VARCHAR(10),
    lifetime_value_band VARCHAR(10),                 -- Updated periodically
    row_effective_date  DATE NOT NULL,
    row_expiry_date     DATE,
    is_current_row      BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Conformed: Warehouse ─────────────────────────────────────
-- Type 1 (overwrite) — warehouses rarely change structurally
CREATE TABLE dim_warehouse (
    warehouse_key   INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    warehouse_bk    UUID NOT NULL,
    warehouse_name  VARCHAR(200) NOT NULL,
    city            VARCHAR(100) NOT NULL,
    country         VARCHAR(100) NOT NULL,
    region          VARCHAR(50) NOT NULL,
    climate_zone    VARCHAR(30),
    capacity_band   VARCHAR(10),                    -- 'Small', 'Medium', 'Large', 'Mega'
    is_active       BOOLEAN NOT NULL
);

-- ── Fact: Sales ──────────────────────────────────────────────
-- Grain: one row per sales order line item
-- Partitioned by order_date_key for query performance
CREATE TABLE fact_sales (
    sales_key           BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    order_date_key      INTEGER NOT NULL REFERENCES dim_date(date_key),
    ship_date_key       INTEGER REFERENCES dim_date(date_key),
    customer_key        INTEGER NOT NULL REFERENCES dim_customer(customer_key),
    product_key         INTEGER NOT NULL REFERENCES dim_product(product_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    supplier_key        INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
    -- Degenerate dimensions
    order_number        VARCHAR(50) NOT NULL,
    line_number         INTEGER NOT NULL,
    -- Fully additive measures
    quantity_ordered    INTEGER NOT NULL,
    unit_price_usd      DECIMAL(12, 4) NOT NULL,
    line_revenue_usd    DECIMAL(14, 4) NOT NULL,
    discount_amount_usd DECIMAL(12, 4) NOT NULL DEFAULT 0,
    net_revenue_usd     DECIMAL(14, 4) NOT NULL,
    cost_of_goods_usd   DECIMAL(14, 4),
    gross_margin_usd    DECIMAL(14, 4),
    -- Semi-additive (sum across products, not time periods)
    line_carbon_kg      DECIMAL(10, 3)
) PARTITION BY RANGE (order_date_key);

-- ── Fact: Purchases ──────────────────────────────────────────
CREATE TABLE fact_purchases (
    purchase_key        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    order_date_key      INTEGER NOT NULL REFERENCES dim_date(date_key),
    expected_date_key   INTEGER REFERENCES dim_date(date_key),
    received_date_key   INTEGER REFERENCES dim_date(date_key),
    supplier_key        INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
    product_key         INTEGER NOT NULL REFERENCES dim_product(product_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    po_number           VARCHAR(50) NOT NULL,
    quantity_ordered    INTEGER NOT NULL,
    quantity_received   INTEGER,
    unit_cost_usd       DECIMAL(12, 4) NOT NULL,
    line_cost_usd       DECIMAL(14, 4) NOT NULL,
    is_on_time          BOOLEAN,
    days_variance       INTEGER                  -- received_date - expected_date
);

-- ── Fact: Shipments ──────────────────────────────────────────
CREATE TABLE fact_shipments (
    shipment_key        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ship_date_key       INTEGER REFERENCES dim_date(date_key),
    arrive_date_key     INTEGER REFERENCES dim_date(date_key),
    supplier_key        INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    shipment_ref        VARCHAR(50) NOT NULL,
    po_number           VARCHAR(50),
    carrier             VARCHAR(100),
    emissions_kg        DECIMAL(10, 3),
    transit_days        INTEGER,
    is_on_time          BOOLEAN
);

-- ── Fact: Inventory (Periodic Snapshot) ─────────────────────
-- Grain: daily inventory position per product per warehouse
CREATE TABLE fact_inventory_snapshot (
    snapshot_key        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    snapshot_date_key   INTEGER NOT NULL REFERENCES dim_date(date_key),
    product_key         INTEGER NOT NULL REFERENCES dim_product(product_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    -- Semi-additive measures: SUM across warehouses OK; SUM across dates NOT OK
    quantity_on_hand    INTEGER NOT NULL,
    quantity_on_order   INTEGER NOT NULL DEFAULT 0,
    reorder_point       INTEGER NOT NULL,
    -- Derived flags
    is_below_reorder    BOOLEAN NOT NULL,
    days_of_supply      DECIMAL(8, 2)            -- quantity_on_hand / avg_daily_demand
);

Frequently Asked Questions

What is the Kimball Group's definition of grain?

The Kimball Group defines grain as: "The grain establishes exactly what a single fact table row represents." This definition must be documented before any columns are added to the fact table. If you cannot write the grain definition as a single declarative sentence, the model is not ready to be implemented.

What is a periodic snapshot fact table?

A periodic snapshot captures the state of a subject at a regular interval — daily, weekly, or monthly. Unlike a transactional fact table (which records events as they occur), a snapshot fact table always has one row per time period per grain entity, even if nothing changed. The inventory snapshot in the GSSC model is a classic example: every day, every warehouse/product combination gets one row showing current stock levels.

What is a factless fact table?

A factless fact table records the occurrence of an event with no numeric measures — for example, a table tracking which products are on promotion during which periods. It contains only dimension foreign keys. TalkingSchema can generate factless fact tables: ask "Generate a factless fact table recording which products are in the supplier's active catalogue during each date period."

The Kimball Lifecycle in TalkingSchema​

Phase 1: Define the Bus Matrix​

Phase 2: Design conformed dimensions​

Phase 3: Define grain and generate fact tables​

GSSC Dimensional Model: Complete Schema​

Frequently Asked Questions​

What is the Kimball Group's definition of grain?​

What is a periodic snapshot fact table?​

What is a factless fact table?​