Star Schema
The star schema is the foundational design pattern of the modern data warehouse — chosen by Amazon, Netflix, Airbnb, and virtually every organization with a functioning business intelligence practice. Its elegance lies in a deliberate trade-off: accept denormalization in dimension tables to eliminate multi-hop joins in analytical queries. The result is a schema that BI tools, SQL analysts, and data scientists can query intuitively, and that columnar query engines can execute at extraordinary speed.
TalkingSchema's AI copilot generates complete, production-ready star schemas from plain language — applying Kimball dimensional modeling best practices automatically, defining grain precisely, designing clean reusable dimensions, and exporting DDL for any target platform.
Anatomy of a Star Schema
The fact table
The fact table is the quantitative core of the star schema. It records business events at a defined grain and contains:
- Surrogate foreign keys to each dimension table
- Additive measures — numeric values that can be summed, averaged, or counted across any dimension (revenue, quantity, cost, duration)
- Semi-additive measures — numeric values that can be summed across some dimensions but not others (inventory quantity — can sum across warehouses, but not across time periods)
- Degenerate dimensions — dimension-like attributes with no corresponding dimension table (order number, invoice ID — high cardinality, not used for filtering)
- Date key — always a foreign key to the date dimension, never a raw timestamp
What fact tables never contain: descriptive text, customer names, product descriptions — these belong in dimension tables.
The dimension tables
Dimension tables provide the context that makes measures meaningful. A revenue figure is meaningless without knowing the customer, product, time period, and geography it belongs to.
Good dimension tables are:
- Wide and flat — dozens of descriptive columns, no normalization of related attributes into child tables
- Slowly changing — with a defined strategy for handling attribute changes over time (SCD Type 1, 2, or 3)
- Conformed — shared across multiple fact tables when the same entity appears in different business processes (the customer dimension used by both sales and service fact tables)
Example: GSSC Supply Chain Star Schema
Using the Global Sustainable Supply Chain OLTP schema as the source, TalkingSchema generates the following star schema for sales performance reporting:
-- ============================================================
-- GSSC Analytics: Star Schema for Supply Chain Reporting
-- Generated by TalkingSchema AI Copilot
-- Source: GSSC OLTP schema (10 tables, 3NF)
-- Grain (fact_sales): one row per sales order line item
-- Grain (fact_shipments): one row per shipment
-- ============================================================
-- ── Dimension: Date ──────────────────────────────────────────
-- Standard date dimension. Pre-populated for 10 years.
CREATE TABLE dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD integer key
full_date DATE NOT NULL UNIQUE,
year SMALLINT NOT NULL,
quarter SMALLINT NOT NULL, -- 1–4
quarter_label VARCHAR(6) NOT NULL, -- 'Q1 2025'
month SMALLINT NOT NULL, -- 1–12
month_name VARCHAR(10) NOT NULL, -- 'January'
month_short VARCHAR(3) NOT NULL, -- 'Jan'
week_of_year SMALLINT NOT NULL,
day_of_week SMALLINT NOT NULL, -- 1=Monday, 7=Sunday
day_name VARCHAR(10) NOT NULL, -- 'Monday'
is_weekend BOOLEAN NOT NULL,
is_fiscal_year_end BOOLEAN NOT NULL DEFAULT FALSE
);
-- ── Dimension: Supplier ──────────────────────────────────────
-- Denormalized from OLTP suppliers table.
-- Flat: all supplier attributes in one row.
CREATE TABLE dim_supplier (
supplier_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
supplier_bk UUID NOT NULL, -- Business key from OLTP
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50), -- Derived from country
carbon_tier VARCHAR(1) NOT NULL, -- A, B, C, D
carbon_tier_label VARCHAR(30) NOT NULL, -- 'Tier A — Gold Certified'
certification VARCHAR(100),
is_active BOOLEAN NOT NULL,
-- SCD Type 2 columns
effective_from DATE NOT NULL,
effective_to DATE, -- NULL = current record
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
-- ── Dimension: Product ───────────────────────────────────────
-- Denormalized from OLTP products table.
CREATE TABLE dim_product (
product_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_bk UUID NOT NULL,
sku VARCHAR(50) NOT NULL,
product_name VARCHAR(200) NOT NULL,
category VARCHAR(100) NOT NULL,
-- Derived carbon scoring bands
carbon_score_band VARCHAR(20), -- 'Low', 'Medium', 'High', 'Critical'
carbon_intensity_score DECIMAL(5, 2),
unit_cost_tier VARCHAR(10), -- 'Budget', 'Standard', 'Premium'
-- Supplier denormalized into product dimension (conformed key)
supplier_key INTEGER REFERENCES dim_supplier(supplier_key),
is_active BOOLEAN NOT NULL,
effective_from DATE NOT NULL,
effective_to DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
-- ── Dimension: Customer ──────────────────────────────────────
CREATE TABLE dim_customer (
customer_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_bk UUID NOT NULL,
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50),
customer_tier VARCHAR(20) NOT NULL, -- 'enterprise', 'mid-market', 'smb'
credit_band VARCHAR(20), -- 'High', 'Medium', 'Low', 'Watch'
is_current BOOLEAN NOT NULL DEFAULT TRUE,
effective_from DATE NOT NULL,
effective_to DATE
);
-- ── Dimension: Warehouse ─────────────────────────────────────
CREATE TABLE dim_warehouse (
warehouse_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
warehouse_bk UUID NOT NULL,
warehouse_name VARCHAR(200) NOT NULL,
city VARCHAR(100) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50),
is_active BOOLEAN NOT NULL
);
-- ── Fact: Sales ──────────────────────────────────────────────
-- Grain: one row per sales order line item.
-- All measures are fully additive unless noted.
CREATE TABLE fact_sales (
sales_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
-- Date keys
order_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
-- Dimension keys
customer_key INTEGER NOT NULL REFERENCES dim_customer(customer_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
-- Degenerate dimensions (high-cardinality identifiers, no dimension table)
sales_order_bk UUID NOT NULL, -- Source so_id
line_item_bk UUID NOT NULL, -- Source item_id
-- Additive measures
quantity INTEGER NOT NULL,
unit_price_usd DECIMAL(12, 4) NOT NULL,
line_revenue_usd DECIMAL(14, 4) NOT NULL, -- quantity × unit_price
-- Semi-additive measure (sum across products, not time)
-- Computed at load time from dim_product.carbon_intensity_score
line_carbon_kg DECIMAL(10, 3) -- quantity × product carbon score
);
-- ── Fact: Shipments ──────────────────────────────────────────
-- Grain: one row per shipment record.
CREATE TABLE fact_shipments (
shipment_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
shipped_date_key INTEGER REFERENCES dim_date(date_key),
arrived_date_key INTEGER REFERENCES dim_date(date_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
-- Degenerate dimension
shipment_bk UUID NOT NULL,
purchase_order_bk UUID NOT NULL,
-- Additive measures
emissions_kg DECIMAL(10, 3), -- Actual CO₂ per shipment
-- Derived measures
transit_days INTEGER, -- arrived_date_key − shipped_date_key
is_on_time BOOLEAN -- arrived_at <= expected_delivery
);
-- ── Indexes for query performance ────────────────────────────
CREATE INDEX idx_fact_sales_order_date ON fact_sales(order_date_key);
CREATE INDEX idx_fact_sales_customer ON fact_sales(customer_key);
CREATE INDEX idx_fact_sales_product ON fact_sales(product_key);
CREATE INDEX idx_fact_sales_warehouse ON fact_sales(warehouse_key);
CREATE INDEX idx_fact_shipments_shipped ON fact_shipments(shipped_date_key);
CREATE INDEX idx_fact_shipments_supplier ON fact_shipments(supplier_key);
Prompts for TalkingSchema
Generate a star schema from OLTP source
Using the current GSSC OLTP schema as the source, generate a star schema
for supply chain analytics.
Requirements:
- fact_sales: grain = one row per sales order line item
- fact_shipments: grain = one row per shipment
- Dimensions: dim_date, dim_supplier (SCD2), dim_product (SCD2),
dim_customer (SCD2), dim_warehouse
Rules:
- Use integer surrogate keys for all dimension tables
- Keep a UUID business key column in each dimension for ETL joins
- Do NOT create foreign keys back to the OLTP schema
- Denormalize all supplier attributes into dim_supplier (flat, no joins)
- Carbon scoring bands should be derived columns in dim_product
Generate a date dimension
Generate a dim_date table for a data warehouse. Include: date_key (YYYYMMDD integer),
full_date, year, quarter, quarter_label, month, month_name, week_of_year,
day_of_week, day_name, is_weekend, is_fiscal_year_end.
Also generate a PostgreSQL function to populate it for any date range.
Generate ELT logic for a fact table
Write the SQL to populate fact_sales from the OLTP sales_orders,
sales_order_items, and products tables. Include:
- Join logic to resolve surrogate keys from dimension tables
- Incremental load using order_date as the partition key
- Handling for late-arriving facts (orders with past order dates)
Frequently Asked Questions
Should fact tables have indexes?
Yes — on all foreign keys. Analytical queries almost always filter or join on dimension keys (date_key, customer_key, product_key). Index each dimension key column. Avoid indexing measures — columnar databases handle measure aggregation via full scans, not index seeks.