Skip to main content

Star Schema

The star schema is the foundational design pattern of the modern data warehouse — chosen by Amazon, Netflix, Airbnb, and virtually every organization with a functioning business intelligence practice. Its elegance lies in a deliberate trade-off: accept denormalization in dimension tables to eliminate multi-hop joins in analytical queries. The result is a schema that BI tools, SQL analysts, and data scientists can query intuitively, and that columnar query engines can execute at extraordinary speed.

TalkingSchema's AI copilot generates complete, production-ready star schemas from plain language — applying Kimball dimensional modeling best practices automatically, defining grain precisely, designing clean reusable dimensions, and exporting DDL for any target platform.


Anatomy of a Star Schema

The fact table

The fact table is the quantitative core of the star schema. It records business events at a defined grain and contains:

  • Surrogate foreign keys to each dimension table
  • Additive measures — numeric values that can be summed, averaged, or counted across any dimension (revenue, quantity, cost, duration)
  • Semi-additive measures — numeric values that can be summed across some dimensions but not others (inventory quantity — can sum across warehouses, but not across time periods)
  • Degenerate dimensions — dimension-like attributes with no corresponding dimension table (order number, invoice ID — high cardinality, not used for filtering)
  • Date key — always a foreign key to the date dimension, never a raw timestamp

What fact tables never contain: descriptive text, customer names, product descriptions — these belong in dimension tables.

The dimension tables

Dimension tables provide the context that makes measures meaningful. A revenue figure is meaningless without knowing the customer, product, time period, and geography it belongs to.

Good dimension tables are:

  • Wide and flat — dozens of descriptive columns, no normalization of related attributes into child tables
  • Slowly changing — with a defined strategy for handling attribute changes over time (SCD Type 1, 2, or 3)
  • Conformed — shared across multiple fact tables when the same entity appears in different business processes (the customer dimension used by both sales and service fact tables)

Example: GSSC Supply Chain Star Schema

Using the Global Sustainable Supply Chain OLTP schema as the source, TalkingSchema generates the following star schema for sales performance reporting:

-- ============================================================
-- GSSC Analytics: Star Schema for Supply Chain Reporting
-- Generated by TalkingSchema AI Copilot
-- Source: GSSC OLTP schema (10 tables, 3NF)
-- Grain (fact_sales): one row per sales order line item
-- Grain (fact_shipments): one row per shipment
-- ============================================================

-- ── Dimension: Date ──────────────────────────────────────────
-- Standard date dimension. Pre-populated for 10 years.
CREATE TABLE dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD integer key
full_date DATE NOT NULL UNIQUE,
year SMALLINT NOT NULL,
quarter SMALLINT NOT NULL, -- 1–4
quarter_label VARCHAR(6) NOT NULL, -- 'Q1 2025'
month SMALLINT NOT NULL, -- 1–12
month_name VARCHAR(10) NOT NULL, -- 'January'
month_short VARCHAR(3) NOT NULL, -- 'Jan'
week_of_year SMALLINT NOT NULL,
day_of_week SMALLINT NOT NULL, -- 1=Monday, 7=Sunday
day_name VARCHAR(10) NOT NULL, -- 'Monday'
is_weekend BOOLEAN NOT NULL,
is_fiscal_year_end BOOLEAN NOT NULL DEFAULT FALSE
);

-- ── Dimension: Supplier ──────────────────────────────────────
-- Denormalized from OLTP suppliers table.
-- Flat: all supplier attributes in one row.
CREATE TABLE dim_supplier (
supplier_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
supplier_bk UUID NOT NULL, -- Business key from OLTP
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50), -- Derived from country
carbon_tier VARCHAR(1) NOT NULL, -- A, B, C, D
carbon_tier_label VARCHAR(30) NOT NULL, -- 'Tier A — Gold Certified'
certification VARCHAR(100),
is_active BOOLEAN NOT NULL,
-- SCD Type 2 columns
effective_from DATE NOT NULL,
effective_to DATE, -- NULL = current record
is_current BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Dimension: Product ───────────────────────────────────────
-- Denormalized from OLTP products table.
CREATE TABLE dim_product (
product_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_bk UUID NOT NULL,
sku VARCHAR(50) NOT NULL,
product_name VARCHAR(200) NOT NULL,
category VARCHAR(100) NOT NULL,
-- Derived carbon scoring bands
carbon_score_band VARCHAR(20), -- 'Low', 'Medium', 'High', 'Critical'
carbon_intensity_score DECIMAL(5, 2),
unit_cost_tier VARCHAR(10), -- 'Budget', 'Standard', 'Premium'
-- Supplier denormalized into product dimension (conformed key)
supplier_key INTEGER REFERENCES dim_supplier(supplier_key),
is_active BOOLEAN NOT NULL,
effective_from DATE NOT NULL,
effective_to DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Dimension: Customer ──────────────────────────────────────
CREATE TABLE dim_customer (
customer_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_bk UUID NOT NULL,
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50),
customer_tier VARCHAR(20) NOT NULL, -- 'enterprise', 'mid-market', 'smb'
credit_band VARCHAR(20), -- 'High', 'Medium', 'Low', 'Watch'
is_current BOOLEAN NOT NULL DEFAULT TRUE,
effective_from DATE NOT NULL,
effective_to DATE
);

-- ── Dimension: Warehouse ─────────────────────────────────────
CREATE TABLE dim_warehouse (
warehouse_key INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
warehouse_bk UUID NOT NULL,
warehouse_name VARCHAR(200) NOT NULL,
city VARCHAR(100) NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(50),
is_active BOOLEAN NOT NULL
);

-- ── Fact: Sales ──────────────────────────────────────────────
-- Grain: one row per sales order line item.
-- All measures are fully additive unless noted.
CREATE TABLE fact_sales (
sales_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
-- Date keys
order_date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
-- Dimension keys
customer_key INTEGER NOT NULL REFERENCES dim_customer(customer_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
-- Degenerate dimensions (high-cardinality identifiers, no dimension table)
sales_order_bk UUID NOT NULL, -- Source so_id
line_item_bk UUID NOT NULL, -- Source item_id
-- Additive measures
quantity INTEGER NOT NULL,
unit_price_usd DECIMAL(12, 4) NOT NULL,
line_revenue_usd DECIMAL(14, 4) NOT NULL, -- quantity × unit_price
-- Semi-additive measure (sum across products, not time)
-- Computed at load time from dim_product.carbon_intensity_score
line_carbon_kg DECIMAL(10, 3) -- quantity × product carbon score
);

-- ── Fact: Shipments ──────────────────────────────────────────
-- Grain: one row per shipment record.
CREATE TABLE fact_shipments (
shipment_key BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
shipped_date_key INTEGER REFERENCES dim_date(date_key),
arrived_date_key INTEGER REFERENCES dim_date(date_key),
supplier_key INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
warehouse_key INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
-- Degenerate dimension
shipment_bk UUID NOT NULL,
purchase_order_bk UUID NOT NULL,
-- Additive measures
emissions_kg DECIMAL(10, 3), -- Actual CO₂ per shipment
-- Derived measures
transit_days INTEGER, -- arrived_date_key − shipped_date_key
is_on_time BOOLEAN -- arrived_at <= expected_delivery
);

-- ── Indexes for query performance ────────────────────────────
CREATE INDEX idx_fact_sales_order_date ON fact_sales(order_date_key);
CREATE INDEX idx_fact_sales_customer ON fact_sales(customer_key);
CREATE INDEX idx_fact_sales_product ON fact_sales(product_key);
CREATE INDEX idx_fact_sales_warehouse ON fact_sales(warehouse_key);
CREATE INDEX idx_fact_shipments_shipped ON fact_shipments(shipped_date_key);
CREATE INDEX idx_fact_shipments_supplier ON fact_shipments(supplier_key);

Prompts for TalkingSchema

Generate a star schema from OLTP source

Using the current GSSC OLTP schema as the source, generate a star schema
for supply chain analytics.

Requirements:
- fact_sales: grain = one row per sales order line item
- fact_shipments: grain = one row per shipment
- Dimensions: dim_date, dim_supplier (SCD2), dim_product (SCD2),
dim_customer (SCD2), dim_warehouse

Rules:
- Use integer surrogate keys for all dimension tables
- Keep a UUID business key column in each dimension for ETL joins
- Do NOT create foreign keys back to the OLTP schema
- Denormalize all supplier attributes into dim_supplier (flat, no joins)
- Carbon scoring bands should be derived columns in dim_product

Generate a date dimension

Generate a dim_date table for a data warehouse. Include: date_key (YYYYMMDD integer),
full_date, year, quarter, quarter_label, month, month_name, week_of_year,
day_of_week, day_name, is_weekend, is_fiscal_year_end.
Also generate a PostgreSQL function to populate it for any date range.

Generate ELT logic for a fact table

Write the SQL to populate fact_sales from the OLTP sales_orders,
sales_order_items, and products tables. Include:
- Join logic to resolve surrogate keys from dimension tables
- Incremental load using order_date as the partition key
- Handling for late-arriving facts (orders with past order dates)

Frequently Asked Questions

Should fact tables have indexes?

Yes — on all foreign keys. Analytical queries almost always filter or join on dimension keys (date_key, customer_key, product_key). Index each dimension key column. Avoid indexing measures — columnar databases handle measure aggregation via full scans, not index seeks.

What is a degenerate dimension?

A degenerate dimension is a dimension attribute stored directly on the fact table — with no corresponding dimension table. Order numbers, invoice IDs, and transaction reference numbers are degenerate dimensions. They are too high-cardinality and too rarely used as filter attributes to justify a dimension table, but they are essential for row-level traceability.

How many fact tables should a star schema have?

Each distinct business process that produces measurable events typically becomes its own fact table — with its own grain. Sales orders, shipments, inventory snapshots, and purchase orders are four distinct business processes with four distinct grains. Combining them into one fact table (a "kitchen sink" fact) breaks the grain contract and produces incorrect aggregations.

How do I handle currency conversion in fact tables?

Store the original transaction currency and amount alongside a converted USD amount. Add a currency_code column and a conversion_rate column. Compute amount_usd = amount_local * conversion_rate at ETL time using the exchange rate on the transaction date. This preserves auditability while enabling global reporting.