What is a star schema?

A star schema is a data warehouse design pattern consisting of one or more central fact tables connected to multiple dimension tables. The fact table records business events or transactions — sales, shipments, page views — with numeric measures and foreign keys to each dimension. Dimension tables hold the descriptive attributes used to filter, group, and label reports: customer, product, date, geography. The resulting diagram resembles a star, with the fact table at the center and dimensions radiating outward. Star schemas are optimized for analytical query performance because they minimize the number of joins required for reporting.

How do I generate a star schema from an existing database?

In TalkingSchema, import your existing OLTP database or describe your system. Then ask the AI copilot: 'Using the current schema as the source, generate a star schema for sales reporting. Fact table: fact_sales. Dimensions: dim_date, dim_customer, dim_product, dim_warehouse, dim_supplier. Grain: one row per sales order line item.' The AI generates the complete star schema as a separate model, creates all dimension and fact tables, and displays the result on the ERD canvas for your review.

What makes a good fact table grain definition?

The grain defines what a single row in the fact table represents. A good grain is atomic (the lowest level of detail available in the source data), unambiguous (every row represents exactly one occurrence of the defined event), and consistent (all measures in the table are true at this grain). Common grains: one row per transaction line item, one row per daily inventory snapshot, one row per shipment leg. Mixing grains in a single fact table — for example, storing both daily totals and individual transactions — is a common modeling error that produces incorrect aggregations.

¿Qué es un esquema estrella y cómo se genera con IA?

Un esquema estrella es el patrón de diseño central de los almacenes de datos modernos. Consiste en una tabla de hechos central (que registra eventos de negocio con medidas numéricas) rodeada de tablas de dimensiones (que almacenan atributos descriptivos como cliente, producto, tiempo, geografía). TalkingSchema genera esquemas estrella completos desde lenguaje natural: describe tus requisitos de analítica, la tabla de hechos y las dimensiones que necesitas, y TalkingSchema crea el modelo completo con claves surrogadas, relaciones de clave foránea y DDL listo para exportar.

O que é um esquema estrela e como gerar um com IA?

Um esquema estrela é o padrão de design central dos data warehouses modernos. Consiste em uma tabela fato central (que registra eventos de negócio com medidas numéricas) rodeada por tabelas dimensão (que armazenam atributos descritivos como cliente, produto, tempo, geografia). TalkingSchema gera esquemas estrela completos a partir de linguagem natural: descreva seus requisitos de análise, a tabela fato e as dimensões que precisa, e TalkingSchema cria o modelo completo com chaves substitutas, relações de chave estrangeira e DDL pronto para exportar.

スタースキーマとは何ですか？AIでスタースキーマを生成するには？

スタースキーマは、現代のデータウェアハウスの中心的な設計パターンです。中央のファクトテーブル（数値メジャーとともにビジネスイベントを記録）を、複数のディメンションテーブル（顧客、製品、時間、地域などの説明的な属性を格納）が囲む構造です。TalkingSchemaは自然言語からスタースキーマを自動生成します。分析要件、ファクトテーブル、必要なディメンションを説明するだけで、サロゲートキー、外部キー関係、エクスポート用DDLを含む完全なモデルが生成されます。

什么是星型模式？如何用AI生成星型模式？

星型模式是现代数据仓库的核心设计模式。它由一个中心事实表（记录带有数字度量的业务事件）和多个维度表（存储客户、产品、时间、地理等描述性属性）组成。TalkingSchema可以从自然语言自动生成完整的星型模式：描述您的分析需求、事实表和所需维度，TalkingSchema将创建包含代理键、外键关系和可导出DDL的完整模型。

Star Schema

note

For a deep comparison of star schema vs. snowflake schema — including grain declaration, BI tool behavior, and when to snowflake specific dimensions — see the Star Schema vs Snowflake Schema blog post.

The star schema is the foundational design pattern of the modern data warehouse — chosen by Amazon, Netflix, Airbnb, and virtually every organization with a functioning business intelligence practice. Its elegance lies in a deliberate trade-off: accept denormalization in dimension tables to eliminate multi-hop joins in analytical queries. The result is a schema that BI tools, SQL analysts, and data scientists can query intuitively, and that columnar query engines can execute at extraordinary speed.

TalkingSchema's AI copilot generates complete, production-ready star schemas from plain language — applying Kimball dimensional modeling best practices automatically, defining grain precisely, designing clean reusable dimensions, and exporting DDL for any target platform.

Anatomy of a Star Schema

The fact table

The fact table is the quantitative core of the star schema. It records business events at a defined grain and contains:

Surrogate foreign keys to each dimension table
Additive measures — numeric values that can be summed, averaged, or counted across any dimension (revenue, quantity, cost, duration)
Semi-additive measures — numeric values that can be summed across some dimensions but not others (inventory quantity — can sum across warehouses, but not across time periods)
Degenerate dimensions — dimension-like attributes with no corresponding dimension table (order number, invoice ID — high cardinality, not used for filtering)
Date key — always a foreign key to the date dimension, never a raw timestamp

What fact tables never contain: descriptive text, customer names, product descriptions — these belong in dimension tables.

The dimension tables

Dimension tables provide the context that makes measures meaningful. A revenue figure is meaningless without knowing the customer, product, time period, and geography it belongs to.

Good dimension tables are:

Wide and flat — dozens of descriptive columns, no normalization of related attributes into child tables
Slowly changing — with a defined strategy for handling attribute changes over time (SCD Type 1, 2, or 3)
Conformed — shared across multiple fact tables when the same entity appears in different business processes (the customer dimension used by both sales and service fact tables)

Example: GSSC Supply Chain Star Schema

Using the Global Sustainable Supply Chain OLTP schema as the source, TalkingSchema generates the following star schema for sales performance reporting:

-- ============================================================
-- GSSC Analytics: Star Schema for Supply Chain Reporting
-- Generated by TalkingSchema AI Copilot
-- Source: GSSC OLTP schema (10 tables, 3NF)
-- Grain (fact_sales): one row per sales order line item
-- Grain (fact_shipments): one row per shipment
-- ============================================================

-- ── Dimension: Date ──────────────────────────────────────────
-- Standard date dimension. Pre-populated for 10 years.
CREATE TABLE dim_date (
    date_key        INTEGER PRIMARY KEY,         -- YYYYMMDD integer key
    full_date       DATE NOT NULL UNIQUE,
    year            SMALLINT NOT NULL,
    quarter         SMALLINT NOT NULL,           -- 1–4
    quarter_label   VARCHAR(6) NOT NULL,         -- 'Q1 2025'
    month           SMALLINT NOT NULL,           -- 1–12
    month_name      VARCHAR(10) NOT NULL,        -- 'January'
    month_short     VARCHAR(3) NOT NULL,         -- 'Jan'
    week_of_year    SMALLINT NOT NULL,
    day_of_week     SMALLINT NOT NULL,           -- 1=Monday, 7=Sunday
    day_name        VARCHAR(10) NOT NULL,        -- 'Monday'
    is_weekend      BOOLEAN NOT NULL,
    is_fiscal_year_end BOOLEAN NOT NULL DEFAULT FALSE
);

-- ── Dimension: Supplier ──────────────────────────────────────
-- Denormalized from OLTP suppliers table.
-- Flat: all supplier attributes in one row.
CREATE TABLE dim_supplier (
    supplier_key    INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    supplier_bk     UUID NOT NULL,               -- Business key from OLTP
    company_name    VARCHAR(200) NOT NULL,
    country         VARCHAR(100) NOT NULL,
    region          VARCHAR(50),                 -- Derived from country
    carbon_tier     VARCHAR(1) NOT NULL,         -- A, B, C, D
    carbon_tier_label VARCHAR(30) NOT NULL,      -- 'Tier A — Gold Certified'
    certification   VARCHAR(100),
    is_active       BOOLEAN NOT NULL,
    -- SCD Type 2 columns
    effective_from  DATE NOT NULL,
    effective_to    DATE,                        -- NULL = current record
    is_current      BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Dimension: Product ───────────────────────────────────────
-- Denormalized from OLTP products table.
CREATE TABLE dim_product (
    product_key             INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    product_bk              UUID NOT NULL,
    sku                     VARCHAR(50) NOT NULL,
    product_name            VARCHAR(200) NOT NULL,
    category                VARCHAR(100) NOT NULL,
    -- Derived carbon scoring bands
    carbon_score_band       VARCHAR(20),         -- 'Low', 'Medium', 'High', 'Critical'
    carbon_intensity_score  DECIMAL(5, 2),
    unit_cost_tier          VARCHAR(10),         -- 'Budget', 'Standard', 'Premium'
    -- Supplier denormalized into product dimension (conformed key)
    supplier_key            INTEGER REFERENCES dim_supplier(supplier_key),
    is_active               BOOLEAN NOT NULL,
    effective_from          DATE NOT NULL,
    effective_to            DATE,
    is_current              BOOLEAN NOT NULL DEFAULT TRUE
);

-- ── Dimension: Customer ──────────────────────────────────────
CREATE TABLE dim_customer (
    customer_key        INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_bk         UUID NOT NULL,
    company_name        VARCHAR(200) NOT NULL,
    country             VARCHAR(100) NOT NULL,
    region              VARCHAR(50),
    customer_tier       VARCHAR(20) NOT NULL,    -- 'enterprise', 'mid-market', 'smb'
    credit_band         VARCHAR(20),             -- 'High', 'Medium', 'Low', 'Watch'
    is_current          BOOLEAN NOT NULL DEFAULT TRUE,
    effective_from      DATE NOT NULL,
    effective_to        DATE
);

-- ── Dimension: Warehouse ─────────────────────────────────────
CREATE TABLE dim_warehouse (
    warehouse_key       INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    warehouse_bk        UUID NOT NULL,
    warehouse_name      VARCHAR(200) NOT NULL,
    city                VARCHAR(100) NOT NULL,
    country             VARCHAR(100) NOT NULL,
    region              VARCHAR(50),
    is_active           BOOLEAN NOT NULL
);

-- ── Fact: Sales ──────────────────────────────────────────────
-- Grain: one row per sales order line item.
-- All measures are fully additive unless noted.
CREATE TABLE fact_sales (
    sales_key           BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    -- Date keys
    order_date_key      INTEGER NOT NULL REFERENCES dim_date(date_key),
    -- Dimension keys
    customer_key        INTEGER NOT NULL REFERENCES dim_customer(customer_key),
    product_key         INTEGER NOT NULL REFERENCES dim_product(product_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    supplier_key        INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
    -- Degenerate dimensions (high-cardinality identifiers, no dimension table)
    sales_order_bk      UUID NOT NULL,           -- Source so_id
    line_item_bk        UUID NOT NULL,           -- Source item_id
    -- Additive measures
    quantity            INTEGER NOT NULL,
    unit_price_usd      DECIMAL(12, 4) NOT NULL,
    line_revenue_usd    DECIMAL(14, 4) NOT NULL, -- quantity × unit_price
    -- Semi-additive measure (sum across products, not time)
    -- Computed at load time from dim_product.carbon_intensity_score
    line_carbon_kg      DECIMAL(10, 3)           -- quantity × product carbon score
);

-- ── Fact: Shipments ──────────────────────────────────────────
-- Grain: one row per shipment record.
CREATE TABLE fact_shipments (
    shipment_key        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    shipped_date_key    INTEGER REFERENCES dim_date(date_key),
    arrived_date_key    INTEGER REFERENCES dim_date(date_key),
    supplier_key        INTEGER NOT NULL REFERENCES dim_supplier(supplier_key),
    warehouse_key       INTEGER NOT NULL REFERENCES dim_warehouse(warehouse_key),
    -- Degenerate dimension
    shipment_bk         UUID NOT NULL,
    purchase_order_bk   UUID NOT NULL,
    -- Additive measures
    emissions_kg        DECIMAL(10, 3),          -- Actual CO₂ per shipment
    -- Derived measures
    transit_days        INTEGER,                 -- arrived_date_key − shipped_date_key
    is_on_time          BOOLEAN                  -- arrived_at <= expected_delivery
);

-- ── Indexes for query performance ────────────────────────────
CREATE INDEX idx_fact_sales_order_date    ON fact_sales(order_date_key);
CREATE INDEX idx_fact_sales_customer      ON fact_sales(customer_key);
CREATE INDEX idx_fact_sales_product       ON fact_sales(product_key);
CREATE INDEX idx_fact_sales_warehouse     ON fact_sales(warehouse_key);
CREATE INDEX idx_fact_shipments_shipped   ON fact_shipments(shipped_date_key);
CREATE INDEX idx_fact_shipments_supplier  ON fact_shipments(supplier_key);

Prompts for TalkingSchema

Generate a star schema from OLTP source

Using the current GSSC OLTP schema as the source, generate a star schema
for supply chain analytics.

Requirements:
- fact_sales: grain = one row per sales order line item
- fact_shipments: grain = one row per shipment
- Dimensions: dim_date, dim_supplier (SCD2), dim_product (SCD2),
  dim_customer (SCD2), dim_warehouse

Rules:
- Use integer surrogate keys for all dimension tables
- Keep a UUID business key column in each dimension for ETL joins
- Do NOT create foreign keys back to the OLTP schema
- Denormalize all supplier attributes into dim_supplier (flat, no joins)
- Carbon scoring bands should be derived columns in dim_product

Generate a date dimension

Generate a dim_date table for a data warehouse. Include: date_key (YYYYMMDD integer),
full_date, year, quarter, quarter_label, month, month_name, week_of_year,
day_of_week, day_name, is_weekend, is_fiscal_year_end.
Also generate a PostgreSQL function to populate it for any date range.

Generate ELT logic for a fact table

Write the SQL to populate fact_sales from the OLTP sales_orders,
sales_order_items, and products tables. Include:
- Join logic to resolve surrogate keys from dimension tables
- Incremental load using order_date as the partition key
- Handling for late-arriving facts (orders with past order dates)

Frequently Asked Questions

Should fact tables have indexes?

Yes — on all foreign keys. Analytical queries almost always filter or join on dimension keys (date_key, customer_key, product_key). Index each dimension key column. Avoid indexing measures — columnar databases handle measure aggregation via full scans, not index seeks.

What is a degenerate dimension?

A degenerate dimension is a dimension attribute stored directly on the fact table — with no corresponding dimension table. Order numbers, invoice IDs, and transaction reference numbers are degenerate dimensions. They are too high-cardinality and too rarely used as filter attributes to justify a dimension table, but they are essential for row-level traceability.

How many fact tables should a star schema have?

Each distinct business process that produces measurable events typically becomes its own fact table — with its own grain. Sales orders, shipments, inventory snapshots, and purchase orders are four distinct business processes with four distinct grains. Combining them into one fact table (a "kitchen sink" fact) breaks the grain contract and produces incorrect aggregations.

How do I handle currency conversion in fact tables?

Store the original transaction currency and amount alongside a converted USD amount. Add a currency_code column and a conversion_rate column. Compute amount_usd = amount_local * conversion_rate at ETL time using the exchange rate on the transaction date. This preserves auditability while enabling global reporting.

Anatomy of a Star Schema​

The fact table​

The dimension tables​

Example: GSSC Supply Chain Star Schema​

Prompts for TalkingSchema​

Generate a star schema from OLTP source​

Generate a date dimension​

Generate ELT logic for a fact table​

Frequently Asked Questions​

Should fact tables have indexes?​

What is a degenerate dimension?​

How many fact tables should a star schema have?​

How do I handle currency conversion in fact tables?​