Skip to main content

One Big Table (OBT)

One Big Table (OBT) is a pragmatic alternative to star schemas for teams that prioritize speed and simplicity over normalized design. Rather than maintaining separate fact and dimension tables, OBT flattens everything into a single wide table per analytical use case — one row per event, with all relevant dimension attributes inlined. No joins at query time; columnar warehouses scan only the columns you need.

TalkingSchema's AI copilot generates OBT schemas from your star schema or OLTP source — ideal for dbt-centric workflows, internal dashboards, and small teams that don't need the full Kimball toolkit.


OBT vs Star Schema: The Trade-Off

DimensionStar SchemaOne Big Table
Table structureFact + dimension tablesSingle wide table per use case
Query joinsMultiple joinsNone (or minimal)
Data redundancyLow (dimensions normalized)High (attributes repeated per row)
GovernanceStrong (conformed dimensions)Weaker (definitions can drift)
SCD handlingExplicit (effective_from, is_current)Awkward (requires full refresh or complex logic)
ETL/ELT complexityHigherLower (one table to build)
BI tool setupSemantic layer for joinsDirect table connection
Best forEnterprise, multi-use-caseSmall teams, narrow use cases, dbt marts

When to Use OBT

  • Small teams — Fewer tables to maintain, faster iteration
  • Narrow use cases — One report or domain per OBT (e.g., sales_obt, inventory_obt)
  • dbt-centric workflows — OBT as a serving layer built from staging or dimensional models
  • BI tools that prefer flat tables — Some tools simplify when there are no joins
  • Prototyping — Get to insights quickly; refactor to star schema later if needed

Avoid OBT when you need conformed dimensions across multiple fact tables, strict SCD Type 2 history, or strong governance across many analysts.


Example: GSSC Sales OBT

Flattening the GSSC star schema into a single sales analytics table:

-- ============================================================
-- GSSC Analytics: One Big Table for Sales Reporting
-- Grain: one row per sales order line item
-- All dimension attributes inlined — no joins at query time
-- ============================================================

CREATE TABLE obt_sales (
-- Degenerate / identifiers
sales_order_bk UUID NOT NULL,
line_item_bk UUID NOT NULL,
order_date DATE NOT NULL,
-- Date attributes (from dim_date)
order_year SMALLINT NOT NULL,
order_quarter SMALLINT NOT NULL,
order_month SMALLINT NOT NULL,
order_month_name VARCHAR(10) NOT NULL,
-- Customer attributes (from dim_customer)
customer_id UUID NOT NULL,
customer_name VARCHAR(200) NOT NULL,
customer_country VARCHAR(100) NOT NULL,
customer_tier VARCHAR(20) NOT NULL,
-- Product attributes (from dim_product)
product_id UUID NOT NULL,
product_sku VARCHAR(50) NOT NULL,
product_name VARCHAR(200) NOT NULL,
product_category VARCHAR(100) NOT NULL,
carbon_score_band VARCHAR(20),
-- Warehouse attributes (from dim_warehouse)
warehouse_name VARCHAR(200) NOT NULL,
warehouse_city VARCHAR(100) NOT NULL,
warehouse_country VARCHAR(100) NOT NULL,
-- Supplier attributes (from dim_supplier)
supplier_name VARCHAR(200) NOT NULL,
supplier_carbon_tier VARCHAR(1) NOT NULL,
-- Measures
quantity INTEGER NOT NULL,
unit_price_usd DECIMAL(12, 4) NOT NULL,
line_revenue_usd DECIMAL(14, 4) NOT NULL,
line_carbon_kg DECIMAL(10, 3)
);

-- Partition by order_date for efficient date-range queries
-- (BigQuery: PARTITION BY order_date, Snowflake: CLUSTER BY order_date)
CREATE INDEX idx_obt_sales_order_date ON obt_sales(order_date);
CREATE INDEX idx_obt_sales_customer ON obt_sales(customer_id);
CREATE INDEX idx_obt_sales_product ON obt_sales(product_category);

Querying the OBT

No joins — filter and aggregate directly:

-- Revenue by product category and month
SELECT
product_category,
order_month_name,
order_year,
SUM(line_revenue_usd) AS total_revenue
FROM obt_sales
WHERE order_year = 2025
GROUP BY product_category, order_month_name, order_year
ORDER BY order_year, order_month, product_category;

Prompts for TalkingSchema

Generate OBT from star schema

Using the current GSSC star schema, generate a One Big Table for sales analytics.

Requirements:
- Table name: obt_sales
- Grain: one row per sales order line item (same as fact_sales)
- Inline all attributes from: dim_date, dim_customer, dim_product,
dim_warehouse, dim_supplier
- Include all measures from fact_sales: quantity, unit_price_usd,
line_revenue_usd, line_carbon_kg
- Use descriptive column names (e.g., customer_name, product_category)
- Add indexes on order_date, customer_id, product_category

Generate dbt model to build OBT

Write a dbt model that builds obt_sales from the star schema.
Source tables: fact_sales, dim_date, dim_customer, dim_product,
dim_warehouse, dim_supplier.

Join logic:
- fact_sales to each dimension on the surrogate key
- Select the dimension attributes to inline (no SCD filtering — use current only)
- Include incremental logic using order_date_key as the partition key

Frequently Asked Questions

Should OBT replace my star schema?

Usually not. OBT works best as a serving layer or data mart built on top of a star schema (or raw/staging layer). The star schema provides governance and conformed dimensions; the OBT simplifies consumption for specific use cases. Building OBT directly from raw sources is possible but loses the benefits of dimensional modeling.

How do I handle dimension changes in OBT?

OBT does not handle SCD elegantly. Options: (1) Current state only — rebuild the OBT periodically with the latest dimension attributes; historical reports show current attributes, not point-in-time. (2) Snapshot columns — add effective_from / effective_to to the OBT if you need history; this widens the table further. (3) Keep star schema for historical — use OBT only for current-state dashboards.

Does OBT work with BigQuery and Snowflake?

Yes. Columnar storage means scanning a wide table is efficient — only the columns in your SELECT are read. Partition by date (e.g., order_date) and cluster by common filter columns (e.g., customer_id, product_category) for best performance. OBT is well-suited to cloud warehouse economics where storage is cheap and compute is billed per query.