Skip to main content

Data Warehouse Modeling Overview

Data warehouse schema design has historically required weeks of collaboration between data engineers, analysts, and architects — whiteboarding dimensional models, arguing over grain definitions, hand-writing DDL. TalkingSchema compresses that timeline by an order of magnitude. Describe your analytical requirements in plain language, connect your OLTP source schema, and the AI copilot generates a production-ready dimensional model: fact tables with the correct grain, dimension tables properly denormalized, conformed dimensions for cross-functional reporting, and export-ready SQL.

No manual diagramming. No separate documentation step. The ERD canvas is the model.


Why Data Warehouse Schema Design Is Hard

Well-intentioned teams make the same mistakes repeatedly in dimensional modeling:

The grain ambiguity problem. A fact table without a precisely defined grain produces incorrect aggregations. SUM(revenue) on a fact table with multiple rows per order produces double-counted results. Getting the grain right requires careful analysis of source data and intended query patterns — not just copying the OLTP schema structure.

The premature normalization problem. Data engineers familiar with 3NF OLTP design instinctively normalize dimension tables into snowflake hierarchies. This is almost always the wrong trade-off in modern cloud data warehouses: the storage savings are negligible; the query complexity is not.

The conformed dimension problem. Each department builds its own customer or product dimension. By the time the organization needs a cross-functional report, the dimensions are incompatible. Designing conformed dimensions up front requires authority and consensus most teams never have time to establish.

The pipeline design blindspot. The dimensional model and the ETL/ELT pipeline that loads it are co-designed artifacts. A dimensional model designed without considering the transformation logic will produce unmaintainable pipelines.

TalkingSchema's AI copilot addresses all four problems — asking for grain definition before generating fact tables, recommending denormalized dimensions by default, identifying shared entities that should become conformed dimensions, and generating sample ELT transformation logic alongside the schema.


The OLTP → OLAP Workflow

Step 1 — Connect your source OLTP schema

Import your transactional database to give the AI copilot full source schema context.

Using the GSSC example — a 10-table 3NF supply chain schema including suppliers, products, purchase_orders, sales_orders, and shipments — this becomes the source of truth for the analytical model.

Step 2 — Define your analytical requirements

Tell the AI what business questions this model must answer:

Using the current GSSC OLTP schema as the source, design an analytical
star schema for supply chain performance reporting.

Business requirements:
- Monthly revenue by product, supplier, customer, and warehouse
- Carbon emissions tracking by supplier tier and shipment route
- Purchase order fulfillment rate (on-time delivery %) by supplier
- Inventory turnover by warehouse and product category

Rules:
- Keep the analytical model completely separate from the OLTP schema
- No foreign keys back to OLTP tables
- Surrogate integer keys for all dimension tables
- Grain: one row per sales order line item in fact_sales
- Grain: one row per shipment in fact_shipments

Step 3 — Review the AI-generated model

The AI generates the full star schema as a proposed ERD change:

  • dim_date — standard date dimension with year, quarter, month, week, day-of-week attributes
  • dim_supplier — flattened supplier attributes including carbon tier and certification status
  • dim_product — flattened product attributes including category hierarchy and cost bands
  • dim_customer — customer attributes including tier, country, and credit classification
  • dim_warehouse — warehouse location attributes
  • fact_sales — one row per sales order line item with revenue, quantity, and discount measures
  • fact_shipments — one row per shipment with emissions, transit time, and on-time delivery flag

Review the Plan Mode checklist — each table creation is a separate line item you can accept, modify, or exclude.

Step 4 — Export

Generate DDL for your cloud data warehouse:

Export this star schema as:
1. PostgreSQL DDL (for Supabase, Neon, or self-hosted)
2. BigQuery DDL (FLOAT64, DATE, TIMESTAMP types)
3. Snowflake DDL (VARIANT for semi-structured columns)
4. Databricks SQL DDL (DELTA format with CLUSTER BY)

Supported Modeling Patterns

PatternWhen to useTalkingSchema support
Star schemaMost analytics use cases — BI tools, dashboardsFull
Snowflake schemaLarge dimension tables with deep hierarchiesFull
Dimensional modeling (Kimball)Enterprise data warehouses, Kimball bus matrixFull
Data Vault 2.0Regulatory auditability, bitemporal trackingFull (via AI copilot)
One Big Table (OBT)Small teams, dbt-centric analytical workflowsFull

Frequently Asked Questions

Does TalkingSchema support BigQuery, Snowflake, and Databricks DDL?

Yes. Ask the AI copilot to export the schema targeting your specific platform. Specify: "Export this star schema for BigQuery. Use ARRAY of STRUCT for nested attributes in the date dimension." The AI generates platform-specific DDL with correct syntax.

Can TalkingSchema design ELT transformation logic too?

Yes. After designing the dimensional model, ask: "Write the dbt model SQL to populate fact_sales from the sales_orders and sales_order_items OLTP tables. Include incremental logic using order_date as the partition key." The AI generates source-to-target transformation SQL following dbt conventions.

What is a conformed dimension?

A conformed dimension is a dimension table shared across multiple fact tables — enabling cross-functional analysis. For example, dim_date applies to both fact_sales and fact_shipments, so you can compare sales revenue and shipment volume in the same report. TalkingSchema identifies shared entities in your source schema and recommends conformed dimensions automatically.

How does TalkingSchema handle slowly changing dimensions?

Ask explicitly: "The dim_supplier table needs to track carbon tier changes over time. Implement SCD Type 2 with effective_from, effective_to, and is_current columns." See the SCD Types guide →