Data Vault 2.0
Data Vault 2.0 is an enterprise data warehouse modeling methodology that prioritizes auditability, source-system independence, and agile evolution over query simplicity. Where star schemas optimize for analytical reads, Data Vault optimizes for loading raw data from multiple sources without losing lineage or history — then you build star schemas (or OBTs) on top as business vaults or marts.
TalkingSchema's AI copilot generates Data Vault 2.0 schemas from your source models — creating hubs, links, and satellites with proper hash keys, load metadata, and record source tracking.
Anatomy of Data Vault 2.0
Hubs
Hubs represent core business entities. They store only the business key and metadata — no descriptive attributes.
- Business key — The natural key from the source system (e.g.,
customer_id,order_id) - Surrogate hash key — A hash of the business key for consistent joins (e.g.,
hub_customer_hk,hub_order_hk) - Load metadata —
load_dts(load timestamp),record_source(which system provided the row)
Hubs avoid duplication: each business key appears once. New sources that provide the same entity add rows to the hub; satellites hold the varying attributes.
Links
Links model relationships between hubs — many-to-many associations that are themselves first-class entities.
- Foreign hash keys — References to the related hubs (e.g.,
hub_customer_hk,hub_order_hkfor an order-customer link) - Link hash key — A hash of the combined foreign keys for the link's primary key
- Load metadata — Same as hubs
Links can have their own satellites to describe the relationship (e.g., order line quantity, discount).
Satellites
Satellites hold descriptive attributes and their history. Each satellite attaches to one hub or one link.
- Parent hash key — References the hub or link
- Load timestamp — When the row was loaded (enables point-in-time queries)
- Record source — Which system provided the data
- Attributes — The actual descriptive columns (name, status, amount, etc.)
Multiple satellites per hub allow you to segregate by source system (e.g., sat_customer_crm, sat_customer_billing) or by change frequency. Satellites are append-only; updates create new rows.
When to Use Data Vault 2.0
| Scenario | Data Vault fit |
|---|---|
| Regulatory auditability | ✅ Full lineage, load timestamps, record source |
| Multiple source systems | ✅ Add new sources without refactoring hubs |
| Bitemporal / point-in-time queries | ✅ Satellites preserve history with load_dts |
| Agile warehouse evolution | ✅ New attributes = new satellite; no ALTER on existing tables |
| Simple analytics, single source | ⚠️ Star schema is simpler |
| BI tool self-service | ⚠️ Business vault or mart layer needed on top |
Example: GSSC Data Vault Model
Using the Global Sustainable Supply Chain OLTP schema as the source:
-- ============================================================
-- GSSC Data Vault 2.0 Model
-- Hubs: supplier, product, warehouse, customer, purchase_order, sales_order
-- Links: po_line, so_line, shipment
-- Satellites: PII, attributes, status
-- ============================================================
-- ── Hub: Supplier ───────────────────────────────────────────
CREATE TABLE hub_supplier (
hub_supplier_hk VARCHAR(64) PRIMARY KEY, -- Hash of supplier_id
supplier_bk UUID NOT NULL UNIQUE, -- Business key
load_dts TIMESTAMPTZ NOT NULL DEFAULT now(),
record_source VARCHAR(100) NOT NULL
);
-- ── Hub: Product ────────────────────────────────────────────
CREATE TABLE hub_product (
hub_product_hk VARCHAR(64) PRIMARY KEY,
product_bk UUID NOT NULL UNIQUE,
load_dts TIMESTAMPTZ NOT NULL DEFAULT now(),
record_source VARCHAR(100) NOT NULL
);
-- ── Hub: Purchase Order ─────────────────────────────────────
CREATE TABLE hub_purchase_order (
hub_po_hk VARCHAR(64) PRIMARY KEY,
po_bk UUID NOT NULL UNIQUE,
load_dts TIMESTAMPTZ NOT NULL DEFAULT now(),
record_source VARCHAR(100) NOT NULL
);
-- ── Link: PO → Supplier, PO → Warehouse ──────────────────────
CREATE TABLE link_po_supplier_warehouse (
link_po_supplier_wh_hk VARCHAR(64) PRIMARY KEY,
hub_po_hk VARCHAR(64) NOT NULL REFERENCES hub_purchase_order(hub_po_hk),
hub_supplier_hk VARCHAR(64) NOT NULL REFERENCES hub_supplier(hub_supplier_hk),
hub_warehouse_hk VARCHAR(64) NOT NULL,
load_dts TIMESTAMPTZ NOT NULL DEFAULT now(),
record_source VARCHAR(100) NOT NULL
);
-- ── Satellite: Supplier attributes ───────────────────────────
CREATE TABLE sat_supplier_pii (
hub_supplier_hk VARCHAR(64) NOT NULL REFERENCES hub_supplier(hub_supplier_hk),
load_dts TIMESTAMPTZ NOT NULL,
record_source VARCHAR(100) NOT NULL,
company_name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL,
carbon_tier VARCHAR(1) NOT NULL,
certification VARCHAR(100),
is_active BOOLEAN NOT NULL,
PRIMARY KEY (hub_supplier_hk, load_dts)
);
-- ── Satellite: PO status (tracks changes over time) ───────────
CREATE TABLE sat_po_status (
hub_po_hk VARCHAR(64) NOT NULL REFERENCES hub_purchase_order(hub_po_hk),
load_dts TIMESTAMPTZ NOT NULL,
record_source VARCHAR(100) NOT NULL,
order_date DATE NOT NULL,
expected_delivery DATE,
status VARCHAR(30) NOT NULL,
PRIMARY KEY (hub_po_hk, load_dts)
);
Prompts for TalkingSchema
Generate a Data Vault model from OLTP
Using the current GSSC OLTP schema as the source, generate a Data Vault 2.0 model.
Create:
- Hubs: hub_supplier, hub_product, hub_warehouse, hub_customer,
hub_purchase_order, hub_sales_order
- Links: link_po_supplier_warehouse, link_po_line (po + product),
link_so_line (sales order + product), link_shipment
- Satellites: sat_supplier_attrs, sat_product_attrs, sat_po_status,
sat_so_status
Rules:
- Use MD5 or SHA256 hash for surrogate keys (hub_*_hk, link_*_hk)
- Every table must have load_dts and record_source
- Satellites: PK = (parent_hk, load_dts) for history
- No descriptive attributes in hubs or links
Add a new source system to an existing Data Vault
We have an existing Data Vault with hub_customer and sat_customer_crm.
Add a new satellite sat_customer_billing for attributes from our
billing system: billing_tier, payment_terms, credit_limit.
Include load_dts and record_source = 'billing_system'.
Frequently Asked Questions
What is a business vault?
A business vault is a layer built on top of the raw Data Vault that applies business rules, deduplication, and SCD logic. It produces cleaner structures (e.g., current customer attributes) for consumption. The raw vault preserves everything; the business vault simplifies for analytics. TalkingSchema can generate both.
How do I query a Data Vault for current state?
Join each hub to its satellites and filter for the latest load_dts per hub key. Use a window function: ROW_NUMBER() OVER (PARTITION BY hub_customer_hk ORDER BY load_dts DESC) and keep rn = 1. Or build a view/materialized view that does this — the business vault layer.
Does Data Vault work with dbt?
Yes. dbt is commonly used to build Data Vault models — staging models clean and hash keys, then hub/link/satellite models insert into the raw vault. TalkingSchema can generate dbt model SQL for Data Vault loading patterns.