Skip to main content

2 posts tagged with "dimensional modeling"

View All Tags

Kimball vs Inmon vs Data Vault 2.0: Data Warehouse Architecture Guide

Every data team eventually walks into the same room.

Someone is convinced Kimball is dead and cloud storage fixes everything. Someone else is just as convinced that Inmon is the only serious enterprise architecture. A third voice slides in with "Data Vault 2.0 — hubs, links, satellites, audit-ready, future-proof." Forty minutes later nothing has been decided, six weeks later nothing has been built, and the dashboards the business asked for in Q1 are now a Q3 problem.

Here's my promise: this post will help you pick an approach the way an architect picks a structure—not like a sports fan picking a jersey.

Slowly Changing Dimensions (SCD) Type 1, 2, 6: Make the Decision in Your Schema, Not Your ETL

ETL is the bullied kid of the data stack. Dashboard wrong? Blame the ETL engineer. Patient count drops 8% on a Monday? ETL. Cost-sharing numbers shift overnight and the CFO wants answers by 9 AM? ETL again. The Slack thread always lands on the same person — who almost certainly isn't the one who introduced the lie.

Let's look at one of those lies. A patient was reclassified from a Bronze insurance tier to a Gold tier in March, but the claims report has been grouping her Bronze claims under the Gold bucket ever since. The ETL job that loaded the new tier overwrote the old value. Every historical encounter that happened under Bronze coverage now looks like Gold.

For context: the healthcare warehouse canvas below shows the source systems and warehouse schema we're working with.

This is what a missing SCD Type 2 decision looks like in production. It's almost never caught by data quality tests, because every row is individually valid. The lie is in the history, not the data. And it didn't originate in the ETL code where it eventually surfaces — it was introduced six months earlier, at the schema-design layer, when someone made an SCD typing decision (or silently failed to) and never wrote it down anywhere outside of a MERGE statement.

I've been designing data warehouse dimension layers long enough to have strong opinions here. The SCD taxonomy is one of the most decision-dense areas of dimensional modeling — eight types with overlapping tradeoffs, no universal agreement on when to use what, and performance consequences that bite teams six to twelve months after they think they've done it right. This post is my attempt to cut through all of it.