Skip to main content

5 Schema Design Patterns Every Developer Should Know

· 5 min read
TalkingSchema Team
Building the future of database design

Schema design isn't just about getting data to persist. It's about modeling the shape of your domain — the things that exist, the relationships between them, and the constraints that keep it all consistent. A good schema makes queries fast, business logic obvious, and migrations safe. A bad one makes all three miserable.

These are five patterns that appear in almost every non-trivial system, with notes on when to reach for each one.

1. Polymorphic associations (done right)

The naive version of polymorphism — a commentable_type + commentable_id column pair — works until you need a foreign key or an index. Instead, use a union table:

-- Each commentable entity gets its own join table
CREATE TABLE post_comments (
id BIGSERIAL PRIMARY KEY,
post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
comment_id BIGINT NOT NULL REFERENCES comments(id) ON DELETE CASCADE,
UNIQUE (post_id, comment_id)
);

CREATE TABLE photo_comments (
id BIGSERIAL PRIMARY KEY,
photo_id BIGINT NOT NULL REFERENCES photos(id) ON DELETE CASCADE,
comment_id BIGINT NOT NULL REFERENCES comments(id) ON DELETE CASCADE,
UNIQUE (photo_id, comment_id)
);

You get real referential integrity, proper indexes, and queries that the query planner can reason about. The cost is more tables — which is almost always worth it.

2. Hierarchical data with closure tables

Self-referential parent_id columns are easy to write and brutal to query. Getting all descendants of a node requires a recursive CTE or application-level recursion. The closure table pattern pre-computes every ancestor-descendant relationship:

CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL
);

-- Every (ancestor, descendant) pair, including self-references (depth = 0)
CREATE TABLE category_tree (
ancestor_id BIGINT NOT NULL REFERENCES categories(id),
descendant_id BIGINT NOT NULL REFERENCES categories(id),
depth INT NOT NULL DEFAULT 0,
PRIMARY KEY (ancestor_id, descendant_id)
);

Getting all descendants of category 42 becomes a simple join:

SELECT c.*
FROM categories c
JOIN category_tree t ON c.id = t.descendant_id
WHERE t.ancestor_id = 42 AND t.depth > 0;

The trade-off: writes are more expensive (you insert N rows per new node, where N is the depth of the tree), but reads are O(1) regardless of tree depth.

3. Audit logs with event sourcing

Instead of updating rows in place, append only and reconstruct state from history. This is particularly useful for financial systems, compliance requirements, or anything where you need to ask "what did this look like on March 3rd?"

CREATE TABLE account_events (
id BIGSERIAL PRIMARY KEY,
account_id BIGINT NOT NULL REFERENCES accounts(id),
event_type TEXT NOT NULL, -- 'created', 'credited', 'debited', 'closed'
amount_cents BIGINT, -- NULL for non-monetary events
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
metadata JSONB
);

-- Current balance is a derived value, not stored
CREATE VIEW account_balances AS
SELECT
account_id,
SUM(CASE event_type
WHEN 'credited' THEN amount_cents
WHEN 'debited' THEN -amount_cents
ELSE 0
END) AS balance_cents
FROM account_events
GROUP BY account_id;

The event log is the source of truth. The balance is a projection. You can replay history, add new projections later, and audit every state transition for free.

4. Soft deletes with deleted_at

Rather than DELETEing rows, set a deleted_at timestamp. Queries filter it out; the data stays recoverable.

ApproachRecoveryFK integrityQuery simplicity
Hard deleteCascades automaticallySimple
Soft deleteMust filter in every queryRequires discipline
Archive tableMaintainedModerate

The catch with soft deletes is that every query must include the filter — easy to forget, especially in ORM-generated queries. A database view or row-level security policy can enforce it:

-- All queries against this view automatically exclude soft-deleted rows
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

Reserve soft deletes for entities that users expect to be recoverable (posts, documents, records). Don't use them for join tables or events — those should be hard deleted or append-only, respectively.

5. Slowly changing dimensions (SCD Type 2)

When historical accuracy matters — for analytics, audits, or reporting — you need to track how data changed over time, not just what it is now. Type 2 SCDs do this by adding validity windows to every version of a record:

CREATE TABLE product_prices (
id BIGSERIAL PRIMARY KEY,
product_id BIGINT NOT NULL REFERENCES products(id),
price_cents INT NOT NULL,
currency CHAR(3) NOT NULL DEFAULT 'USD',
valid_from DATE NOT NULL,
valid_until DATE, -- NULL means "current"
is_current BOOLEAN NOT NULL DEFAULT TRUE,
CONSTRAINT one_current_price_per_product
EXCLUDE USING GIST (product_id WITH =) WHERE (is_current = TRUE)
);

To find the price on a specific date:

SELECT price_cents
FROM product_prices
WHERE product_id = $1
AND valid_from <= $2
AND (valid_until IS NULL OR valid_until > $2);

The EXCLUDE USING GIST constraint ensures only one price is marked is_current per product, enforced at the database level.


A note on schema evolution

These patterns are a starting point, not a destination. Real schemas evolve — columns get added, tables split, constraints tighten. The patterns above are designed to make evolution safer: append-only events don't need migrations; closure tables can be rebuilt; SCD rows are never updated, only closed.

If you're designing a new schema from scratch, try describing the domain in plain language first. What are the things? What are the relationships? What constraints are always true? The code follows from the answers — not the other way around.