offdata ai — agentic AI for data modelers and data engineers
3 min readData VaultData ModelingGuide

Automating Data Vault 2.0 — Hubs, Links, and Satellites from a Description

Data Vault 2.0 is powerful but tedious to model by hand. Here's how to generate a standards-compliant raw vault — hubs, links, satellites, hash keys, and hash diffs — from a plain-English description.

Data Vault 2.0 is the methodology teams reach for when a warehouse has to absorb many source systems, track full history, and stay auditable. It's also the methodology that's most tedious to model by hand: every business key becomes a hub, every relationship a link, every set of descriptive attributes a satellite — and each one needs consistent hash keys and change-detection logic.

This post walks through generating a Data Vault 2.0 model from a description, so you get the methodology's benefits without authoring every structure by hand.

The three core structures

Data Vault separates what from how it relates from what we know about it:

  • Hubs store the immutable business keys (a customer number, an order number).
  • Links store the relationships between business keys (this customer placed this order).
  • Satellites store the descriptive, time-variant context (the customer's email and plan as of a load date).

Keeping these separate is what lets a Data Vault grow without breaking: a new source adds a satellite, a new relationship adds a link, and nothing existing has to change.

Why it's painful by hand

The pattern is mechanical, which is exactly why it's error-prone at scale:

  • Every hub and link needs a hash key (_HK) derived from its business key(s).
  • Every satellite needs a HASH_DIFF so loads can detect changes.
  • Every table needs a load timestamp and a record source.
  • The hashing has to be identical across platforms, or loads stop being deterministic.

Hand-writing this for dozens of tables is where mistakes creep in.

Generating the vault from a description

Pick the Data Vault paradigm and describe your domain in plain English:

We ingest customers and orders from our billing system. Customers have an email and a plan. Orders belong to a customer and have a total and a status.

OffDataAI identifies the business keys (customer, order), the relationship (customer → order), and the descriptive attributes (email, plan, total, status), then generates the structures:

-- Hub: the customer business key
CREATE TABLE raw_vault.hub_customer (
  customer_hk      BINARY(20)    NOT NULL,  -- hash key
  customer_bk      STRING        NOT NULL,  -- business key
  load_dts         TIMESTAMP_NTZ NOT NULL,
  record_source    STRING        NOT NULL,
  CONSTRAINT pk_hub_customer PRIMARY KEY (customer_hk)
);

-- Link: customer placed order
CREATE TABLE raw_vault.link_customer_order (
  customer_order_hk BINARY(20)    NOT NULL,
  customer_hk       BINARY(20)    NOT NULL,
  order_hk          BINARY(20)    NOT NULL,
  load_dts          TIMESTAMP_NTZ NOT NULL,
  record_source     STRING        NOT NULL
);

-- Satellite: descriptive context, time-variant
CREATE TABLE raw_vault.sat_customer_details (
  customer_hk      BINARY(20)    NOT NULL,
  load_dts         TIMESTAMP_NTZ NOT NULL,
  hash_diff        BINARY(20)    NOT NULL,  -- change detection
  email            STRING,
  plan_id          STRING,
  record_source    STRING        NOT NULL,
  CONSTRAINT pk_sat_customer PRIMARY KEY (customer_hk, load_dts)
);

The hash keys and HASH_DIFF columns are emitted by the DDL layer — you never hand-add them — so hashing is consistent whether you target Snowflake, BigQuery, Databricks, Redshift, or Postgres.

Loading it with dbt

A raw vault is only useful once it's loaded. OffDataAI emits a dbt project alongside the DDL: staging models that compute the hash keys and hash diffs, and insert-only loads for hubs, links, and satellites. It runs with dbt build from the first commit.

Evolving without breaking

When the billing system adds a region field, you don't alter sat_customer_details — you add a new satellite (or a new column on a versioned satellite) and OffDataAI generates the migration. That insert-only, additive evolution is the whole point of Data Vault, and it's preserved automatically.

Try it

Describe your sources and get a standards-compliant raw vault in minutes. Start with the Data Vault 2.0 generator, or compare approaches with dimensional modeling and the full AI data modeling tool.


Try OffDataAI

Ship a warehouse from a single conversation.

Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.

Start building — free