How to Generate a Snowflake Schema from Natural Language (Step by Step)
A complete walkthrough of generating a Snowflake schema — DDL, clustering, and a full dbt project — from a plain-English description, using OffDataAI.
This post walks through generating a production-ready Snowflake schema from a single paragraph of plain English — and turning it into a complete dbt project ready for dbt build.
We'll model a small B2B SaaS warehouse with multi-tenant customers, plans, usage events, and invoices. By the end you'll have:
- A Snowflake DDL file with proper
CLUSTER BYandTIMESTAMP_NTZtypes. - A complete dbt project: sources, staging, marts, tests, and seeds.
- A Mermaid ERD you can paste into your docs.
Step 1: Describe the domain
In OffDataAI, the only mandatory input is a natural-language description. Here is the one we'll use:
We're a B2B SaaS platform that tracks subscriptions, usage events, and billing across multi-tenant customers. Customers have plans, plans meter on usage, and we invoice monthly. Customer plan changes are SCD2 — name and billing-address updates are SCD1.
Notice we mention grain ("invoice monthly"), SCD types, and multi-tenancy. The more of these you mention up front, the fewer follow-up questions the interview agent will ask.
Step 2: Answer the interview questions
The interview agent (Claude Haiku 4.5) will ask three or four targeted questions. Typical ones for this domain:
- What is the grain of a usage event? → "Per API call."
- Should the customer dimension be SCD2 on plan changes? → "Yes, SCD2 on plan_id. SCD1 on name and billing-address."
- Are invoices derived from usage rollups, or directly from subscriptions? → "From subscription + monthly usage rollup."
That's it. Three answers and the model is unambiguous.
Step 3: Inspect the Intermediate Representation
OffDataAI's synthesis agent (Claude Sonnet 4.5) folds your answers into a typed IR. Here's the relevant slice:
{
"paradigm": "kimball",
"platform": "snowflake",
"entities": [
{
"name": "dim_customer",
"type": "dimension",
"scd": "type_2",
"natural_key": "customer_id",
"attributes": [
{ "name": "customer_sk", "type": "string", "nullable": false },
{ "name": "customer_id", "type": "string", "nullable": false },
{ "name": "name", "type": "string" },
{ "name": "plan_id", "type": "string" },
{ "name": "valid_from", "type": "date", "nullable": false },
{ "name": "valid_to", "type": "date" }
]
},
{
"name": "fct_usage",
"type": "fact",
"grain": "api_call",
"attributes": [
{ "name": "usage_id", "type": "string", "nullable": false },
{ "name": "customer_sk", "type": "string", "nullable": false },
{ "name": "plan_id", "type": "string", "nullable": false },
{ "name": "event_ts", "type": "timestamp", "nullable": false },
{ "name": "api_calls", "type": "integer", "nullable": false }
]
}
]
}
This is the contract. Every downstream artifact is generated from this IR. If you edit the IR (in the UI, or via API), the artifacts regenerate.
Step 4: Generate Snowflake DDL
Here's what OffDataAI emits for the customer dimension:
CREATE OR REPLACE TABLE analytics.dim_customer (
customer_sk STRING NOT NULL,
customer_id STRING NOT NULL,
name STRING,
plan_id STRING,
valid_from DATE NOT NULL,
valid_to DATE,
CONSTRAINT pk_dim_customer PRIMARY KEY (customer_sk)
)
CLUSTER BY (customer_id, valid_from);
CREATE OR REPLACE TABLE analytics.fct_usage (
usage_id STRING NOT NULL,
customer_sk STRING NOT NULL,
plan_id STRING NOT NULL,
event_ts TIMESTAMP_NTZ NOT NULL,
api_calls NUMBER(18,0) NOT NULL,
CONSTRAINT pk_fct_usage PRIMARY KEY (usage_id)
)
CLUSTER BY (event_ts, customer_sk);
Two things to notice:
CLUSTER BYis Snowflake-native. A generic LLM might emitPARTITION BY(a BigQuery concept) orDISTKEY(a Redshift concept). OffDataAI's Snowflake generator only emits Snowflake-correct DDL.TIMESTAMP_NTZis the right type. Most generic tools emitTIMESTAMPorDATETIME, which behaves subtly differently.TIMESTAMP_NTZis what you almost always want for warehouse fact tables.
Step 5: Generate the dbt project
OffDataAI emits a complete dbt tree:
dbt/
├── dbt_project.yml
├── models/
│ ├── staging/
│ │ ├── _sources.yml
│ │ ├── stg_customer.sql
│ │ └── stg_usage_event.sql
│ └── marts/
│ ├── dim_customer.sql
│ └── fct_invoice.sql
├── seeds/
│ ├── dim_customer.csv
│ └── dim_plan.csv
└── tests/
└── schema.yml
The mart for fct_invoice is incremental, with usage rollup and SCD2-aware joining:
{{ config(
materialized='incremental',
unique_key='invoice_id',
cluster_by=['period_start']
) }}
with usage_rollup as (
select
customer_sk,
date_trunc('month', event_ts)::date as period_start,
sum(api_calls) as total_calls
from {{ ref('fct_usage') }}
{% if is_incremental() %}
where event_ts >= (select max(period_start) from {{ this }})
{% endif %}
group by 1, 2
)
select
{{ dbt_utils.generate_surrogate_key(['customer_sk','period_start']) }} as invoice_id,
u.customer_sk,
p.unit_price * u.total_calls as amount,
u.period_start,
last_day(u.period_start) as period_end
from usage_rollup u
left join {{ ref('dim_plan') }} p using (plan_id)
Step 6: Run it
cd dbt
dbt deps
dbt seed
dbt build
That's a working Snowflake warehouse from one paragraph of English.
Going further
- Need the same domain modeled as Data Vault 2.0? Switch the paradigm in the IR. The generator emits hubs, links, and satellites.
- Need it for BigQuery instead? See the BigQuery schema generator.
- Want a guided walkthrough? Book a demo.
Or just try it — it's completely free for our first customers.
Try OffDataAI
Ship a warehouse from a single conversation.
Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.
Start building — free