May 8, 20264 min readSnowflakeTutorialdbt

How to Generate a Snowflake Schema from Natural Language (Step by Step)

A complete walkthrough of generating a Snowflake schema — DDL, clustering, and a full dbt project — from a plain-English description, using OffDataAI.

This post walks through generating a production-ready Snowflake schema from a single paragraph of plain English — and turning it into a complete dbt project ready for dbt build.

We'll model a small B2B SaaS warehouse with multi-tenant customers, plans, usage events, and invoices. By the end you'll have:

A Snowflake DDL file with proper CLUSTER BY and TIMESTAMP_NTZ types.
A complete dbt project: sources, staging, marts, tests, and seeds.
A Mermaid ERD you can paste into your docs.

Step 1: Describe the domain

In OffDataAI, the only mandatory input is a natural-language description. Here is the one we'll use:

We're a B2B SaaS platform that tracks subscriptions, usage events, and billing across multi-tenant customers. Customers have plans, plans meter on usage, and we invoice monthly. Customer plan changes are SCD2 — name and billing-address updates are SCD1.

Notice we mention grain ("invoice monthly"), SCD types, and multi-tenancy. The more of these you mention up front, the fewer follow-up questions the interview agent will ask.

Step 2: Answer the interview questions

The interview agent (Claude Haiku 4.5) will ask three or four targeted questions. Typical ones for this domain:

What is the grain of a usage event? → "Per API call."
Should the customer dimension be SCD2 on plan changes? → "Yes, SCD2 on plan_id. SCD1 on name and billing-address."
Are invoices derived from usage rollups, or directly from subscriptions? → "From subscription + monthly usage rollup."

That's it. Three answers and the model is unambiguous.

Step 3: Inspect the Intermediate Representation

OffDataAI's synthesis agent (Claude Sonnet 4.5) folds your answers into a typed IR. Here's the relevant slice:

{
  "paradigm": "kimball",
  "platform": "snowflake",
  "entities": [
    {
      "name": "dim_customer",
      "type": "dimension",
      "scd": "type_2",
      "natural_key": "customer_id",
      "attributes": [
        { "name": "customer_sk", "type": "string", "nullable": false },
        { "name": "customer_id", "type": "string", "nullable": false },
        { "name": "name",        "type": "string" },
        { "name": "plan_id",     "type": "string" },
        { "name": "valid_from",  "type": "date",   "nullable": false },
        { "name": "valid_to",    "type": "date" }
      ]
    },
    {
      "name": "fct_usage",
      "type": "fact",
      "grain": "api_call",
      "attributes": [
        { "name": "usage_id",    "type": "string",    "nullable": false },
        { "name": "customer_sk", "type": "string",    "nullable": false },
        { "name": "plan_id",     "type": "string",    "nullable": false },
        { "name": "event_ts",    "type": "timestamp", "nullable": false },
        { "name": "api_calls",   "type": "integer",   "nullable": false }
      ]
    }
  ]
}

This is the contract. Every downstream artifact is generated from this IR. If you edit the IR (in the UI, or via API), the artifacts regenerate.

Step 4: Generate Snowflake DDL

Here's what OffDataAI emits for the customer dimension:

CREATE OR REPLACE TABLE analytics.dim_customer (
  customer_sk   STRING        NOT NULL,
  customer_id   STRING        NOT NULL,
  name          STRING,
  plan_id       STRING,
  valid_from    DATE          NOT NULL,
  valid_to      DATE,
  CONSTRAINT pk_dim_customer PRIMARY KEY (customer_sk)
)
CLUSTER BY (customer_id, valid_from);

CREATE OR REPLACE TABLE analytics.fct_usage (
  usage_id      STRING        NOT NULL,
  customer_sk   STRING        NOT NULL,
  plan_id       STRING        NOT NULL,
  event_ts      TIMESTAMP_NTZ NOT NULL,
  api_calls     NUMBER(18,0)  NOT NULL,
  CONSTRAINT pk_fct_usage PRIMARY KEY (usage_id)
)
CLUSTER BY (event_ts, customer_sk);

Two things to notice:

CLUSTER BY is Snowflake-native. A generic LLM might emit PARTITION BY (a BigQuery concept) or DISTKEY (a Redshift concept). OffDataAI's Snowflake generator only emits Snowflake-correct DDL.
TIMESTAMP_NTZ is the right type. Most generic tools emit TIMESTAMP or DATETIME, which behaves subtly differently. TIMESTAMP_NTZ is what you almost always want for warehouse fact tables.

Step 5: Generate the dbt project

OffDataAI emits a complete dbt tree:

dbt/
├── dbt_project.yml
├── models/
│   ├── staging/
│   │   ├── _sources.yml
│   │   ├── stg_customer.sql
│   │   └── stg_usage_event.sql
│   └── marts/
│       ├── dim_customer.sql
│       └── fct_invoice.sql
├── seeds/
│   ├── dim_customer.csv
│   └── dim_plan.csv
└── tests/
    └── schema.yml

The mart for fct_invoice is incremental, with usage rollup and SCD2-aware joining:

{{ config(
    materialized='incremental',
    unique_key='invoice_id',
    cluster_by=['period_start']
) }}

with usage_rollup as (
    select
        customer_sk,
        date_trunc('month', event_ts)::date as period_start,
        sum(api_calls)                       as total_calls
    from {{ ref('fct_usage') }}
    {% if is_incremental() %}
      where event_ts >= (select max(period_start) from {{ this }})
    {% endif %}
    group by 1, 2
)

select
    {{ dbt_utils.generate_surrogate_key(['customer_sk','period_start']) }} as invoice_id,
    u.customer_sk,
    p.unit_price * u.total_calls                                          as amount,
    u.period_start,
    last_day(u.period_start)                                              as period_end
from usage_rollup u
left join {{ ref('dim_plan') }} p using (plan_id)

Step 6: Run it

cd dbt
dbt deps
dbt seed
dbt build

That's a working Snowflake warehouse from one paragraph of English.

Going further

Need the same domain modeled as Data Vault 2.0? Switch the paradigm in the IR. The generator emits hubs, links, and satellites.
Need it for BigQuery instead? See the BigQuery schema generator.
Want a guided walkthrough? Book a demo.

Or just try it — it's completely free for our first customers.

Try OffDataAI

Ship a warehouse from a single conversation.

Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.

Start building — free