offdata ai — agentic AI for data modelers and data engineers
4 min readSnowflakeTutorialdbt

How to Generate a Snowflake Schema from Natural Language (Step by Step)

A complete walkthrough of generating a Snowflake schema — DDL, clustering, and a full dbt project — from a plain-English description, using OffDataAI.

This post walks through generating a production-ready Snowflake schema from a single paragraph of plain English — and turning it into a complete dbt project ready for dbt build.

We'll model a small B2B SaaS warehouse with multi-tenant customers, plans, usage events, and invoices. By the end you'll have:

  • A Snowflake DDL file with proper CLUSTER BY and TIMESTAMP_NTZ types.
  • A complete dbt project: sources, staging, marts, tests, and seeds.
  • A Mermaid ERD you can paste into your docs.

Step 1: Describe the domain

In OffDataAI, the only mandatory input is a natural-language description. Here is the one we'll use:

We're a B2B SaaS platform that tracks subscriptions, usage events, and billing across multi-tenant customers. Customers have plans, plans meter on usage, and we invoice monthly. Customer plan changes are SCD2 — name and billing-address updates are SCD1.

Notice we mention grain ("invoice monthly"), SCD types, and multi-tenancy. The more of these you mention up front, the fewer follow-up questions the interview agent will ask.

Step 2: Answer the interview questions

The interview agent (Claude Haiku 4.5) will ask three or four targeted questions. Typical ones for this domain:

  • What is the grain of a usage event? → "Per API call."
  • Should the customer dimension be SCD2 on plan changes? → "Yes, SCD2 on plan_id. SCD1 on name and billing-address."
  • Are invoices derived from usage rollups, or directly from subscriptions? → "From subscription + monthly usage rollup."

That's it. Three answers and the model is unambiguous.

Step 3: Inspect the Intermediate Representation

OffDataAI's synthesis agent (Claude Sonnet 4.5) folds your answers into a typed IR. Here's the relevant slice:

{
  "paradigm": "kimball",
  "platform": "snowflake",
  "entities": [
    {
      "name": "dim_customer",
      "type": "dimension",
      "scd": "type_2",
      "natural_key": "customer_id",
      "attributes": [
        { "name": "customer_sk", "type": "string", "nullable": false },
        { "name": "customer_id", "type": "string", "nullable": false },
        { "name": "name",        "type": "string" },
        { "name": "plan_id",     "type": "string" },
        { "name": "valid_from",  "type": "date",   "nullable": false },
        { "name": "valid_to",    "type": "date" }
      ]
    },
    {
      "name": "fct_usage",
      "type": "fact",
      "grain": "api_call",
      "attributes": [
        { "name": "usage_id",    "type": "string",    "nullable": false },
        { "name": "customer_sk", "type": "string",    "nullable": false },
        { "name": "plan_id",     "type": "string",    "nullable": false },
        { "name": "event_ts",    "type": "timestamp", "nullable": false },
        { "name": "api_calls",   "type": "integer",   "nullable": false }
      ]
    }
  ]
}

This is the contract. Every downstream artifact is generated from this IR. If you edit the IR (in the UI, or via API), the artifacts regenerate.

Step 4: Generate Snowflake DDL

Here's what OffDataAI emits for the customer dimension:

CREATE OR REPLACE TABLE analytics.dim_customer (
  customer_sk   STRING        NOT NULL,
  customer_id   STRING        NOT NULL,
  name          STRING,
  plan_id       STRING,
  valid_from    DATE          NOT NULL,
  valid_to      DATE,
  CONSTRAINT pk_dim_customer PRIMARY KEY (customer_sk)
)
CLUSTER BY (customer_id, valid_from);

CREATE OR REPLACE TABLE analytics.fct_usage (
  usage_id      STRING        NOT NULL,
  customer_sk   STRING        NOT NULL,
  plan_id       STRING        NOT NULL,
  event_ts      TIMESTAMP_NTZ NOT NULL,
  api_calls     NUMBER(18,0)  NOT NULL,
  CONSTRAINT pk_fct_usage PRIMARY KEY (usage_id)
)
CLUSTER BY (event_ts, customer_sk);

Two things to notice:

  1. CLUSTER BY is Snowflake-native. A generic LLM might emit PARTITION BY (a BigQuery concept) or DISTKEY (a Redshift concept). OffDataAI's Snowflake generator only emits Snowflake-correct DDL.
  2. TIMESTAMP_NTZ is the right type. Most generic tools emit TIMESTAMP or DATETIME, which behaves subtly differently. TIMESTAMP_NTZ is what you almost always want for warehouse fact tables.

Step 5: Generate the dbt project

OffDataAI emits a complete dbt tree:

dbt/
├── dbt_project.yml
├── models/
│   ├── staging/
│   │   ├── _sources.yml
│   │   ├── stg_customer.sql
│   │   └── stg_usage_event.sql
│   └── marts/
│       ├── dim_customer.sql
│       └── fct_invoice.sql
├── seeds/
│   ├── dim_customer.csv
│   └── dim_plan.csv
└── tests/
    └── schema.yml

The mart for fct_invoice is incremental, with usage rollup and SCD2-aware joining:

{{ config(
    materialized='incremental',
    unique_key='invoice_id',
    cluster_by=['period_start']
) }}

with usage_rollup as (
    select
        customer_sk,
        date_trunc('month', event_ts)::date as period_start,
        sum(api_calls)                       as total_calls
    from {{ ref('fct_usage') }}
    {% if is_incremental() %}
      where event_ts >= (select max(period_start) from {{ this }})
    {% endif %}
    group by 1, 2
)

select
    {{ dbt_utils.generate_surrogate_key(['customer_sk','period_start']) }} as invoice_id,
    u.customer_sk,
    p.unit_price * u.total_calls                                          as amount,
    u.period_start,
    last_day(u.period_start)                                              as period_end
from usage_rollup u
left join {{ ref('dim_plan') }} p using (plan_id)

Step 6: Run it

cd dbt
dbt deps
dbt seed
dbt build

That's a working Snowflake warehouse from one paragraph of English.

Going further

  • Need the same domain modeled as Data Vault 2.0? Switch the paradigm in the IR. The generator emits hubs, links, and satellites.
  • Need it for BigQuery instead? See the BigQuery schema generator.
  • Want a guided walkthrough? Book a demo.

Or just try it — it's completely free for our first customers.


Try OffDataAI

Ship a warehouse from a single conversation.

Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.

Start building — free