How to Generate a Complete dbt Project from a Domain Description
Generate a complete dbt project — sources, staging, marts, tests, and seeds — from a single paragraph of plain English. A reference workflow for analytics engineers.
If you've ever bootstrapped a dbt project from scratch, you know the pain. You start with dbt init, then spend an afternoon writing source YAML, naming staging models, deciding on mart materializations, and arguing about whether dim_customer should be SCD2.
This post shows how to skip all of that. We'll generate a complete dbt project — sources, staging, marts, tests, and seeds — from a single paragraph of plain English using OffDataAI.
Why this matters for analytics engineers
A "first commit" dbt project usually looks like this:
models/
├── example/
│ ├── my_first_dbt_model.sql
│ └── my_second_dbt_model.sql
That is not useful. The real first commit you want is:
models/
├── staging/
│ ├── _sources.yml
│ ├── stg_<source>__<entity>.sql
│ └── ...
└── marts/
├── core/
│ ├── dim_customer.sql
│ ├── dim_product.sql
│ └── fct_order.sql
└── ...
seeds/
└── dim_*.csv
tests/
└── schema.yml
Going from dbt init to that structure is at least a day of work — more if you don't already know the schema.
OffDataAI compresses that day into about three minutes.
The end-to-end flow
1. Describe your domain (one paragraph).
2. Pick a paradigm: Kimball, Data Vault 2.0, or 3NF.
3. Pick a target platform: Snowflake, BigQuery, Databricks, etc.
4. Answer 4–7 clarifying questions.
5. Generate the dbt project.
6. dbt deps && dbt seed && dbt build
What you get out of the box
For a typical B2B SaaS domain (customers, plans, usage, invoices) with the Kimball paradigm and Snowflake as the target, OffDataAI generates:
dbt_project.yml
Pre-configured with materializations: view for staging, table for dims, incremental for facts.
models/staging/_sources.yml
Source definitions with freshness and loaded_at_field configured.
version: 2
sources:
- name: app_db
schema: raw_app
freshness:
warn_after: { count: 12, period: hour }
error_after: { count: 24, period: hour }
loaded_at_field: _ingested_at
tables:
- name: customer
description: One row per multi-tenant customer.
- name: plan
description: Subscription plan catalog.
- name: usage_event
description: Per-API-call usage event.
models/staging/stg_*.sql
Light, columnar staging models — one per source table, with consistent naming and explicit casts.
with source as (
select * from {{ source('app_db', 'customer') }}
),
renamed as (
select
id::string as customer_id,
name::string as name,
plan_id::string as plan_id,
billing_address::string as billing_address,
created_at::timestamp_ntz as created_at,
_ingested_at::timestamp_ntz as _ingested_at
from source
)
select * from renamed
models/marts/dim_customer.sql (SCD2)
Type-2 slowly-changing-dimension model, with valid_from / valid_to windows and a stable surrogate key:
{{ config(materialized='table') }}
with source as (
select * from {{ ref('stg_app_db__customer') }}
),
scd2 as (
select
{{ dbt_utils.generate_surrogate_key(['customer_id', 'plan_id', 'created_at']) }} as customer_sk,
customer_id,
name,
plan_id,
billing_address,
created_at as valid_from,
lead(created_at) over (partition by customer_id order by created_at) as valid_to
from source
)
select * from scd2
models/marts/fct_usage.sql
Incremental fact at the API-call grain, with cluster_by set to match the Snowflake DDL.
seeds/dim_*.csv
Realistic seed data that respects foreign keys:
customer_id,name,plan_id,valid_from,valid_to
C-1001,Acme Corp,plan_pro,2025-01-15,
C-1002,Globex,plan_team,2025-02-03,2025-08-31
C-1002,Globex,plan_pro,2025-09-01,
C-1003,Initech,plan_starter,2025-03-22,
tests/schema.yml
Schema tests covering uniqueness, non-null, and referential integrity:
version: 2
models:
- name: dim_customer
columns:
- name: customer_sk
tests: [unique, not_null]
- name: customer_id
tests: [not_null]
- name: plan_id
tests:
- relationships:
to: ref('dim_plan')
field: plan_id
Editing the generated project
The whole project is regenerated from a single Intermediate Representation (IR). If you change your mind about a grain, an SCD type, or a column, you patch the IR — not the SQL — and regenerate.
This is the part that surprises most analytics engineers: the SQL is the artifact, not the source of truth. The IR is the source of truth.
What's not included (and why)
OffDataAI deliberately does not generate:
- Snapshots. dbt snapshots require an opinion on the snapshot mechanism (
timestampvs.check) and a real connection to source data. We leave that to you. - Custom macros for your business logic. The mart layer is a starting point — your real business logic goes on top.
- Exposures and metrics. These are about how you use the warehouse, not how it's modeled.
Try it
Sign up and generate your first dbt project — it's completely free for our first customers. Or see the dbt project generator page for a deeper walkthrough.
Try OffDataAI
Ship a warehouse from a single conversation.
Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.
Start building — free