offdata ai — agentic AI for data modelers and data engineers
3 min readdbtTutorialAnalytics Engineering

How to Generate a Complete dbt Project from a Domain Description

Generate a complete dbt project — sources, staging, marts, tests, and seeds — from a single paragraph of plain English. A reference workflow for analytics engineers.

If you've ever bootstrapped a dbt project from scratch, you know the pain. You start with dbt init, then spend an afternoon writing source YAML, naming staging models, deciding on mart materializations, and arguing about whether dim_customer should be SCD2.

This post shows how to skip all of that. We'll generate a complete dbt project — sources, staging, marts, tests, and seeds — from a single paragraph of plain English using OffDataAI.

Why this matters for analytics engineers

A "first commit" dbt project usually looks like this:

models/
├── example/
│   ├── my_first_dbt_model.sql
│   └── my_second_dbt_model.sql

That is not useful. The real first commit you want is:

models/
├── staging/
│   ├── _sources.yml
│   ├── stg_<source>__<entity>.sql
│   └── ...
└── marts/
    ├── core/
    │   ├── dim_customer.sql
    │   ├── dim_product.sql
    │   └── fct_order.sql
    └── ...
seeds/
└── dim_*.csv
tests/
└── schema.yml

Going from dbt init to that structure is at least a day of work — more if you don't already know the schema.

OffDataAI compresses that day into about three minutes.

The end-to-end flow

1. Describe your domain (one paragraph).
2. Pick a paradigm: Kimball, Data Vault 2.0, or 3NF.
3. Pick a target platform: Snowflake, BigQuery, Databricks, etc.
4. Answer 4–7 clarifying questions.
5. Generate the dbt project.
6. dbt deps && dbt seed && dbt build

What you get out of the box

For a typical B2B SaaS domain (customers, plans, usage, invoices) with the Kimball paradigm and Snowflake as the target, OffDataAI generates:

dbt_project.yml

Pre-configured with materializations: view for staging, table for dims, incremental for facts.

models/staging/_sources.yml

Source definitions with freshness and loaded_at_field configured.

version: 2
sources:
  - name: app_db
    schema: raw_app
    freshness:
      warn_after: { count: 12, period: hour }
      error_after: { count: 24, period: hour }
    loaded_at_field: _ingested_at
    tables:
      - name: customer
        description: One row per multi-tenant customer.
      - name: plan
        description: Subscription plan catalog.
      - name: usage_event
        description: Per-API-call usage event.

models/staging/stg_*.sql

Light, columnar staging models — one per source table, with consistent naming and explicit casts.

with source as (
    select * from {{ source('app_db', 'customer') }}
),

renamed as (
    select
        id::string                                 as customer_id,
        name::string                               as name,
        plan_id::string                            as plan_id,
        billing_address::string                    as billing_address,
        created_at::timestamp_ntz                  as created_at,
        _ingested_at::timestamp_ntz                as _ingested_at
    from source
)

select * from renamed

models/marts/dim_customer.sql (SCD2)

Type-2 slowly-changing-dimension model, with valid_from / valid_to windows and a stable surrogate key:

{{ config(materialized='table') }}

with source as (
    select * from {{ ref('stg_app_db__customer') }}
),

scd2 as (
    select
        {{ dbt_utils.generate_surrogate_key(['customer_id', 'plan_id', 'created_at']) }} as customer_sk,
        customer_id,
        name,
        plan_id,
        billing_address,
        created_at as valid_from,
        lead(created_at) over (partition by customer_id order by created_at) as valid_to
    from source
)

select * from scd2

models/marts/fct_usage.sql

Incremental fact at the API-call grain, with cluster_by set to match the Snowflake DDL.

seeds/dim_*.csv

Realistic seed data that respects foreign keys:

customer_id,name,plan_id,valid_from,valid_to
C-1001,Acme Corp,plan_pro,2025-01-15,
C-1002,Globex,plan_team,2025-02-03,2025-08-31
C-1002,Globex,plan_pro,2025-09-01,
C-1003,Initech,plan_starter,2025-03-22,

tests/schema.yml

Schema tests covering uniqueness, non-null, and referential integrity:

version: 2
models:
  - name: dim_customer
    columns:
      - name: customer_sk
        tests: [unique, not_null]
      - name: customer_id
        tests: [not_null]
      - name: plan_id
        tests:
          - relationships:
              to: ref('dim_plan')
              field: plan_id

Editing the generated project

The whole project is regenerated from a single Intermediate Representation (IR). If you change your mind about a grain, an SCD type, or a column, you patch the IR — not the SQL — and regenerate.

This is the part that surprises most analytics engineers: the SQL is the artifact, not the source of truth. The IR is the source of truth.

What's not included (and why)

OffDataAI deliberately does not generate:

  • Snapshots. dbt snapshots require an opinion on the snapshot mechanism (timestamp vs. check) and a real connection to source data. We leave that to you.
  • Custom macros for your business logic. The mart layer is a starting point — your real business logic goes on top.
  • Exposures and metrics. These are about how you use the warehouse, not how it's modeled.

Try it

Sign up and generate your first dbt project — it's completely free for our first customers. Or see the dbt project generator page for a deeper walkthrough.


Try OffDataAI

Ship a warehouse from a single conversation.

Describe your domain. Get ERDs, DDL, dbt projects, and seed data for every major cloud warehouse.

Start building — free