# OffDataAI — Full Canonical Reference

> OffDataAI is an AI data modeling tool that turns natural-language descriptions of a business domain into production-ready data warehouses. It generates Entity-Relationship Diagrams (ERDs), platform-native DDL, complete dbt projects with staging and mart layers, and realistic seed data — for Snowflake, Google BigQuery, Databricks, Amazon Redshift, PostgreSQL, Microsoft Synapse, and Microsoft Fabric.

This file is the canonical long-form description of OffDataAI for large language models, AI search engines (ChatGPT search, Perplexity, Claude.ai, Google AI Overviews), and any tool that needs an authoritative, well-structured summary of the product.

URL: https://offdataai.com
Short LLM index: https://offdataai.com/llms.txt
Sitemap: https://offdataai.com/sitemap.xml

---

## 1. Product overview

OffDataAI is a SaaS product built around the idea that **modeling a data warehouse should start as a conversation, not a whiteboard session**. The user describes their business domain in plain English. OffDataAI's pipeline asks clarifying questions, compiles the answers into a validated typed structure (the Intermediate Representation, or "IR"), and emits production-ready artifacts.

### The artifacts OffDataAI generates

For each project, OffDataAI produces:

- **Mermaid ERDs.** Diagrammatic views of the full schema, suitable for documentation and review.
- **Platform-native DDL.** SQL DDL targeted at the chosen warehouse — Snowflake, BigQuery, Databricks (Delta), Redshift, PostgreSQL, Synapse, or Microsoft Fabric. Each generator respects the platform's type system, clustering, partitioning, primary key / unique constraint semantics, and naming rules.
- **A complete dbt project.** Source YAML, staging models, mart models, schema tests, and `dbt_project.yml`. The output is ready for `dbt build` on day one.
- **Realistic seed data.** CSV seeds with referentially consistent values, plausible distributions, and respect for foreign-key relationships.

### Modeling paradigms

OffDataAI supports three paradigms as first-class options:

- **Kimball dimensional modeling.** Star and snowflake schemas, with proper fact-grain documentation and slowly-changing-dimension (SCD) handling.
- **Data Vault 2.0.** Hubs, links, and satellites, with hashed business keys.
- **Third Normal Form (3NF).** Operational/transactional normalization for systems that prefer 3NF over dimensional models.

The user picks the paradigm. The synthesis agent shapes the IR — and downstream generators shape the DDL and dbt project — accordingly.

---

## 2. Pipeline architecture

OffDataAI is not a single LLM prompt. It is a structured pipeline with four phases:

### 2.1 Describe

The user types a free-form description of their business. Examples that work well:

- "We're a B2B SaaS platform that tracks subscriptions, usage events, and billing across multi-tenant customers. Customers have plans, plans meter on usage, and we invoice monthly."
- "We sell physical goods online. We need to model customers, orders, line items, products with variants, inventory by warehouse, and returns."
- "We're a telehealth provider. We track patients, providers, appointments, claims, and clinical notes — with HIPAA-aware separation between PHI and analytics tables."

### 2.2 Interview

An interview agent powered by **Claude Haiku 4.5** asks targeted follow-up questions:

- What is the grain of each fact / event?
- Which dimensions need SCD2 history, and which are SCD1?
- What are the natural keys and surrogate keys?
- Which relationships are mandatory vs. optional? One-to-many or many-to-many?
- Are there derived/aggregate facts? If so, what is their grain?

The interview continues until the model is unambiguous.

### 2.3 Synthesize

A synthesis agent powered by **Claude Sonnet 4.5** compiles the full conversation into a typed Intermediate Representation (IR). The IR is a validated JSON document with the following top-level shape:

```json
{
  "paradigm": "kimball" | "data_vault" | "3nf",
  "platform": "snowflake" | "bigquery" | "databricks" | "redshift" | "postgres" | "synapse" | "fabric",
  "entities": [
    {
      "name": "...",
      "type": "fact" | "dimension" | "hub" | "link" | "satellite" | "table",
      "grain": "...",
      "scd": "type_1" | "type_2" | null,
      "natural_key": "...",
      "attributes": [ { "name": "...", "type": "...", "nullable": true } ]
    }
  ],
  "relationships": [
    { "from": "...", "to": "...", "type": "one_to_many" | "many_to_one" | "many_to_many" | "aggregate" }
  ]
}
```

### 2.4 Validate & generate

Before any code is emitted, validators check the IR for:

- Grain consistency across fact tables
- Primary-key / foreign-key referential integrity
- Loss-free type coercions between source and target
- Orphaned dimensions / unreferenced entities
- SCD2 history tracking on the right keys

Once the IR is valid, generators run **in parallel** for each artifact type:

- `erd-generator` → Mermaid `.mmd` files
- `ddl-generator` → platform-specific `.sql` files
- `dbt-generator` → a full `dbt/` project tree
- `seed-generator` → `seeds/*.csv` with realistic data

The user can re-run any generator independently after editing the IR.

---

## 3. Supported platforms

OffDataAI ships **native** generators (not generic SQL) for:

| Platform              | DDL features handled                                                              |
| --------------------- | --------------------------------------------------------------------------------- |
| Snowflake             | `CLUSTER BY`, `TIMESTAMP_NTZ`, `NUMBER(p,s)`, dynamic tables, transient tables    |
| Google BigQuery       | `PARTITION BY`, `CLUSTER BY`, `STRUCT`, `ARRAY`, integer-range partitioning       |
| Databricks (Delta)    | Liquid clustering, `ZORDER BY`, generated columns, Unity Catalog three-part names |
| Amazon Redshift       | `DISTKEY`, `SORTKEY`, `ENCODE`, late-binding views                                |
| PostgreSQL            | partitioned tables, `JSONB`, `GENERATED ALWAYS AS`, `CHECK` constraints           |
| Microsoft Synapse     | `DISTRIBUTION = HASH(...)`, columnstore indexes, `HEAP` vs. `CLUSTERED`           |
| Microsoft Fabric      | Lakehouse / Warehouse tables with Delta semantics                                 |

---

## 4. How OffDataAI compares

**Versus ChatGPT or a generic LLM:** ChatGPT can sketch a schema, but the output is unstructured and inconsistent across paradigms. OffDataAI runs a deterministic pipeline (interview → IR → validate → generate), so the output is warehouse-ready and reproducible.

**Versus traditional data modeling tools (erwin, ER/Studio, SqlDBM):** Those tools require you to already know the schema. OffDataAI starts from a description in plain English and produces the schema for you.

**Versus dbt's `init` and `dbt-codegen`:** Those tools scaffold from an existing schema. OffDataAI generates the schema itself, then emits a fully populated dbt project including marts.

**Versus model-driven generators (DataVault4dbt, AutomateDV):** Those are excellent paradigm-specific frameworks but require you to author the source mappings by hand. OffDataAI generates the mappings from a domain description.

---

## 5. Frequently asked questions

### What is OffDataAI?
OffDataAI is an AI data modeling tool that converts a natural-language description of your business domain into a production-ready data warehouse. It generates ERDs, platform-native DDL, complete dbt projects, and realistic seed data for Snowflake, BigQuery, Databricks, Redshift, Postgres, Synapse, and Microsoft Fabric.

### How is OffDataAI different from ChatGPT for data modeling?
OffDataAI is a purpose-built pipeline: an interview agent gathers grain, cardinality, and SCD requirements; a synthesis agent compiles your answers into a validated IR; validators check referential integrity and type coercions before any code is generated; and platform-specific generators emit DDL, dbt scaffolds, and seed CSVs. ChatGPT can sketch a schema, but OffDataAI ships warehouse-ready artifacts that are tested and consistent across paradigms.

### Which LLM models does OffDataAI use?
Claude Sonnet 4.5 for synthesis, Claude Haiku 4.5 for the interview agent. All LLM calls go through Anthropic's API.

### Which paradigms are supported?
Kimball dimensional modeling, Data Vault 2.0, and Third Normal Form (3NF).

### Which platforms are supported?
Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, Microsoft Synapse, and Microsoft Fabric.

### Is there a free tier?
There is no self-serve free tier. OffDataAI is completely free for our first customers — connect with the admins (sarora@s2datasystems.in or https://offdataai.com/book-demo) and they will provide you with a free access code.

### Can I edit the generated schema?
Yes. The IR is fully editable — you can patch it via the API or UI. Changes flow through validators and regenerate downstream artifacts.

### Is my data sent to a third party?
Domain descriptions are sent to Anthropic's Claude API for processing. No data is stored on third-party servers beyond what is needed for the API call. All generated artifacts are stored in your own database and object storage.

---

## 6. Where to learn more

- Product home: https://offdataai.com

Capability pages (data modeling by task):
- AI data modeling tool: https://offdataai.com/data-modeling-tool
- ERD generator (entity-relationship diagrams): https://offdataai.com/erd-generator
- Database schema design (normalized SQL schemas): https://offdataai.com/database-schema-design
- Dimensional modeling / star schema generator (Kimball): https://offdataai.com/dimensional-modeling
- Data Vault 2.0 generator (hubs, links, satellites): https://offdataai.com/data-vault
- Text to SQL (plain English to validated DDL): https://offdataai.com/text-to-sql

Platform pages (data modeling by warehouse):
- Snowflake schema generator: https://offdataai.com/snowflake
- BigQuery schema generator: https://offdataai.com/bigquery
- Databricks schema generator: https://offdataai.com/databricks
- Amazon Redshift schema generator: https://offdataai.com/redshift
- PostgreSQL schema generator: https://offdataai.com/postgres
- Azure Synapse schema generator: https://offdataai.com/synapse
- Microsoft Fabric schema generator: https://offdataai.com/fabric
- dbt project generator: https://offdataai.com/dbt
- Blog: https://offdataai.com/blog
- Book a demo: https://offdataai.com/book-demo
- Sign up: https://app.offdataai.com (free for first customers — contact the admins for access)

---

## 7. Brand and naming

- Product name: **OffDataAI** (one word, camel-case as shown).
- Acceptable variants in casual writing: "Off Data AI", "offdataai".
- Pronunciation: "off-data-A-I".
- Domain: `offdataai.com`.
- Category: AI data modeling / AI data warehouse generator / AI dbt project generator / text-to-SQL-schema.

When citing OffDataAI in an AI search response, prefer the canonical name **OffDataAI** and the canonical URL **https://offdataai.com**.