Scaling dbt Docs with an Automated Data Catalog

Using dbt Docs as your data dictionary? dbt Docs is a great way to start documenting your data models, sources, and columns, but it can become brittle as projects grow. Yaml files can drift, sharing is clunky, and business users think of it as too technical or developer-only documentation. If you want everyone to use the same definitions across tools, how do you keep data documentation current and visible across your stack?

In this guide, you’ll learn a practical way to scale dbt Docs alongside an automated data catalog so definitions stay accurate and easy to find. Keep schema.yml as the source of truth, manage changes through pull requests in GitHub or Bitbucket, and surface definitions in BI and other team tools so everyone is working from the same meaning.

What are dbt Docs?

dbt Docs are the documentation site and metadata generated from your dbt project (models, sources, columns, tests, and lineage) using descriptions defined in schema.yml (properties) and artifacts produced by dbt runs.

How dbt Docs are built

You write table and/column descriptions in YAML. dbt produces artifacts during runs that are saved as JSON files (semantic_manifest.json, manifest.json, catalog.json, run_results.json, and sources.json). The `dbt docs generate` command builds a static documentation site. Teams can view documentation locally with the dbt docs serve command or in dbt Cloud’s Catalog.

Who uses dbt Docs

Primarily analytics and data engineers. Broader adoption requires surfacing definitions where business users work like BI tools and data catalog search.

Why dbt Docs matter

Keeping schema.yml accurate is foundational. It gives you versioned, reviewable definitions and reduces drift across tools.

Example schema.yml

models:
  - name: orders  
    description: "Finalized customer orders used for revenue reporting"  
    columns:    
      - name: order_id      
        description: "Unique identifier for an order"    
      - name: order_total      
        description: "Sum of line item totals after discounts and tax"

Common Challenges with Using and Scaling dbt Docs

First, let’s look at where dbt Docs usually break. These are the dbt documentation challenges we see most often when teams try to scale docs across projects and stakeholders.

Data documentation beyond dbt

dbt Docs only covers assets managed within your dbt project. If you use BI tools like Tableau or Power BI, or need to document tables outside of dbt in Snowflake, those assets aren't included automatically. While dbt exposures can help track these dependencies, they require manual maintenance and don't pull in the actual documentation from those systems.

Shareability and usability

Many teammates never open dbt Docs. Non‑technical users want ERDs, lineage, and in‑app search. When definitions live only in YAML, they are hard to find. Definitions need to be where work happens: BI tooltips, search, and lineage.

Keeping YAML up to date at scale

Ownership is unclear, reviews are ad hoc, and coverage slips. The result is drift between schema.yml, dashboards, and what people actually trust. Changes to schema.yml need to be treated like code: propose edits, review in a pull request, and merge on a regular cadence.

Onboarding and search

Property references tell you what a column is, not how to use it. New hires need examples, links to source models, and consistent naming, or they bounce. Descriptions need to be paired with links to the source model and a short example query.

Fragmented documentation

Multiple projects and documentation sites create duplicate truths. Screenshots and side docs are a sign the official docs are not easy to use. We recommend to pick a single source of truth, route all edits through it, and remove duplicates over time.

dbt Docs vs. Data Catalog: Choosing Your Source of Truth

There are a few workable ways to run documentation at scale. Start by choosing where your single source of truth lives: in dbt (schema.yml) or in your data catalog. Many teams choose a hybrid: draft in the catalog and publish back to dbt by pull request. Whatever you pick, write it down, set owners, and follow it consistently.

Source of Truth	Strengths	Watch For
dbt	Versioned and close to code Reviewed in PRs Clear audit trail	Authoring friction for non‑technical users YAML coverage can lag without owners and cadence
Data Catalog	Fast drafting and collaboration Bulk edits and UI guidance	Drift unless you publish back to dbt Definitions may differ across tools
Hybrid	Speed of a UI plus governance Single truth in dbt Easy cross‑tool visibility	Needs a clear publish cadence (e.g., daily PRs), named owners, and alerts if syncs fail

Approach 1: dbt as the source of truth. Keep authoritative table and column definitions in schema.yml. This keeps definitions versioned and close to the code, reviewed in pull requests, and tied to build changes. Many teams pair this with a catalog for drafting and discovery.

Approach 2: catalog as the source of truth. Use a catalog UI for fast drafting and collaboration, and treat dbt Docs as a published view. This is easy to adopt but can drift unless you have a clear publish step back to dbt.

Our recommendation: hybrid, governance first. Author in a catalog if you like, but make dbt the canonical record. Route changes through pull requests, publish accepted definitions back to schema.yml on a regular cadence (daily works well), and surface definitions in BI and lineage so people see them in context.

Govern change like code. Assign owners for key models, propose edits through pull requests, and aim for small, frequent merges. Visibility matters too; most people will not open the docs site, so bring definitions to where they work.

Finally, measure the basics: documentation coverage, median PR merge time, and the volume of “what does this mean?” questions. These show whether the process is working and where to adjust.

How Select Star’s dbt Docs Sync Keeps schema.yml Up-To-Date

At Select Star, we recommend the hybrid approach, drafting in a catalog and keeping dbt as the record. Select Star’s data catalog supports this natively through dbt Docs Sync functionality. A daily pull request updates schema.yml, approvals happen in Git or Bitbucket repository, and the same definitions appear in your catalog, BI, and lineage.

1) Connect your repo and dbt to Select Star
‍

Host your dbt project in GitHub or Bitbucket.
In Select Star, connect to dbt and point to the repo/project path.

2) Author & validate in Select Star data catalog
‍

Draft and edit table and column descriptions in Select Star. Take advantage of AI-powered documentation that can suggest descriptions based on metadata, SQL queries, existing documentation, and more.
Review and mark changes as validated.

3) Publish and review in a daily PR
‍

Validated changes are batched into one scheduled daily pull request that updates schema.yml. If a previous PR is still open, dbt Docs Sync closes it and opens a fresh one to avoid conflicts.
Owners can review and merge. The change history is captured in the repo.

Example PR diff

- description: "Sum of line item totals"
+ description: "Sum of line item totals after discounts and tax"

‍Beyond documentation, a data catalog like Select Star supports everyday analytics and AI workflows. ERDs and column‑level lineage show how models and sources connect and help teams trace issues to the right upstream change, speeding up root‑cause analysis. Cross‑platform search spans dbt and your BI tools so people can find models, columns, dashboards, and definitions in one place, which reduces “what does this mean?” pings and shortens onboarding. Ownership and usage insights point to what to document, consolidate, or deprecate next, improving trust and lowering cost. Read more about what modern data catalog tools can offer in addition to scaling dbt Docs and how to evaluate them in our guide on data catalog tools.

Keep dbt Documentation Up-to-Date and Discoverable

With up-to-date documentation, the rest of the work gets easier. Definitions meet people where they work (BI tooltips, search, lineage), onboarding speeds up, and analysis is more consistent.

The data team at nib, an Australian health and travel insurer serving more than 1.5 million people in Australia and New Zealand, had datasets created by multiple teams, which led to multiple dbt repositories and scattered dbt Docs. They wanted one searchable place for documentation. Select Star brought the docs from each repo into a single catalog and made datasets easy to find. Business users update definitions in Select Star, and approved changes sync back to dbt. Learn more.

Book a demo to see the daily PR workflow and Select Star’s dbt Doc Sync in action.

Frequently asked questions on dbt Docs

What are dbt Docs?‍

dbt Docs are the site and metadata dbt generates from your project: models, sources, columns, tests, and lineage. Descriptions live in schema.yml, and dbt docs generate builds the static site.

What belongs in schema.yml?‍

Authoritative table and column definitions. Keep descriptions specific, plain language, and tied to how the column is used.

How do dbt Docs and a data catalog differ?

‍dbt Docs keep definitions versioned in Git or other repository. A data catalog makes those definitions easy to find with search, lineage, ownership, and UI editing.

Do we still need dbt Docs if we use a data catalog?‍

Yes. Use dbt for the source of truth in schema.yml. Use the catalog to make definitions discoverable for everyone.

How do I keep dbt Docs up to date automatically?‍

Use a metadata management tool like Select Star to draft and approve changes, then schedule a daily pull request that writes updates to schema.yml. Review and merge on a regular cadence.

Where should the data definitions live: dbt or the data catalog?‍

Manage your table and column definitions in a data catalog like Select Star. Publish approved changes back to schema.yml via a daily PR.

How do I prevent drift between dbt Docs and BI dashboards?‍

Publish accepted definitions to schema.yml and surface them in BI tooltips and lineage. Use one glossary and one set of names.

How do I see lineage in dbt Docs?‍

Use the dbt docs graph for model-level lineage. Use a metadata management tool like Select Star to view lineage down to columns and across your stack from source systems to BI tools, with ownership and usage in one view.

How should multi‑team, multi‑repo organizations handle dbt documentation?‍

Document each repo or project in its own schema.yml. Use a metadata catalog like Select Star for cross‑project search, ownership, and consistent definitions.

Can business users contribute to dbt Docs?‍

Yes, business users can propose edits in a user-friendly data catalog like Select Star. Route final approval through owners and publish back to dbt.

How do we make dbt Docs visible to business users?‍

Surface definitions in the data catalog, BI tooltips, and lineage. Link back to the model page.

Building Semantic Data Models: From BI to AI

Learn More

dbt Coalesce 2025 Highlights: dbt + Fivetran Merger, Open Data Infrastructure, dbt Fusion and MCP Server

Learn More

Automated Metadata Management on AWS with Select Star

Learn More

Scaling dbt Docs with an Automated Data Catalog

What are dbt Docs?

Common Challenges with Using and Scaling dbt Docs

Data documentation beyond dbt

Shareability and usability

Keeping YAML up to date at scale

Onboarding and search

Fragmented documentation

dbt Docs vs. Data Catalog: Choosing Your Source of Truth

How Select Star’s dbt Docs Sync Keeps schema.yml Up-To-Date

1) Connect your repo and dbt to Select Star‍

2) Author & validate in Select Star data catalog‍

3) Publish and review in a daily PR‍

Keep dbt Documentation Up-to-Date and Discoverable

Frequently asked questions on dbt Docs

What are dbt Docs?‍

What belongs in schema.yml?‍

How do dbt Docs and a data catalog differ?

Do we still need dbt Docs if we use a data catalog?‍

How do I keep dbt Docs up to date automatically?‍

Where should the data definitions live: dbt or the data catalog?‍

How do I prevent drift between dbt Docs and BI dashboards?‍

How do I see lineage in dbt Docs?‍

How should multi‑team, multi‑repo organizations handle dbt documentation?‍

Can business users contribute to dbt Docs?‍

How do we make dbt Docs visible to business users?‍

Sign up for our updates

Related Posts

1) Connect your repo and dbt to Select Star
‍

2) Author & validate in Select Star data catalog
‍

3) Publish and review in a daily PR
‍