Documentation index for AI agents: see /llms.txt. Markdown versions of every page are available at <path>.md or via Accept: text/markdown.
Concepts

Catalog

The catalog is the semantic layer that connects user intent to the data schema and business knowledge in a workspace.

The catalog is the semantic layer that connects user intent to the data schema and business knowledge in a workspace.

It gives agents a grounded representation of what the user is asking for, which data entities can answer that request, and which business rules, metrics, documents, and relationships should shape the result. Agents use this context to resolve better plans before they query live data or propose an action.

The catalog stores metadata, semantic descriptions, relationships, and retrieval projections. It helps agents find the right tables, columns, joins, metrics, documents, knowledge items, enum values, and data-quality findings without guessing names or inventing schema context.

The catalog is not the target database. It describes data and knowledge. The target database is where read-only samples and SQL queries run.

Why the catalog exists

Data agents need a grounded path from a user question to a query or answer. Without a catalog, the model has to infer table names, field meanings, joins, metric definitions, and policy context from prompts alone.

The catalog gives agents a safer workflow:

discover candidates
  -> hydrate selected entities
  -> inspect fields
  -> inspect relations and paths
  -> sample target data when needed
  -> execute read-only SQL only after context is confirmed

This keeps broad discovery separate from authoritative details. A search result is a candidate. A hydrated catalog entity, field listing, or relation path is the context an agent should use for SQL planning.

What lives in the catalog

Catalog entries are typed entities. The public catalog tools expose these kinds:

KindWhat it represents
tableA table, view, or preferred query surface.
columnA field belonging to a table or query surface.
relationshipA join or semantic relationship between entities.
enum_valueA known categorical value for a field.
metricA named calculation or business measure.
documentA document projected from knowledge ingestion.
knowledgeAn extracted fact, rule, policy, or note.
data_quality_findingA quality issue, warning, or profiling finding.

Every entry has a stable shape:

FieldPurpose
idStable catalog id. Prefer this when calling follow-up tools.
itemTypeEntry kind, such as table or column.
nameHuman-readable short name.
qualified_nameFully qualified name when the entity has one.
contentCompact description or semantic summary.
tagsSearch and filtering labels.
relatedLinks to other catalog entries.
metadataTyped details such as database id, schema, data type, or row count.

Scope and identity

Catalog data is scoped by tenant and workspace. Shared catalog backends should store that scope as first-class storage fields so two workspaces can share the same backend without mixing entries.

database_id is different. It identifies a logical target database inside a workspace, such as warehouse or billing. Use the same database_id when profiling a database and ingesting knowledge that links to that database.

Do not use database_id as an authorization boundary, and do not confuse it with the catalog storage database.

Catalog tools

Agents access the catalog through the built-in catalog toolkit. Attach it only to agents that should inspect catalog metadata or run read-only data queries:

analyst = define_specialist(
    name="data_analyst",
    model="claude-sonnet-4-6",
    prompt="Use catalog tools to answer data questions.",
    toolkits=["catalog"],
)

The tools form a staged workflow:

ToolUse it for
search_catalogDiscover candidate entities from a phrase or identifier.
get_catalog_entitiesHydrate selected ids or qualified names into typed details.
list_schema_fieldsInspect columns, data types, keys, and field profiles.
get_catalog_relationsFetch adjacent graph context for selected entities.
get_relation_paths_betweenFind join paths or semantic paths between endpoints.
sample_table_dataRead a small exploratory sample from a selected table.
execute_queryRun a validated read-only SELECT or WITH query.

Use execute_query as the final read step, after the agent has confirmed the tables, fields, joins, filters, and semantic rules it needs.

Storage setup

Catalog tools get their storage and data dependencies from create_runtime(..., data_environment=...). Each field has a separate job:

FieldSupported kindsPurpose
catalogempty, inline, sqlite, postgresCatalog entities and relations.
catalog_searchlocal index configSearch index for search_catalog.
target_databasesqlite, postgresRead-only source for samples and SQL.
target_database_urlURL shorthandSimple target database connection shortcut.
kvmemory, sqlite, postgres, redisFull document and knowledge payload storage.

catalog and catalog_search are separate. The catalog stores entities. The search index makes those entities discoverable.

Use inline catalogs for tests, examples, and committed export artifacts. Use sqlite or postgres catalogs when profiling, Studio, or knowledge ingestion should write durable entries.

Local durable setup

Use local SQLite files when developing against a local target database:

data_environment = {
    "tenant_id": "acme",
    "workspace_id": "analytics",
    "target_database": {
        "kind": "sqlite",
        "url": "sqlite:.data/target.db",
    },
    "catalog": {
        "kind": "sqlite",
        "url": "sqlite:.data/catalog.db",
        "ensure_schema": True,
    },
    "catalog_search": {
        "index_path": ".data/catalog-index",
        "rebuild_on_start": True,
        "write_through": True,
    },
}

Reproducible artifact setup

Use an inline catalog when the catalog is an input artifact, such as an exported catalog.entries.json committed with an example or test:

data_environment = {
    "target_database_url": "sqlite:.data/target.db",
    "catalog": {
        "kind": "inline",
        "entries": catalog_entries,
    },
    "catalog_search": {
        "index_path": ".data/catalog-index",
        "rebuild_on_start": True,
    },
}

Inline catalogs are read-only runtime inputs. Profiling and knowledge ingestion need a writable sqlite or postgres catalog.

Shared deployment setup

Use environment-backed Postgres and Redis descriptors when the application runs against shared services:

data_environment = {
    "tenant_id": "acme",
    "workspace_id": "analytics",
    "target_database": {
        "kind": "postgres",
        "url_env": "ACME_WAREHOUSE_URL",
        "schema": "public",
    },
    "catalog": {
        "kind": "postgres",
        "url_env": "FLOWAI_CATALOG_URL",
        "ensure_schema": True,
    },
    "kv": {
        "kind": "redis",
        "url_env": "FLOWAI_REDIS_URL",
        "prefix": "acme:analytics",
    },
    "catalog_search": {
        "index_path": "/var/lib/flowai/catalog-index",
        "write_through": True,
    },
}

Use url_env for credentialed services so connection strings stay out of checked-in config.

Lifecycle

Catalog entries usually come from one of four paths:

  • profiling a target database into a durable catalog
  • ingesting documents or extracted knowledge into KV and catalog projections
  • loading an exported catalog.entries.json artifact as an inline catalog
  • writing entries through a backend-owned workflow

After entries change, keep catalog_search in sync by using write_through, rebuild_on_start, or the catalog index rebuild command.

Boundaries

Catalog tools are read-oriented. They help agents discover metadata, inspect relationships, sample target data, and run read-only SQL. They do not perform platform writes.

For business mutations, model the change as a typed plan, require approval when needed, and apply the approved action through the action dispatcher.

Common mistakes

  • Treating search_catalog results as final query context.
  • Running SQL before hydrating selected entities and inspecting fields.
  • Inventing joins instead of using catalog relations or relation paths.
  • Confusing catalog storage with the target_database.
  • Using inline or empty catalogs as profiling or ingestion sinks.
  • Enabling search_catalog without configuring catalog_search.
  • Changing database_id between profiling and knowledge ingestion.

See also