Documentation index for AI agents: see /llms.txt. Markdown versions of every page are available at <path>.md or via Accept: text/markdown.
Guides

Knowledge and Documents

Flow AI supports workspace-local knowledge documents in two phases:

Flow AI supports workspace-local knowledge documents in two phases:

  1. flowai-harness data knowledge ingest imports local files into the configured KV store and optionally extracts structured knowledge items.
  2. When a writable catalog is configured, ingestion projects document and extracted knowledge entries into the tenant/workspace catalog scope. Agents inspect those entries through the built-in catalog toolkit.

The runtime no longer exposes a separate knowledge toolkit. Documents and knowledge are catalog entity kinds surfaced by get_catalog_entities, get_catalog_relations, and search_catalog when catalog_search is configured.

Configure storage

Knowledge ingestion always requires kv. If catalog is omitted, ingestion is KV-only and the runtime can still hydrate entries that another process projected into the catalog. If catalog is present during ingestion, it must be writable: use sqlite or postgres, not inline or empty.

{
  "tenant_id": "acme",
  "workspace_id": "analytics",
  "kv": {
    "kind": "sqlite",
    "url": "sqlite:.data/flowai-kv.db",
    "ensure_schema": true
  },
  "catalog": {
    "kind": "sqlite",
    "url": "sqlite:.data/flowai-catalog.db",
    "ensure_schema": true
  },
  "catalog_search": {
    "index_path": ".data/catalog-index",
    "rebuild_on_start": true,
    "write_through": true
  },
  "target_database": {
    "kind": "postgres",
    "url_env": "ACME_WAREHOUSE_URL",
    "schema": "public"
  }
}

All storage keys are written in snake_case here to match the Python examples; camelCase aliases (catalogSearch, indexPath, …) are also accepted. In target_database, url_env names the environment variable the connection URL is read from at startup, so the credentialed URL itself stays out of the config file.

The catalog is scoped by tenant and workspace. Knowledge projection uses that scope when generating document and knowledge catalog ids, so two workspaces can share the same catalog backend without colliding.

Ingest documents

Preview the directory before running an ingest command. Ingestion writes durable state, and --extract-knowledge can call an LLM.

find ./knowledge -type f

Document-only ingest stores DocumentItem payloads and content hashes in KV:

flowai-harness --data-environment data-environment.json --output ndjson \
  data knowledge ingest \
  --tenant-id acme \
  --workspace-id analytics \
  --database-id warehouse \
  --local-dir ./knowledge \
  --ext md \
  --ext txt

With --extract-knowledge, the command also extracts KnowledgeItem payloads. It requires ANTHROPIC_API_KEY or --anthropic-api-key. Use FLOWAI_KNOWLEDGE_ANTHROPIC_MODEL or --anthropic-model to select the extraction model.

flowai-harness --data-environment data-environment.json --output ndjson \
  data knowledge ingest \
  --tenant-id acme \
  --workspace-id analytics \
  --database-id warehouse \
  --local-dir ./knowledge \
  --ext md \
  --extract-knowledge

When a writable catalog is configured, completion means both KV persistence and catalog projection succeeded. If catalog projection fails, the command emits an error instead of a completed event.

Verify the ingest

With --output ndjson, a successful run ends with a completed event that accounts for every scanned file:

{"type":"completed","scanned":1,"new":1,"skippedDuplicate":0,"errors":[]}

To confirm the catalog projection, export the same catalog scope and check the summary for a document entry:

flowai-harness --data-environment data-environment.json \
  data catalog export \
  --tenant-id acme \
  --workspace-id analytics \
  --out catalog.entries.json
output_path: catalog.entries.json
entries_written: 1
  document: 1

Agents see the same entry through the catalog toolkit: search_catalog for discovery, or get_catalog_entities once ids are known.

Attach the toolkit

Add catalog to any specialist that should retrieve workspace documents or extracted knowledge catalog entries:

from flowai_harness import (
    create_runtime,
    define_runtime,
    define_specialist,
    define_tenant,
)

tenant = define_tenant("acme", "v1")
knowledge_reader = define_specialist(
    "knowledge_reader",
    model="claude-sonnet-4-6",
    prompt="Use workspace knowledge before answering data-policy questions.",
    toolkits=["catalog"],
)

runtime = create_runtime(
    define_runtime(
        tenant=tenant,
        agents=[knowledge_reader],
        providers={"anthropic": {"apiKeyEnv": "ANTHROPIC_API_KEY"}},
    ),
    data_environment={
        "tenant_id": "acme",
        "workspace_id": "analytics",
        "kv": {"kind": "sqlite", "url": "sqlite:.data/flowai-kv.db"},
        "catalog": {"kind": "sqlite", "url": "sqlite:.data/flowai-catalog.db"},
        "catalog_search": {
            "index_path": ".data/catalog-index",
            "rebuild_on_start": True,
            "write_through": True,
        },
    },
)

Inline catalogs are useful for tests and examples. They can drive catalog hydration, graph tools, and search_catalog when paired with catalog_search, but they cannot receive ingestion output:

runtime = create_runtime(
    define_runtime(
        tenant=tenant,
        agents=[knowledge_reader],
        providers={"anthropic": {"apiKey": "unused"}},
    ),
    interpreter="scripted",
    data_environment={
        "kv": {"kind": "memory"},
        "catalog": {
            "kind": "inline",
            "entries": [
                {
                    "id": "document:revenue-guide",
                    "itemType": "document",
                    "name": "Revenue Guide",
                    "qualified_name": None,
                    "content": "Catalog preview for revenue guidance.",
                    "tags": ["[TYPE:document]"],
                    "related": [],
                    "metadata": {
                        "sourceDocumentId": "doc-1",
                        "extractionStatus": "processed"
                    },
                }
            ],
        },
        "catalog_search": {
            "index_path": ".data/catalog-index",
            "rebuild_on_start": True,
        },
    },
)

Toolkit tools

Use the built-in catalog toolkit for document and knowledge entries:

ToolPurpose
get_catalog_entitiesHydrate known document or knowledge catalog ids and return typed details.
get_catalog_relationsTraverse document/knowledge relations, including extracted-from and applies-to edges.
search_catalogSearch documents and knowledge through the configured catalog search index.

Example input for known ids:

{
  "refs": [
    { "id": "document:revenue-guide" },
    { "id": "knowledge:revenue-rule" }
  ]
}

The response includes catalog entities. Full document bodies remain in the ingestion KV store; the public catalog toolkit returns the catalog projection and typed metadata, not the old KV-hydrated document-content envelope.

{
  "entities": [
    {
      "id": "document:revenue-guide",
      "kind": "document",
      "name": "Revenue Guide"
    }
  ],
  "missing": [],
  "warnings": []
}

Common errors

SymptomFix
--database-id <DATABASE_ID> is missingPass the catalog database id used when profiling the target schema, for example --database-id warehouse.
knowledge ingestion database_id must not be blankPass a non-empty --database-id; there is no default fallback because schema links must target a known catalog database.
knowledge catalog projection found ... missing scope targetsProfile the target schema first, or use the same --database-id used by profiling.
knowledge ingestion requires data_environment.kvAdd kv to the data environment.
kind=inline is read-only or kind=empty is read-only during ingestionUse a writable catalog backend or omit catalog for KV-only ingestion.
Toolkit returns a DataCatalog missing-dependency errorAttach data_environment["catalog"] to the runtime or MCP server.
create_runtime reports that catalog search is not configuredAdd catalog_search.index_path to the data environment for agents that select the catalog toolkit. Use rebuild_on_start = true or run flowai-harness data catalog index rebuild before the first search request.
ANTHROPIC_API_KEY is required when --extract-knowledge is enabledLoad .env, set ANTHROPIC_API_KEY, or pass --anthropic-api-key.