Knowledge and Documents
Flow AI supports workspace-local knowledge documents in two phases:
Flow AI supports workspace-local knowledge documents in two phases:
flowai-harness data knowledge ingestimports local files into the configured KV store and optionally extracts structured knowledge items.- When a writable catalog is configured, ingestion projects document and
extracted knowledge entries into the tenant/workspace catalog scope. Agents
inspect those entries through the built-in
catalogtoolkit.
The runtime no longer exposes a separate knowledge toolkit. Documents and
knowledge are catalog entity kinds surfaced by get_catalog_entities,
get_catalog_relations, and search_catalog when catalog_search is
configured.
Configure storage
Knowledge ingestion always requires kv. If catalog is omitted, ingestion is
KV-only and the runtime can still hydrate entries that another process projected
into the catalog. If catalog is present during ingestion, it must be writable:
use sqlite or postgres, not inline or empty.
{
"tenant_id": "acme",
"workspace_id": "analytics",
"kv": {
"kind": "sqlite",
"url": "sqlite:.data/flowai-kv.db",
"ensure_schema": true
},
"catalog": {
"kind": "sqlite",
"url": "sqlite:.data/flowai-catalog.db",
"ensure_schema": true
},
"catalog_search": {
"index_path": ".data/catalog-index",
"rebuild_on_start": true,
"write_through": true
},
"target_database": {
"kind": "postgres",
"url_env": "ACME_WAREHOUSE_URL",
"schema": "public"
}
}All storage keys are written in snake_case here to match the Python examples;
camelCase aliases (catalogSearch, indexPath, …) are also accepted. In
target_database, url_env names the environment variable the connection URL
is read from at startup, so the credentialed URL itself stays out of the config
file.
The catalog is scoped by tenant and workspace. Knowledge projection uses that scope when generating document and knowledge catalog ids, so two workspaces can share the same catalog backend without colliding.
Ingest documents
Preview the directory before running an ingest command. Ingestion writes durable
state, and --extract-knowledge can call an LLM.
find ./knowledge -type fDocument-only ingest stores DocumentItem payloads and content hashes in KV:
flowai-harness --data-environment data-environment.json --output ndjson \
data knowledge ingest \
--tenant-id acme \
--workspace-id analytics \
--database-id warehouse \
--local-dir ./knowledge \
--ext md \
--ext txtWith --extract-knowledge, the command also extracts KnowledgeItem payloads.
It requires ANTHROPIC_API_KEY or --anthropic-api-key. Use
FLOWAI_KNOWLEDGE_ANTHROPIC_MODEL or --anthropic-model to select the
extraction model.
flowai-harness --data-environment data-environment.json --output ndjson \
data knowledge ingest \
--tenant-id acme \
--workspace-id analytics \
--database-id warehouse \
--local-dir ./knowledge \
--ext md \
--extract-knowledgeWhen a writable catalog is configured, completion means both KV persistence and catalog projection succeeded. If catalog projection fails, the command emits an error instead of a completed event.
Verify the ingest
With --output ndjson, a successful run ends with a completed event that
accounts for every scanned file:
{"type":"completed","scanned":1,"new":1,"skippedDuplicate":0,"errors":[]}To confirm the catalog projection, export the same catalog scope and check the
summary for a document entry:
flowai-harness --data-environment data-environment.json \
data catalog export \
--tenant-id acme \
--workspace-id analytics \
--out catalog.entries.jsonoutput_path: catalog.entries.json
entries_written: 1
document: 1Agents see the same entry through the catalog toolkit: search_catalog for
discovery, or get_catalog_entities once ids are known.
Attach the toolkit
Add catalog to any specialist that should retrieve workspace documents or
extracted knowledge catalog entries:
from flowai_harness import (
create_runtime,
define_runtime,
define_specialist,
define_tenant,
)
tenant = define_tenant("acme", "v1")
knowledge_reader = define_specialist(
"knowledge_reader",
model="claude-sonnet-4-6",
prompt="Use workspace knowledge before answering data-policy questions.",
toolkits=["catalog"],
)
runtime = create_runtime(
define_runtime(
tenant=tenant,
agents=[knowledge_reader],
providers={"anthropic": {"apiKeyEnv": "ANTHROPIC_API_KEY"}},
),
data_environment={
"tenant_id": "acme",
"workspace_id": "analytics",
"kv": {"kind": "sqlite", "url": "sqlite:.data/flowai-kv.db"},
"catalog": {"kind": "sqlite", "url": "sqlite:.data/flowai-catalog.db"},
"catalog_search": {
"index_path": ".data/catalog-index",
"rebuild_on_start": True,
"write_through": True,
},
},
)Inline catalogs are useful for tests and examples. They can drive catalog
hydration, graph tools, and search_catalog when paired with
catalog_search, but they cannot receive ingestion output:
runtime = create_runtime(
define_runtime(
tenant=tenant,
agents=[knowledge_reader],
providers={"anthropic": {"apiKey": "unused"}},
),
interpreter="scripted",
data_environment={
"kv": {"kind": "memory"},
"catalog": {
"kind": "inline",
"entries": [
{
"id": "document:revenue-guide",
"itemType": "document",
"name": "Revenue Guide",
"qualified_name": None,
"content": "Catalog preview for revenue guidance.",
"tags": ["[TYPE:document]"],
"related": [],
"metadata": {
"sourceDocumentId": "doc-1",
"extractionStatus": "processed"
},
}
],
},
"catalog_search": {
"index_path": ".data/catalog-index",
"rebuild_on_start": True,
},
},
)Toolkit tools
Use the built-in catalog toolkit for document and knowledge entries:
| Tool | Purpose |
|---|---|
get_catalog_entities | Hydrate known document or knowledge catalog ids and return typed details. |
get_catalog_relations | Traverse document/knowledge relations, including extracted-from and applies-to edges. |
search_catalog | Search documents and knowledge through the configured catalog search index. |
Example input for known ids:
{
"refs": [
{ "id": "document:revenue-guide" },
{ "id": "knowledge:revenue-rule" }
]
}The response includes catalog entities. Full document bodies remain in the ingestion KV store; the public catalog toolkit returns the catalog projection and typed metadata, not the old KV-hydrated document-content envelope.
{
"entities": [
{
"id": "document:revenue-guide",
"kind": "document",
"name": "Revenue Guide"
}
],
"missing": [],
"warnings": []
}Common errors
| Symptom | Fix |
|---|---|
--database-id <DATABASE_ID> is missing | Pass the catalog database id used when profiling the target schema, for example --database-id warehouse. |
knowledge ingestion database_id must not be blank | Pass a non-empty --database-id; there is no default fallback because schema links must target a known catalog database. |
knowledge catalog projection found ... missing scope targets | Profile the target schema first, or use the same --database-id used by profiling. |
knowledge ingestion requires data_environment.kv | Add kv to the data environment. |
kind=inline is read-only or kind=empty is read-only during ingestion | Use a writable catalog backend or omit catalog for KV-only ingestion. |
Toolkit returns a DataCatalog missing-dependency error | Attach data_environment["catalog"] to the runtime or MCP server. |
create_runtime reports that catalog search is not configured | Add catalog_search.index_path to the data environment for agents that select the catalog toolkit. Use rebuild_on_start = true or run flowai-harness data catalog index rebuild before the first search request. |
ANTHROPIC_API_KEY is required when --extract-knowledge is enabled | Load .env, set ANTHROPIC_API_KEY, or pass --anthropic-api-key. |
Profiling and Catalog Export
Profiling and ingestion are a **dev/operator workflow owned by the CLI**, not a runtime-construction step. You profile a read-only target database once, persist the resulting...
Expose Tools Over MCP
flowai-harness can expose runtime tools as Model Context Protocol (MCP) servers over stdio or Streamable HTTP. Use this when an MCP-aware client should call Python-defined custom...
