Documentation index for AI agents: see /llms.txt. Markdown versions of every page are available at <path>.md or via Accept: text/markdown.
Guides

Debugging System Prompts

This guide helps you find out why an agent's system prompt is not what you expect at the model call: the text is present in Python but missing, stale, or different by the time...

This guide helps you find out why an agent's system prompt is not what you expect at the model call: the text is present in Python but missing, stale, or different by the time the provider sees it.

When to use this guide

Use this guide when prompt text looks right in layered_prompt(...) but model behavior suggests the agent received something else, or when you need to work out which stage of the pipeline dropped or replaced it. For everyday prompt authoring, start with layered_prompt and the Prompts concept page.

Where the prompt comes from

The system prompt is rendered exactly once, in Python, and then copied unchanged down the stack: layered_prompt(...) (or a plain str) becomes AgentSpec.system_prompt, is serialized as systemPrompt in the runtime wire JSON, crosses the PyO3 bridge into flowai_runtime::RuntimeSpec, is copied into the framework AgentRegistration, embedded in a pure ChatProgram, and finally handed to Rig as the provider preamble. No layer rewrites the text.

For the full walk through each stage, see How a prompt reaches the model.

Because the text is copied, not recomputed, a wrong prompt at the model means one of two things: the wrong text went in at the top, or you are looking at a different agent, runtime, or interpreter than you think.

Debugging checklist

Check the path in pipeline order. The first stage where the text is wrong is the stage to fix.

  1. Check the rendered text. Print or assert str(layered_prompt(...)) before passing it to define_*. Remember that empty sections are omitted and structured sections render as sorted JSON.
  2. Check the agent spec. Inspect agent.system_prompt on the Python AgentSpec. It should be the exact rendered text.
  3. Check the wire JSON. Inspect runtime_spec.model_dump(by_alias=True, mode="json") and confirm the agent entry has the expected systemPrompt.
  4. Check which spec the runtime got. Confirm create_runtime(...) is using the expected RuntimeSpec, not a stale copy built earlier in the process or in another module.
  5. Check which agent is being invoked. query(...) invokes the coordinator, while run_specialist(...) directly invokes a specialist. A correct prompt on the wrong agent looks identical to a wrong prompt.
  6. Check the interpreter. Confirm the active interpreter is anthropic or another real provider when debugging LLM behavior. The default noop interpreter, the deterministic testing interpreter (testing={"mock_response": ...}), and interpreter="scripted" are test paths and may echo or bypass normal model behavior.
  7. Check tool wiring separately. If the issue involves tool usage, inspect runtime tool bindings separately from the prompt text. Tool executability is dispatcher wiring, not Markdown in the prompt.

Common causes

  • Tool in the prompt but not executable. layered_prompt(tools=[...]) only renders descriptive text. Executable tools are registered through agent specs, tool bindings, runtime toolkits, and dispatcher wiring. See Tool descriptions versus executable tools.
  • Expecting the cache key downstream. prompt_cache_key stays on the Python AgentSpec; it is excluded from the native wire shape. The Rust runtime only ever sees the prompt text.
  • Expecting history to carry the prompt. Stored conversation memory for stateful agents never contains system messages. The system prompt always comes from the agent registration on every invocation.
  • Two dicts, one cache key. Structured sections serialize with sorted keys, so dicts that differ only in ordering produce identical text and the same cache key. That is by design, not a stale prompt.

See also