Debugging System Prompts
This guide helps you find out why an agent's system prompt is not what you expect at the model call: the text is present in Python but missing, stale, or different by the time...
This guide helps you find out why an agent's system prompt is not what you expect at the model call: the text is present in Python but missing, stale, or different by the time the provider sees it.
When to use this guide
Use this guide when prompt text looks right in layered_prompt(...) but model
behavior suggests the agent received something else, or when you need to work
out which stage of the pipeline dropped or replaced it. For everyday prompt
authoring, start with
layered_prompt
and the Prompts concept page.
Where the prompt comes from
The system prompt is rendered exactly once, in Python, and then copied
unchanged down the stack: layered_prompt(...) (or a plain str) becomes
AgentSpec.system_prompt, is serialized as systemPrompt in the runtime wire
JSON, crosses the PyO3 bridge into flowai_runtime::RuntimeSpec, is copied
into the framework AgentRegistration, embedded in a pure ChatProgram, and
finally handed to Rig as the provider preamble. No layer rewrites the text.
For the full walk through each stage, see How a prompt reaches the model.
Because the text is copied, not recomputed, a wrong prompt at the model means one of two things: the wrong text went in at the top, or you are looking at a different agent, runtime, or interpreter than you think.
Debugging checklist
Check the path in pipeline order. The first stage where the text is wrong is the stage to fix.
- Check the rendered text. Print or assert
str(layered_prompt(...))before passing it todefine_*. Remember that empty sections are omitted and structured sections render as sorted JSON. - Check the agent spec. Inspect
agent.system_prompton the PythonAgentSpec. It should be the exact rendered text. - Check the wire JSON. Inspect
runtime_spec.model_dump(by_alias=True, mode="json")and confirm the agent entry has the expectedsystemPrompt. - Check which spec the runtime got. Confirm
create_runtime(...)is using the expectedRuntimeSpec, not a stale copy built earlier in the process or in another module. - Check which agent is being invoked.
query(...)invokes the coordinator, whilerun_specialist(...)directly invokes a specialist. A correct prompt on the wrong agent looks identical to a wrong prompt. - Check the interpreter. Confirm the active interpreter is
anthropicor another real provider when debugging LLM behavior. The defaultnoopinterpreter, the deterministic testing interpreter (testing={"mock_response": ...}), andinterpreter="scripted"are test paths and may echo or bypass normal model behavior. - Check tool wiring separately. If the issue involves tool usage, inspect runtime tool bindings separately from the prompt text. Tool executability is dispatcher wiring, not Markdown in the prompt.
Common causes
- Tool in the prompt but not executable.
layered_prompt(tools=[...])only renders descriptive text. Executable tools are registered through agent specs, tool bindings, runtime toolkits, and dispatcher wiring. See Tool descriptions versus executable tools. - Expecting the cache key downstream.
prompt_cache_keystays on the PythonAgentSpec; it is excluded from the native wire shape. The Rust runtime only ever sees the prompt text. - Expecting history to carry the prompt. Stored conversation memory for stateful agents never contains system messages. The system prompt always comes from the agent registration on every invocation.
- Two dicts, one cache key. Structured sections serialize with sorted keys, so dicts that differ only in ordering produce identical text and the same cache key. That is by design, not a stale prompt.
See also
- How a prompt reaches the model — the full pipeline explanation.
layered_promptreference- Testing — the deterministic testing and scripted interpreter paths.
Studio
Studio is the local browser interface for a flowai-harness app. It runs next to your Python runtime and lets you inspect agents, chat with the entrypoint, browse attached data...
Testing
The harness ships two deterministic interpreters so you can unit-test your agent topology without a live model provider:
