Agent (cersei-agent)
Agent builder, agentic loop, streaming events, auto-compact, sub-agents, coordinator mode.
cersei-agent
The high-level Agent API. Builder pattern for configuration, an agentic loop that handles tool dispatch and multi-turn conversations, a 26-variant event system for observation, and automatic context management.
Most users interact with Cersei through this crate. The cersei facade re-exports Agent, AgentBuilder, AgentOutput, AgentEvent, and AgentStream directly.
Agent Builder
Every agent starts with a builder. The only required field is .provider() — everything else has sensible defaults.
let agent = Agent::builder()
.provider(Anthropic::from_env()?)
.tools(cersei::tools::coding())
.system_prompt("You are a coding assistant.")
.model("claude-sonnet-4-6")
.max_turns(10)
.max_tokens(16384)
.permission_policy(AllowAll)
.build()?;Builder Methods
Prop
Type
.build() returns Result<Agent>. It can fail if the provider requires authentication that isn't configured.
Execution Modes
One-Shot (shorthand)
The simplest way to run an agent — builds, executes, and returns in one call:
let output = Agent::builder()
.provider(Anthropic::from_env()?)
.tools(cersei::tools::coding())
.run_with("Fix the failing tests")
.await?;
println!("{}", output.text());
println!("Turns: {}, Tool calls: {}", output.turns, output.tool_calls.len());Blocking (reusable agent)
Build once, run multiple times. The agent maintains conversation history across calls:
let agent = Agent::builder()
.provider(Anthropic::from_env()?)
.tools(cersei::tools::coding())
.build()?;
// First turn
let output1 = agent.run("What files are in src/?").await?;
// Second turn — the agent remembers the first
let output2 = agent.run("Now fix the bug in main.rs").await?;Streaming (events in real-time)
Returns an AgentStream that yields events as they happen. Supports bidirectional control — you can respond to permission requests, inject messages, or cancel mid-stream.
let mut stream = agent.run_stream("Deploy the application");
while let Some(event) = stream.next().await {
match event {
AgentEvent::TextDelta(t) => print!("{t}"),
AgentEvent::ThinkingDelta(t) => { /* thinking content */ }
AgentEvent::ToolStart { name, input, .. } => {
eprintln!("\n[Tool: {name}]");
}
AgentEvent::ToolEnd { name, duration, is_error, .. } => {
let status = if is_error { "FAIL" } else { "OK" };
eprintln!("[{name}: {status} in {}ms]", duration.as_millis());
}
AgentEvent::PermissionRequired(req) => {
// Interactive approval
stream.respond_permission(req.id, PermissionDecision::Allow);
}
AgentEvent::Complete(output) => {
eprintln!("\nDone: {} turns", output.turns);
break;
}
AgentEvent::Error(msg) => {
eprintln!("Error: {msg}");
break;
}
_ => {}
}
}AgentOutput
Returned by run(), run_with(), and AgentStream::collect().
pub struct AgentOutput {
pub message: Message,
pub usage: Usage,
pub stop_reason: StopReason,
pub turns: u32,
pub tool_calls: Vec<ToolCallRecord>,
}Prop
Type
// Access the text response
let text = output.text(); // -> &str
// Inspect tool calls
for call in &output.tool_calls {
println!("{}: {}ms (error: {})", call.name, call.duration.as_millis(), call.is_error);
}AgentEvent
26 variants covering every observable moment in the agentic loop:
Content Events
| Variant | Fields | Emitted When |
|---|---|---|
TextDelta(String) | text chunk | Model streams a text token |
ThinkingDelta(String) | thinking chunk | Model streams a thinking token (extended thinking) |
Tool Events
| Variant | Fields | Emitted When |
|---|---|---|
ToolStart { name, id, input } | tool name, call ID, JSON input | Tool dispatch begins |
ToolEnd { name, id, result, is_error, duration } | tool name, call ID, result text, error flag, wall-clock time | Tool execution completes |
Lifecycle Events
| Variant | Fields | Emitted When |
|---|---|---|
TurnComplete { turn, usage } | turn number, token usage | One model call + tool cycle finishes |
TokenWarning { pct_used, state } | context % used, warning/critical | Context window approaching limit |
CompactStart { reason } | threshold/manual/overflow | Context compaction begins |
CompactEnd { messages_after, tokens_freed } | remaining messages, tokens reclaimed | Compaction finishes |
SessionLoaded { session_id, message_count } | session ID, number of messages | Session resumed from memory |
Control Events
| Variant | Fields | Emitted When |
|---|---|---|
PermissionRequired(PermissionRequest) | tool name, description, level | Tool needs approval (interactive policy) |
CostUpdate { turn_cost, cumulative_cost, input_tokens, output_tokens } | costs and token counts | After each model call |
SubAgentSpawned { agent_id, prompt } | sub-agent ID, task description | Sub-agent created |
SubAgentComplete { agent_id, result } | sub-agent ID, output | Sub-agent finished |
Terminal Events
| Variant | Fields | Emitted When |
|---|---|---|
Status(String) | status message | Informational update |
Error(String) | error message | Unrecoverable error |
Complete(AgentOutput) | final output | Agent loop finished successfully |
AgentStream
Bidirectional control channel. Receive events and send commands back.
Methods
Prop
Type
Effort Levels
Control thinking depth and temperature via a single setting:
use cersei_agent::effort::EffortLevel;
let effort = EffortLevel::from_str("max");
let budget = effort.thinking_budget_tokens(); // 32768
let temp = effort.temperature(); // Some(1.0)Prop
Type
Auto-Compact
When the conversation approaches the context window limit, the agent automatically summarizes older messages to free space. This happens transparently — the model continues working without interruption.
Agent::builder()
.auto_compact(true) // enable
.compact_threshold(0.9) // trigger at 90% of context window
.tool_result_budget(50_000) // also truncate oldest tool results above 50K charsThe compaction pipeline:
- Count tokens in the current conversation
- If above threshold, group old messages by topic
- Call the LLM to summarize each group
- Replace original messages with summaries
- Free tool results above the budget
Events emitted: CompactStart, CompactEnd (with messages_after and tokens_freed).
System Prompt Caching
The system prompt is split into two sections separated by __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__:
[Static section — cached by the provider]
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__
[Dynamic section — rebuilt each turn]Agent::builder()
.system_prompt("You are a coding assistant. Always use Rust.") // static, cached
.append_system_prompt("Current time: 2024-01-01") // dynamic, per-turnAnthropic's prompt caching caches everything before the boundary. This saves tokens and latency on multi-turn conversations.