Cersei

Agent (cersei-agent)

Agent builder, agentic loop, streaming events, auto-compact, sub-agents, coordinator mode.

cersei-agent

The high-level Agent API. Builder pattern for configuration, an agentic loop that handles tool dispatch and multi-turn conversations, a 26-variant event system for observation, and automatic context management.

Most users interact with Cersei through this crate. The cersei facade re-exports Agent, AgentBuilder, AgentOutput, AgentEvent, and AgentStream directly.


Agent Builder

Every agent starts with a builder. The only required field is .provider() — everything else has sensible defaults.

let agent = Agent::builder()
    .provider(Anthropic::from_env()?)
    .tools(cersei::tools::coding())
    .system_prompt("You are a coding assistant.")
    .model("claude-sonnet-4-6")
    .max_turns(10)
    .max_tokens(16384)
    .permission_policy(AllowAll)
    .build()?;

Builder Methods

Prop

Type

.build() returns Result<Agent>. It can fail if the provider requires authentication that isn't configured.


Execution Modes

One-Shot (shorthand)

The simplest way to run an agent — builds, executes, and returns in one call:

let output = Agent::builder()
    .provider(Anthropic::from_env()?)
    .tools(cersei::tools::coding())
    .run_with("Fix the failing tests")
    .await?;

println!("{}", output.text());
println!("Turns: {}, Tool calls: {}", output.turns, output.tool_calls.len());

Blocking (reusable agent)

Build once, run multiple times. The agent maintains conversation history across calls:

let agent = Agent::builder()
    .provider(Anthropic::from_env()?)
    .tools(cersei::tools::coding())
    .build()?;

// First turn
let output1 = agent.run("What files are in src/?").await?;

// Second turn — the agent remembers the first
let output2 = agent.run("Now fix the bug in main.rs").await?;

Streaming (events in real-time)

Returns an AgentStream that yields events as they happen. Supports bidirectional control — you can respond to permission requests, inject messages, or cancel mid-stream.

let mut stream = agent.run_stream("Deploy the application");

while let Some(event) = stream.next().await {
    match event {
        AgentEvent::TextDelta(t) => print!("{t}"),
        AgentEvent::ThinkingDelta(t) => { /* thinking content */ }
        AgentEvent::ToolStart { name, input, .. } => {
            eprintln!("\n[Tool: {name}]");
        }
        AgentEvent::ToolEnd { name, duration, is_error, .. } => {
            let status = if is_error { "FAIL" } else { "OK" };
            eprintln!("[{name}: {status} in {}ms]", duration.as_millis());
        }
        AgentEvent::PermissionRequired(req) => {
            // Interactive approval
            stream.respond_permission(req.id, PermissionDecision::Allow);
        }
        AgentEvent::Complete(output) => {
            eprintln!("\nDone: {} turns", output.turns);
            break;
        }
        AgentEvent::Error(msg) => {
            eprintln!("Error: {msg}");
            break;
        }
        _ => {}
    }
}

AgentOutput

Returned by run(), run_with(), and AgentStream::collect().

pub struct AgentOutput {
    pub message: Message,
    pub usage: Usage,
    pub stop_reason: StopReason,
    pub turns: u32,
    pub tool_calls: Vec<ToolCallRecord>,
}

Prop

Type

// Access the text response
let text = output.text();  // -> &str

// Inspect tool calls
for call in &output.tool_calls {
    println!("{}: {}ms (error: {})", call.name, call.duration.as_millis(), call.is_error);
}

AgentEvent

26 variants covering every observable moment in the agentic loop:

Content Events

VariantFieldsEmitted When
TextDelta(String)text chunkModel streams a text token
ThinkingDelta(String)thinking chunkModel streams a thinking token (extended thinking)

Tool Events

VariantFieldsEmitted When
ToolStart { name, id, input }tool name, call ID, JSON inputTool dispatch begins
ToolEnd { name, id, result, is_error, duration }tool name, call ID, result text, error flag, wall-clock timeTool execution completes

Lifecycle Events

VariantFieldsEmitted When
TurnComplete { turn, usage }turn number, token usageOne model call + tool cycle finishes
TokenWarning { pct_used, state }context % used, warning/criticalContext window approaching limit
CompactStart { reason }threshold/manual/overflowContext compaction begins
CompactEnd { messages_after, tokens_freed }remaining messages, tokens reclaimedCompaction finishes
SessionLoaded { session_id, message_count }session ID, number of messagesSession resumed from memory

Control Events

VariantFieldsEmitted When
PermissionRequired(PermissionRequest)tool name, description, levelTool needs approval (interactive policy)
CostUpdate { turn_cost, cumulative_cost, input_tokens, output_tokens }costs and token countsAfter each model call
SubAgentSpawned { agent_id, prompt }sub-agent ID, task descriptionSub-agent created
SubAgentComplete { agent_id, result }sub-agent ID, outputSub-agent finished

Terminal Events

VariantFieldsEmitted When
Status(String)status messageInformational update
Error(String)error messageUnrecoverable error
Complete(AgentOutput)final outputAgent loop finished successfully

AgentStream

Bidirectional control channel. Receive events and send commands back.

Methods

Prop

Type


Effort Levels

Control thinking depth and temperature via a single setting:

use cersei_agent::effort::EffortLevel;

let effort = EffortLevel::from_str("max");
let budget = effort.thinking_budget_tokens();  // 32768
let temp = effort.temperature();               // Some(1.0)

Prop

Type


Auto-Compact

When the conversation approaches the context window limit, the agent automatically summarizes older messages to free space. This happens transparently — the model continues working without interruption.

Agent::builder()
    .auto_compact(true)           // enable
    .compact_threshold(0.9)       // trigger at 90% of context window
    .tool_result_budget(50_000)   // also truncate oldest tool results above 50K chars

The compaction pipeline:

  1. Count tokens in the current conversation
  2. If above threshold, group old messages by topic
  3. Call the LLM to summarize each group
  4. Replace original messages with summaries
  5. Free tool results above the budget

Events emitted: CompactStart, CompactEnd (with messages_after and tokens_freed).


System Prompt Caching

The system prompt is split into two sections separated by __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__:

[Static section — cached by the provider]
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__
[Dynamic section — rebuilt each turn]
Agent::builder()
    .system_prompt("You are a coding assistant. Always use Rust.")  // static, cached
    .append_system_prompt("Current time: 2024-01-01")               // dynamic, per-turn

Anthropic's prompt caching caches everything before the boundary. This saves tokens and latency on multi-turn conversations.

On this page