Changelog

0.1.92026-05-17

added New crate: cersei-vms. Sandbox & VM isolation layer for coding agents. Lets the agent run shell commands and edit files inside isolated environments instead of on the host, and lets multiple parallel agents run in separate sandboxes that share state through host-mediated primitives. Docs: Overview · API · Cookbook.

added Core traits. cersei_vms::runtime::{SandboxRuntime, Sandbox} plus per-sandbox Commands and Filesystem surfaces. Mirrors E2B's Sandbox / Commands / Filesystem shape almost 1:1, so E2B mental models port directly.

added Backends. LocalProcessRuntime (always-on, no isolation — for tests and --sandbox local) and DockerRuntime (real container isolation via the local docker CLI; feature backend-docker, default-on). Phase 1 ships without bollard / HTTP-over-UDS — DockerRuntime shells out to docker, which works identically on macOS, Linux, and Windows wherever Docker is installed.

added Cross-sandbox primitives in cersei_vms::primitives. Volume registry (host-side dirs bind-mounted into N sandboxes), Mailbox (broadcast pub/sub by topic, backed by tokio::sync::broadcast), KvStore (DashMap + optional journal file, versioned CAS via cas(key, expected_version, value)). All three reachable from sandboxes through a host-side broker; sandboxes never need direct network links between each other.

added First-party snapshots. Sandbox::snapshot() -> SnapshotId and SandboxRuntime::restore(&SnapshotId) -> SandboxHandle. Docker = docker commit for FS state + a JSON SnapshotManifest (env, mounts, mailbox topics, KV checkpoint) in ~/.cersei/vms/snapshots/. Local = directory copy. Survives process restart.

added cersei-envd binary (feature envd, default-on). Tiny Rust JSON-RPC 2.0 daemon meant to be baked into container images for richer in-VM ops; talks to the host over a bind-mounted Unix socket at /run/cersei-envd.sock. Methods: process.run, fs.{read,write,list,stat,mkdir,remove}, ping, info. Reuses the JSON-RPC wire shape from cersei-mcp/src/jsonrpc.rs.

added Reference image. crates/cersei-vms/docker/Dockerfile produces cersei/sandbox-base:latest — Alpine 3.20 + bash + git + cersei-envd, ~8 MB.

added cersei-tools integration behind a new vms feature. Transparent routing in BashTool: when ctx.extensions.get::<Arc<dyn cersei_vms::Sandbox>>() is Some, the bash tool routes through the sandbox; otherwise it falls back to the existing local pproc::exec() path. New agent-facing tools in cersei_tools::vm_tools: SendVmMessage / RecvVmMessage, SharedStateGet / SharedStateSet, SandboxSnapshot.

added cersei facade — cersei::vms re-export behind a new default-on vms feature, so use cersei::prelude::* gains the sandbox surface without an extra import.

added 9 tests pass on Phase 1. 4 LocalProcessRuntime (incl. snapshot round-trip preserving FS + KV state), 2 mailbox (cross-sandbox pub/sub + topic isolation), 3 KvStore (concurrent writes, CAS rejecting stale writers, journal-survives-reopen).

changed Workspace bumped to 0.1.9 (15-crate layout now includes cersei-vms).

removed Deferred to 0.1.10. AgentBuilder::with_sandbox(...), per-task allocator in cersei-agent::delegate, abstract-cli --sandbox flag + /vm slash command, FirecrackerRuntime / E2bRuntime / VercelSandboxRuntime backends, docker pause/unpause and incremental snapshots.

LongMemEval Benchmark2026-04-24

added LongMemEval head-to-head memory benchmark (bench/long-mem/). Runs the 500-question LongMemEval (ICLR 2025) dataset — the same benchmark Mastra, Zep, Supermemory, Hindsight, and EmergenceMem report on — across four Cersei memory configurations. Judge rubric, observer rubric, and context-injection prompts are verbatim ports from Mastra's @mastra/memory so numbers land on the same public leaderboard. Full methodology + reproduction on /docs/bench-memory.

added Final numbers on longmemeval_s (500 Qs, all models gemini-2.5-flash, 2026-04-24):

Hybrid (Observer + embed + graph + RRF): 85.7 % overall, 93.3 % abstention, 432/500 correct, 1.58 M input tokens (34× fewer than baseline). Best config — wins outright on knowledge-update (94.4 %).
Embed-only (EmbeddingMemory over usearch HNSW + gemini-embedding-001): 84.2 % overall, 86.7 % abstention, 429/500, 2.68 M input tokens (20× fewer). Already beats Mastra OM / gpt-4o and Supermemory / gpt-4o on its own.
Baseline (full-context JsonlMemory): 84.6 %, 86.7 % abstention, 422/500, 53.16 M input tokens.
Graph-substring floor: 6.6 %, 100 % abstention.

added Leaderboard position. Cersei Hybrid 85.7 % lands above Supermemory / gemini-3-pro-preview (85.2 %), Supermemory / gpt-5 (84.6 %), Mastra OM / gpt-4o (84.23 %), Mastra RAG / gpt-4o (80.05 %), Zep / gpt-4o (71.2 %), and within 0.3 pp of EmergenceMem Internal on gpt-4o (86.0 %). Remaining gap to Mastra OM / gemini-3-flash-preview (89.2 %) and gpt-5-mini (94.87 %) is model-tier, not algorithm.

added Mastra prompt port — bench/long-mem/src/mastra_prompts.rs ports OBSERVER_EXTRACTION_INSTRUCTIONS, OBSERVER_OUTPUT_FORMAT, OBSERVER_GUIDELINES, OBSERVATION_CONTEXT_PROMPT, OBSERVATION_CONTEXT_INSTRUCTIONS verbatim from @mastra/memory. The context-injection prompt's KNOWLEDGE UPDATES / PLANNED ACTIONS / MOST RECENT USER INPUT clauses are what drive the Hybrid win on knowledge-update (94.4 %).

added cersei-memory::embedding_memory::EmbeddingMemory — thin adapter bridging cersei-embeddings::EmbeddingStore into the Memory trait. Behind the new optional embed feature so consumers opt in.

use cersei_memory::embedding_memory::EmbeddingMemory;
use cersei_embeddings::{GeminiEmbeddings, Metric};
let mem = EmbeddingMemory::new(GeminiEmbeddings::from_env()?, Metric::Cosine)?;

added GraphMemory::recall_top_k(query, limit) -> Vec<(String, f32)> — scored retrieval that re-ranks the substring-match candidates by fraction of query words found. Additive; the existing recall signature is unchanged.

added cersei-embeddings::GeminiEmbeddings rewritten for gemini-embedding-001 (3072-d native, Matryoshka outputDimensionality supported). Uses the embedContent endpoint with bounded concurrent streaming; retries 429 / 5xx / transport errors with exponential backoff (6 attempts, ~30 s window).

fixed cersei-embeddings::OpenAiEmbeddings::embed_batch and GeminiEmbeddings::embed_batch no longer panic on multi-byte UTF-8 input. The pre-call truncation was byte-slicing at index 2000, which crashed when the boundary fell inside a character (Spanish diacritics, emoji, smart quotes). Now walks back to the nearest char boundary.

changed Gemini API key plumbing (SECURITY). Keys now ride the x-goog-api-key header instead of the ?key=… URL query string in both cersei-provider::Gemini and cersei-embeddings::GeminiEmbeddings. The URL contains no secret, so reqwest::Error Display (which prints the URL) cannot leak the key. Added a redact_url_key helper as belt-and-braces; tightened .gitignore to block bench/**/*.log, bench/**/results*/, bench/**/runner-*.sh, bench/**/abstract-output.jsonl, and .env*. Untracked 38 previously-committed bench artifacts that carried keys. Prompted by two historic leaks — see bench-memory for the post-mortem.

0.1.72026-04-20

added New crate: cersei-compression. Structural and command-aware compression for tool outputs. Sits between a tool's raw execute() result and the existing cap_tool_result() safety net, trimming the ANSI codes, Compiling … spam, boilerplate comments, and unchanged function bodies that dominate typical tool output. Three levels: Off (default, byte-for-byte passthrough), Minimal (ANSI + comment stripping, whitespace collapse — safe for JSON/YAML/TOML), Aggressive (adds language-aware body stubbing and declarative TOML rules for git, cargo, npm, pnpm, pytest, docker). Docs: Overview · Benchmarks.

added Credit where it's due — the rule engine, language-aware code filter, and TOML DSL are ports of rtk (Rust Token Killer) by Patrick Szymkowiak, MIT licensed. crates/cersei-compression/LICENSE + the per-module //! Credits: headers document which rtk file each module derives from.

added Agent builder + runtime knob. AgentBuilder::compression_level(level) for build-time, agent.set_compression_level(level) / agent.compression_level() for runtime — shared mutex, takes effect on the next tool call. Wired in the runner at crates/cersei-agent/src/runner.rs:708, before the existing cap_tool_result.

added Abstract CLI controls. --compress <off|minimal|aggressive> flag, ABSTRACT_COMPRESSION env var, compression_level in ~/.abstract/config.toml / .abstract/config.toml, and a live /compression [on|off|minimal|aggressive] slash command that flips the active agent mid-session. /compression with no argument reports the current level.

added Live-provider savings benchmark — crates/cersei-agent/tests/e2e_openai_compression.rs (#[ignore]). Same prompt, same tool, same fixture run twice per provider with only CompressionLevel changing. Token counts are provider-reported (output.usage.input_tokens from OpenAI, usageMetadata.promptTokenCount from Gemini), not our estimate.

OpenAI gpt-4o-mini — 11,576 → 8,202 input tokens (−29.1%, Δ 3,374 tokens); 15 → 13 tool calls over 5 turns.
Google Gemini gemini-2.5-flash — 4,490 → 1,700 input tokens (−62.1%, Δ 2,790 tokens); 1 → 1 tool call over 5 → 3 turns.
Reproduce it — commands, per-call logs, and caveats on the Compression Benchmarks page.
Synthetic floors — git log ≥ 30% at Minimal, cargo test ≥ 25% at Minimal, Off is byte-for-byte identity, Rust source at Aggressive drops bodies but keeps signatures + imports. Source: crates/cersei-compression/tests/savings.rs.

added Per-call observability. Every compress_tool_output invocation emits a structured tracing::info! event on target cersei_compression with tool, level, strategy, detail (matched rule name or detected Language), before_bytes, after_bytes, before_lines, after_lines, and savings_pct. Subscribe anywhere with RUST_LOG=cersei_compression=info.

changed Workspace version bumped to 0.1.7 across every crate (cersei, cersei-agent, cersei-compression, cersei-embeddings, cersei-hooks, cersei-lsp, cersei-mcp, cersei-memory, cersei-provider, cersei-tools, cersei-tools-derive, cersei-types, abstract-cli) via version.workspace = true.

changed cersei-agent::Agent + AgentBuilder gained a compression_level field. Default is CompressionLevel::Off — existing users see zero behavioural change without opting in. The pre-existing cap_tool_result (80+80 head/tail) and apply_tool_result_budget (50k chars, keeps last 6 messages) still run after compression as unconditional safety nets.

0.1.6-patch.22026-04-18

added New crate: cersei-embeddings. Standalone, provider-agnostic text embeddings with a usearch-backed vector index. Ships with GeminiEmbeddings (text-embedding-004, 768-d) and OpenAiEmbeddings (text-embedding-3-small, 1536-d, base-URL overridable). Zero dependency on other cersei-* crates. Docs: Overview · API Reference · Cookbook.

added EmbeddingProvider trait + VectorIndex + EmbeddingStore<P>. Implement the trait once, compose with index and store for free. Built-in Cosine / L2 / InnerProduct metrics with automatic similarity conversion. auto_from_model(&str) factory picks OpenAI or Gemini from an LLM model string.

added General-Agent Framework Benchmark. First-party end-to-end comparison against the Python agent stack — Agno 2.5.17, PydanticAI 1.22.0, LangGraph 1.1.8, CrewAI 1.14.2 — every number measured on Apple M1 Pro via the same harness suite. Methodology mirrors Agno's own cookbook (real agent constructors, no LLM invocation, no stub models). Three new chart components (AgentInstantiationChart, PerAgentMemoryChart, MaxConcurrentChart).

Headline — Cersei 704 B per agent (8× smaller than Agno, 44× smaller than LangGraph). Builds 500 agents concurrently in 4.4 ms on 8.5 MB; CrewAI needs 50,697 ms on 1,739 MB for the same N. That's 11,500× faster wall time, 204× less memory. Cersei sweeps to 10,000 concurrent agents on 22 MB RSS total.
Browse the numbers — Landing page section · Deep dive with all five axes · Comparisons page.
Reproduce it — Rust harness at crates/cersei-agent/benchmarks/general_agent_bench.rs (opt-in via bench-full feature); uv-managed Python harnesses + ./run.sh at bench/general-agents/.

changed cersei-tools::code_search delegates to cersei-embeddings. Inline Gemini / OpenAI embedding HTTP and raw usearch::Index handling removed from the tool. CodeSearchTool::with_embeddings now takes Arc<dyn EmbeddingProvider>.

changed abstract-cli uses cersei_embeddings::auto_from_model instead of inline model-string detection. --embedding-api end-user behaviour unchanged.

fixed cersei-lsp version now inherits from workspace (was hardcoded to 0.1.6, drifted behind every other crate on version bumps).

changed Google provider default model upgraded from gemini-2.0-flash to gemini-3.1-pro-preview (2M context). Affects abstract --model gemini, auto when GOOGLE_API_KEY is the only key, and Gemini::new() / Gemini::builder() when .model(...) is omitted.

changed abstract login <provider> accepts any provider registered in the cersei-provider registry (Google, Groq, DeepSeek, xAI, Mistral, Together, Fireworks, Perplexity, Cerebras, OpenRouter, Cohere, SambaNova) — previously only anthropic and openai were wired up. Saved keys live in a generic provider_keys map in ~/.abstract/credentials.json and export as the provider's first env var at startup. Local providers (Ollama) report "no login needed".

fixed Auto-default silently picking Ollama. Two changes: (1) registry::available() TCP-probes local providers (env_keys empty) with a 200ms check on their api_base, so abstract login status distinguishes available (local) from not running. (2) from_model_string("auto") skips local providers entirely and only considers keyed providers — Ollama must now be selected explicitly via --model ollama/<model>. No more silent fallback to llama3.1 on machines where the user never opted into Ollama.

fixed abstract login google (and every other registered provider) no longer rejected with "Unknown provider".

0.1.6-patch.12026-04-13

added VibeProxy support. Route requests through local proxy (VibeProxy or compatible) using existing AI subscriptions instead of API keys. --proxy and --proxy-url CLI flags. Auto-detection when no API keys set. /proxy command shows status and authenticated accounts. [proxy] config section.

added Channel-based TUI permissions. Rewrote permission system from stdin-based to mpsc+oneshot channels. TUI renders overlay, user decides, decision flows back — no more stdin race condition or raw mode corruption.

added Virtualized message list. O(viewport_height) rendering instead of O(total_lines). Pre-built committed items cached. Buffer cleared before render. Smooth 60fps scrolling for long conversations.

added Inline diff viewer. Edit/Write/ApplyPatch tools show syntax-highlighted unified diffs inline with ┌─ diff borders. Edit tool auto-captures before/after content.

added Multi-line input. Textarea with word wrapping, dynamic height (1-10 lines). Option+Enter / Ctrl+J / Shift+Enter for newlines. Kitty keyboard protocol support.

added 4 cookbooks — ML Coding Agent, Research Agent, General Agent, Graph Memory. Comparisons page — Cersei vs Claude Code SDK vs Pydantic AI vs LangChain. Code & AST Intelligence docs.

changed Permission overlay: 75% x 55% with padding. Side panel: focus mode with j/k scroll, compact file tree. Git diff: human-readable status labels. Help overlay lists all commands.

fixed TUI permission freeze (stdin race condition). Stale scroll content (buffer clear). Ghostty resize crash (kitty protocol detection). Cost $0 display. Missing slash commands.

0.1.62026-04-12

added New crate: cersei-lsp. On-demand LSP server management with JSON-RPC 2.0 over stdio. 5 operations (hover, definition, references, symbols, diagnostics). 13 built-in server configs. Auto-detection by file extension, lazy startup.

added Tree-sitter code intelligence. Multi-language parsing (Rust, TypeScript/JS, Python, Go) for imports and symbols. Bash command safety analysis with risk classification. Dependency-ranked project intelligence injected into system prompt.

added Production TUI. ratatui alternate screen with tokio::select! at 62 FPS. Side panel (Ctrl+B) with Git Diff + File Tree tabs. 5 permission modes (Shift+Tab). Enterprise theme (AMOLED black). 16 slash commands. Markdown rendering with syntax highlighting. Graph visualization (/graph). Scrolling, paste support, native text selection.

added Parallel tool execution via futures::future::join_all(). Automatic retry with exponential backoff (5 retries, 1s→16s). LLM-based context compaction at 90% usage. Todo nudge injection. Depth nudge for deeper exploration.

added File snapshot/undo — before/after content per tool call with /undo command. ApplyPatch tool for unified diff patching. Shell state persistence via sentinel-based cwd capture.

added GPT-5.x support — gpt-5.3-chat-latest default, gpt-5.3-chat, gpt-5-chat, o3-pro. 1M context. max_completion_tokens for GPT-5/o-series. Per-message cost estimation for 15 models.

added AGENTS.md/CLAUDE.md hierarchy — walks up directory tree for instruction files. File watching via notify crate. 3 themes — Enterprise (default), Light, Solarized.

changed Default OpenAI model: gpt-4o → gpt-5.3-chat-latest. Default theme: Enterprise (AMOLED black). Agent::run_stream() uses Arc<Self> (safe). System prompt rewritten for deep exploration. Glob capped at 200 results. Per-result cap at 30KB. max_turns: 20 → 50.

fixed TUI streaming (no longer blocks). Mid-stream cancellation. OpenAI max_completion_tokens. Token stats/cost display. Git diff with untracked files. Markdown wrapping. Paste handling. Gemini tool result names.

0.1.52026-04-07

added /sessions and /ls slash commands — list sessions directly from the REPL. Closes #9.

added Expanded /help — now shows CLI subcommands alongside REPL commands. Updated model aliases.

added Conditional system prompt components. 23 sections (8 conditional) replacing 6 static constants. New: output efficiency, tool result summarization, sub-agent guidance, skills guidance, memory guidance, context management warning, structured git snapshot, MCP instructions, language preference.

added GitSnapshot struct and new SystemPromptOptions fields (tools_available, has_memory, has_auto_compact, git_status, mcp_instructions, language). 11 new tests.

changed System prompt now includes output efficiency and tool result summarization by default.

changed Git info in prompt upgraded from one-line string to structured snapshot (branch, user, status, recent commits).

0.1.42026-04-06

added Tool primitives (tool_primitives module). 6 sub-modules: diff, fs, process, http, search, git. Low-level async building blocks for custom tools. 26 new tests.

added Built-in tools reference — documentation page with complete input schemas for all 34 tools using TypeTable.

added Tool primitives documentation — overview, full API reference, and cookbook with DiffTool, deploy verifier, research agent, and git-aware code reviewer examples.

added Providers documentation — dedicated page covering all 13 providers with env vars, models, context windows, and usage examples.

changed file_read.rs, file_write.rs, file_edit.rs refactored to delegate to tool_primitives::fs.

changed bash.rs refactored to delegate to tool_primitives::process::exec. ShellState preserved.

changed web_fetch.rs refactored to delegate to tool_primitives::http::fetch_html.

changed grep_tool.rs and glob_tool.rs refactored to delegate to tool_primitives::search. Grep now returns structured SearchMatch results internally.

0.1.32026-04-05

added Session auto-fork. When a session file exceeds 50MB, writes automatically fork to a new part file (_part2.jsonl, _part3.jsonl, etc.). Loading stitches all parts together. Tombstones apply across parts. Total limit: 200MB.

added Multi-part session helpers — all_part_paths() and total_session_size() for inspecting session files.

added Sessions and Tasks documentation — two new doc pages covering the full session lifecycle, auto-compact, memory extraction, auto-dream consolidation, task orchestration, cron scheduling, and git worktree isolation.

changed load_transcript() now loads from all part files and applies tombstones across the combined set.

changed abstract sessions rm removes all part files, not just the base.

fixed Sessions that exceeded 50MB became unloadable. Now they auto-fork before hitting the limit.

0.1.22026-04-04

added Multi-provider model router. 13 providers via provider/model string format. Anthropic, OpenAI, Google, Mistral, Groq, DeepSeek, xAI, Together, Fireworks, Perplexity, Cerebras, Ollama, OpenRouter.

added from_model_string() — parse "groq/llama-3.1-70b-versatile" into a configured provider. Auto-detection from bare model names.

added Provider continuity — interactive model switching on rate limits. Retry, switch to fallback, wait, or skip. History transfers across provider switches.

added --fallback CLI flag and fallback_models config for provider switching.

added OpenAI tool calling — full streaming support. Accumulates delta.tool_calls chunks across SSE events. Previously tool calls were silently dropped.

added AgentBuilder::with_messages() — pre-populate conversation history for provider switching mid-session.

changed Provider resolution replaced with the model router. Model string is the source of truth.

changed REPL owns the Agent (not borrows), enabling mid-session provider swaps.

fixed Grafeo dependency uses crates.io instead of local filesystem paths. cargo install works on any machine.

fixed OpenAI tool calling loop — tool results now serialize correctly as role: "tool" with tool_call_id.

removed Hardcoded local filesystem paths from all Cargo.toml files.

0.1.12026-04-03

added Schema versioning and migration engine. Graph databases store a (:SchemaVersion) node. Auto-detects version on open, runs sequential idempotent migrations.

added Confidence decay. Memory nodes track last_validated_at and decay_rate. Old memories lose weight over time without manual cleanup.

added Embedding readiness. Memory nodes include embedding_model_version preparing for future vector-based semantic recall.

added Centralized GQL queries. All 15+ scattered query strings in graph.rs extracted into a single mod gql block.

added Abstract CLI. Complete coding agent — REPL, streaming renderer, interactive permissions, session management, slash commands, TOML config, graph memory on by default.

added Benchmark suite. run_tool_bench_claude.sh and run_tool_bench_codex.sh for three-way comparison. memory_bench.rs for graph performance.

added Documentation site. 23 pages on fumadocs with API reference, architecture, cookbooks, benchmarks, and interactive charts.

changed GraphMemory::open() auto-migrates on startup. store_memory() writes v2 fields. No API change.

fixed Empty ANTHROPIC_API_KEY environment variable no longer treated as valid auth.

0.1.02026-04-02

added Initial release. Complete Rust SDK for building coding agents.

added 9 crates — cersei (facade), cersei-types, cersei-provider (Anthropic + OpenAI), cersei-tools (34 tools), cersei-tools-derive (proc macro), cersei-agent (builder + agentic loop), cersei-memory (graph + flat files + CLAUDE.md), cersei-hooks (middleware), cersei-mcp (JSON-RPC 2.0).

added Graph memory. Grafeo embedded graph DB. 3 node types, 2 edge types. Recall in 98 microseconds. Graph ON adds zero scan overhead, 92.5% faster recall.

added Agent builder with 20+ options. 26-variant event system. Bidirectional stream control. Auto-compact, effort levels, sub-agents, coordinator mode.

added Session persistence via append-only JSONL with tombstone soft-delete. Compatible with Claude Code format.

added 11 examples and 5 stress test suites. 160 unit tests, 262 stress checks.

Changelog

Changelog

On this page