Abstract CLI Benchmarks

Measures framework overhead, not model speed. Apple Silicon, release build.

Startup

Metric	Abstract	Notes
`--version` (avg, 50 runs)	32ms	Includes Python subprocess overhead (~20ms)
`--version` (native)	~8ms	Direct wall-clock
`--help`	21ms

Metric	Value
Binary size	6.0 MB
Peak RSS (`--help`)	4.9 MB

20 iterations each:

All subcommands complete in ~21ms — process startup dominates. The operation itself adds under 1ms.

Full round-trip: CLI startup + config + system prompt + API call + streaming + rendering.

Using OpenAI gpt-4o:

Estimated overhead breakdown:

Phase	Time
Process startup	~8ms
Config + memory context	~2ms
System prompt assembly	~1ms
Tool definition serialization	~1ms
HTTP connection + TLS	~50-100ms
Framework overhead	~60-110ms

Everything else is network + model inference.

10 consecutive prompts, compared across all three tools:

Abstract processes 10 prompts in 16 seconds. Codex takes 42 seconds. Claude Code takes 2 minutes.

./run_tool_bench_claude.sh --iterations 20 --full
./run_tool_bench_codex.sh --iterations 20 --full