Embeddings Cookbook
Recipes for building semantic search, RAG, and custom providers on top of cersei-embeddings.
Embeddings Cookbook
Practical patterns built on cersei-embeddings.
Semantic search over a folder of Markdown
A self-contained program that indexes every .md file in a directory and answers queries semantically.
use cersei_embeddings::{EmbeddingStore, Metric, OpenAiEmbeddings};
use walkdir::WalkDir;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let provider = OpenAiEmbeddings::from_env()?;
let store = EmbeddingStore::new(provider, Metric::Cosine)?;
// Collect every .md file under ./notes and remember the path for each key.
let mut paths: Vec<String> = Vec::new();
let mut items: Vec<(u64, String)> = Vec::new();
for entry in WalkDir::new("./notes").into_iter().filter_map(|e| e.ok()) {
if entry.path().extension().and_then(|e| e.to_str()) != Some("md") {
continue;
}
let text = std::fs::read_to_string(entry.path())?;
let key = paths.len() as u64;
paths.push(entry.path().display().to_string());
items.push((key, text));
}
store.add_batch(&items).await?;
let hits = store.search("how do I set up CI for rust?", 5).await?;
for hit in hits {
println!("{:.3} {}", hit.similarity, paths[hit.key as usize]);
}
Ok(())
}Minimal RAG agent
Combine cersei-embeddings (retrieval) with cersei (LLM) to ground the model's answer in the most relevant snippets.
use cersei::prelude::*;
use cersei_embeddings::{EmbeddingStore, Metric, OpenAiEmbeddings};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. Build a corpus store.
let store = EmbeddingStore::new(OpenAiEmbeddings::from_env()?, Metric::Cosine)?;
let docs = load_docs(); // Vec<(u64, String)>
store.add_batch(&docs).await?;
// 2. Retrieve the top-K snippets.
let question = "Why does the scheduler reject long-running jobs?";
let hits = store.search(question, 4).await?;
let context = hits
.iter()
.map(|h| docs[h.key as usize].1.clone())
.collect::<Vec<_>>()
.join("\n\n---\n\n");
// 3. Ask the LLM with retrieved context injected.
let prompt = format!(
"Context:\n{context}\n\nQuestion: {question}\n\nAnswer using only the context above."
);
let output = Agent::builder()
.provider(OpenAi::from_env()?)
.system_prompt("You are a precise technical assistant.")
.run_with(&prompt)
.await?;
println!("{}", output.text());
Ok(())
}
fn load_docs() -> Vec<(u64, String)> {
// your loader — reads files, DB rows, API responses, etc.
vec![]
}For a production RAG setup you'll want: (a) chunk your documents into ~200–500 token pieces before embedding, (b) persist the vector index to disk between runs, (c) track both the chunk text and source-document metadata so citations work.
Wrapping retrieval in a custom Tool
If you want the agent itself to decide when to retrieve, expose the store as a Cersei Tool.
use async_trait::async_trait;
use cersei::prelude::*;
use cersei_embeddings::{EmbeddingStore, OpenAiEmbeddings};
use std::sync::Arc;
pub struct RetrieverTool {
store: Arc<EmbeddingStore<OpenAiEmbeddings>>,
docs: Arc<Vec<(u64, String)>>,
}
#[async_trait]
impl Tool for RetrieverTool {
fn name(&self) -> &str { "Retrieve" }
fn description(&self) -> &str {
"Retrieve the top 4 most relevant snippets from the internal corpus for a natural-language query."
}
fn input_schema(&self) -> serde_json::Value {
serde_json::json!({
"type": "object",
"properties": { "query": { "type": "string" } },
"required": ["query"]
})
}
async fn execute(&self, input: serde_json::Value, _ctx: &ToolContext) -> ToolResult {
let query = match input.get("query").and_then(|v| v.as_str()) {
Some(q) => q,
None => return ToolResult::error("missing query"),
};
let hits = match self.store.search(query, 4).await {
Ok(h) => h,
Err(e) => return ToolResult::error(format!("retrieve failed: {e}")),
};
let snippets = hits.iter()
.filter_map(|h| self.docs.get(h.key as usize).map(|(_, t)| t.clone()))
.collect::<Vec<_>>()
.join("\n---\n");
ToolResult::success(snippets)
}
}Plug it into your agent:
let retriever = RetrieverTool { store: store.clone(), docs: docs.clone() };
let mut tools = cersei::tools::all();
tools.push(Box::new(retriever));
let agent = Agent::builder()
.provider(OpenAi::from_env()?)
.tools(tools)
.build()?;Writing a custom provider
Implement the trait for any other API — here is Cohere as an example.
use async_trait::async_trait;
use cersei_embeddings::{EmbeddingError, EmbeddingProvider};
use serde::Deserialize;
pub struct CohereEmbeddings {
api_key: String,
model: String,
client: reqwest::Client,
}
impl CohereEmbeddings {
pub fn from_env() -> Result<Self, EmbeddingError> {
let api_key = std::env::var("COHERE_API_KEY")
.map_err(|_| EmbeddingError::Config("COHERE_API_KEY missing".into()))?;
Ok(Self {
api_key,
model: "embed-english-v3.0".into(),
client: reqwest::Client::new(),
})
}
}
#[async_trait]
impl EmbeddingProvider for CohereEmbeddings {
fn name(&self) -> &str { "cohere" }
fn dimensions(&self) -> usize { 1024 }
async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Vec<f32>>, EmbeddingError> {
let resp = self.client.post("https://api.cohere.com/v1/embed")
.header("Authorization", format!("Bearer {}", self.api_key))
.json(&serde_json::json!({
"model": self.model,
"texts": texts,
"input_type": "search_document",
}))
.send().await?;
if !resp.status().is_success() {
let body = resp.text().await.unwrap_or_default();
return Err(EmbeddingError::Api(format!("cohere: {body}")));
}
#[derive(Deserialize)]
struct R { embeddings: Vec<Vec<f32>> }
let parsed: R = resp.json().await.map_err(|e| EmbeddingError::Parse(e.to_string()))?;
Ok(parsed.embeddings)
}
}The moment your type implements EmbeddingProvider, it composes with VectorIndex and EmbeddingStore just like the built-in providers.
Tuning tips
- Metric choice. Use
Cosinefor text (length-invariant) — the default in Abstract'sCodeSearch. UseL2when magnitude matters. UseInnerProductwhen you need raw dot-product scoring (e.g., your embeddings are already normalized and you want speed). - Batch size. Gemini caps at 100 per request (handled automatically). OpenAI has no hard per-request cap but accepts far more throughput when you keep each request under ~2048 inputs.
- Truncation. The built-in providers truncate each text to 2000 chars by default. If your content is long, chunk before embedding rather than letting truncation silently drop material.
- Index reuse.
VectorIndexis cheap to query but expensive to build. For Abstract'sCodeSearchthis is handled by an in-memory cache keyed on working directory. For your own app, save the vectors to disk (e.g., viabincode) and reload on startup.