FerresDB delivers sub-millisecond vector search with hybrid BM25 retrieval, native Cross-Encoder re-ranking, graph exploration, advanced quantization (SQ8 + QJL + PolarQuant), gRPC streaming, tiered storage, Point-in-Time Recovery, and enterprise-grade RBAC — all powered by Rust for uncompromising performance in RAG, semantic search, knowledge graphs, and recommendation systems.
From RAG pipelines to real-time recommendations, FerresDB powers the most demanding vector workloads.
Transform user queries into meaning-based results with Cosine, Euclidean, or Dot Product similarity. Combine with metadata filters for precision retrieval.
Hybrid vector + BM25 search with weighted or RRF fusion, plus native Cross-Encoder re-ranking via ONNX. Ground your LLM responses with the most relevant context.
Real-time similarity matching with WebSocket streaming. Dot Product distance optimized for recommendation models. Auto-batching up to 1000 points/request.
Store graph edges alongside vectors. Traverse connected points via BFS, query subgraphs, and combine vector similarity with graph proximity for richer results.
A complete vector database with enterprise-grade features, built from the ground up in Rust for maximum performance and reliability.
P50 search at 100-500μs, P95 at 200-1000μs. No GC pauses — Rust delivers predictable, low-latency execution with zero runtime overhead.
Combine dense vector search with BM25 text retrieval using weighted fusion or Reciprocal Rank Fusion (RRF). Tunable alpha parameter for precision control.
REST API for simplicity, gRPC with bidirectional streaming for high-throughput, WebSocket for real-time applications, and MCP (Model Context Protocol) via STDIO for Claude Desktop and other AI assistants. All protocols run in parallel.
Automatically move vectors between RAM (Hot), memory-mapped (Warm), and disk (Cold) tiers based on access frequency. HNSW graph stays in memory for speed.
Role-based access control with Admin, Editor, and Viewer roles. Granular per-collection permissions with metadata restrictions. Daily-rotated audit logs.
Rebuild HNSW indexes in the background. Searches continue on the old index until the new one is ready. Auto-triggers when tombstones exceed 20%.
Write-Ahead Log with periodic snapshots every 1000 ops. Automatic crash recovery replays the WAL from the latest snapshot. Auto-save every 30 seconds.
Prometheus metrics endpoint, query profiling with /search/explain, slow query tracking, cost estimation with budget_ms, and a built-in web dashboard.
Fully-typed TypeScript SDK with Zod validation and WebSocket support. Async Python SDK with httpx. Both feature auto-retry, auto-batching, and structured logging.
Native point-level graph: store relations between documents, traverse subgraphs via BFS, and combine graph proximity with vector similarity. Ideal for knowledge graphs and connected recommendations.
SQ8 scalar quantization (4× memory reduction), QJL residual correction to reduce quantization bias and improve recall@10, and PolarQuant — a calibration-free polar-coordinate encoding.
Dynamically adjusts ef_search every 60 seconds based on real-time P95 latency and CPU load. Increases recall when latency budget permits; backs off under pressure. Zero config required.
Every layer of FerresDB is designed for performance, safety, and operational excellence.
REST, gRPC (port 50051), WebSocket — all running in parallel
RBAC with Admin/Editor/Viewer roles, per-collection permissions
Cosine/Euclidean/DotProduct metrics, metadata filters, hybrid fusion
Hot (RAM) / Warm (mmap) / Cold (disk), auto-save every 30s
Metrics, query profiling, slow queries, daily audit trail (JSONL)
The Hierarchical Navigable Small World index is tuned for an optimal balance of speed and recall.
mMax connections per layeref_constructionIndex build qualityef_searchQuery search widthBenchmarked with Criterion.rs — real numbers, not marketing claims
4× memory reduction with SQ8. QJL residual correction maintains recall without extra RAM.
No GC pauses, zero-cost abstractions, memory safety without runtime overhead. Compiled to native machine code.
Multi-layer graph with O(log N) search complexity. Optimized for high recall with configurable ef_search.
Thread-safe design with parallelized batch operations. Ready for multi-threaded servers and concurrent requests.
Optional caching for repeated queries. On startup, the server replays recent queries from query log to warm the index and cache automatically.
Distance kernels use AVX2 (8× f32) and SSE4.1 (4× f32) with runtime dispatch. Asymmetric SQ8 distance (f32×u8) also SIMD-accelerated.
ef_search is adjusted every 60 s using P95 latency as a proxy. Automatically increases recall when bandwidth is available, backs off under load.
See how FerresDB compares to conventional vector databases
| Aspect | FerresDB | Others |
|---|---|---|
| Language | Pure Rust — zero GC, native performance | Python, Go, or Java with GC overhead |
| Search Latency | P50: 100–500μs (sub-millisecond) | Typically 1–50ms per query |
| Search Types | Vector + BM25 hybrid (weighted & RRF fusion) | Often vector-only focus |
| Protocols | REST + gRPC (streaming) + WebSocket | Usually REST or gRPC only |
| Storage | WAL + snapshots + tiered (Hot/Warm/Cold) | Not all offer WAL + crash recovery |
| Security | RBAC + API Keys + JWT + Audit Trail | Varies — often basic API keys only |
| Deployment | Single Docker container, no cloud lock-in | Many are managed-only or heavier |
| Observability | Prometheus + query profiling + dashboard | Depends on the product |
| Quantization | SQ8 (4× mem) + QJL residual + PolarQuant — all opt-in | Rarely built-in; often external preprocessing |
| Graph Support | Native point graph with BFS traversal + subgraph API | Not a standard feature |
| Disaster Recovery | WAL + Snapshots + PITR (restore to any past timestamp) | Basic snapshots at best |
Deploy the full stack with Docker Compose or run individual containers
Recommended — runs Backend + Dashboard together
# 1. Pull both images
docker pull ferresdb/ferres-db-core:latest
docker pull ferresdb/ferres-db-frontend:latest
# 2. Run the backend
docker run -d -p 8080:8080 \
-e FERRESDB_API_KEYS=sk-your-key \
-e CORS_ORIGINS=http://localhost:3000 \
-v ferres-data:/data \
ferresdb/ferres-db-core:latest
# 3. Run the dashboard
docker run -d -p 3000:80 \
-e VITE_API_BASE_URL=http://localhost:8080 \
-e VITE_API_KEY=sk-your-key \
ferresdb/ferres-db-frontend:latestpnpm add @ferresdb/typescript-sdkpip install ferres-db-pythonFrom zero to vector search in under 10 lines of code
import { VectorDBClient, DistanceMetric } from "@ferresdb/typescript-sdk";
// Initialize client with auto-retry and timeout
const client = new VectorDBClient({{
baseUrl: "http://localhost:8080",
apiKey: "ferres_sk_...",
maxRetries: 3,
});
// Create a collection with hybrid search enabled
await client.createCollection({{
name: "documents",
dimension: 384,
distance: DistanceMetric.Cosine,
enable_bm25: true,
});
// Upsert vectors with metadata (auto-batches > 1000)
await client.upsertPoints("documents", [
{ id: "doc-1", vector: [0.1, 0.2, ...], metadata: { text: "Hello" } },
]);
// Hybrid search: vector + BM25 with weighted fusion
const results = await client.hybridSearch("documents", {
query_text: "how to deploy",
query_vector: [0.1, 0.2, ...],
limit: 5,
alpha: 0.5, // 0 = BM25 only, 1 = vector only
});Security, compliance, and operational features built-in — not bolted on.
API Keys (SHA-256 hashed, stored in SQLite) for programmatic access. JWT tokens (Argon2 passwords) for dashboard sessions.
Admin, Editor, Viewer roles with per-collection permissions. Restrict access to specific metadata fields and allowed values.
Every action logged: searches, mutations, logins, user management. Daily-rotated JSONL files with user, IP, duration, and result.
Use /search/explain to understand query execution. /search/estimate for cost prediction. Slow query tracking for optimization.
Rich filter operators: $eq, $ne, $in, $gt, $lt, $gte, $lte. Combine with vector search for precise, scoped retrieval.
Set budget_ms on any search query. Automatically fails with 422 if the latency budget is exceeded — perfect for SLA enforcement.
Every WAL entry is timestamped. Restore any collection (or all) to an exact past moment via POST /admin/restore. Browse available timestamps before committing.
One-click snapshot export to AWS S3 (or any S3-compatible endpoint). Configure region, bucket, and credentials via config.toml or environment variables.
Foundation for multi-node deployments: optional openraft integration replicates WAL to a majority of nodes before confirming writes. Build with --features raft.
With namespace_physical_isolation, each tenant's data lives in a separate directory. Enables per-namespace snapshots and clean tenant offboarding without affecting others.
Join developers building the next generation of AI applications with FerresDB. Self-hosted, no cloud lock-in.