Built with Rust — Zero GC Pauses

The Fastest
Vector Search Engine

FerresDB delivers sub-millisecond vector search with hybrid BM25 retrieval, native Cross-Encoder re-ranking, graph exploration, advanced quantization (SQ8 + QJL + PolarQuant), gRPC streaming, tiered storage, Point-in-Time Recovery, and enterprise-grade RBAC — all powered by Rust for uncompromising performance in RAG, semantic search, knowledge graphs, and recommendation systems.

<500μs
P50 Search Latency
Sub-millisecond
50K+
Vectors/Second
Indexing throughput
4
REST, gRPC, WS, MCP
Multi-protocol
HNSW
ANN Algorithm
High recall rate
Use Cases

Built for AI-Native Applications

From RAG pipelines to real-time recommendations, FerresDB powers the most demanding vector workloads.

Semantic Search

Transform user queries into meaning-based results with Cosine, Euclidean, or Dot Product similarity. Combine with metadata filters for precision retrieval.

Vector Search
Metadata Filters
Budget-aware

RAG Pipelines

Hybrid vector + BM25 search with weighted or RRF fusion, plus native Cross-Encoder re-ranking via ONNX. Ground your LLM responses with the most relevant context.

Hybrid Search
Cross-Encoder
RRF & Weighted

Recommendations

Real-time similarity matching with WebSocket streaming. Dot Product distance optimized for recommendation models. Auto-batching up to 1000 points/request.

WebSocket
Dot Product
Auto-batch

Knowledge Graphs

Store graph edges alongside vectors. Traverse connected points via BFS, query subgraphs, and combine vector similarity with graph proximity for richer results.

Graph Traversal
BFS Subgraph
Relations API
Core Features

Everything You Need, Nothing You Don't

A complete vector database with enterprise-grade features, built from the ground up in Rust for maximum performance and reliability.

Sub-Millisecond Latency

P50 search at 100-500μs, P95 at 200-1000μs. No GC pauses — Rust delivers predictable, low-latency execution with zero runtime overhead.

Hybrid Vector + BM25

Combine dense vector search with BM25 text retrieval using weighted fusion or Reciprocal Rank Fusion (RRF). Tunable alpha parameter for precision control.

Multi-Protocol: REST, gRPC, WebSocket, MCP

REST API for simplicity, gRPC with bidirectional streaming for high-throughput, WebSocket for real-time applications, and MCP (Model Context Protocol) via STDIO for Claude Desktop and other AI assistants. All protocols run in parallel.

Tiered Storage (Hot/Warm/Cold)

Automatically move vectors between RAM (Hot), memory-mapped (Warm), and disk (Cold) tiers based on access frequency. HNSW graph stays in memory for speed.

RBAC + Audit Trail

Role-based access control with Admin, Editor, and Viewer roles. Granular per-collection permissions with metadata restrictions. Daily-rotated audit logs.

Zero-Downtime Reindex

Rebuild HNSW indexes in the background. Searches continue on the old index until the new one is ready. Auto-triggers when tombstones exceed 20%.

WAL + Snapshots

Write-Ahead Log with periodic snapshots every 1000 ops. Automatic crash recovery replays the WAL from the latest snapshot. Auto-save every 30 seconds.

Full Observability

Prometheus metrics endpoint, query profiling with /search/explain, slow query tracking, cost estimation with budget_ms, and a built-in web dashboard.

Official TypeScript & Python SDKs

Fully-typed TypeScript SDK with Zod validation and WebSocket support. Async Python SDK with httpx. Both feature auto-retry, auto-batching, and structured logging.

Graph Exploration

Native point-level graph: store relations between documents, traverse subgraphs via BFS, and combine graph proximity with vector similarity. Ideal for knowledge graphs and connected recommendations.

Advanced Quantization

SQ8 scalar quantization (4× memory reduction), QJL residual correction to reduce quantization bias and improve recall@10, and PolarQuant — a calibration-free polar-coordinate encoding.

HNSW Auto-Tuning (FerresEngine)

Dynamically adjusts ef_search every 60 seconds based on real-time P95 latency and CPU load. Increases recall when latency budget permits; backs off under pressure. Zero config required.

Architecture

Engineered for Production

Every layer of FerresDB is designed for performance, safety, and operational excellence.

System Layers

API LayerActix-Web + Tonic gRPC

REST, gRPC (port 50051), WebSocket — all running in parallel

Auth LayerAPI Keys (SHA-256) + JWT (Argon2)

RBAC with Admin/Editor/Viewer roles, per-collection permissions

Search EngineHNSW + BM25 + LRU Cache

Cosine/Euclidean/DotProduct metrics, metadata filters, hybrid fusion

Storage EngineWAL + Snapshots + Tiered Storage

Hot (RAM) / Warm (mmap) / Cold (disk), auto-save every 30s

ObservabilityPrometheus + OpenTelemetry

Metrics, query profiling, slow queries, daily audit trail (JSONL)

HNSW Parameters

The Hierarchical Navigable Small World index is tuned for an optimal balance of speed and recall.

mMax connections per layer
16
ef_constructionIndex build quality
200
ef_searchQuery search width
50

Storage Layout

{STORAGE_PATH}/
├── collections/
├── points.jsonl# Current state
├── wal.jsonl# Write-ahead log
├── snapshot.jsonl# Every 1000 ops
└── index.bin# HNSW index
├── api_keys.db# SHA-256 hashed
├── users.db# Argon2 passwords
└── logs/
└── audit-*.jsonl# Daily rotation
Benchmarks

Performance That Speaks for Itself

Benchmarked with Criterion.rs — real numbers, not marketing claims

Indexing Throughput

1K vectors(Small)
50K–100Kpts/s
10K vectors(Medium)
30K–60Kpts/s
100K vectors(Large)
20K–40Kpts/s

Search Latency

P50(Median)
100–500μs
P95(95th)
200–1000μs
P99(99th)
500–2000μs

Memory Savings (SQ8)

f32 baseline100%
SQ8 compressed25%
SQ8 + QJL recall@10≥ 90%
PolarQuant (no calib.)25–30%

4× memory reduction with SQ8. QJL residual correction maintains recall without extra RAM.

Why FerresDB is Fast

Rust Foundation

No GC pauses, zero-cost abstractions, memory safety without runtime overhead. Compiled to native machine code.

HNSW Algorithm

Multi-layer graph with O(log N) search complexity. Optimized for high recall with configurable ef_search.

Parallel with Rayon

Thread-safe design with parallelized batch operations. Ready for multi-threaded servers and concurrent requests.

LRU Search Cache + Warmup

Optional caching for repeated queries. On startup, the server replays recent queries from query log to warm the index and cache automatically.

SIMD (AVX2 / SSE4.1)

Distance kernels use AVX2 (8× f32) and SSE4.1 (4× f32) with runtime dispatch. Asymmetric SQ8 distance (f32×u8) also SIMD-accelerated.

FerresEngine Auto-Tuning

ef_search is adjusted every 60 s using P95 latency as a proxy. Automatically increases recall when bandwidth is available, backs off under load.

FerresDB vs The Rest

See how FerresDB compares to conventional vector databases

AspectFerresDBOthers
LanguagePure Rust — zero GC, native performancePython, Go, or Java with GC overhead
Search LatencyP50: 100–500μs (sub-millisecond)Typically 1–50ms per query
Search TypesVector + BM25 hybrid (weighted & RRF fusion)Often vector-only focus
ProtocolsREST + gRPC (streaming) + WebSocketUsually REST or gRPC only
StorageWAL + snapshots + tiered (Hot/Warm/Cold)Not all offer WAL + crash recovery
SecurityRBAC + API Keys + JWT + Audit TrailVaries — often basic API keys only
DeploymentSingle Docker container, no cloud lock-inMany are managed-only or heavier
ObservabilityPrometheus + query profiling + dashboardDepends on the product
QuantizationSQ8 (4× mem) + QJL residual + PolarQuant — all opt-inRarely built-in; often external preprocessing
Graph SupportNative point graph with BFS traversal + subgraph APINot a standard feature
Disaster RecoveryWAL + Snapshots + PITR (restore to any past timestamp)Basic snapshots at best
Quick Start

Up and Running in 60 Seconds

Deploy the full stack with Docker Compose or run individual containers

Docker Compose

Recommended — runs Backend + Dashboard together

Recommended
terminal
# 1. Pull both images
docker pull ferresdb/ferres-db-core:latest
docker pull ferresdb/ferres-db-frontend:latest

# 2. Run the backend
docker run -d -p 8080:8080 \
  -e FERRESDB_API_KEYS=sk-your-key \
  -e CORS_ORIGINS=http://localhost:3000 \
  -v ferres-data:/data \
  ferresdb/ferres-db-core:latest

# 3. Run the dashboard
docker run -d -p 3000:80 \
  -e VITE_API_BASE_URL=http://localhost:8080 \
  -e VITE_API_KEY=sk-your-key \
  ferresdb/ferres-db-frontend:latest
API: http://localhost:8080
Dashboard: http://localhost:3000

Install an SDK

TypeScript

npm
pnpm add @ferresdb/typescript-sdk
Full type safety + Zod validation
WebSocket support + auto-retry
ESM & CJS exports

Python

PyPI
pip install ferres-db-python
AsyncIO with httpx
Auto-batching + structured logs
Python 3.8+ support
Developer Experience

Simple, Powerful API

From zero to vector search in under 10 lines of code

example.ts — TypeScript SDK
import { VectorDBClient, DistanceMetric } from "@ferresdb/typescript-sdk";

// Initialize client with auto-retry and timeout
const client = new VectorDBClient({{
  baseUrl: "http://localhost:8080",
  apiKey: "ferres_sk_...",
  maxRetries: 3,
});

// Create a collection with hybrid search enabled
await client.createCollection({{
  name: "documents",
  dimension: 384,
  distance: DistanceMetric.Cosine,
  enable_bm25: true,
});

// Upsert vectors with metadata (auto-batches > 1000)
await client.upsertPoints("documents", [
  { id: "doc-1", vector: [0.1, 0.2, ...], metadata: { text: "Hello" } },
]);

// Hybrid search: vector + BM25 with weighted fusion
const results = await client.hybridSearch("documents", {
  query_text: "how to deploy",
  query_vector: [0.1, 0.2, ...],
  limit: 5,
  alpha: 0.5, // 0 = BM25 only, 1 = vector only
});
Enterprise Ready

Production-Grade from Day One

Security, compliance, and operational features built-in — not bolted on.

Dual Authentication

API Keys (SHA-256 hashed, stored in SQLite) for programmatic access. JWT tokens (Argon2 passwords) for dashboard sessions.

Granular RBAC

Admin, Editor, Viewer roles with per-collection permissions. Restrict access to specific metadata fields and allowed values.

Audit Trail

Every action logged: searches, mutations, logins, user management. Daily-rotated JSONL files with user, IP, duration, and result.

Query Profiling

Use /search/explain to understand query execution. /search/estimate for cost prediction. Slow query tracking for optimization.

Metadata Filters

Rich filter operators: $eq, $ne, $in, $gt, $lt, $gte, $lte. Combine with vector search for precise, scoped retrieval.

Budget-Aware Search

Set budget_ms on any search query. Automatically fails with 422 if the latency budget is exceeded — perfect for SLA enforcement.

Point-in-Time Recovery

Every WAL entry is timestamped. Restore any collection (or all) to an exact past moment via POST /admin/restore. Browse available timestamps before committing.

S3 Cloud Backup

One-click snapshot export to AWS S3 (or any S3-compatible endpoint). Configure region, bucket, and credentials via config.toml or environment variables.

Distributed Consensus (Raft)

Foundation for multi-node deployments: optional openraft integration replicates WAL to a majority of nodes before confirming writes. Build with --features raft.

Namespace Physical Isolation

With namespace_physical_isolation, each tenant's data lives in a separate directory. Enables per-namespace snapshots and clean tenant offboarding without affecting others.

Community

Support us on Product Hunt

FerresDB is on Product Hunt. Your upvote and feedback help us reach more developers building AI applications. Click the badge below to visit our page.

FerresDB - High-performance Vector Search Engine built in Rust | Product Hunt

Ready for Blazing-Fast Vector Search?

Join developers building the next generation of AI applications with FerresDB. Self-hosted, no cloud lock-in.