Introduction
"The SQLite of vector databases — embed it in your application, no server required."
This is article #98 in the Open Source Project of the Day series. Today's project is Zvec — an in-process vector database from Alibaba's Tongyi Lab.
Building a RAG application means dealing with a vector database. The dominant deployment pattern for vector databases is an external service: Pinecone is a cloud service, Milvus/Qdrant/Weaviate are standalone servers. That means maintaining an extra infrastructure component per application, paying network call overhead, carrying operational burden, and ruling out notebooks and edge devices entirely.
Zvec takes the opposite approach: pip install zvec, runs inside your process, no daemon, no network calls, no server configuration. Underneath is Alibaba's Proxima engine — the production vector search infrastructure battle-tested across Alibaba's internal systems.
v0.5.0 (June 2026) added native full-text search and hybrid queries, turning it from a pure vector search tool into a retrieval engine that can replace several infrastructure components at once.
What You'll Learn
- In-process vs. client-server: the design tradeoffs and which scenarios each fits
- Zvec's index types: when to use HNSW vs. FAISS vs. DiskANN vs. sparse vectors
- Hybrid queries in v0.5.0: combining vector similarity + full-text search + scalar filters in one query
- DiskANN: how to fit a billion-vector database into a memory-constrained machine
- Performance data: VectorDBBench QPS numbers and comparisons with Pinecone and others
- Five-language SDK coverage from server-side to mobile
Prerequisites
- Familiarity with vector embeddings and similarity search
- Experience with RAG application development or vector database usage
- Python familiarity; other SDK languages as context
Project Background
What Is Zvec?
Zvec is an open-source in-process vector database — "lightweight, lightning-fast, and designed to embed directly into applications." The underlying engine is Alibaba Proxima, the production vector search infrastructure behind Alibaba's image search and recommendation systems at scale.
In-process is the key phrase. Vector search happens inside the application process — the same way SQLite handles relational data — without a network hop, without an external process, without a service to manage.
Author / Team
- Team: Alibaba Tongyi Lab
- Underlying engine: Alibaba Proxima (production-grade vector search engine)
- License: Apache-2.0
- Latest version: v0.5.0 (June 2026)
Project Stats
- ⭐ GitHub Stars: 10,500+
- 🍴 Forks: 607+
- 📅 First release: December 2025
- 📄 License: Apache-2.0
Core Features
What It Does
Traditional approach (external service):
App process → network request → vector DB server → result returned
↓
latency overhead + operational burden + no edge/mobile
Zvec (in-process):
App process
└── Zvec library (in-process)
├── Vector index (memory / disk)
├── Full-text index (v0.5.0)
└── Scalar filters
Returns results directly, no network overhead
Use Cases
- Local RAG applications: Build on-device RAG systems without cloud service dependencies
- Notebook prototyping: Use in data science workflows without configuring an external service
- Mobile AI: Dart/Flutter SDK supports on-device vector search on Android and iOS
- Production service embedding: Embed vector search directly into Python/Go/Rust services, reducing infrastructure components
- Agent memory: MCP integration (v0.3.0) provides local vector memory for AI agents
Quick Start
Install:
pip install zvec
# With Zvec Studio visual tool
pip install zvec-studio
Basic usage:
import zvec
# Define schema
schema = zvec.CollectionSchema(
name="articles",
vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 1536),
fields=[
zvec.FieldSchema("title", zvec.DataType.STRING),
zvec.FieldSchema("content", zvec.DataType.STRING),
zvec.FieldSchema("score", zvec.DataType.FLOAT),
]
)
# Create and open collection
collection = zvec.create_and_open(path="./my_rag_db", schema=schema)
# Insert documents
collection.insert([
zvec.Doc(
id="doc_1",
vectors={"embedding": [0.1, 0.2, ...]}, # 1536-dim vector
fields={"title": "AI Basics", "content": "...", "score": 9.5}
)
])
# Pure vector search
results = collection.query(
zvec.VectorQuery("embedding", vector=query_embedding),
topk=10
)
Hybrid query (v0.5.0):
# Combine vector similarity + full-text search + scalar filter in one query
results = collection.query(
zvec.MultiQuery(
queries=[
zvec.VectorQuery("embedding", vector=query_embedding, weight=0.7),
zvec.FTSQuery("content", text="machine learning", weight=0.3),
],
filter="score > 8.0", # Scalar filter
fusion=zvec.FusionType.RRF # Reciprocal Rank Fusion
),
topk=10
)
Full-text search:
# Create a full-text index on a field
collection.create_index("content", zvec.IndexType.FULL_TEXT)
# Full-text query
results = collection.query(
zvec.FTSQuery("content", text="vector database"),
topk=10
)
Node.js:
import { createCollection, VectorQuery } from '@zvec/zvec';
const collection = await createCollection({
path: './my_db',
schema: { name: 'docs', vectors: { embedding: { dim: 1536 } } }
});
const results = await collection.query(
new VectorQuery('embedding', queryVector),
{ topk: 10 }
);
Multi-Language SDKs
| Language | Install | Best for |
|---|---|---|
| Python | pip install zvec |
Data science, RAG services, notebooks |
| Node.js | npm install @zvec/zvec |
Web backends, full-stack apps |
| Go | go get github.com/alibaba/zvec-go |
High-performance backend services |
| Rust | crates.io: zvec
|
System-level apps, performance-critical paths |
| Dart/Flutter | flutter pub add zvec |
Android / iOS mobile |
Deep Dive
The In-Process Tradeoff
Zvec's core architectural decision is to abandon the client-server model for in-process embedding. That choice has specific costs and benefits:
Benefits:
- Zero network latency: vector search happens in memory or on local disk, no round trip
- Zero operational burden: no vector database service to deploy, monitor, or upgrade
- Portability: application binary + data files deploy to anything with a filesystem
- Edge and mobile: Dart SDK supports on-device inference on Android and iOS
Costs:
- No sharing a single vector store across multiple services (single-process write, multi-process read-only)
- No cross-network access (this is an architectural property, not a defect)
For most RAG application prototypes, edge deployments, and single-service architectures, the tradeoff is correct. Multi-replica, distributed query scenarios are better served by Milvus or Qdrant.
Index Types
HNSW (Hierarchical Navigable Small World): Default index. Graph-based approximate nearest neighbor search. Best for memory-rich environments that need maximum query speed. Sub-linear query time, longer index build time.
FAISS: Interface to Facebook AI Research's vector search library. Multiple quantization strategies (IVF, PQ) for flexible precision-speed tradeoffs.
DiskANN (new in v0.5.0): Stores the vector index on disk rather than in memory, dramatically reducing RAM requirements. The design comes from Microsoft Research's DiskANN paper. Enables billion-scale vector collections on ordinary servers, at the cost of slightly higher latency than pure in-memory indexes.
Sparse vectors: Stores and retrieves sparse representations from BM25 and SPLADE, useful for keyword-semantic hybrid retrieval.
Hybrid Query Design
v0.5.0's hybrid queries integrate three retrieval modes into a single MultiQuery interface:
MultiQuery
├── VectorQuery (dense vector similarity)
│ └── Semantic similarity matching
├── FTSQuery (full-text search)
│ └── Keyword exact matching
└── ScalarFilter (structured conditions)
└── Dates, scores, categories
Fusion strategies:
├── RRF (Reciprocal Rank Fusion) — recommended, balances multi-source results
└── Weighted — configurable per-source weights
The practical value: real RAG queries are rarely pure vector similarity. Users routinely need "semantically related + contains specific keyword + from the last 30 days." Splitting into three separate queries and manually merging the ranked lists is expensive and produces suboptimal results. MultiQuery + RRF handles all three in one operation.
Performance
On VectorDBBench (Cohere 10M dataset, 768 dimensions, 10 million vectors):
- Zvec measured at 8,000+ QPS, more than 2× the previous leaderboard leader (ZillizCloud)
- Self-reported: approximately 7× faster query throughput than Pinecone
Performance sources:
- SIMD instructions: AVX-512/AVX2 for vector distance computation — 16 float32 values per instruction on modern CPUs
- Multithreading: Both index building and query execution parallelize across cores
- Cache-friendly memory layout: Reduces CPU cache misses during traversal
- RabitQ quantization (v0.3.0): Compressed vector storage with minimal precision loss
Version Velocity
Zvec went from initial release in December 2025 to v0.5.0 in June 2026 — six major versions in six months:
| Version | Date | Key Changes |
|---|---|---|
| v0.1.0 | Dec 2025 | Initial release, HNSW baseline |
| v0.2.0 | Feb 2026 | ARM64 Linux, unified search interface |
| v0.3.0 | Apr 2026 | Windows support, RabitQ quantization, MCP/Agent integration |
| v0.4.0 | May 2026 | Dart/Flutter SDK, Android/iOS support |
| v0.5.0 | Jun 2026 | FTS, hybrid queries, DiskANN, Go/Rust SDKs, Zvec Studio |
The MCP integration in v0.3.0 is worth noting: AI agents can access Zvec vector memory directly through the MCP protocol, with no adapter layer required.
Zvec Studio
pip install zvec-studio installs a visual interface for browsing collections, executing queries, and debugging vector search results — no code required. Useful for RAG application debugging and data exploration.
Links and Resources
Official Resources
- 🌟 GitHub: alibaba/zvec
- 🌐 Website: zvec.org
- 📖 Documentation: zvec.org/en/docs
- 📊 Benchmarks: zvec.org/en/docs/db/benchmarks/
Related Projects
-
Zvec Studio:
pip install zvec-studio(visual management tool) - zvec-go: Go SDK
- zvec-rust: Rust SDK
- Alibaba Proxima: The production vector search engine underlying Zvec
Conclusion
Zvec's positioning is precise: deliver production-grade vector search as an embedded library, eliminating the deployment and operational complexity of vector databases.
The SQLite analogy holds. SQLite turned relational databases from "requires a configured server" into "one file, works anywhere." Zvec does the same for vector search.
v0.5.0's hybrid query interface — combining vector similarity, full-text search, and scalar filtering — has concrete value for RAG applications. Real queries tend to be combinations of all three constraint types. Previously that required three separate systems (vector database + search engine + relational database) and manual result merging. Now it's one operation in an embedded library.
The Dart SDK extending coverage to Android and iOS signals where this is heading: on-device RAG without routing vector search back to a server.
10.5k Stars, six major versions in six months — Zvec is in rapid iteration and worth watching closely.
Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.
Welcome to my Homepage for more useful insights and interesting products.
Top comments (0)