WonderLab

Posted on Jun 18

pen Source Project of the Day (#98): Zvec — Alibaba's Embedded Vector Database, the SQLite of Vector Search

#opensource #vectordatabase #ai

Introduction

"The SQLite of vector databases — embed it in your application, no server required."

This is article #98 in the Open Source Project of the Day series. Today's project is Zvec — an in-process vector database from Alibaba's Tongyi Lab.

Building a RAG application means dealing with a vector database. The dominant deployment pattern for vector databases is an external service: Pinecone is a cloud service, Milvus/Qdrant/Weaviate are standalone servers. That means maintaining an extra infrastructure component per application, paying network call overhead, carrying operational burden, and ruling out notebooks and edge devices entirely.

Zvec takes the opposite approach: pip install zvec, runs inside your process, no daemon, no network calls, no server configuration. Underneath is Alibaba's Proxima engine — the production vector search infrastructure battle-tested across Alibaba's internal systems.

v0.5.0 (June 2026) added native full-text search and hybrid queries, turning it from a pure vector search tool into a retrieval engine that can replace several infrastructure components at once.

What You'll Learn

In-process vs. client-server: the design tradeoffs and which scenarios each fits
Zvec's index types: when to use HNSW vs. FAISS vs. DiskANN vs. sparse vectors
Hybrid queries in v0.5.0: combining vector similarity + full-text search + scalar filters in one query
DiskANN: how to fit a billion-vector database into a memory-constrained machine
Performance data: VectorDBBench QPS numbers and comparisons with Pinecone and others
Five-language SDK coverage from server-side to mobile

Prerequisites

Familiarity with vector embeddings and similarity search
Experience with RAG application development or vector database usage
Python familiarity; other SDK languages as context

Project Background

What Is Zvec?

Zvec is an open-source in-process vector database — "lightweight, lightning-fast, and designed to embed directly into applications." The underlying engine is Alibaba Proxima, the production vector search infrastructure behind Alibaba's image search and recommendation systems at scale.

In-process is the key phrase. Vector search happens inside the application process — the same way SQLite handles relational data — without a network hop, without an external process, without a service to manage.

Author / Team

Team: Alibaba Tongyi Lab
Underlying engine: Alibaba Proxima (production-grade vector search engine)
License: Apache-2.0
Latest version: v0.5.0 (June 2026)

Project Stats

⭐ GitHub Stars: 10,500+
🍴 Forks: 607+
📅 First release: December 2025
📄 License: Apache-2.0

Core Features

What It Does

Traditional approach (external service):
App process → network request → vector DB server → result returned
                  ↓
           latency overhead + operational burden + no edge/mobile

Zvec (in-process):
App process
  └── Zvec library (in-process)
        ├── Vector index (memory / disk)
        ├── Full-text index (v0.5.0)
        └── Scalar filters
  Returns results directly, no network overhead

Use Cases

Local RAG applications: Build on-device RAG systems without cloud service dependencies
Notebook prototyping: Use in data science workflows without configuring an external service
Mobile AI: Dart/Flutter SDK supports on-device vector search on Android and iOS
Production service embedding: Embed vector search directly into Python/Go/Rust services, reducing infrastructure components
Agent memory: MCP integration (v0.3.0) provides local vector memory for AI agents

Quick Start

Install:

pip install zvec

# With Zvec Studio visual tool
pip install zvec-studio

Basic usage:

import zvec

# Define schema
schema = zvec.CollectionSchema(
    name="articles",
    vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 1536),
    fields=[
        zvec.FieldSchema("title", zvec.DataType.STRING),
        zvec.FieldSchema("content", zvec.DataType.STRING),
        zvec.FieldSchema("score", zvec.DataType.FLOAT),
    ]
)

# Create and open collection
collection = zvec.create_and_open(path="./my_rag_db", schema=schema)

# Insert documents
collection.insert([
    zvec.Doc(
        id="doc_1",
        vectors={"embedding": [0.1, 0.2, ...]},  # 1536-dim vector
        fields={"title": "AI Basics", "content": "...", "score": 9.5}
    )
])

# Pure vector search
results = collection.query(
    zvec.VectorQuery("embedding", vector=query_embedding),
    topk=10
)

Hybrid query (v0.5.0):

# Combine vector similarity + full-text search + scalar filter in one query
results = collection.query(
    zvec.MultiQuery(
        queries=[
            zvec.VectorQuery("embedding", vector=query_embedding, weight=0.7),
            zvec.FTSQuery("content", text="machine learning", weight=0.3),
        ],
        filter="score > 8.0",      # Scalar filter
        fusion=zvec.FusionType.RRF  # Reciprocal Rank Fusion
    ),
    topk=10
)

Full-text search:

# Create a full-text index on a field
collection.create_index("content", zvec.IndexType.FULL_TEXT)

# Full-text query
results = collection.query(
    zvec.FTSQuery("content", text="vector database"),
    topk=10
)

Node.js:

import { createCollection, VectorQuery } from '@zvec/zvec';

const collection = await createCollection({
  path: './my_db',
  schema: { name: 'docs', vectors: { embedding: { dim: 1536 } } }
});

const results = await collection.query(
  new VectorQuery('embedding', queryVector),
  { topk: 10 }
);

Multi-Language SDKs

Language	Install	Best for
Python	`pip install zvec`	Data science, RAG services, notebooks
Node.js	`npm install @zvec/zvec`	Web backends, full-stack apps
Go	`go get github.com/alibaba/zvec-go`	High-performance backend services
Rust	crates.io: `zvec`	System-level apps, performance-critical paths
Dart/Flutter	`flutter pub add zvec`	Android / iOS mobile

Deep Dive

The In-Process Tradeoff

Zvec's core architectural decision is to abandon the client-server model for in-process embedding. That choice has specific costs and benefits:

Benefits:

Zero network latency: vector search happens in memory or on local disk, no round trip
Zero operational burden: no vector database service to deploy, monitor, or upgrade
Portability: application binary + data files deploy to anything with a filesystem
Edge and mobile: Dart SDK supports on-device inference on Android and iOS

Costs:

No sharing a single vector store across multiple services (single-process write, multi-process read-only)
No cross-network access (this is an architectural property, not a defect)

For most RAG application prototypes, edge deployments, and single-service architectures, the tradeoff is correct. Multi-replica, distributed query scenarios are better served by Milvus or Qdrant.

Index Types

HNSW (Hierarchical Navigable Small World): Default index. Graph-based approximate nearest neighbor search. Best for memory-rich environments that need maximum query speed. Sub-linear query time, longer index build time.

FAISS: Interface to Facebook AI Research's vector search library. Multiple quantization strategies (IVF, PQ) for flexible precision-speed tradeoffs.

DiskANN (new in v0.5.0): Stores the vector index on disk rather than in memory, dramatically reducing RAM requirements. The design comes from Microsoft Research's DiskANN paper. Enables billion-scale vector collections on ordinary servers, at the cost of slightly higher latency than pure in-memory indexes.

Sparse vectors: Stores and retrieves sparse representations from BM25 and SPLADE, useful for keyword-semantic hybrid retrieval.

Hybrid Query Design

v0.5.0's hybrid queries integrate three retrieval modes into a single MultiQuery interface:

MultiQuery
    ├── VectorQuery (dense vector similarity)
    │   └── Semantic similarity matching
    ├── FTSQuery (full-text search)
    │   └── Keyword exact matching
    └── ScalarFilter (structured conditions)
        └── Dates, scores, categories

Fusion strategies:
    ├── RRF (Reciprocal Rank Fusion) — recommended, balances multi-source results
    └── Weighted — configurable per-source weights

The practical value: real RAG queries are rarely pure vector similarity. Users routinely need "semantically related + contains specific keyword + from the last 30 days." Splitting into three separate queries and manually merging the ranked lists is expensive and produces suboptimal results. MultiQuery + RRF handles all three in one operation.

Performance

On VectorDBBench (Cohere 10M dataset, 768 dimensions, 10 million vectors):

Zvec measured at 8,000+ QPS, more than 2× the previous leaderboard leader (ZillizCloud)
Self-reported: approximately 7× faster query throughput than Pinecone

Performance sources:

SIMD instructions: AVX-512/AVX2 for vector distance computation — 16 float32 values per instruction on modern CPUs
Multithreading: Both index building and query execution parallelize across cores
Cache-friendly memory layout: Reduces CPU cache misses during traversal
RabitQ quantization (v0.3.0): Compressed vector storage with minimal precision loss

Version Velocity

Zvec went from initial release in December 2025 to v0.5.0 in June 2026 — six major versions in six months:

Version	Date	Key Changes
v0.1.0	Dec 2025	Initial release, HNSW baseline
v0.2.0	Feb 2026	ARM64 Linux, unified search interface
v0.3.0	Apr 2026	Windows support, RabitQ quantization, MCP/Agent integration
v0.4.0	May 2026	Dart/Flutter SDK, Android/iOS support
v0.5.0	Jun 2026	FTS, hybrid queries, DiskANN, Go/Rust SDKs, Zvec Studio

The MCP integration in v0.3.0 is worth noting: AI agents can access Zvec vector memory directly through the MCP protocol, with no adapter layer required.

Zvec Studio

pip install zvec-studio installs a visual interface for browsing collections, executing queries, and debugging vector search results — no code required. Useful for RAG application debugging and data exploration.

Links and Resources

Official Resources

🌟 GitHub: alibaba/zvec
🌐 Website: zvec.org
📖 Documentation: zvec.org/en/docs
📊 Benchmarks: zvec.org/en/docs/db/benchmarks/

Related Projects

Zvec Studio: pip install zvec-studio (visual management tool)
zvec-go: Go SDK
zvec-rust: Rust SDK
Alibaba Proxima: The production vector search engine underlying Zvec

Conclusion

Zvec's positioning is precise: deliver production-grade vector search as an embedded library, eliminating the deployment and operational complexity of vector databases.

The SQLite analogy holds. SQLite turned relational databases from "requires a configured server" into "one file, works anywhere." Zvec does the same for vector search.

v0.5.0's hybrid query interface — combining vector similarity, full-text search, and scalar filtering — has concrete value for RAG applications. Real queries tend to be combinations of all three constraint types. Previously that required three separate systems (vector database + search engine + relational database) and manual result merging. Now it's one operation in an embedded library.

The Dart SDK extending coverage to Android and iOS signals where this is heading: on-device RAG without routing vector search back to a server.

10.5k Stars, six major versions in six months — Zvec is in rapid iteration and worth watching closely.

Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.

Welcome to my Homepage for more useful insights and interesting products.

DEV Community