DEV Community

Cover image for pen Source Project of the Day (#98): Zvec — Alibaba's Embedded Vector Database, the SQLite of Vector Search
WonderLab
WonderLab

Posted on

pen Source Project of the Day (#98): Zvec — Alibaba's Embedded Vector Database, the SQLite of Vector Search

Introduction

"The SQLite of vector databases — embed it in your application, no server required."

This is article #98 in the Open Source Project of the Day series. Today's project is Zvec — an in-process vector database from Alibaba's Tongyi Lab.

Building a RAG application means dealing with a vector database. The dominant deployment pattern for vector databases is an external service: Pinecone is a cloud service, Milvus/Qdrant/Weaviate are standalone servers. That means maintaining an extra infrastructure component per application, paying network call overhead, carrying operational burden, and ruling out notebooks and edge devices entirely.

Zvec takes the opposite approach: pip install zvec, runs inside your process, no daemon, no network calls, no server configuration. Underneath is Alibaba's Proxima engine — the production vector search infrastructure battle-tested across Alibaba's internal systems.

v0.5.0 (June 2026) added native full-text search and hybrid queries, turning it from a pure vector search tool into a retrieval engine that can replace several infrastructure components at once.

What You'll Learn

  • In-process vs. client-server: the design tradeoffs and which scenarios each fits
  • Zvec's index types: when to use HNSW vs. FAISS vs. DiskANN vs. sparse vectors
  • Hybrid queries in v0.5.0: combining vector similarity + full-text search + scalar filters in one query
  • DiskANN: how to fit a billion-vector database into a memory-constrained machine
  • Performance data: VectorDBBench QPS numbers and comparisons with Pinecone and others
  • Five-language SDK coverage from server-side to mobile

Prerequisites

  • Familiarity with vector embeddings and similarity search
  • Experience with RAG application development or vector database usage
  • Python familiarity; other SDK languages as context

Project Background

What Is Zvec?

Zvec is an open-source in-process vector database — "lightweight, lightning-fast, and designed to embed directly into applications." The underlying engine is Alibaba Proxima, the production vector search infrastructure behind Alibaba's image search and recommendation systems at scale.

In-process is the key phrase. Vector search happens inside the application process — the same way SQLite handles relational data — without a network hop, without an external process, without a service to manage.

Author / Team

  • Team: Alibaba Tongyi Lab
  • Underlying engine: Alibaba Proxima (production-grade vector search engine)
  • License: Apache-2.0
  • Latest version: v0.5.0 (June 2026)

Project Stats

  • ⭐ GitHub Stars: 10,500+
  • 🍴 Forks: 607+
  • 📅 First release: December 2025
  • 📄 License: Apache-2.0

Core Features

What It Does

Traditional approach (external service):
App process → network request → vector DB server → result returned
                  ↓
           latency overhead + operational burden + no edge/mobile

Zvec (in-process):
App process
  └── Zvec library (in-process)
        ├── Vector index (memory / disk)
        ├── Full-text index (v0.5.0)
        └── Scalar filters
  Returns results directly, no network overhead
Enter fullscreen mode Exit fullscreen mode

Use Cases

  1. Local RAG applications: Build on-device RAG systems without cloud service dependencies
  2. Notebook prototyping: Use in data science workflows without configuring an external service
  3. Mobile AI: Dart/Flutter SDK supports on-device vector search on Android and iOS
  4. Production service embedding: Embed vector search directly into Python/Go/Rust services, reducing infrastructure components
  5. Agent memory: MCP integration (v0.3.0) provides local vector memory for AI agents

Quick Start

Install:

pip install zvec

# With Zvec Studio visual tool
pip install zvec-studio
Enter fullscreen mode Exit fullscreen mode

Basic usage:

import zvec

# Define schema
schema = zvec.CollectionSchema(
    name="articles",
    vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 1536),
    fields=[
        zvec.FieldSchema("title", zvec.DataType.STRING),
        zvec.FieldSchema("content", zvec.DataType.STRING),
        zvec.FieldSchema("score", zvec.DataType.FLOAT),
    ]
)

# Create and open collection
collection = zvec.create_and_open(path="./my_rag_db", schema=schema)

# Insert documents
collection.insert([
    zvec.Doc(
        id="doc_1",
        vectors={"embedding": [0.1, 0.2, ...]},  # 1536-dim vector
        fields={"title": "AI Basics", "content": "...", "score": 9.5}
    )
])

# Pure vector search
results = collection.query(
    zvec.VectorQuery("embedding", vector=query_embedding),
    topk=10
)
Enter fullscreen mode Exit fullscreen mode

Hybrid query (v0.5.0):

# Combine vector similarity + full-text search + scalar filter in one query
results = collection.query(
    zvec.MultiQuery(
        queries=[
            zvec.VectorQuery("embedding", vector=query_embedding, weight=0.7),
            zvec.FTSQuery("content", text="machine learning", weight=0.3),
        ],
        filter="score > 8.0",      # Scalar filter
        fusion=zvec.FusionType.RRF  # Reciprocal Rank Fusion
    ),
    topk=10
)
Enter fullscreen mode Exit fullscreen mode

Full-text search:

# Create a full-text index on a field
collection.create_index("content", zvec.IndexType.FULL_TEXT)

# Full-text query
results = collection.query(
    zvec.FTSQuery("content", text="vector database"),
    topk=10
)
Enter fullscreen mode Exit fullscreen mode

Node.js:

import { createCollection, VectorQuery } from '@zvec/zvec';

const collection = await createCollection({
  path: './my_db',
  schema: { name: 'docs', vectors: { embedding: { dim: 1536 } } }
});

const results = await collection.query(
  new VectorQuery('embedding', queryVector),
  { topk: 10 }
);
Enter fullscreen mode Exit fullscreen mode

Multi-Language SDKs

Language Install Best for
Python pip install zvec Data science, RAG services, notebooks
Node.js npm install @zvec/zvec Web backends, full-stack apps
Go go get github.com/alibaba/zvec-go High-performance backend services
Rust crates.io: zvec System-level apps, performance-critical paths
Dart/Flutter flutter pub add zvec Android / iOS mobile

Deep Dive

The In-Process Tradeoff

Zvec's core architectural decision is to abandon the client-server model for in-process embedding. That choice has specific costs and benefits:

Benefits:

  • Zero network latency: vector search happens in memory or on local disk, no round trip
  • Zero operational burden: no vector database service to deploy, monitor, or upgrade
  • Portability: application binary + data files deploy to anything with a filesystem
  • Edge and mobile: Dart SDK supports on-device inference on Android and iOS

Costs:

  • No sharing a single vector store across multiple services (single-process write, multi-process read-only)
  • No cross-network access (this is an architectural property, not a defect)

For most RAG application prototypes, edge deployments, and single-service architectures, the tradeoff is correct. Multi-replica, distributed query scenarios are better served by Milvus or Qdrant.

Index Types

HNSW (Hierarchical Navigable Small World): Default index. Graph-based approximate nearest neighbor search. Best for memory-rich environments that need maximum query speed. Sub-linear query time, longer index build time.

FAISS: Interface to Facebook AI Research's vector search library. Multiple quantization strategies (IVF, PQ) for flexible precision-speed tradeoffs.

DiskANN (new in v0.5.0): Stores the vector index on disk rather than in memory, dramatically reducing RAM requirements. The design comes from Microsoft Research's DiskANN paper. Enables billion-scale vector collections on ordinary servers, at the cost of slightly higher latency than pure in-memory indexes.

Sparse vectors: Stores and retrieves sparse representations from BM25 and SPLADE, useful for keyword-semantic hybrid retrieval.

Hybrid Query Design

v0.5.0's hybrid queries integrate three retrieval modes into a single MultiQuery interface:

MultiQuery
    ├── VectorQuery (dense vector similarity)
    │   └── Semantic similarity matching
    ├── FTSQuery (full-text search)
    │   └── Keyword exact matching
    └── ScalarFilter (structured conditions)
        └── Dates, scores, categories

Fusion strategies:
    ├── RRF (Reciprocal Rank Fusion) — recommended, balances multi-source results
    └── Weighted — configurable per-source weights
Enter fullscreen mode Exit fullscreen mode

The practical value: real RAG queries are rarely pure vector similarity. Users routinely need "semantically related + contains specific keyword + from the last 30 days." Splitting into three separate queries and manually merging the ranked lists is expensive and produces suboptimal results. MultiQuery + RRF handles all three in one operation.

Performance

On VectorDBBench (Cohere 10M dataset, 768 dimensions, 10 million vectors):

  • Zvec measured at 8,000+ QPS, more than 2× the previous leaderboard leader (ZillizCloud)
  • Self-reported: approximately 7× faster query throughput than Pinecone

Performance sources:

  • SIMD instructions: AVX-512/AVX2 for vector distance computation — 16 float32 values per instruction on modern CPUs
  • Multithreading: Both index building and query execution parallelize across cores
  • Cache-friendly memory layout: Reduces CPU cache misses during traversal
  • RabitQ quantization (v0.3.0): Compressed vector storage with minimal precision loss

Version Velocity

Zvec went from initial release in December 2025 to v0.5.0 in June 2026 — six major versions in six months:

Version Date Key Changes
v0.1.0 Dec 2025 Initial release, HNSW baseline
v0.2.0 Feb 2026 ARM64 Linux, unified search interface
v0.3.0 Apr 2026 Windows support, RabitQ quantization, MCP/Agent integration
v0.4.0 May 2026 Dart/Flutter SDK, Android/iOS support
v0.5.0 Jun 2026 FTS, hybrid queries, DiskANN, Go/Rust SDKs, Zvec Studio

The MCP integration in v0.3.0 is worth noting: AI agents can access Zvec vector memory directly through the MCP protocol, with no adapter layer required.

Zvec Studio

pip install zvec-studio installs a visual interface for browsing collections, executing queries, and debugging vector search results — no code required. Useful for RAG application debugging and data exploration.


Links and Resources

Official Resources

Related Projects

  • Zvec Studio: pip install zvec-studio (visual management tool)
  • zvec-go: Go SDK
  • zvec-rust: Rust SDK
  • Alibaba Proxima: The production vector search engine underlying Zvec

Conclusion

Zvec's positioning is precise: deliver production-grade vector search as an embedded library, eliminating the deployment and operational complexity of vector databases.

The SQLite analogy holds. SQLite turned relational databases from "requires a configured server" into "one file, works anywhere." Zvec does the same for vector search.

v0.5.0's hybrid query interface — combining vector similarity, full-text search, and scalar filtering — has concrete value for RAG applications. Real queries tend to be combinations of all three constraint types. Previously that required three separate systems (vector database + search engine + relational database) and manual result merging. Now it's one operation in an embedded library.

The Dart SDK extending coverage to Android and iOS signals where this is heading: on-device RAG without routing vector search back to a server.

10.5k Stars, six major versions in six months — Zvec is in rapid iteration and worth watching closely.


Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.

Welcome to my Homepage for more useful insights and interesting products.

Top comments (0)