Milvus

Overview

Milvus is an open-source vector database designed to store, index, and search massive collections of vector embeddings. Built for AI workloads, it supports dense and sparse vector search, full-text search with BM25, and hybrid search with reranking. Milvus runs anywhere from a Jupyter Notebook (Milvus Lite) to a full Kubernetes cluster handling billions of vectors. It is free under the Apache 2.0 license, with Zilliz Cloud available as a fully managed option.

Key Features

Multiple Index Types: Supports HNSW, IVF, FLAT, SCANN, DiskANN, and GPU-accelerated CAGRA indexing, so you can optimize for speed, accuracy, or cost depending on your workload.
Hybrid Search: Combine dense semantic search with sparse vector search including BM25 full‑text ranking and sparse embeddings (such as BGE‑M3) in the same Milvus collection, and merge results using reranking functions for balanced relevance.
Flexible Deployment: Run Milvus Lite in a Python notebook, Milvus Standalone in a single Docker container, or Milvus Distributed on Kubernetes for billion-scale production workloads.
Rich Data Type Support: Store vectors alongside JSON, arrays, numerical, and string fields in the same collection, eliminating the need for a separate metadata database.
Multi-Tenancy: Isolate tenants at the database, collection, partition, or partition key level. A single cluster can serve hundreds to millions of tenants with fine-grained access control.
Hot/Cold Tiered Storage: Keep frequently accessed data in memory or on SSDs for fast queries, and move cold data to cheaper storage to reduce costs by up to 87%.
GPU Acceleration: Use NVIDIA GPU indexing (CAGRA) to speed up vector search on large datasets, reducing query latency for high-throughput applications.
Enterprise Security: Includes TLS encryption, user authentication, and role-based access control (RBAC) out of the box for production-grade deployments.

Pros

Truly Open Source: Licensed under Apache 2.0 with no vendor lock-in. You can self-host for free and modify the source code to fit your needs.
Proven at Scale: Used in production by eBay, Walmart, Salesforce, and Shell, handling billions of vectors with consistent performance.
Active Community and Rapid Updates: Backed by the LF AI & Data Foundation with frequent releases, strong community support, and a clear public roadmap.
Cost-Effective: The core Milvus engine is free for self‑hosted use. Managed versions like Zilliz Cloud provide pay‑as‑you‑go pricing tiers (including free or low‑entry tiers) that let teams scale without large upfront costs.
Easy Integration: Works well with popular AI frameworks like LangChain, LlamaIndex, and OpenAI embeddings through official SDKs in Python, Java, Go, and Node.js.
Flexible Search Modes: Supports ANN search, range search, filtered search, full-text search, and hybrid search in one system, reducing architectural complexity.

Cons

Steep Learning Curve: Users new to vector databases report that getting started takes significant effort, especially around indexing strategies and cluster configuration.
Complex Self-Hosted Setup: Running Milvus Distributed requires managing dependencies like etcd, MinIO, and Pulsar, which adds operational overhead compared to simpler alternatives like ChromaDB.
Documentation Gaps: Several reviewers note that the docs can be hard to follow for beginners, with some advanced topics lacking clear examples.

Use Cases

Retrieval-Augmented Generation (RAG): Store document embeddings in Milvus and retrieve relevant context at query time to feed your LLM. This gives your AI assistant accurate, up-to-date answers grounded in your actual knowledge base rather than relying on the model's training data alone.
Recommendation Systems: Embed user profiles and product attributes as vectors, then use similarity search to match users with relevant items. eBay uses this approach to power ad recommendations, matching users with sponsored products based on semantic similarity.
Image and Multimedia Search: Convert images, audio, or video into vector embeddings and search by visual or acoustic similarity instead of keywords. Orfium uses Milvus to match and detect cover songs across roughly 250 million audio vectors.
Semantic Enterprise Search: Index internal documents, wikis, and knowledge bases as vectors so employees can search by meaning rather than exact keywords. Shell uses Milvus for document retrieval across their corporate knowledge base.
Fraud Detection and Anomaly Analysis: Embed transaction patterns as vectors and search for similar patterns to flag suspicious activity in real time. Financial institutions use vector similarity to detect fraud patterns that rule-based systems miss.
Drug Discovery and Molecular Similarity: Encode chemical compound structures as vectors and search for molecules similar to target proteins. This accelerates the drug discovery pipeline by narrowing candidates to the most promising leads.

Frequently Asked Questions

Is Milvus free to use?

Yes. Milvus is fully open source under the Apache 2.0 license, so you can download, deploy, and run it on your own infrastructure at no software cost. You only pay for the compute, storage, and cloud resources you provision yourself. Milvus also has a managed cloud option called Zilliz Cloud, which offers flexible pricing plans and a free tier for getting started, though specifics (like storage and credits) vary by region and offer.

How does Milvus compare to Pinecone?

Milvus is open source and can be self-hosted, giving you full control over your data and infrastructure. Pinecone is a proprietary, fully managed service. Milvus supports more index types, hybrid search, and GPU acceleration. Pinecone is simpler to get started with since there's no infrastructure to manage. Choose Milvus if you need flexibility, cost control, or on-premise deployment. Choose Pinecone if you want a zero-ops managed service.

What programming languages does Milvus support?

Milvus provides official SDKs for Python, Java, Go, Node.js, and C#. The Python SDK is the most feature-complete and widely used. You can also interact with Milvus through its RESTful API from any language.

How many vectors can Milvus handle?

Milvus Distributed on Kubernetes can handle billions of vectors. The architecture separates storage and compute, so you can scale each independently. Companies like Salesforce run Milvus clusters serving 100+ tenants with diverse workloads. For smaller use cases, Milvus Standalone handles millions of vectors on a single machine.

What is Zilliz Cloud?

Zilliz Cloud is a fully managed vector database service built on Milvus. It eliminates most operational tasks by offering features like serverless and dedicated clusters, auto‑scaling, and pay‑as‑you‑go pricing. It’s designed to run across major cloud providers and simplifies deployment, scaling, and maintenance compared with self‑hosting.

Can I use Milvus for RAG applications?

Yes, RAG is one of the most common use cases for Milvus. You store your document embeddings in Milvus, and when a user asks a question, you search for the most relevant chunks and pass them as context to your LLM. Milvus integrates directly with LangChain, LlamaIndex, and other popular RAG frameworks.

What are the system requirements for self-hosting Milvus?

Milvus Lite: Runs as a lightweight Python‑embedded library with minimal requirements.
Milvus Standalone: Requires Docker and is suitable for development and mid‑sized workloads.
Milvus Distributed: Requires orchestration components such as Kubernetes and services like etcd and object storage (e.g., S3/MinIO) for production‑scale clusters. Actual hardware requirements depend on your dataset size and throughput needs.

Does Milvus support filtered search?

Yes. You can combine vector similarity search with scalar filters on any stored field, including numeric ranges, string matches, JSON field queries, and array contains operations. Milvus evaluates filters during the search process rather than as a post-processing step, which keeps performance high even with complex filter conditions.