Weaviate
Weaviate is an open-source, AI-native vector database with built-in vectorization modules, hybrid search combining vector and keyword search, and a powerful GraphQL API.
What is Weaviate?
Weaviate goes beyond simple vector storage. It is an AI-native database that integrates vectorization directly into the database layer:
- Built-in vectorizers: Automatically embed text, images, and more using modules like
text2vec-openai,text2vec-transformers,img2vec-neural. - Hybrid search: Combine vector similarity with BM25 keyword search for the best of both worlds.
- GraphQL API: Rich, flexible query language for complex searches.
- Multi-tenancy: Isolate data per tenant with efficient resource sharing.
- Open source: BSD-3 license with optional managed cloud service.
Installation
Docker (Recommended for Development)
version: '3.4'
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:latest
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai,generative-openai'
OPENAI_APIKEY: ${OPENAI_API_KEY}
CLUSTER_HOSTNAME: 'node1'
volumes:
- weaviate_data:/var/lib/weaviate
volumes:
weaviate_data:
# Start Weaviate
docker compose up -d
# Install Python client
pip install weaviate-client
Weaviate Cloud (Managed)
For production, Weaviate offers a managed cloud service at console.weaviate.cloud. Create a cluster in minutes with no infrastructure management.
Schema Definition
Weaviate uses a schema-based approach. You define collections (previously called "classes") with properties and vectorizer configuration.
import weaviate
import weaviate.classes as wvc
# Connect to Weaviate
client = weaviate.connect_to_local() # localhost:8080
# Create a collection with auto-vectorization
articles = client.collections.create(
name="Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small"
),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="published", data_type=wvc.config.DataType.DATE),
]
)
Importing Data
articles = client.collections.get("Article")
# Single insert (Weaviate auto-embeds the text)
articles.data.insert({
"title": "Introduction to Vector Databases",
"content": "Vector databases store high-dimensional embeddings...",
"category": "technology"
})
# Batch insert for efficiency
with articles.batch.dynamic() as batch:
for item in data_list:
batch.add_object(properties={
"title": item["title"],
"content": item["content"],
"category": item["category"]
})
Querying with GraphQL
articles = client.collections.get("Article")
# Semantic search - Weaviate embeds the query automatically
response = articles.query.near_text(
query="How do AI systems store knowledge?",
limit=5,
return_metadata=wvc.query.MetadataQuery(distance=True)
)
for obj in response.objects:
print(f"{obj.properties['title']} (distance: {obj.metadata.distance:.4f})")
Hybrid Search
Hybrid search combines vector similarity (semantic meaning) with BM25 keyword search (exact term matching). This gives you the best of both worlds.
# Hybrid search: combines vector + BM25 keyword search
response = articles.query.hybrid(
query="vector database HNSW indexing",
alpha=0.5, # 0 = pure keyword, 1 = pure vector
limit=10,
filters=wvc.query.Filter.by_property("category").equal("technology")
)
for obj in response.objects:
print(obj.properties["title"])
alpha value controls the balance between keyword and vector search. alpha=0.75 (more vector) works well for most use cases. Experiment with your data to find the best value.Multi-Tenancy
Weaviate supports native multi-tenancy — isolate data per tenant while sharing the same collection schema and infrastructure.
# Create collection with multi-tenancy enabled
docs = client.collections.create(
name="Document",
multi_tenancy_config=wvc.config.Configure.multi_tenancy(enabled=True),
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai()
)
# Add tenants
docs.tenants.create([
wvc.tenants.Tenant(name="tenant_a"),
wvc.tenants.Tenant(name="tenant_b"),
])
# Insert data for a specific tenant
tenant_a = docs.with_tenant("tenant_a")
tenant_a.data.insert({"content": "Tenant A's private document"})
💡 Try It Yourself
Run Weaviate with Docker, create a collection, import some articles, and compare the results of pure vector search vs hybrid search. Notice how hybrid search handles queries with specific technical terms better.
Lilly Tech Systems