Intermediate

Weaviate

Weaviate is an open-source, AI-native vector database with built-in vectorization modules, hybrid search combining vector and keyword search, and a powerful GraphQL API.

What is Weaviate?

Weaviate goes beyond simple vector storage. It is an AI-native database that integrates vectorization directly into the database layer:

  • Built-in vectorizers: Automatically embed text, images, and more using modules like text2vec-openai, text2vec-transformers, img2vec-neural.
  • Hybrid search: Combine vector similarity with BM25 keyword search for the best of both worlds.
  • GraphQL API: Rich, flexible query language for complex searches.
  • Multi-tenancy: Isolate data per tenant with efficient resource sharing.
  • Open source: BSD-3 license with optional managed cloud service.

Installation

Docker (Recommended for Development)

docker-compose.yml
version: '3.4'
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,generative-openai'
      OPENAI_APIKEY: ${OPENAI_API_KEY}
      CLUSTER_HOSTNAME: 'node1'
    volumes:
      - weaviate_data:/var/lib/weaviate
volumes:
  weaviate_data:
Terminal
# Start Weaviate
docker compose up -d

# Install Python client
pip install weaviate-client

Weaviate Cloud (Managed)

For production, Weaviate offers a managed cloud service at console.weaviate.cloud. Create a cluster in minutes with no infrastructure management.

Schema Definition

Weaviate uses a schema-based approach. You define collections (previously called "classes") with properties and vectorizer configuration.

Python - Define Schema
import weaviate
import weaviate.classes as wvc

# Connect to Weaviate
client = weaviate.connect_to_local()  # localhost:8080

# Create a collection with auto-vectorization
articles = client.collections.create(
    name="Article",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    properties=[
        wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="published", data_type=wvc.config.DataType.DATE),
    ]
)

Importing Data

Python - Insert Objects
articles = client.collections.get("Article")

# Single insert (Weaviate auto-embeds the text)
articles.data.insert({
    "title": "Introduction to Vector Databases",
    "content": "Vector databases store high-dimensional embeddings...",
    "category": "technology"
})

# Batch insert for efficiency
with articles.batch.dynamic() as batch:
    for item in data_list:
        batch.add_object(properties={
            "title": item["title"],
            "content": item["content"],
            "category": item["category"]
        })

Querying with GraphQL

Python - Vector Search (Near Text)
articles = client.collections.get("Article")

# Semantic search - Weaviate embeds the query automatically
response = articles.query.near_text(
    query="How do AI systems store knowledge?",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(distance=True)
)

for obj in response.objects:
    print(f"{obj.properties['title']} (distance: {obj.metadata.distance:.4f})")

Hybrid Search

Hybrid search combines vector similarity (semantic meaning) with BM25 keyword search (exact term matching). This gives you the best of both worlds.

Python - Hybrid Search
# Hybrid search: combines vector + BM25 keyword search
response = articles.query.hybrid(
    query="vector database HNSW indexing",
    alpha=0.5,   # 0 = pure keyword, 1 = pure vector
    limit=10,
    filters=wvc.query.Filter.by_property("category").equal("technology")
)

for obj in response.objects:
    print(obj.properties["title"])
💡
Alpha parameter: The alpha value controls the balance between keyword and vector search. alpha=0.75 (more vector) works well for most use cases. Experiment with your data to find the best value.

Multi-Tenancy

Weaviate supports native multi-tenancy — isolate data per tenant while sharing the same collection schema and infrastructure.

Python - Multi-Tenancy
# Create collection with multi-tenancy enabled
docs = client.collections.create(
    name="Document",
    multi_tenancy_config=wvc.config.Configure.multi_tenancy(enabled=True),
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai()
)

# Add tenants
docs.tenants.create([
    wvc.tenants.Tenant(name="tenant_a"),
    wvc.tenants.Tenant(name="tenant_b"),
])

# Insert data for a specific tenant
tenant_a = docs.with_tenant("tenant_a")
tenant_a.data.insert({"content": "Tenant A's private document"})
When to choose Weaviate: Choose Weaviate when you need built-in vectorization (no separate embedding step), hybrid search, multi-tenancy, or a rich query API. It excels for complex search applications where combining vector and keyword relevance matters.

💡 Try It Yourself

Run Weaviate with Docker, create a collection, import some articles, and compare the results of pure vector search vs hybrid search. Notice how hybrid search handles queries with specific technical terms better.

Try the same query with different alpha values (0.0, 0.25, 0.5, 0.75, 1.0) to see how keyword vs vector balance affects results.