Intermediate
ChromaDB
ChromaDB is an open-source, embeddable vector database that makes it easy to build AI applications with embeddings. Perfect for prototyping, local development, and lightweight production use.
What is ChromaDB?
ChromaDB is an open-source vector database designed for simplicity. It runs in-memory, as a persistent local database, or as a client-server setup. Key features:
- Embeddable: Runs inside your Python application with zero configuration.
- Built-in embedding functions: Automatically embed documents using OpenAI, Sentence Transformers, or other models.
- Simple API: Add documents, query by text or vector, get results — that is it.
- Open source: Apache 2.0 license. Free forever.
- Persistent storage: Save to disk and reload later.
Installation
Terminal
# Basic installation
pip install chromadb
# With OpenAI embedding support
pip install chromadb openai
In-Memory vs Persistent
Python - Client Modes
import chromadb
# In-memory (data lost when process exits)
client = chromadb.Client()
# Persistent (data saved to disk)
client = chromadb.PersistentClient(path="./chroma_data")
# Client-server (connect to a running Chroma server)
client = chromadb.HttpClient(host="localhost", port=8000)
Creating Collections
A collection is like a table — it holds vectors with their documents and metadata.
Python - Create and Use Collections
# Create a collection (uses default embedding function)
collection = client.create_collection(
name="my_documents",
metadata={"hnsw:space": "cosine"} # distance metric
)
# Or get an existing collection
collection = client.get_or_create_collection("my_documents")
Adding Documents
Python - Add Documents with Auto-Embedding
# Add documents - ChromaDB embeds them automatically
collection.add(
ids=["doc1", "doc2", "doc3"],
documents=[
"Python is great for data science and AI",
"JavaScript powers modern web applications",
"Machine learning algorithms learn from data"
],
metadatas=[
{"category": "programming", "year": 2024},
{"category": "programming", "year": 2024},
{"category": "ai", "year": 2024}
]
)
# Or add pre-computed embeddings
collection.add(
ids=["doc4"],
embeddings=[[0.1, 0.2, 0.3, ...]], # your own embeddings
documents=["Custom embedded document"],
metadatas=[{"category": "custom"}]
)
Querying
Python - Query by Text or Vector
# Query by text (auto-embeds the query)
results = collection.query(
query_texts=["What programming language should I learn?"],
n_results=2,
where={"category": "programming"} # metadata filter
)
print(results["documents"])
# [['Python is great for data science and AI',
# 'JavaScript powers modern web applications']]
print(results["distances"])
# [[0.2341, 0.4523]]
# Query by embedding vector
results = collection.query(
query_embeddings=[query_vector],
n_results=5
)
Built-in Embedding Functions
Python - Custom Embedding Functions
from chromadb.utils.embedding_functions import (
OpenAIEmbeddingFunction,
SentenceTransformerEmbeddingFunction
)
# Use OpenAI embeddings
openai_ef = OpenAIEmbeddingFunction(
api_key="your-openai-key",
model_name="text-embedding-3-small"
)
# Use Sentence Transformers (free, runs locally)
st_ef = SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
# Create collection with custom embedding function
collection = client.create_collection(
name="openai_docs",
embedding_function=openai_ef
)
Docker Deployment
Terminal - Run ChromaDB Server
# Pull and run ChromaDB server
docker pull chromadb/chroma
docker run -p 8000:8000 chromadb/chroma
# With persistent storage
docker run -p 8000:8000 \
-v ./chroma_data:/chroma/chroma \
chromadb/chroma
When to choose ChromaDB: Choose ChromaDB for prototyping, local development, small-to-medium datasets (under 1M vectors), or when you want an open-source solution that runs anywhere. It has the lowest barrier to entry of any vector database.
💡 Try It Yourself
Install ChromaDB, create a persistent collection, add 5 documents about different topics, and query them. Experiment with metadata filters and different embedding functions.
ChromaDB is the fastest way to get started with vector search. The entire setup takes less than 5 minutes.
Lilly Tech Systems