Step 3: Semantic Search
In this lesson, you will add vector-based semantic search to the engine. The query is encoded into a dense vector using the same sentence-transformer model, and Elasticsearch finds the nearest neighbors using cosine similarity. This enables the engine to understand meaning — "programming tutorials" now finds "coding lessons."
How Semantic Search Works
Instead of matching keywords, semantic search compares the meaning of the query to the meaning of each document:
Query: "how to fix a car"
|
v
[sentence-transformer] --> [0.12, -0.34, 0.56, ...] (384 dims)
|
v
[Elasticsearch kNN] --> Find nearest vectors using cosine similarity
|
v
Results:
1. "Automobile Repair Guide" (cosine: 0.89) -- different words, same meaning
2. "Vehicle Maintenance Tips" (cosine: 0.85) -- conceptually similar
3. "Introduction to Car Engines" (cosine: 0.78) -- related topic
Notice that none of these results contain the exact words "fix" or "car" in their titles, yet they are highly relevant. This is the power of semantic search.
Cosine Similarity Explained
Cosine similarity measures the angle between two vectors, ignoring magnitude:
cosine_similarity(A, B) = (A . B) / (||A|| * ||B||)
For normalized vectors (which sentence-transformers produces):
cosine_similarity(A, B) = A . B (dot product)
Score range: -1 to 1
1.0 = identical meaning
0.0 = completely unrelated
-1.0 = opposite meaning (rare in practice)
The Semantic Search Module
Create the semantic search module that encodes the query and runs kNN search:
# app/search/semantic.py
"""Semantic vector search using Elasticsearch kNN."""
from app.elasticsearch.client import SearchClient
from app.embeddings.encoder import encode_query
from app.config import get_settings
import logging
logger = logging.getLogger(__name__)
settings = get_settings()
def semantic_search(
query: str,
top_k: int = None,
category: str = None,
tags: list[str] = None,
num_candidates: int = None
) -> dict:
"""Run a semantic (vector) search against Elasticsearch.
The query is encoded into a dense vector using sentence-transformers,
then Elasticsearch finds the nearest neighbors using HNSW.
Args:
query: The user's search query string.
top_k: Number of results to return.
category: Optional category filter.
tags: Optional tag filters.
num_candidates: Number of candidates for kNN (higher = more accurate, slower).
Returns:
Dict with 'results' list and 'total' count.
"""
if top_k is None:
top_k = settings.search_top_k
if num_candidates is None:
num_candidates = top_k * 10 # 10x oversampling for accuracy
client = SearchClient()
# Step 1: Encode the query into a dense vector
query_vector = encode_query(query)
# Step 2: Build kNN query
knn_query = {
"field": "embedding",
"query_vector": query_vector,
"k": top_k,
"num_candidates": num_candidates
}
# Step 3: Add filters if specified
filter_clauses = []
if category:
filter_clauses.append({"term": {"category": category}})
if tags:
for tag in tags:
filter_clauses.append({"term": {"tags": tag}})
if filter_clauses:
knn_query["filter"] = {"bool": {"must": filter_clauses}}
# Step 4: Execute kNN search
search_body = {
"knn": knn_query,
"_source": ["title", "body", "category", "tags", "url", "created_at"],
"size": top_k
}
response = client.es.search(
index=client.index_name,
body=search_body
)
# Step 5: Parse results
results = []
for hit in response["hits"]["hits"]:
result = {
"id": hit["_id"],
"score": hit["_score"], # cosine similarity score
"source": hit["_source"],
"highlights": {} # No keyword highlights for vector search
}
results.append(result)
total = response["hits"]["total"]["value"]
logger.info(f"Semantic search for '{query}': {total} total, returning {len(results)}")
return {
"results": results,
"total": total,
"query": query,
"mode": "semantic"
}
Understanding the kNN Query
Let us break down the key parameters:
num_candidates: The Accuracy-Speed Tradeoff
"knn": {
"field": "embedding",
"query_vector": [0.12, -0.34, ...], // 384 dimensions
"k": 10, // Return 10 results
"num_candidates": 100 // Consider 100 candidates
}
Elasticsearch uses HNSW (Hierarchical Navigable Small World) graphs for approximate nearest-neighbor search. The num_candidates parameter controls the accuracy-speed tradeoff:
- num_candidates = k: Fastest, but may miss relevant results.
- num_candidates = k * 10: Good balance. This is what we use.
- num_candidates = k * 100: Near-exact results, but slower for large indexes.
Pre-filtering vs Post-filtering
// Pre-filtering: Elasticsearch filters BEFORE kNN search
"knn": {
"filter": {"term": {"category": "machine-learning"}},
...
}
// Only vectors in the "machine-learning" category are considered.
// Faster, but may return fewer than k results if the category is small.
// Post-filtering: Filter AFTER kNN search (not recommended)
// You would get k results first, then filter, potentially returning 0 results.
We use pre-filtering because it ensures all returned results match the filter criteria.
Update the Search API Route
Update app/main.py to support both keyword and semantic modes:
# Update the search endpoint in app/main.py
from app.search.keyword import keyword_search
from app.search.semantic import semantic_search
@app.get("/api/search")
async def search(
q: str,
mode: str = None,
top_k: int = 10,
category: str = None,
tags: str = None,
page: int = 1
):
"""Search documents with keyword (BM25) or semantic (vector) matching.
Query params:
q: Search query string
mode: 'keyword', 'semantic', or 'hybrid' (default from settings)
top_k: Number of results per page
category: Filter by category
tags: Comma-separated tag filters
page: Page number (1-based)
"""
if mode is None:
mode = settings.search_default_mode
tag_list = tags.split(",") if tags else None
from_offset = (page - 1) * top_k
if mode == "keyword":
return keyword_search(
query=q, top_k=top_k, category=category,
tags=tag_list, from_offset=from_offset
)
elif mode == "semantic":
return semantic_search(
query=q, top_k=top_k, category=category, tags=tag_list
)
return {"error": f"Mode '{mode}' not implemented yet"}
Test Semantic Search
# Semantic search - finds conceptually similar documents
curl "http://localhost:8000/api/search?q=how+to+store+data+efficiently&mode=semantic"
# Should find "Understanding Vector Databases" even though no keywords match
# Compare with keyword search
curl "http://localhost:8000/api/search?q=how+to+store+data+efficiently&mode=keyword"
# Keyword search may return fewer or no results for this query
# Semantic search with category filter
curl "http://localhost:8000/api/search?q=web+framework&mode=semantic&category=web-development"
# Demonstrate semantic understanding
curl "http://localhost:8000/api/search?q=coding+lessons&mode=semantic"
# Should find "Introduction to Machine Learning" and other tutorial-like content
Keyword vs Semantic: When to Use Each
Keyword Search Wins
- Exact terms: "FastAPI", "uvicorn", "BM25"
- Error codes: "CORS 403 error"
- Product names: "Elasticsearch 8"
- Quoted phrases: "machine learning"
- Short, specific queries
Semantic Search Wins
- Conceptual queries: "how to process text"
- Synonym matching: "car" finds "automobile"
- Natural language questions
- Cross-language concepts
- Long, descriptive queries
Key Takeaways
- Semantic search encodes queries and documents into the same vector space, finding results by meaning rather than keywords.
- Elasticsearch 8 supports kNN search natively with
dense_vectorfields and HNSW indexes. - The
num_candidatesparameter controls the accuracy-speed tradeoff for approximate nearest-neighbor search. - Pre-filtering applies category and tag filters before the kNN search, ensuring all results match.
- The embedding model runs locally with zero API cost — encode as many queries as you want.
What Is Next
In the next lesson, you will combine keyword and semantic search into a hybrid search system using Reciprocal Rank Fusion, then add cross-encoder re-ranking for maximum precision.
Lilly Tech Systems