Intermediate

Step 3: Semantic Search

In this lesson, you will add vector-based semantic search to the engine. The query is encoded into a dense vector using the same sentence-transformer model, and Elasticsearch finds the nearest neighbors using cosine similarity. This enables the engine to understand meaning — "programming tutorials" now finds "coding lessons."

How Semantic Search Works

Instead of matching keywords, semantic search compares the meaning of the query to the meaning of each document:

Query: "how to fix a car"
    |
    v
[sentence-transformer] --> [0.12, -0.34, 0.56, ...] (384 dims)
    |
    v
[Elasticsearch kNN] --> Find nearest vectors using cosine similarity
    |
    v
Results:
  1. "Automobile Repair Guide" (cosine: 0.89)     -- different words, same meaning
  2. "Vehicle Maintenance Tips" (cosine: 0.85)     -- conceptually similar
  3. "Introduction to Car Engines" (cosine: 0.78)  -- related topic

Notice that none of these results contain the exact words "fix" or "car" in their titles, yet they are highly relevant. This is the power of semantic search.

Cosine Similarity Explained

Cosine similarity measures the angle between two vectors, ignoring magnitude:

cosine_similarity(A, B) = (A . B) / (||A|| * ||B||)

For normalized vectors (which sentence-transformers produces):
cosine_similarity(A, B) = A . B  (dot product)

Score range: -1 to 1
  1.0 = identical meaning
  0.0 = completely unrelated
 -1.0 = opposite meaning (rare in practice)

💡

Why Not Euclidean Distance? Cosine similarity focuses on the direction of vectors, not their magnitude. Two documents about "machine learning" will point in similar directions regardless of document length. Euclidean distance would penalize shorter documents unfairly.

The Semantic Search Module

Create the semantic search module that encodes the query and runs kNN search:

# app/search/semantic.py
"""Semantic vector search using Elasticsearch kNN."""
from app.elasticsearch.client import SearchClient
from app.embeddings.encoder import encode_query
from app.config import get_settings
import logging

logger = logging.getLogger(__name__)
settings = get_settings()


def semantic_search(
    query: str,
    top_k: int = None,
    category: str = None,
    tags: list[str] = None,
    num_candidates: int = None
) -> dict:
    """Run a semantic (vector) search against Elasticsearch.

    The query is encoded into a dense vector using sentence-transformers,
    then Elasticsearch finds the nearest neighbors using HNSW.

    Args:
        query: The user's search query string.
        top_k: Number of results to return.
        category: Optional category filter.
        tags: Optional tag filters.
        num_candidates: Number of candidates for kNN (higher = more accurate, slower).

    Returns:
        Dict with 'results' list and 'total' count.
    """
    if top_k is None:
        top_k = settings.search_top_k
    if num_candidates is None:
        num_candidates = top_k * 10  # 10x oversampling for accuracy

    client = SearchClient()

    # Step 1: Encode the query into a dense vector
    query_vector = encode_query(query)

    # Step 2: Build kNN query
    knn_query = {
        "field": "embedding",
        "query_vector": query_vector,
        "k": top_k,
        "num_candidates": num_candidates
    }

    # Step 3: Add filters if specified
    filter_clauses = []
    if category:
        filter_clauses.append({"term": {"category": category}})
    if tags:
        for tag in tags:
            filter_clauses.append({"term": {"tags": tag}})

    if filter_clauses:
        knn_query["filter"] = {"bool": {"must": filter_clauses}}

    # Step 4: Execute kNN search
    search_body = {
        "knn": knn_query,
        "_source": ["title", "body", "category", "tags", "url", "created_at"],
        "size": top_k
    }

    response = client.es.search(
        index=client.index_name,
        body=search_body
    )

    # Step 5: Parse results
    results = []
    for hit in response["hits"]["hits"]:
        result = {
            "id": hit["_id"],
            "score": hit["_score"],  # cosine similarity score
            "source": hit["_source"],
            "highlights": {}  # No keyword highlights for vector search
        }
        results.append(result)

    total = response["hits"]["total"]["value"]

    logger.info(f"Semantic search for '{query}': {total} total, returning {len(results)}")

    return {
        "results": results,
        "total": total,
        "query": query,
        "mode": "semantic"
    }

Understanding the kNN Query

Let us break down the key parameters:

num_candidates: The Accuracy-Speed Tradeoff

"knn": {
    "field": "embedding",
    "query_vector": [0.12, -0.34, ...],  // 384 dimensions
    "k": 10,                              // Return 10 results
    "num_candidates": 100                 // Consider 100 candidates
}

Elasticsearch uses HNSW (Hierarchical Navigable Small World) graphs for approximate nearest-neighbor search. The num_candidates parameter controls the accuracy-speed tradeoff:

num_candidates = k: Fastest, but may miss relevant results.
num_candidates = k * 10: Good balance. This is what we use.
num_candidates = k * 100: Near-exact results, but slower for large indexes.

Pre-filtering vs Post-filtering

// Pre-filtering: Elasticsearch filters BEFORE kNN search
"knn": {
    "filter": {"term": {"category": "machine-learning"}},
    ...
}
// Only vectors in the "machine-learning" category are considered.
// Faster, but may return fewer than k results if the category is small.

// Post-filtering: Filter AFTER kNN search (not recommended)
// You would get k results first, then filter, potentially returning 0 results.

We use pre-filtering because it ensures all returned results match the filter criteria.

Update the Search API Route

Update app/main.py to support both keyword and semantic modes:

# Update the search endpoint in app/main.py
from app.search.keyword import keyword_search
from app.search.semantic import semantic_search


@app.get("/api/search")
async def search(
    q: str,
    mode: str = None,
    top_k: int = 10,
    category: str = None,
    tags: str = None,
    page: int = 1
):
    """Search documents with keyword (BM25) or semantic (vector) matching.

    Query params:
        q: Search query string
        mode: 'keyword', 'semantic', or 'hybrid' (default from settings)
        top_k: Number of results per page
        category: Filter by category
        tags: Comma-separated tag filters
        page: Page number (1-based)
    """
    if mode is None:
        mode = settings.search_default_mode

    tag_list = tags.split(",") if tags else None
    from_offset = (page - 1) * top_k

    if mode == "keyword":
        return keyword_search(
            query=q, top_k=top_k, category=category,
            tags=tag_list, from_offset=from_offset
        )
    elif mode == "semantic":
        return semantic_search(
            query=q, top_k=top_k, category=category, tags=tag_list
        )

    return {"error": f"Mode '{mode}' not implemented yet"}

Test Semantic Search

# Semantic search - finds conceptually similar documents
curl "http://localhost:8000/api/search?q=how+to+store+data+efficiently&mode=semantic"
# Should find "Understanding Vector Databases" even though no keywords match

# Compare with keyword search
curl "http://localhost:8000/api/search?q=how+to+store+data+efficiently&mode=keyword"
# Keyword search may return fewer or no results for this query

# Semantic search with category filter
curl "http://localhost:8000/api/search?q=web+framework&mode=semantic&category=web-development"

# Demonstrate semantic understanding
curl "http://localhost:8000/api/search?q=coding+lessons&mode=semantic"
# Should find "Introduction to Machine Learning" and other tutorial-like content

Keyword vs Semantic: When to Use Each

Keyword Search Wins

Exact terms: "FastAPI", "uvicorn", "BM25"
Error codes: "CORS 403 error"
Product names: "Elasticsearch 8"
Quoted phrases: "machine learning"
Short, specific queries

Semantic Search Wins

Conceptual queries: "how to process text"
Synonym matching: "car" finds "automobile"
Natural language questions
Cross-language concepts
Long, descriptive queries

📝

Key Insight: Neither keyword nor semantic search is always better. Keyword search is precise but brittle. Semantic search understands meaning but can miss exact terms. The ideal solution is to combine both — which is exactly what you will build in the next lesson with hybrid search and Reciprocal Rank Fusion.

Key Takeaways

Semantic search encodes queries and documents into the same vector space, finding results by meaning rather than keywords.
Elasticsearch 8 supports kNN search natively with dense_vector fields and HNSW indexes.
The num_candidates parameter controls the accuracy-speed tradeoff for approximate nearest-neighbor search.
Pre-filtering applies category and tag filters before the kNN search, ensuring all results match.
The embedding model runs locally with zero API cost — encode as many queries as you want.

What Is Next

In the next lesson, you will combine keyword and semantic search into a hybrid search system using Reciprocal Rank Fusion, then add cross-encoder re-ranking for maximum precision.

← Previous Keyword Search (BM25) Next → Hybrid Search & Re-ranking