Best Practices
Optimize costs, tune performance, monitor your indexes, and follow production deployment patterns for Pinecone vector databases.
Cost Optimization
Use Serverless
Start with Pinecone Serverless for variable workloads. You pay only for reads, writes, and storage — no idle costs.
Choose Smaller Dimensions
Use text-embedding-3-small (1536-dim) instead of text-embedding-3-large (3072-dim) when precision is not critical. Halves storage cost.
Optimize top_k
Use the smallest top_k that gives good results. Each additional result costs read units. For RAG, 3-5 is often sufficient.
Clean Up Old Data
Delete outdated vectors regularly. Use namespaces to isolate versioned data and delete entire namespaces when no longer needed.
Performance Tuning
- Batch upserts: Always upsert in batches of 100-200 vectors. Single-vector upserts are much slower.
- Use gRPC: Install
pinecone[grpc]for 2-3x faster upserts and queries compared to REST. - Minimize metadata: Only store metadata you will filter on or need to retrieve. Large metadata increases storage and slows queries.
- Namespace strategy: Use namespaces to partition data logically (per user, per tenant, per version). Queries within namespaces are faster.
- Connection pooling: Reuse the Pinecone client across requests. Do not create a new client per API call.
Production Checklist
-
Secure Your API Key
Store API keys in environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager). Never hardcode keys in source code.
-
Monitor Index Stats
Regularly check
describe_index_stats()for vector count, index fullness, and namespace distribution. Set up alerts for unexpected growth. -
Handle Errors Gracefully
Implement retry logic with exponential backoff for transient errors (429, 500, 503). The Pinecone SDK handles basic retries, but add application-level retries for critical paths.
-
Test Your Embeddings
Validate that your embedding pipeline produces consistent results. Run sanity checks: embed the same text twice and verify the vectors are identical.
from pinecone import Pinecone from tenacity import retry, stop_after_attempt, wait_exponential pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) index = pc.Index("production-index") @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10) ) def safe_query(embedding, top_k=5, **kwargs): """Query with retry logic.""" return index.query( vector=embedding, top_k=top_k, include_metadata=True, **kwargs )
prod-docs-v2, staging-products, dev-test. This prevents accidental operations on production indexes.- Mismatched dimensions between embedding model and index
- Creating a new Pinecone client per request instead of reusing
- Not including metadata when querying (forgetting
include_metadata=True) - Using the wrong distance metric for your embedding model
- Not batching upserts for large datasets
Course Summary
Congratulations on completing the Pinecone course! You have learned how to set up Pinecone, index and query vectors, build RAG pipelines, and deploy vector search to production. You are now equipped to use Pinecone as the vector backbone for AI-powered applications.
Lilly Tech Systems