RAG with LangChain
Build a complete Retrieval Augmented Generation pipeline: load documents, split text, create embeddings, store in vector databases, retrieve relevant context, and generate grounded answers.
Document Loaders
LangChain provides 100+ document loaders for every data source imaginable:
# PDF files from langchain_community.document_loaders import PyPDFLoader loader = PyPDFLoader("document.pdf") docs = loader.load() # Returns list of Document objects # Web pages from langchain_community.document_loaders import WebBaseLoader loader = WebBaseLoader("https://example.com/article") docs = loader.load() # CSV files from langchain_community.document_loaders.csv_loader import CSVLoader loader = CSVLoader("data.csv") docs = loader.load() # Directory of files from langchain_community.document_loaders import DirectoryLoader loader = DirectoryLoader("./docs/", glob="**/*.md") docs = loader.load() # Code files from langchain_community.document_loaders import TextLoader loader = TextLoader("app.py") docs = loader.load() # Each Document has: page_content (str) + metadata (dict) print(docs[0].page_content[:100]) print(docs[0].metadata) # {"source": "document.pdf", "page": 0}
Text Splitters
Documents are usually too long to embed as a single chunk. Split them into smaller, overlapping pieces:
from langchain_text_splitters import RecursiveCharacterTextSplitter # The most commonly used splitter splitter = RecursiveCharacterTextSplitter( chunk_size=1000, # Max characters per chunk chunk_overlap=200, # Overlap between chunks for context separators=["\n\n", "\n", " ", ""], # Try to split at these boundaries ) chunks = splitter.split_documents(docs) print(f"Split {len(docs)} documents into {len(chunks)} chunks") # Semantic splitter (uses embeddings to find natural breakpoints) from langchain_experimental.text_splitter import SemanticChunker from langchain_openai import OpenAIEmbeddings semantic_splitter = SemanticChunker( OpenAIEmbeddings(), breakpoint_threshold_type="percentile", ) semantic_chunks = semantic_splitter.split_documents(docs)
Embeddings
Convert text chunks into numerical vectors for similarity search:
# OpenAI embeddings (most popular) from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-small") # HuggingFace embeddings (free, local) from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") # Embed a single text vector = embeddings.embed_query("What is LangChain?") print(f"Vector dimension: {len(vector)}") # 1536 for OpenAI
Vector Stores
Store embedded chunks in a vector database for efficient similarity search:
# Chroma (simple, local, great for development) from langchain_chroma import Chroma vectorstore = Chroma.from_documents( documents=chunks, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db", ) # FAISS (fast, in-memory, by Meta) from langchain_community.vectorstores import FAISS vectorstore = FAISS.from_documents(chunks, OpenAIEmbeddings()) vectorstore.save_local("./faiss_index") # Pinecone (managed, scalable, production-ready) from langchain_pinecone import PineconeVectorStore vectorstore = PineconeVectorStore.from_documents( chunks, OpenAIEmbeddings(), index_name="my-index" )
Retrievers
Retrievers find the most relevant chunks for a given query:
# Basic similarity search retriever retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 4}, # Return top 4 chunks ) # MMR (Maximum Marginal Relevance) - diverse results retriever = vectorstore.as_retriever( search_type="mmr", search_kwargs={"k": 4, "fetch_k": 20}, ) # Self-query retriever (LLM generates filters from natural language) from langchain.retrievers.self_query.base import SelfQueryRetriever retriever = SelfQueryRetriever.from_llm( llm=ChatOpenAI(model="gpt-4o-mini"), vectorstore=vectorstore, document_contents="Technical documentation", metadata_field_info=[...], ) # Test retrieval docs = retriever.invoke("How do I set up LangChain?") for doc in docs: print(doc.page_content[:100])
Building a RAG Chain
Combine all components into a complete RAG chain using LCEL:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_chroma import Chroma # 1. Load and index documents (do this once) vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings()) retriever = vectorstore.as_retriever(search_kwargs={"k": 4}) # 2. RAG prompt template rag_prompt = ChatPromptTemplate.from_template( """Answer the question based only on the following context: {context} Question: {question} Answer:""" ) # 3. Format retrieved documents def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) # 4. Build the RAG chain rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | rag_prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser() ) # 5. Ask questions! answer = rag_chain.invoke("How do I install LangChain?") print(answer)
Conversational RAG
Add conversation history to your RAG chain for follow-up questions:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain.chains.history_aware_retriever import create_history_aware_retriever from langchain.chains.retrieval import create_retrieval_chain from langchain.chains.combine_documents import create_stuff_documents_chain llm = ChatOpenAI(model="gpt-4o-mini") # Contextualize the question using chat history contextualize_prompt = ChatPromptTemplate.from_messages([ ("system", "Given chat history and a follow-up question, reformulate it as a standalone question."), MessagesPlaceholder("chat_history"), ("human", "{input}"), ]) history_aware_retriever = create_history_aware_retriever( llm, retriever, contextualize_prompt ) # Answer prompt answer_prompt = ChatPromptTemplate.from_messages([ ("system", "Answer using only this context:\n\n{context}"), MessagesPlaceholder("chat_history"), ("human", "{input}"), ]) # Build the full conversational RAG chain question_answer_chain = create_stuff_documents_chain(llm, answer_prompt) conversational_rag = create_retrieval_chain( history_aware_retriever, question_answer_chain ) # Usage with history chat_history = [] result = conversational_rag.invoke({ "input": "What is LangChain?", "chat_history": chat_history, }) print(result["answer"])
RecursiveCharacterTextSplitter with chunk_size=1000 and chunk_overlap=200 as a starting point. Experiment with chunk sizes based on your documents — technical docs may need smaller chunks, while narrative text can use larger ones.What's Next?
The next lesson covers Agents and Tools — how to build autonomous LLM agents that can use tools to accomplish tasks.
Lilly Tech Systems