Introduction to LlamaIndex
LlamaIndex is the leading data framework for building LLM applications. It provides tools to ingest, structure, and query your data — making it easy to build RAG pipelines, chatbots, and agents that work with your private data.
What is LlamaIndex?
LlamaIndex (formerly GPT Index) is a Python framework that connects large language models to external data sources. It handles the entire pipeline: loading documents, parsing them into chunks, creating embeddings, storing them in an index, and querying them with natural language.
LlamaIndex = Data Framework for LLM Applications # Core Pipeline Load → Parse → Index → Query | | | | PDFs Chunking VectorStore Natural Web Metadata Summary Language APIs Nodes KnowledgeGraph Questions DBs Tree Chat
Key Concepts
Documents & Nodes
Documents are your raw data (PDFs, web pages). Nodes are the parsed chunks that contain text, metadata, and relationships.
Indexes
Data structures that organize your nodes for efficient retrieval. VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex, and more.
Query Engines
Interfaces for asking questions over your indexed data. They retrieve relevant nodes and synthesize answers using an LLM.
Agents
Autonomous reasoning systems that can use tools, query multiple indexes, and make decisions to answer complex questions.
LlamaIndex vs LangChain
| Feature | LlamaIndex | LangChain |
|---|---|---|
| Primary Focus | Data indexing and retrieval | General LLM application chains |
| RAG | Best-in-class, purpose-built | Good support, more manual |
| Index Types | Vector, Summary, KG, Tree, SQL | Mainly vector stores |
| Agents | Data-aware agents with tools | Extensive agent framework |
| Learning Curve | Moderate (focused API) | Steeper (many abstractions) |
| Best For | RAG, data-heavy applications | General LLM orchestration |
The LlamaIndex Ecosystem
-
LlamaIndex Core
The main framework with document loaders, indexes, query engines, and agents. The foundation for all LlamaIndex applications.
-
LlamaHub
A registry of data loaders (readers) for 150+ data sources: Google Docs, Slack, Notion, databases, APIs, and more.
-
LlamaParse
A managed document parsing service for complex files (PDFs with tables, charts, and multi-column layouts). Superior to basic text extraction.
-
LlamaCloud
Managed RAG-as-a-service platform. Upload documents, and LlamaCloud handles indexing, retrieval, and scaling.
What's Next?
In the next lesson, we will install LlamaIndex, configure an LLM provider, load our first document, and run a basic query.