Beginner

Introduction to LlamaIndex

LlamaIndex is the leading data framework for building LLM applications. It provides tools to ingest, structure, and query your data — making it easy to build RAG pipelines, chatbots, and agents that work with your private data.

What is LlamaIndex?

LlamaIndex (formerly GPT Index) is a Python framework that connects large language models to external data sources. It handles the entire pipeline: loading documents, parsing them into chunks, creating embeddings, storing them in an index, and querying them with natural language.

LlamaIndex at a Glance
LlamaIndex = Data Framework for LLM Applications

# Core Pipeline
LoadParseIndexQuery
 |            |            |            |
PDFs       Chunking    VectorStore   Natural
Web        Metadata    Summary       Language
APIs       Nodes       KnowledgeGraph Questions
DBs                    Tree          Chat

Key Concepts

📄

Documents & Nodes

Documents are your raw data (PDFs, web pages). Nodes are the parsed chunks that contain text, metadata, and relationships.

📦

Indexes

Data structures that organize your nodes for efficient retrieval. VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex, and more.

🔎

Query Engines

Interfaces for asking questions over your indexed data. They retrieve relevant nodes and synthesize answers using an LLM.

🤖

Agents

Autonomous reasoning systems that can use tools, query multiple indexes, and make decisions to answer complex questions.

LlamaIndex vs LangChain

Feature LlamaIndex LangChain
Primary Focus Data indexing and retrieval General LLM application chains
RAG Best-in-class, purpose-built Good support, more manual
Index Types Vector, Summary, KG, Tree, SQL Mainly vector stores
Agents Data-aware agents with tools Extensive agent framework
Learning Curve Moderate (focused API) Steeper (many abstractions)
Best For RAG, data-heavy applications General LLM orchestration

The LlamaIndex Ecosystem

  1. LlamaIndex Core

    The main framework with document loaders, indexes, query engines, and agents. The foundation for all LlamaIndex applications.

  2. LlamaHub

    A registry of data loaders (readers) for 150+ data sources: Google Docs, Slack, Notion, databases, APIs, and more.

  3. LlamaParse

    A managed document parsing service for complex files (PDFs with tables, charts, and multi-column layouts). Superior to basic text extraction.

  4. LlamaCloud

    Managed RAG-as-a-service platform. Upload documents, and LlamaCloud handles indexing, retrieval, and scaling.

When to choose LlamaIndex: Pick LlamaIndex when your application is primarily about connecting LLMs to data. It excels at RAG, document Q&A, knowledge bases, and structured data querying. Use LangChain when you need general orchestration beyond data retrieval.

What's Next?

In the next lesson, we will install LlamaIndex, configure an LLM provider, load our first document, and run a basic query.