Advanced

Memory Systems

Memory is what transforms a generic chatbot into a personal assistant. By remembering your preferences, past conversations, and important context, your assistant becomes more useful and personalized over time.

Types of Memory

Memory TypeDurationContentStorage
ConversationSingle sessionCurrent conversation messagesIn-memory array
Short-termHours to daysRecent interactions, current tasksCache or database
Long-termPermanentUser profile, preferences, factsDatabase, vector store
EpisodicPermanentPast conversations, events, decisionsVector store with metadata
SemanticPermanentKnowledge, facts, domain informationRAG system, knowledge base

User Profile Memory

The most impactful memory for a personal assistant is a user profile that grows over time:

  • Preferences: Communication style (brief vs detailed), work hours, timezone, dietary restrictions, travel preferences
  • Relationships: Key contacts with context (manager's name, spouse, frequent collaborators)
  • Ongoing projects: What the user is currently working on, deadlines, collaborators
  • Past decisions: Choices the user has made that inform future recommendations
  • Corrections: When the user corrects the assistant, store the correction to avoid repeating mistakes
💡
Automatic memory extraction: The best approach is to have the LLM automatically identify information worth remembering during conversations. After each conversation, run a secondary pass that extracts facts, preferences, and important context into structured memory.

Implementing Conversation Memory

For conversations that exceed the context window, you need a strategy:

  • Sliding window: Keep the most recent N messages in context. Simple but loses early context.
  • Summarization: Periodically summarize older messages into a condensed form. Balances context retention with token usage.
  • Retrieval-augmented: Store all messages in a vector database. Retrieve the most relevant past messages for each new query.
  • Hybrid: Keep recent messages in full, summarize medium-term history, and use retrieval for long-term history.

Vector Store for Long-Term Memory

Vector databases enable semantic search across all past interactions:

Python - Memory Storage and Retrieval
# Store a memory
def save_memory(text, metadata):
    embedding = embed_model.encode(text)
    vector_db.upsert(
        id=generate_id(),
        vector=embedding,
        metadata={
            "text": text,
            "type": metadata["type"],  # "preference", "fact", "event"
            "timestamp": datetime.now().isoformat(),
            **metadata
        }
    )

# Retrieve relevant memories
def recall(query, top_k=5):
    embedding = embed_model.encode(query)
    results = vector_db.query(vector=embedding, top_k=top_k)
    return [r.metadata["text"] for r in results]

Memory Management

  • Memory importance scoring: Not all memories are equally valuable. Score memories by frequency of access, recency, and relevance.
  • Conflict resolution: When new information contradicts old memories, update rather than duplicate. "User prefers tea" should replace "User prefers coffee."
  • Privacy controls: Give users the ability to view, edit, and delete specific memories. Provide a "forget this" command.
  • Memory decay: Reduce the weight of old, unaccessed memories over time to keep the most relevant information prominent.
Start simple: Begin with a basic user profile stored as a JSON file that gets appended to the system prompt. Add vector search and sophisticated memory management only when you have enough conversation history to make it valuable.