Advanced

Memory Systems

Memory is what transforms a generic chatbot into a personal assistant. By remembering your preferences, past conversations, and important context, your assistant becomes more useful and personalized over time.

Types of Memory

Memory Type	Duration	Content	Storage
Conversation	Single session	Current conversation messages	In-memory array
Short-term	Hours to days	Recent interactions, current tasks	Cache or database
Long-term	Permanent	User profile, preferences, facts	Database, vector store
Episodic	Permanent	Past conversations, events, decisions	Vector store with metadata
Semantic	Permanent	Knowledge, facts, domain information	RAG system, knowledge base

User Profile Memory

The most impactful memory for a personal assistant is a user profile that grows over time:

Preferences: Communication style (brief vs detailed), work hours, timezone, dietary restrictions, travel preferences
Relationships: Key contacts with context (manager's name, spouse, frequent collaborators)
Ongoing projects: What the user is currently working on, deadlines, collaborators
Past decisions: Choices the user has made that inform future recommendations
Corrections: When the user corrects the assistant, store the correction to avoid repeating mistakes

💡

Automatic memory extraction: The best approach is to have the LLM automatically identify information worth remembering during conversations. After each conversation, run a secondary pass that extracts facts, preferences, and important context into structured memory.

Implementing Conversation Memory

For conversations that exceed the context window, you need a strategy:

Sliding window: Keep the most recent N messages in context. Simple but loses early context.
Summarization: Periodically summarize older messages into a condensed form. Balances context retention with token usage.
Retrieval-augmented: Store all messages in a vector database. Retrieve the most relevant past messages for each new query.
Hybrid: Keep recent messages in full, summarize medium-term history, and use retrieval for long-term history.

Vector Store for Long-Term Memory

Vector databases enable semantic search across all past interactions:

Python - Memory Storage and Retrieval

# Store a memory
def save_memory(text, metadata):
    embedding = embed_model.encode(text)
    vector_db.upsert(
        id=generate_id(),
        vector=embedding,
        metadata={
            "text": text,
            "type": metadata["type"],  # "preference", "fact", "event"
            "timestamp": datetime.now().isoformat(),
            **metadata
        }
    )

# Retrieve relevant memories
def recall(query, top_k=5):
    embedding = embed_model.encode(query)
    results = vector_db.query(vector=embedding, top_k=top_k)
    return [r.metadata["text"] for r in results]

Memory Management

Memory importance scoring: Not all memories are equally valuable. Score memories by frequency of access, recency, and relevance.
Conflict resolution: When new information contradicts old memories, update rather than duplicate. "User prefers tea" should replace "User prefers coffee."
Privacy controls: Give users the ability to view, edit, and delete specific memories. Provide a "forget this" command.
Memory decay: Reduce the weight of old, unaccessed memories over time to keep the most relevant information prominent.

✅

Start simple: Begin with a basic user profile stored as a JSON file that gets appended to the system prompt. Add vector search and sophisticated memory management only when you have enough conversation history to make it valuable.

← Previous Task Automation Next → Best Practices