Intermediate
Building AI Assistants
A practical guide to building AI assistants using LLM APIs. From choosing the model and designing system prompts to implementing function calling and managing conversations.
Choosing the LLM
| Requirement | Recommended | Why |
|---|---|---|
| Best quality, safety-critical | Claude Sonnet 4 | Strong instruction following, safety, 200K context |
| Multimodal (images + audio) | GPT-4o | Native multimodal, fast, good quality |
| Long documents / large context | Gemini 2.5 Pro | Up to 2M token context window |
| High volume, low cost | GPT-4o mini / Gemini Flash | Cheapest quality models |
| Self-hosted / privacy | Llama 3.3 70B | Best open-weight model for the size |
System Prompt Design
The system prompt is the most important component of your assistant. It defines the assistant's identity, capabilities, boundaries, and behavior.
System Prompt - Customer Support Assistant
You are a customer support assistant for TechStore,
an online electronics retailer.
## Your Role
- Help customers with orders, returns, products, and
account questions
- Be friendly, professional, and efficient
- Always prioritize the customer's satisfaction
## Guidelines
- Use the customer's name when available
- For order issues, always look up the order first
- Never share other customers' information
- If you cannot resolve an issue, offer to escalate
to a human agent
- Do not make promises about refunds or replacements
without checking the policy tool first
## Boundaries
- Only discuss TechStore products and services
- Do not provide advice on competitors' products
- Do not engage in personal conversations
- If asked about topics outside your scope, politely
redirect to the relevant resource
## Tone
- Warm but professional
- Clear and concise
- Empathetic when customers are frustrated
Building with Anthropic Messages API
Python - Anthropic Assistant
import anthropic client = anthropic.Anthropic() class SupportAssistant: def __init__(self): self.model = "claude-sonnet-4-20250514" self.system = """You are a customer support assistant for TechStore...""" self.messages = [] self.tools = [ { "name": "lookup_order", "description": "Look up order details by order ID", "input_schema": { "type": "object", "properties": { "order_id": {"type": "string"} }, "required": ["order_id"] } }, { "name": "check_return_policy", "description": "Check the return policy for a product category", "input_schema": { "type": "object", "properties": { "category": {"type": "string"} }, "required": ["category"] } } ] def chat(self, user_message): self.messages.append({ "role": "user", "content": user_message }) # Agent loop for tool use while True: response = client.messages.create( model=self.model, max_tokens=1024, system=self.system, tools=self.tools, messages=self.messages ) self.messages.append({ "role": "assistant", "content": response.content }) if response.stop_reason != "tool_use": return response.content[0].text # Execute tools and return results results = [] for block in response.content: if block.type == "tool_use": result = self._run_tool(block) results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result }) self.messages.append({ "role": "user", "content": results }) # Usage assistant = SupportAssistant() reply = assistant.chat("Where is my order #12345?") print(reply)
Threading and Conversation Management
- Session management: Create unique session IDs for each conversation. Store messages per session.
- Context window management: Monitor token count. When approaching limits, summarize older messages or use a sliding window.
- Conversation state: Track metadata like customer ID, issue type, resolution status alongside the message history.
- Persistence: Store conversations in a database (PostgreSQL, Redis) for continuity and analytics.
File Handling
Modern assistants can process uploaded files:
- Documents: PDFs, Word docs, spreadsheets — extract text and pass to the LLM
- Images: Use multimodal models (GPT-4o, Claude, Gemini) to analyze images directly
- Code files: Parse and analyze source code for coding assistants
- Implementation: Extract content, chunk if needed, pass as context or use RAG
Start simple: Build the simplest version first — a system prompt, message handling, and one or two tools. Add complexity (RAG, file handling, multi-channel) only after the basic assistant works well. Premature complexity is the enemy of good assistant design.
Lilly Tech Systems