How to Build an AI Chatbot with Claude or GPT-4o in 2026
Building a chatbot that actually works in production is very different from the demo you built in 30 minutes. Here's what the production version requires.
Choosing a Model: Claude vs GPT-4o
Claude (Anthropic):
- Stronger on instruction-following and refusing harmful requests
- Better for business contexts where consistency matters
- Available in Russian with excellent quality
- Context window: 200K tokens (Claude 3.5+)
- Better for long document analysis
GPT-4o (OpenAI):
- Faster on average
- Slightly better tool use reliability in complex chains
- Vision input support
- Large ecosystem of documentation
For most business chatbots: both work. Use Claude for CIS markets (better Russian), GPT-4o for international.
Prompt Engineering That Actually Works
System prompt structure
You are [ROLE] for [COMPANY_NAME].
Your job is to [PRIMARY_TASK].
Rules:
- [CONSTRAINT 1]
- [CONSTRAINT 2]
- If asked about [X], say [Y]
- Never [PROHIBITED_ACTION]
Knowledge:
[COMPANY_SPECIFIC_FACTS]
Be specific. "You are a helpful assistant" is useless. "You are a sales consultant for Aunimeda Software. You help potential clients understand our services and pricing. You do not discuss competitor pricing." - this works.
Temperature
- 0.0–0.3 for factual Q&A, support bots
- 0.5–0.7 for conversational, natural-feeling responses
- 0.8+ for creative content only
Context Management
The biggest production problem: conversations get expensive and slow as they grow.
Strategy 1: Sliding window. Keep last N messages. Simple, loses older context.
Strategy 2: Summary compression. When conversation exceeds threshold, summarize older messages into compact form, keep summary + recent messages.
Strategy 3: RAG (Retrieval-Augmented Generation). Store conversation history in vector database (Pinecone, Qdrant), retrieve semantically relevant past context. Best for long-running relationships with customers.
Tool Use: How Agents Take Actions
Modern LLMs can call functions/tools you define. Pattern:
const tools = [{
name: "check_order_status",
description: "Get the current status of a customer order",
parameters: {
type: "object",
properties: {
order_id: { type: "string", description: "The order ID" }
},
required: ["order_id"]
}
}];
// LLM decides when to call this tool based on conversation
// Your code executes the actual function and returns result to LLM
The LLM doesn't execute code - it signals intent, your backend executes, you return results. This is how agents query databases, send emails, update CRMs.
Production Checklist
- Rate limiting (per-user message limits)
- Content filtering for harmful outputs
- Fallback when API is unavailable
- Logging all conversations (legal/audit requirements)
- User feedback mechanism ("Was this helpful?")
- Cost monitoring - LLM costs scale with usage
- PII detection before logging
Cost at Scale
Claude Sonnet 4.6: ~$3 per 1M input tokens, ~$15 per 1M output tokens.
Average conversation: ~2K tokens total → $0.006–0.03 per conversation.
At 1,000 conversations/day: $6–30/day, $180–900/month.
For most business chatbots: negligible. For high-volume consumer apps: design for efficiency.