Early NLP: Building Basic Chatbots Before the LLM Era

#nlp#chatbot#natural language processing#machine learning#naive bayes#2013#ai

📋 Table of Contents ▼

In 2013 a retail client asked us to build an automated customer support system. They had 3 support agents handling 200-300 identical questions per day: "Where is my order?", "How do I return this?", "What are your store hours?"

They didn't need AI. They needed automation. We built it. In doing so, we went deeper into Natural Language Processing than we'd expected.

The Rule-Based Layer

The simplest approach first: pattern matching. If the message matches a pattern, return a canned response.

import re

class PatternMatcher:
    """
    Rule-based pattern matching for FAQ responses.
    Handles the 80% of questions that have simple, fixed answers.
    """
    
    def __init__(self):
        # Rules: (compiled_regex, response, confidence)
        self.rules = [
            (
                re.compile(r'\b(where|status|track|tracking)\b.*\border\b', re.IGNORECASE),
                "To track your order, please visit our order tracking page at /orders or provide your order number.",
                0.95
            ),
            (
                re.compile(r'\b(return|refund|exchange|send back)\b', re.IGNORECASE),
                "Our return policy allows returns within 30 days of purchase. Visit /returns to start a return.",
                0.90
            ),
            (
                re.compile(r'\b(hours?|open|close|closing|opening)\b', re.IGNORECASE),
                "We are open Monday-Friday 9am-6pm and Saturday 10am-4pm. Closed Sundays.",
                0.95
            ),
            (
                re.compile(r'\b(shipping|delivery|ship|deliver|how long)\b', re.IGNORECASE),
                "Standard shipping takes 3-5 business days. Express shipping (1-2 days) is available at checkout.",
                0.88
            ),
        ]
    
    def match(self, text):
        matches = []
        for pattern, response, confidence in self.rules:
            if pattern.search(text):
                matches.append({
                    'response': response,
                    'confidence': confidence
                })
        
        if not matches:
            return None
        
        # Return highest-confidence match
        return max(matches, key=lambda x: x['confidence'])

This handled ~60% of incoming messages with high accuracy. The remaining 40% required understanding intent from more varied phrasing.

Naive Bayes Intent Classification

For messages that didn't match fixed patterns, we trained a Naive Bayes classifier on labeled examples. The math: given the words in a message, what's the probability it belongs to each intent class?

import math
from collections import defaultdict

class NaiveBayesClassifier:
    """
    Text classification using Naive Bayes.
    Trained on (message, intent) pairs.
    """
    
    def __init__(self):
        self.word_counts = defaultdict(lambda: defaultdict(int))
        self.class_counts = defaultdict(int)
        self.vocabulary = set()
        self.total_docs = 0
    
    def tokenize(self, text):
        """Simple tokenization: lowercase, split on non-alpha, remove stops."""
        stop_words = {'i', 'me', 'my', 'the', 'a', 'an', 'is', 'are', 'was',
                     'be', 'been', 'have', 'has', 'do', 'does', 'to', 'of', 
                     'and', 'or', 'but', 'in', 'on', 'at', 'for', 'with'}
        words = re.findall(r'[a-z]+', text.lower())
        return [w for w in words if w not in stop_words and len(w) > 1]
    
    def train(self, text, label):
        """Add a training example."""
        tokens = self.tokenize(text)
        self.class_counts[label] += 1
        self.total_docs += 1
        for token in tokens:
            self.word_counts[label][token] += 1
            self.vocabulary.add(token)
    
    def predict(self, text):
        """Return the most likely class and probabilities for all classes."""
        tokens = self.tokenize(text)
        V = len(self.vocabulary)
        
        scores = {}
        for label, count in self.class_counts.items():
            # Prior probability: log(count/total)
            score = math.log(count / self.total_docs)
            
            # Get total words for this class
            total_words = sum(self.word_counts[label].values())
            
            for token in tokens:
                # Laplace smoothing: add 1 to avoid log(0)
                word_count = self.word_counts[label].get(token, 0) + 1
                score += math.log(word_count / (total_words + V))
            
            scores[label] = score
        
        # Return class with highest score
        best_label = max(scores, key=scores.get)
        return best_label, scores


# Training the classifier
classifier = NaiveBayesClassifier()

training_data = [
    ("I never received my package", "missing_package"),
    ("my order hasn't arrived", "missing_package"),
    ("package missing", "missing_package"),
    ("where is my delivery", "order_status"),
    ("track my shipment", "order_status"),
    ("when will my order come", "order_status"),
    ("wrong item received", "wrong_item"),
    ("you sent me the wrong product", "wrong_item"),
    ("this is not what I ordered", "wrong_item"),
    ("item is broken", "damaged_item"),
    ("product arrived damaged", "damaged_item"),
    ("it arrived in bad condition", "damaged_item"),
    ("how do I cancel my order", "cancel_order"),
    ("I want to cancel", "cancel_order"),
]

for text, intent in training_data:
    classifier.train(text, intent)

With 20-30 examples per intent class, accuracy on test data was ~72% - not reliable enough to use alone, but good enough as a routing layer to the pattern matcher.

The Response Pipeline

Combining rule-based matching and ML classification:

class ChatbotEngine:
    
    def __init__(self):
        self.pattern_matcher = PatternMatcher()
        self.classifier = NaiveBayesClassifier()
        self._load_classifier_training_data()
        
        # Intent to response templates
        self.intent_responses = {
            'missing_package': (
                "I'm sorry your package hasn't arrived. "
                "Please provide your order number and I'll look into it right away."
            ),
            'wrong_item': (
                "I apologize for sending the wrong item. "
                "Please reply with your order number and a photo of what you received, "
                "and we'll ship the correct item immediately."
            ),
            'damaged_item': (
                "I'm sorry your item arrived damaged. "
                "Please send your order number and a photo of the damage, "
                "and we'll arrange a replacement or refund."
            ),
            'cancel_order': (
                "To cancel your order, please provide your order number. "
                "Please note: orders can only be cancelled within 2 hours of placement."
            ),
            'order_status': None,  # Handled by pattern matcher
        }
    
    def process(self, message, session_context=None):
        # Step 1: Try pattern matching first (high precision)
        pattern_match = self.pattern_matcher.match(message)
        if pattern_match and pattern_match['confidence'] > 0.90:
            return {
                'response': pattern_match['response'],
                'confidence': pattern_match['confidence'],
                'source': 'pattern',
                'escalate': False
            }
        
        # Step 2: ML classification
        intent, scores = self.classifier.predict(message)
        
        # Get confidence as relative score
        score_values = list(scores.values())
        max_score = max(score_values)
        second_score = sorted(score_values)[-2] if len(score_values) > 1 else float('-inf')
        confidence = 1 - (second_score / max_score) if max_score != 0 else 0
        
        # Step 3: Decide whether to respond or escalate
        if confidence < 0.6 or intent not in self.intent_responses:
            return {
                'response': "Let me connect you with a human agent who can help with this.",
                'confidence': confidence,
                'source': 'escalation',
                'escalate': True,
                'detected_intent': intent
            }
        
        response_template = self.intent_responses.get(intent)
        return {
            'response': response_template,
            'confidence': confidence,
            'source': 'ml',
            'escalate': False,
            'detected_intent': intent
        }

Entity Extraction

Responses like "provide your order number" were only useful if the chatbot could also extract the order number from the user's reply:

def extract_entities(text):
    """Extract structured information from text."""
    entities = {}
    
    # Order number: format ORD-XXXXX or just 5-digit number
    order_pattern = re.search(r'\b(?:ORD-)?(\d{5,8})\b', text)
    if order_pattern:
        entities['order_id'] = order_pattern.group(1)
    
    # Email address
    email_pattern = re.search(r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b', text)
    if email_pattern:
        entities['email'] = email_pattern.group(0)
    
    # Phone number
    phone_pattern = re.search(r'[\+]?[\d\s\-\(\)]{10,}', text)
    if phone_pattern:
        phone = re.sub(r'[^\d+]', '', phone_pattern.group(0))
        if len(phone) >= 10:
            entities['phone'] = phone
    
    return entities

The full conversation with entity extraction:

User: "my package hasn't arrived"
Bot: "I'm sorry. Please provide your order number."

User: "it's 84721"
Bot: *extracts order_id: "84721"*
     "Looking up order 84721..."
     *queries order API*
     "Your order #84721 shipped on Nov 10 and is expected to arrive November 14. 
      Current status: In transit, last scan in Almaty at 9:14am."

The Metrics After 3 Months

Metric	Before chatbot	After chatbot
Messages requiring human response	280/day	68/day
Average response time	4.2 hours	8 seconds (bot) / 2.8 hours (escalated)
Customer satisfaction (1-5)	3.1	3.8
Messages correctly handled by bot	-	71%
False escalations (bot could have answered)	-	11%
Incorrect bot responses	-	6%

The 6% incorrect response rate was the sensitive metric. Wrong answers damaged trust more than slow answers. We tuned the confidence thresholds to escalate more aggressively - bringing incorrect responses to 2% but escalations up to 18%.

What LLMs Changed

In 2023, a GPT-4 powered chatbot does what our 2013 system did - but without the training data, without the pattern library, without the classifier tuning. It understands context across multiple turns without explicit session management. It handles language variations we never anticipated.

The 2013 system took 6 weeks to build, 3 months to tune, and handled 71% of cases. A 2023 LLM-based system takes 2 days to deploy and handles 90%+ of cases out of the box.

What the 2013 work taught: the problem was always decomposable - intent classification, entity extraction, response generation, escalation logic. LLMs solve all four simultaneously. But understanding the decomposition makes you a better user of LLMs: you know what to test (intent accuracy, entity extraction, edge cases), what to tune (confidence thresholds, escalation triggers), and what to measure (incorrect response rate, unnecessary escalations).

Aunimeda builds AI-powered solutions - chatbots, AI agents, voice assistants, and automation systems for businesses.

Early NLP: Building Basic Chatbots Before the LLM Era

The Rule-Based Layer

Naive Bayes Intent Classification

The Response Pipeline

Entity Extraction

The Metrics After 3 Months

What LLMs Changed

Aunimeda

Read Also

Vector Databases in Production: pgvector, Pinecone, and When Semantic Search Actually Matters

LLM in Production: How to Cut Your AI API Costs by 80% Without Degrading Quality

Serverless AI: Streaming Claude and OpenAI Responses in Next.js 15 via Edge Runtime

Need IT development for your business?