The premise in early 2013 was sensible: employees submit expense reports by photographing receipts with their phones. The app reads the receipt, extracts the amount, date, and vendor, and auto-fills the expense form.
The execution required understanding Optical Character Recognition (OCR) at a level we hadn't anticipated.
Tesseract OCR: The Open Source Standard
Tesseract was developed at Hewlett-Packard in the 1980s, open-sourced in 2005, and adopted by Google in 2006. By 2013, Tesseract 3.02 was the most accurate open-source OCR engine available.
We compiled it for iOS (Tesseract had no official iOS port; there were community Objective-C wrappers):
// Using tesseract-ios wrapper (2013)
#import "Tesseract.h"
@implementation ReceiptOCRProcessor
- (NSString *)extractTextFromImage:(UIImage *)image {
Tesseract *tesseract = [[Tesseract alloc] initWithLanguage:@"eng"];
// Set page segmentation mode
// PSM_AUTO = Tesseract decides the layout
// PSM_SINGLE_BLOCK = Treat as single block of text (good for receipts)
[tesseract setVariableValue:@"6" forKey:@"tessedit_pageseg_mode"];
// Whitelist characters (receipts have numbers, punctuation, basic letters)
// Restricting the character set improves accuracy significantly
[tesseract setVariableValue:@"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,/$:-()"
forKey:@"tessedit_char_whitelist"];
[tesseract setImage:image];
[tesseract recognize];
NSString *text = [tesseract recognizedText];
return text;
}
@end
Raw result on a real receipt photo (iPhone 5 camera, decent lighting):
TARGEf ST0RE #1823
1234 MAIN ST
ANYTOWN CA 90210
GROCERIES
MILK 2% 1GAL 3.49
BREAD WHOLE WHEAT 2.29
EGGS LARGE DOZ 4.19
APPLES FUJI 3LB 3.99
SUBTOTAL 13.96
TAX 8.5% 1.19
TOTAL S15.15
VISA *4231
APPROVED
"TARGEf" instead of "TARGET". "S15.15" instead of "$15.15". 68% character accuracy on 20 test receipts - meaning roughly 1 in 3 characters was wrong or missing. Not usable as raw output.
The Preprocessing Pipeline
OCR accuracy depends heavily on image quality. Tesseract was designed for scanned documents - horizontal, high-resolution, even lighting. Phone camera photos of receipts were none of these things: skewed, low-contrast, variable lighting, sometimes crumpled.
We built a preprocessing pipeline in OpenCV (which we also compiled for iOS):
#import <opencv2/opencv.hpp>
@implementation ImagePreprocessor
- (UIImage *)preprocessForOCR:(UIImage *)inputImage {
// Convert UIImage to OpenCV Mat
cv::Mat mat;
UIImageToMat(inputImage, mat);
// Step 1: Convert to grayscale
cv::Mat gray;
cv::cvtColor(mat, gray, cv::COLOR_RGB2GRAY);
// Step 2: Increase resolution if too small
// Tesseract works best at 300+ DPI; scale up small images
if (gray.cols < 1000) {
double scale = 1000.0 / gray.cols;
cv::resize(gray, gray, cv::Size(), scale, scale, cv::INTER_CUBIC);
}
// Step 3: Deskewing - detect and correct rotation
gray = [self deskewImage:gray];
// Step 4: Adaptive thresholding
// Converts grayscale to black-and-white, handling uneven lighting
cv::Mat thresh;
cv::adaptiveThreshold(
gray, thresh, 255,
cv::ADAPTIVE_THRESH_GAUSSIAN_C,
cv::THRESH_BINARY,
11, // Block size: neighborhood area for threshold calculation
2 // C: constant subtracted from mean
);
// Step 5: Noise removal
cv::Mat denoised;
cv::medianBlur(thresh, denoised, 3);
return MatToUIImage(denoised);
}
- (cv::Mat)deskewImage:(cv::Mat)image {
// Find edges
cv::Mat edges;
cv::Canny(image, edges, 50, 150, 3);
// Hough line transform - find dominant lines
std::vector<cv::Vec4i> lines;
cv::HoughLinesP(edges, lines, 1, CV_PI/180, 50, 50, 10);
if (lines.empty()) return image;
// Calculate average angle of detected lines
double totalAngle = 0;
int count = 0;
for (auto& line : lines) {
double angle = atan2(line[3] - line[1], line[2] - line[0]) * 180 / CV_PI;
// Only consider near-horizontal lines (receipts are landscape text)
if (fabs(angle) < 15) {
totalAngle += angle;
count++;
}
}
if (count == 0) return image;
double avgAngle = totalAngle / count;
// Rotate to correct the skew
cv::Point2f center(image.cols / 2.0, image.rows / 2.0);
cv::Mat rotationMatrix = cv::getRotationMatrix2D(center, avgAngle, 1.0);
cv::Mat rotated;
cv::warpAffine(image, rotated, rotationMatrix, image.size(),
cv::INTER_CUBIC, cv::BORDER_REPLICATE);
return rotated;
}
@end
After preprocessing, character accuracy improved from 68% to 81%. Not perfect, but better.
Extracting Structured Data
Raw OCR text was unstructured. We needed to extract specific fields: date, total amount, vendor name.
# Python post-processing (ran on server after uploading OCR text from iOS)
import re
from datetime import datetime
def extract_receipt_data(ocr_text):
"""Extract structured data from raw OCR text."""
result = {
'vendor': None,
'date': None,
'total': None,
'subtotal': None,
'items': []
}
lines = [l.strip() for l in ocr_text.split('\n') if l.strip()]
# Vendor: typically the first non-empty line
if lines:
result['vendor'] = lines[0]
# Date patterns: various formats on receipts
date_patterns = [
r'\b(\d{1,2})[/\-](\d{1,2})[/\-](\d{2,4})\b', # 12/25/2013
r'\b(\d{4})[/\-](\d{1,2})[/\-](\d{1,2})\b', # 2013-12-25
r'\b(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\.?\s+(\d{1,2}),?\s+(\d{4})\b',
]
for line in lines:
for pattern in date_patterns:
match = re.search(pattern, line, re.IGNORECASE)
if match:
result['date'] = line # Store the matched line for review
break
if result['date']:
break
# Total amount: look for "TOTAL" keyword followed by dollar amount
total_patterns = [
r'(?:total|amount due|balance due)[:\s]+\$?\s*(\d+[\.,]\d{2})',
r'(?:grand total)[:\s]+\$?\s*(\d+[\.,]\d{2})',
]
full_text = ' '.join(lines).lower()
for pattern in total_patterns:
match = re.search(pattern, full_text, re.IGNORECASE)
if match:
amount_str = match.group(1).replace(',', '.')
result['total'] = float(amount_str)
break
# If no explicit total found, look for largest dollar amount
# (often the total on receipts)
if not result['total']:
amounts = re.findall(r'\$?\s*(\d+\.\d{2})', full_text)
if amounts:
result['total'] = max(float(a) for a in amounts)
return result
The Error Correction Layer
81% OCR accuracy on characters meant some field extractions were still wrong. We added a human review step - not manual entry, but confirmation:
// iOS: Show extracted data for user confirmation
- (void)showExtractionConfirmation:(NSDictionary *)extracted {
// Pre-fill the form with extracted data
self.vendorField.text = extracted[@"vendor"] ?: @"";
self.amountField.text = extracted[@"total"] ?
[NSString stringWithFormat:@"%.2f", [extracted[@"total"] floatValue]] : @"";
self.dateField.text = extracted[@"date"] ?: @"";
// Highlight fields that have low confidence
if (!extracted[@"total"]) {
[self highlightFieldAsNeedsReview:self.amountField];
}
// Show the original photo alongside for comparison
self.receiptImageView.image = self.capturedImage;
// User confirms or edits before submitting
[self showConfirmationView];
}
This pattern - machine extraction + human confirmation - increased user acceptance. Users trusted the app because they could see the photo and verify the extracted data. The confirmation step also collected training data: corrections became labeled examples for improving the model.
What Came After
By 2016, Google Cloud Vision API and Amazon Textract made receipt OCR trivial. Send a photo to an API endpoint, receive structured JSON with extracted text, bounding boxes, and confidence scores. Accuracy was 95%+.
By 2020, on-device ML models (Core ML on iOS, TensorFlow Lite on Android) could run receipt OCR entirely offline with accuracy matching the 2016 cloud APIs.
The 2013 system - Tesseract, OpenCV preprocessing, regex extraction, human confirmation - was replaced by one API call.
The learning wasn't wasted. Understanding image preprocessing taught us why quality matters more than algorithm for computer vision tasks. Understanding Tesseract's failures (poor performance on cursive fonts, low contrast, skewed text) made us informed users of the APIs that replaced it. "Garbage in, garbage out" applies to ML systems at every level of sophistication.
Aunimeda builds AI-powered solutions - chatbots, AI agents, voice assistants, and automation systems for businesses.
Contact us to discuss AI integration for your business. See also: AI Solutions, AI Agents, Chatbot Development