AboutBlogContact
EngineeringMay 19, 2011 5 min read 26

Debugging Production Without APM: Logging Strategies Before New Relic Existed (2011)

AunimedaAunimeda
📋 Table of Contents

Debugging Production Without APM: Logging Strategies Before New Relic Existed (2011)

New Relic launched in 2008. Datadog in 2010. Sentry in 2010. But in 2011, "add an APM agent to your PHP app" was either expensive, immature, or something your team simply hadn't done yet.

When production broke, you had three things: SSH into the server, tail -f /var/log/apache2/error.log, and whatever you had decided to log beforehand. If you'd logged nothing useful, you were reconstructing the incident from access.log timestamps and guesswork.

We built a logging discipline out of necessity. Here's what it looked like.


The structured log format we landed on

Raw PHP error logs ([Thu May 19 14:32:01 2011] [error] [client 84.204.10.5] PHP Fatal error: ...) gave you the error but no context: which user, what they were doing, which state led there.

We moved to application-level structured logging before "structured logging" was a common term:

class AppLogger {
    private static $context = [];
    
    // Called at request start: attach session context to every log line
    public static function setContext(array $ctx): void {
        self::$context = $ctx;
    }
    
    public static function log(string $level, string $message, array $data = []): void {
        $entry = array_merge([
            'ts'      => date('c'),           // ISO 8601
            'level'   => $level,
            'msg'     => $message,
            'req_id'  => self::$context['req_id'] ?? null,
            'user_id' => self::$context['user_id'] ?? null,
            'url'     => $_SERVER['REQUEST_URI'] ?? null,
            'ip'      => $_SERVER['REMOTE_ADDR'] ?? null,
        ], $data);
        
        // One JSON line per log entry — grep-friendly
        error_log(json_encode($entry, JSON_UNESCAPED_UNICODE));
    }
    
    public static function info(string $msg, array $data = []): void  { self::log('INFO', $msg, $data); }
    public static function warn(string $msg, array $data = []): void  { self::log('WARN', $msg, $data); }
    public static function error(string $msg, array $data = []): void { self::log('ERROR', $msg, $data); }
}

// Bootstrap: attach request context
AppLogger::setContext([
    'req_id'  => substr(md5(uniqid()), 0, 8), // Short ID to correlate log lines per request
    'user_id' => $_SESSION['user_id'] ?? null,
]);

// Usage
AppLogger::info('Order created', ['order_id' => $order->id, 'amount' => $order->total]);
AppLogger::error('Payment failed', ['order_id' => $id, 'error' => $e->getMessage(), 'gateway' => 'paypal']);

The req_id was the key idea. Every log line from the same HTTP request shared an ID. grep "req_id\":\"a3f91c" /var/log/app.log showed the complete timeline for that single request — all SQL queries, all external calls, all decisions.


Slow query logging: the most valuable 10 lines

Most production incidents in 2011 were slow MySQL queries under load, not PHP errors. The error log showed nothing. The slow query log showed everything:

# /etc/mysql/my.cnf
[mysqld]
slow_query_log        = 1
slow_query_log_file   = /var/log/mysql/slow.log
long_query_time       = 0.5          # Log queries slower than 500ms
log_queries_not_using_indexes = 1    # Critical: catches full table scans
min_examined_row_limit = 100         # Avoid logging fast queries on tiny tables

Then in our deployment checklist: mysqldumpslow -s t -t 10 /var/log/mysql/slow.log — top 10 slowest queries by total time — ran before every production deploy and after every incident.

The second tool: EXPLAIN. Every query that appeared in slow logs:

EXPLAIN SELECT p.*, u.name 
FROM posts p 
JOIN users u ON p.user_id = u.id 
WHERE p.category_id = 5 
ORDER BY p.created_at DESC 
LIMIT 20;

-- If "type" column shows "ALL" → full table scan → missing index
-- If "rows" shows 50000+ → problem even with an index → query needs restructuring

The "heartbeat" endpoint

We added a /health endpoint to every application. Not for external monitoring services (we didn't have those yet) — for a cron job that hit it every 60 seconds and wrote to a local log.

// health.php — no authentication, read-only checks only
$checks = [];

// Database connectivity
try {
    $db->query("SELECT 1");
    $checks['db'] = 'ok';
} catch (Exception $e) {
    $checks['db'] = 'error: ' . $e->getMessage();
}

// Memcached
$mc = new Memcache();
$checks['cache'] = $mc->connect('127.0.0.1', 11211) ? 'ok' : 'error';

// Disk space
$free = disk_free_space('/');
$total = disk_total_space('/');
$checks['disk_pct'] = round(($free / $total) * 100);
$checks['disk_ok'] = $checks['disk_pct'] > 10; // Alert if < 10% free

header('Content-Type: application/json');
$allOk = !in_array('error', array_values($checks), true);
http_response_code($allOk ? 200 : 503);
echo json_encode(['status' => $allOk ? 'ok' : 'degraded', 'checks' => $checks]);
# crontab -e
* * * * * curl -s http://localhost/health >> /var/log/healthcheck.log 2>&1

When something broke we could run grep '"db":"error"' /var/log/healthcheck.log and see exactly when the database started failing. Primitive by modern standards. Exactly what we needed at the time.


Exception capture before Sentry

Sentry's PHP SDK existed in 2011 but wasn't widely used yet. We built a minimal version: uncaught exceptions wrote to a database table, and a daily cron emailed us the previous day's errors grouped by message.

// Global exception handler
set_exception_handler(function (Throwable $e) {
    $db->insert('error_log', [
        'message'    => $e->getMessage(),
        'file'       => $e->getFile(),
        'line'       => $e->getLine(),
        'trace'      => $e->getTraceAsString(),
        'url'        => $_SERVER['REQUEST_URI'] ?? '',
        'user_id'    => $_SESSION['user_id'] ?? null,
        'created_at' => date('Y-m-d H:i:s'),
    ]);
    
    // Show user a friendly error page, not a stack trace
    include 'views/500.php';
    exit;
});
// Daily digest cron
$errors = $db->query(
    "SELECT message, file, line, COUNT(*) as count
     FROM error_log 
     WHERE created_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
     GROUP BY message, file, line
     ORDER BY count DESC
     LIMIT 20"
)->fetchAll();

if ($errors) {
    mail('team@company.com', 'Daily Error Report', renderErrorDigest($errors));
}

This caught real bugs: a missing null check that threw on 0.1% of requests was invisible in normal testing but showed up as 40 occurrences in the daily digest.


The discipline that transferred

When modern APM tools came — New Relic, then Sentry, then Datadog — we adopted them immediately. But the habits built without them translated directly: think about what you'll need to know when something breaks in production, and log it before the incident, not after.

The specific tools have changed completely. The question hasn't: when this fails at 3am, what will I need in the logs to understand why? Answer that before you deploy, and debugging production becomes tractable instead of an archaeology expedition.

Read Also

15 Years in Tech: What Building Apps in 2010-2014 Still Teaches Us Todayaunimeda
Engineering

15 Years in Tech: What Building Apps in 2010-2014 Still Teaches Us Today

We spent 2010-2014 navigating PhoneGap, Node.js 0.6, Backbone.js, Hadoop clusters, and Bootstrap grids. Most of that stack is gone. But the reasoning behind those choices — the tradeoffs, the failure modes, the architecture instincts — still shows up in every project we build in 2025.

The jQuery Era (2008–2015): When One Library United the Webaunimeda
Engineering

The jQuery Era (2008–2015): When One Library United the Web

Before React, Vue, or Angular, there was jQuery. For seven years it was the answer to virtually every frontend problem. We wrote hundreds of thousands of lines of jQuery code. Here's what that era actually looked like — and what it taught us that still applies.

Surviving Traffic Spikes Before Auto-Scaling Existed: PHP in 2010aunimeda
Engineering

Surviving Traffic Spikes Before Auto-Scaling Existed: PHP in 2010

In 2010, when your app went viral it either held or it died. There was no Lambda to absorb the spike, no CDN edge compute, no managed database autoscaling. There was you, an SSH session, and whatever you'd built into the stack before the moment hit.

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

Get Consultation All articles