Debugging Production Without APM: Logging Strategies

#logging#debugging#production#devops#php#2011#retrospective#monitoring

📋 Table of Contents ▼

New Relic launched in 2008. Datadog in 2010. Sentry in 2010. But in 2011, "add an APM agent to your PHP app" was either expensive, immature, or something your team simply hadn't done yet.

When production broke, you had three things: SSH into the server, tail -f /var/log/apache2/error.log, and whatever you had decided to log beforehand. If you'd logged nothing useful, you were reconstructing the incident from access.log timestamps and guesswork.

We built a logging discipline out of necessity. Here's what it looked like.

The structured log format we landed on

Raw PHP error logs ([Thu May 19 14:32:01 2011] [error] [client 84.204.10.5] PHP Fatal error: ...) gave you the error but no context: which user, what they were doing, which state led there.

We moved to application-level structured logging before "structured logging" was a common term:

class AppLogger {
    private static $context = [];
    
    // Called at request start: attach session context to every log line
    public static function setContext(array $ctx): void {
        self::$context = $ctx;
    }
    
    public static function log(string $level, string $message, array $data = []): void {
        $entry = array_merge([
            'ts'      => date('c'),           // ISO 8601
            'level'   => $level,
            'msg'     => $message,
            'req_id'  => self::$context['req_id'] ?? null,
            'user_id' => self::$context['user_id'] ?? null,
            'url'     => $_SERVER['REQUEST_URI'] ?? null,
            'ip'      => $_SERVER['REMOTE_ADDR'] ?? null,
        ], $data);
        
        // One JSON line per log entry - grep-friendly
        error_log(json_encode($entry, JSON_UNESCAPED_UNICODE));
    }
    
    public static function info(string $msg, array $data = []): void  { self::log('INFO', $msg, $data); }
    public static function warn(string $msg, array $data = []): void  { self::log('WARN', $msg, $data); }
    public static function error(string $msg, array $data = []): void { self::log('ERROR', $msg, $data); }
}

// Bootstrap: attach request context
AppLogger::setContext([
    'req_id'  => substr(md5(uniqid()), 0, 8), // Short ID to correlate log lines per request
    'user_id' => $_SESSION['user_id'] ?? null,
]);

// Usage
AppLogger::info('Order created', ['order_id' => $order->id, 'amount' => $order->total]);
AppLogger::error('Payment failed', ['order_id' => $id, 'error' => $e->getMessage(), 'gateway' => 'paypal']);

The req_id was the key idea. Every log line from the same HTTP request shared an ID. grep "req_id\":\"a3f91c" /var/log/app.log showed the complete timeline for that single request - all SQL queries, all external calls, all decisions.

Slow query logging: the most valuable 10 lines

Most production incidents in 2011 were slow MySQL queries under load, not PHP errors. The error log showed nothing. The slow query log showed everything:

# /etc/mysql/my.cnf
[mysqld]
slow_query_log        = 1
slow_query_log_file   = /var/log/mysql/slow.log
long_query_time       = 0.5          # Log queries slower than 500ms
log_queries_not_using_indexes = 1    # Critical: catches full table scans
min_examined_row_limit = 100         # Avoid logging fast queries on tiny tables

Then in our deployment checklist: mysqldumpslow -s t -t 10 /var/log/mysql/slow.log - top 10 slowest queries by total time - ran before every production deploy and after every incident.

The second tool: EXPLAIN. Every query that appeared in slow logs:

EXPLAIN SELECT p.*, u.name 
FROM posts p 
JOIN users u ON p.user_id = u.id 
WHERE p.category_id = 5 
ORDER BY p.created_at DESC 
LIMIT 20;

-- If "type" column shows "ALL" → full table scan → missing index
-- If "rows" shows 50000+ → problem even with an index → query needs restructuring

The "heartbeat" endpoint

We added a /health endpoint to every application. Not for external monitoring services (we didn't have those yet) - for a cron job that hit it every 60 seconds and wrote to a local log.

// health.php - no authentication, read-only checks only
$checks = [];

// Database connectivity
try {
    $db->query("SELECT 1");
    $checks['db'] = 'ok';
} catch (Exception $e) {
    $checks['db'] = 'error: ' . $e->getMessage();
}

// Memcached
$mc = new Memcache();
$checks['cache'] = $mc->connect('127.0.0.1', 11211) ? 'ok' : 'error';

// Disk space
$free = disk_free_space('/');
$total = disk_total_space('/');
$checks['disk_pct'] = round(($free / $total) * 100);
$checks['disk_ok'] = $checks['disk_pct'] > 10; // Alert if < 10% free

header('Content-Type: application/json');
$allOk = !in_array('error', array_values($checks), true);
http_response_code($allOk ? 200 : 503);
echo json_encode(['status' => $allOk ? 'ok' : 'degraded', 'checks' => $checks]);

# crontab -e
* * * * * curl -s http://localhost/health >> /var/log/healthcheck.log 2>&1

When something broke we could run grep '"db":"error"' /var/log/healthcheck.log and see exactly when the database started failing. Primitive by modern standards. Exactly what we needed at the time.

Exception capture before Sentry

Sentry's PHP SDK existed in 2011 but wasn't widely used yet. We built a minimal version: uncaught exceptions wrote to a database table, and a daily cron emailed us the previous day's errors grouped by message.

// Global exception handler
set_exception_handler(function (Throwable $e) {
    $db->insert('error_log', [
        'message'    => $e->getMessage(),
        'file'       => $e->getFile(),
        'line'       => $e->getLine(),
        'trace'      => $e->getTraceAsString(),
        'url'        => $_SERVER['REQUEST_URI'] ?? '',
        'user_id'    => $_SESSION['user_id'] ?? null,
        'created_at' => date('Y-m-d H:i:s'),
    ]);
    
    // Show user a friendly error page, not a stack trace
    include 'views/500.php';
    exit;
});

// Daily digest cron
$errors = $db->query(
    "SELECT message, file, line, COUNT(*) as count
     FROM error_log 
     WHERE created_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
     GROUP BY message, file, line
     ORDER BY count DESC
     LIMIT 20"
)->fetchAll();

if ($errors) {
    mail('team@company.com', 'Daily Error Report', renderErrorDigest($errors));
}

This caught real bugs: a missing null check that threw on 0.1% of requests was invisible in normal testing but showed up as 40 occurrences in the daily digest.

The discipline that transferred

When modern APM tools came - New Relic, then Sentry, then Datadog - we adopted them immediately. But the habits built without them translated directly: think about what you'll need to know when something breaks in production, and log it before the incident, not after.

The specific tools have changed completely. The question hasn't: when this fails at 3am, what will I need in the logs to understand why? Answer that before you deploy, and debugging production becomes tractable instead of an archaeology expedition.

Aunimeda builds production-grade backend systems - APIs, microservices, real-time applications, and system integrations.

Debugging Production Without APM: Logging Strategies Before New Relic Existed (2011)

The structured log format we landed on

Slow query logging: the most valuable 10 lines

The "heartbeat" endpoint

Exception capture before Sentry

The discipline that transferred

Aunimeda

Need IT development for your business?

Debugging Production Without APM: Logging Strategies Before New Relic Existed (2011)

The structured log format we landed on

Slow query logging: the most valuable 10 lines

The "heartbeat" endpoint

Exception capture before Sentry

The discipline that transferred

Aunimeda

Read Also

Node.js + TypeScript: Building a Production REST API from Scratch in 2026

15 Years in Tech: What Building Apps in 2010-2014 Still Teaches Us Today

How to Use Redis for Caching in PHP: Cutting Response Times from 800ms to 40ms (2015)

Need IT development for your business?