Elasticsearch: The Hidden Cost of Segment Merging

Elasticsearch is built on Lucene, and Lucene is built on "Segments." When you index a document, it's written to an in-memory buffer and eventually flushed to a new segment on disk. Segments are immutable. This is great for caching but creates a problem: too many files.

The Merge Policy

To keep the number of segments under control, Elasticsearch runs a background process to merge smaller segments into larger ones. This is also when deleted documents are actually removed from disk.

// Tuning the merge policy in 0.90.x
PUT /my_index/_settings
{
  "index.merge.policy.max_merge_at_once": 10,
  "index.merge.policy.segments_per_tier": 10
}

The I/O Spike

Merging is I/O intensive. If you're indexing heavily, you might see "Merge Throttling" in your logs. Elasticsearch throttles indexing to give the merger room to breathe.

// Disabling throttling for initial bulk loads (careful!)
PUT /_cluster/settings
{
  "transient": {
    "indices.store.throttle.type": "none"
  }
}

Segment Tiering

Lucene uses a tiered merge policy. It tries to merge segments that are of similar size. If you have 10 segments of ~10MB, it will merge them into one ~100MB segment.

However, if you have a massive merge (e.g., merging two 5GB segments), it can saturate your disk throughput and cause search latency to spike. In 2013, with SSDs still being relatively expensive for large clusters, tuning your max_combined_segment_size is the difference between a stable cluster and one that dies every afternoon.

Elasticsearch: The Hidden Cost of Segment Merging (2013)

Elasticsearch: The Hidden Cost of Segment Merging

The Merge Policy

The I/O Spike

Segment Tiering

Aunimeda

Need IT development for your business?

Elasticsearch: The Hidden Cost of Segment Merging (2013)

Elasticsearch: The Hidden Cost of Segment Merging

The Merge Policy

The I/O Spike

Segment Tiering

Aunimeda

Read Also

Mojo: AI-Native Programming and Language Features (2024)

Immutable.js: Persistent Data Structures for JavaScript (2014)

Dart: Google's Early Vision for a Structured Web (2011)

Need IT development for your business?