Elasticsearch: The Hidden Cost of Segment Merging
Elasticsearch is built on Lucene, and Lucene is built on "Segments." When you index a document, it's written to an in-memory buffer and eventually flushed to a new segment on disk. Segments are immutable. This is great for caching but creates a problem: too many files.
The Merge Policy
To keep the number of segments under control, Elasticsearch runs a background process to merge smaller segments into larger ones. This is also when deleted documents are actually removed from disk.
// Tuning the merge policy in 0.90.x
PUT /my_index/_settings
{
"index.merge.policy.max_merge_at_once": 10,
"index.merge.policy.segments_per_tier": 10
}
The I/O Spike
Merging is I/O intensive. If you're indexing heavily, you might see "Merge Throttling" in your logs. Elasticsearch throttles indexing to give the merger room to breathe.
// Disabling throttling for initial bulk loads (careful!)
PUT /_cluster/settings
{
"transient": {
"indices.store.throttle.type": "none"
}
}
Segment Tiering
Lucene uses a tiered merge policy. It tries to merge segments that are of similar size. If you have 10 segments of ~10MB, it will merge them into one ~100MB segment.
However, if you have a massive merge (e.g., merging two 5GB segments), it can saturate your disk throughput and cause search latency to spike. In 2013, with SSDs still being relatively expensive for large clusters, tuning your max_combined_segment_size is the difference between a stable cluster and one that dies every afternoon.