AboutBlogContact
TechnologyMay 22, 2013 2 min read 35

Elasticsearch: The Hidden Cost of Segment Merging (2013)

AunimedaAunimeda
📋 Table of Contents

Elasticsearch: The Hidden Cost of Segment Merging

Elasticsearch is built on Lucene, and Lucene is built on "Segments." When you index a document, it's written to an in-memory buffer and eventually flushed to a new segment on disk. Segments are immutable. This is great for caching but creates a problem: too many files.

The Merge Policy

To keep the number of segments under control, Elasticsearch runs a background process to merge smaller segments into larger ones. This is also when deleted documents are actually removed from disk.

// Tuning the merge policy in 0.90.x
PUT /my_index/_settings
{
  "index.merge.policy.max_merge_at_once": 10,
  "index.merge.policy.segments_per_tier": 10
}

The I/O Spike

Merging is I/O intensive. If you're indexing heavily, you might see "Merge Throttling" in your logs. Elasticsearch throttles indexing to give the merger room to breathe.

// Disabling throttling for initial bulk loads (careful!)
PUT /_cluster/settings
{
  "transient": {
    "indices.store.throttle.type": "none"
  }
}

Segment Tiering

Lucene uses a tiered merge policy. It tries to merge segments that are of similar size. If you have 10 segments of ~10MB, it will merge them into one ~100MB segment.

However, if you have a massive merge (e.g., merging two 5GB segments), it can saturate your disk throughput and cause search latency to spike. In 2013, with SSDs still being relatively expensive for large clusters, tuning your max_combined_segment_size is the difference between a stable cluster and one that dies every afternoon.

Read Also

Mojo: AI-Native Programming and Language Features (2024)aunimeda
Technology

Mojo: AI-Native Programming and Language Features (2024)

Is Python finally being replaced? Mojo combines the usability of Python with the performance of C++. Let's explore its unique ownership system.

Immutable.js: Persistent Data Structures for JavaScript (2014)aunimeda
Technology

Immutable.js: Persistent Data Structures for JavaScript (2014)

React is teaching us about one-way data flow. Immutable.js from Facebook gives us the tools to make it efficient.

Dart: Google's Early Vision for a Structured Web (2011)aunimeda
Technology

Dart: Google's Early Vision for a Structured Web (2011)

Google just unveiled Dart at GOTO Aarhus. Is this the language that will finally replace JavaScript in the browser?

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

Get Consultation All articles