HBase: Anatomy of a Region Split (2008)

#HBase#Hadoop#NoSQL#Java

📋 Table of Contents ▼

HBase: Anatomy of a Region Split

In HBase, data is partitioned into "Regions." As you pour gigabytes of data into a table, a single region will eventually become too large for one server to handle. This is when the magic of the Region Split happens.

The Split Trigger

By default, when a StoreFile in a region exceeds hbase.hregion.max.filesize (usually 256MB in these early versions), a split is triggered.

<!-- hbase-site.xml -->
<property>
  <name>hbase.hregion.max.filesize</name>
  <value>268435456</value>
</property>

The Split Process

Transaction Start: The RegionServer creates a split znode in ZooKeeper to notify the Master.
Offline: The parent region is taken offline. It stops accepting new requests.
Daughter Creation: Two new daughter regions are created. Instead of copying all the data (which would be slow), HBase creates "Reference files."
Reference Files: These are tiny files that point to the top or bottom half of the original parent HFiles.

// Conceptual logic for reference file check
if (isReference(path)) {
    Reference r = Reference.read(fs, path);
    long splitPoint = r.getSplitPoint();
    // Only read the relevant half of the HFile
}

Online: The daughter regions are opened and registered with the .META. table.
Compaction: Eventually, a "Major Compaction" will run, which actually rewrites the data into new HFiles for the daughter regions, deleting the old parent file.

This "constant-time" split is why HBase can scale to petabytes. The split itself takes seconds, regardless of how much data is in the region, because it's just a metadata operation.

Aunimeda builds backend systems with optimized database architectures - PostgreSQL, Redis, ClickHouse, and more.

HBase: Anatomy of a Region Split (2008)

HBase: Anatomy of a Region Split

The Split Trigger

The Split Process

Aunimeda

Read Also

Apache Storm: Spouts, Bolts, and Topologies (2012)

MongoDB 1.6: Scaling Out with Sharding and Replica Sets (2010)

Redis: RDB vs. AOF Persistence (2009)

Need IT development for your business?