AboutBlogContact
DatabasesNovember 3, 2008 2 min read 128Updated: June 22, 2026

HBase: Anatomy of a Region Split (2008)

AunimedaAunimeda
📋 Table of Contents

HBase: Anatomy of a Region Split

In HBase, data is partitioned into "Regions." As you pour gigabytes of data into a table, a single region will eventually become too large for one server to handle. This is when the magic of the Region Split happens.

The Split Trigger

By default, when a StoreFile in a region exceeds hbase.hregion.max.filesize (usually 256MB in these early versions), a split is triggered.

<!-- hbase-site.xml -->
<property>
  <name>hbase.hregion.max.filesize</name>
  <value>268435456</value>
</property>

The Split Process

  1. Transaction Start: The RegionServer creates a split znode in ZooKeeper to notify the Master.
  2. Offline: The parent region is taken offline. It stops accepting new requests.
  3. Daughter Creation: Two new daughter regions are created. Instead of copying all the data (which would be slow), HBase creates "Reference files."
  4. Reference Files: These are tiny files that point to the top or bottom half of the original parent HFiles.
// Conceptual logic for reference file check
if (isReference(path)) {
    Reference r = Reference.read(fs, path);
    long splitPoint = r.getSplitPoint();
    // Only read the relevant half of the HFile
}
  1. Online: The daughter regions are opened and registered with the .META. table.
  2. Compaction: Eventually, a "Major Compaction" will run, which actually rewrites the data into new HFiles for the daughter regions, deleting the old parent file.

This "constant-time" split is why HBase can scale to petabytes. The split itself takes seconds, regardless of how much data is in the region, because it's just a metadata operation.


Aunimeda builds backend systems with optimized database architectures - PostgreSQL, Redis, ClickHouse, and more.

Contact us for backend and database engineering. See also: Custom Software Development

Read Also

Apache Storm: Spouts, Bolts, and Topologies (2012)aunimeda
Databases

Apache Storm: Spouts, Bolts, and Topologies (2012)

Hadoop is for batches. Storm is for streams. Let's build a real-time word count that doesn't melt your cluster.

MongoDB 1.6: Scaling Out with Sharding and Replica Sets (2010)aunimeda
Databases

MongoDB 1.6: Scaling Out with Sharding and Replica Sets (2010)

The NoSQL revolution is in full swing. With MongoDB 1.6, horizontal scaling and automated failover are finally production-ready. Let's configure a sharded cluster.

Redis: RDB vs. AOF Persistence (2009)aunimeda
Databases

Redis: RDB vs. AOF Persistence (2009)

Redis is fast because it's in-memory, but what happens when the power goes out? Choosing between RDB and AOF is a classic trade-off.

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

Get Consultation All articles