Analytics

Wednesday, May 8, 2013

B+ trees

As a short follow-up to my last post on B-trees, this post is about a variant that is more popular in practice, the B+ tree. There are two primary differences between the data structures:
  • internal nodes in B+ trees do not store data (only keys); and
  • leaf nodes in B+ trees form a linked list in the order of the keys.
The reasoning behind this is because, as discussed previously, the limiting factor in a looking up a key in an index is the depth of the tree. Each additional step that has to be taken requires an extra disk seek, so minimizing the depth is the most important factor in improving performance. By moving all of the data to the leaves, the internal nodes can have much higher fan-out, thus reducing the depth of the tree. Additionally, by adding the linked list property, it becomes much easier to traverse keys sequentially as is common when doing range queries or index scans in a database.

B-trees and B+ trees are excellent at using block-oriented secondary storage liked spinning hard drives, but the hardware landscape is changing. Next time, I'll discuss a new variant of the B-tree which is optimized for flash storage and solid-state drives.

No comments:

Post a Comment