set
, map
, multiset
and multimap
and follow their interfaces very closely. By packing multiple value pairs into each node of the tree the B+ tree reduces heap fragmentation and utilizes cache-line effects better than the standard red-black binary tree. The tree algorithms are based on the implementation in Cormen, Leiserson and Rivest's Introduction into Algorithms, Jan Jannink's paper and other algorithm resources. The classes contain extensive assertion and verification mechanisms to ensure the implementation's correctness by testing the tree invariants. To illustrate the B+ tree's structure a wxWidgets demo program is included in the source package.The include files are extensively documented using doxygen. The compiled doxygen html documentation is included in the source package. It can also be viewed online at http://idlebox.net/2007/stx-btree/stx-btree-0.8.3/doxygen-html/ (if you are not reading it right now).
The wxWidgets B+ tree demo program is located in the directory wxbtreedemo. Compiled binary versions can be found on the package web page mentioned above.
If bugs should become known they will be posted on the above web page together with patches or corrected versions.
The complete source code is released under the GNU Lesser General Public License v2.1 (LGPL) which can be found in the file COPYING.
In computer science lectures it is often stated that using consecutive bytes in memory would be more cache-efficient, because the CPU's cache levels always fetch larger blocks from main memory. So it would be best to store the keys of a node in one continuous array. This way the inner scanning loop would be accelerated by benefiting from cache effects and pipelining speed-ups. Thus the cost of scanning for a matching key would be lower than in a red-black tree, even though the number of key comparisons are theoretically larger. This second aspect aroused my academic interest and resulted in the speed test experiments.
A third inspiration was that no working C++ template implementation of a B+ tree could be found on the Internet. Now this one can be found.
The base class is then specialized into btree_set, btree_multiset, btree_map and btree_multimap using default template parameters and facade functions. These classes are designed to be drop-in replacements for the corresponding STL containers.
The insertion function splits the nodes on recursion unroll. Erase is largely based on Jannink's ideas. See http://dbpubs.stanford.edu:8090/pub/1995-19 for his paper on "Implementing Deletion in B+-trees".
The two set classes (btree_set and btree_multiset) are derived from the base implementation class btree by specifying an empty struct as data_type. All functions are adapted to provide the base class with empty placeholder objects. Note that it is somewhat inefficient to implement a set or multiset using a B+ tree: a plain B tree (without +) would hold no extra copies of the keys. The main focus was on implementing the maps.
However it also directly generates many problems in implementing the iterators' operators. These return a (writable) reference or pointer to a value_type, which is a std::pair composition. These data/key pairs however are not stored together and thus a temporary copy must be constructed. This copy should not be written to, because it is not stored back into the B+ tree. This effectively prohibits use of many STL algorithms which writing to the B+ tree's iterators. I would be grateful for hints on how to resolve this problem without folding the key and data arrays.
operator=
).operator*
and operator->
of the iterator. See above for a discussion of the problem on separated key/data arrays. Instead of *iter
and iter->
use the new function iter.data()
which returns a writable reference to the data value in the tree.
size_type erase(const key_type &key); // erase all data pairs matching key bool erase_one(const key_type &key); // erase one data pair matching key
The following STL-required functions are not supported:
void erase(iterator iter); void erase(iterator first, iterator last);
// Output the tree in a pseudo-hierarchical text dump to std::cout. This // function requires that BTREE_DEBUG is defined prior to including the btree // headers. Furthermore the key and data types must be std::ostream printable. void print() const; // Run extensive checks of the tree invariants. If a corruption in found the // program will abort via assert(). See below on enabling auto-verification. void verify() const; // Serialize and restore the B+ tree nodes and data into/from a binary image. // This requires that the key and data types are integral and contain no // outside pointers or references. void dump(std::ostream &os) const; bool restore(std::istream &is);
struct btree_default_map_traits { // If true, the tree will self verify it's invariants after each insert() // or erase(). The header must have been compiled with BTREE_DEBUG // defined. static const bool selfverify = false; // If true, the tree will print out debug information and a tree dump // during insert() or erase() operation. The header must have been // compiled with BTREE_DEBUG defined and key_type must be std::ostream // printable. static const bool debug = false; // Number of slots in each leaf of the tree. Estimated so that each node // has a size of about 256 bytes. static const int leafslots = MAX( 8, 256 / (sizeof(_Key) + sizeof(_Data)) ); // Number of slots in each inner node of the tree. Estimated so that each // node has a size of about 256 bytes. static const int innerslots = MAX( 8, 256 / (sizeof(_Key) + sizeof(void*)) ); };