Design Overview

In this section we want to give you an initial overview of all the present structures in Haura and how they relate to another. If you want to understand how the database works in detail it is best to gain a general idea of how each individual parts interact with another.

Haura Structure

An overview of the different layers defined in the betree architecture

Database, Dataset and Object store

At the top of the overview, see above, we can see the user interface part of Haura. The Database provides the main introduction here for the rest of the modules to be used. From an active Database we can then initiate a Dataset or Objectstore. Also notable from the graphic, the Database creates the AllocationHandler which is subsequently used in the DataManagement. The Database also contains due to this also the storage configuration which is then initiated in the StoragePool. The implementation of the Objectstore is a wrapper around Dataset whereas the keys are chunk ids and the value is there chunk content.

B-epsilon-tree
%3tblTreeLayerin1Internal Nodecb1CBin1->cb1cb2CBin1->cb2cb3CBin1->cb3cb4CBin1->cb4in2Internal Nodecb6CBin2->cb6cb7CBin2->cb7cb8CBin2->cb8cb9CBin2->cb9rootrootc1CBroot->c1c2CBroot->c2c1->in1c2->in2l1Leafcb1->l1l2Leafcb2->l2l3Leafcb3->l3l4Leafcb4->l4l6Leafcb6->l6l7Leafcb7->l7l8Leafcb8->l8l9Leafcb9->l9

The Dataset interacts mainly with the actual B-epsilon-tree, which receives through its root node messages from the Dataset. By default these implement insert, remove and upsert, although this can be exchanged if required. Solely the MessageAction trait is required to be implemented on the chosen type to allow for its use in the tree. An example for this can be seen in the MetaMessage of the Objectstore.

In the figure above we show how this tree is structured. We differentiate between three different kinds of nodes in it, first Leaves shown in blue, Internal Nodes shown in red, and Child Buffers (abbr. "CB") shown in grey. The root node of the tree is highlighted with TreeLayer to indicate that this is the node providing the interface for Datasets to attach to. From the root node messages are received.

Once passed, the tree propagates the message down the tree until it reaches a leaf node where the message will be applied. This might not happen instantaneously, though, and multiple buffers (ChildBuffers) might be encountered which momentarily hold the message at internal nodes. This way we avoid additional deep traversals and might be able to flush multiple messages at once from one buffer node.

Vital to understanding the handling and movement of nodes and their content within Haura is the object state cycle, this is illustrated at the leaf nodes and in more detail in the following figure.

State Diagram of the object lifecycle

Adjacent to the internals and construction of B-epsilon-trees are the commonalities between existing trees in an open database. Hidden to the user, the root tree is used to store internal information concerning the created datasets (their DatasetIds and ObjectPointers) and Segments information. Segments are previously not mentioned here as they are considered in the Data Management Layer, but can be thought of as containers organizing the allocation bitmap for a range of blocks. Additionally to avoid conflicts with another, all trees share the same Data Management Unit to assure that no irregular state is reached in handling critical on-disk management such as allocation of blocks and updating of bitmaps.

%3DMUData Management UnitrootRoot Treeroot->DMUds1Dataset 1Treeroot->ds1Object Pointerds2Dataset 2Treeroot->ds2Object Pointerds3Dataset 3Treeroot->ds3Object Pointerds1->DMUds2->DMUds3->DMU

Data Management

On-disk operations and storage allocation are handled by the Data Management layer. This layer also implements the copy-on-write semantics required for snapshots, done in delayed deallocation and accounting of a dead-list of blocks.

The Handler manages the actual bitmap handling for all allocations and deallocations and is also responsible for tracking the number of blocks distributed (Space Accounting).

To keep track of specific locations of allocated blocks, or free ranges of blocks rather, bitmaps are used. Wrapped around SegmentAllocators, these can be used to allocate block ranges at any position in a specific SegmentId or request specific allocations at given offsets.

SegementIds refer to 1 GiB large ranges of blocks on a storage tier, though the Id is unique over all storage tiers.

Copy on Write

The Data Management Layer is also responsible to handle copy-on-write preservation. This is handle by checking if any snapshots of the dataset contain the afflicted node (via Generation), if this is the case the dead_list contained in the root tree is updated, to contain the storage location of the old version of the node on the next sync.

Storage Pool

As the abstraction over specific hardware types and RAID configurations the data management unit interacts for all I/O operation with the storage pool layer. Notable here is the division of the layer into (of course) storage tiers, Vdev and LeafVdevs. There are 4 storage tiers available (FASTEST,FAST,SLOW,SLOWEST) with each at maximum 1024 Vdevs. Each Vdev can be one of four variants. First, a singular LeafVdev, this is the equivalent of a disk or any other file path backed interface, for example a truncated file or a disk dev/.... Second, a RAID-1 like mirrored configuration. Third, a RAID-5 like striping and parity based setup with multiple disks. Fourth and last, a main memory backed buffer, simply hold as a vector.

Implementation

This section should help you find the module you'll want to implement your changes in by giving you a brief description of each module together with its location and references in the layer design.

NameDescription
cacheClock Cache implementation used internally
compressionCompression logic for indication and usage of compression algorithm (zstd only atm)
data_managementAllocation and Copy on Write logic for underlying storage space
databaseThe Database layer & Dataset implementation with snapshots
metricsBasic Metric collections
objectThe object store wrapper around the dataset store
storage_poolThe storage pool layer which manages different vdevs
treeThe actual b-epsilon tree
vdevImplement the use of different devices for storage (Block, File, Memory) with different modes parity, raid, single

Note that traits are heavily used to allow interaction between objects of different modules, traits implemented by one module might be located in multiple other modules.