products
Product Architecture
Citrusleaf 2.0 uses the marriage of distributed system and traditional database technologies to ensure exceptional performance, linear scalability and high availability. The Citrusleaf 2.0 product architecture is comprised of three main layers – the Client Layer, the Distribution Layer, and the Data Layer. The Client Layer contains the client libraries. The Distribution Layer is responsible for the cluster communication and cluster management operations. The Data Storage layer delivers data storage capabilities that are optimized to work with the latest in storage technology and storage systems, including DRAM, rotational disk and Flash. Citrusleaf 2.0 is written in C, runs on Linux and can be ported to other server operating systems.
Client Layer
The Citrusleaf Client libraries make up the Client Layer. The Citrusleaf Client is a ‘smart client’ – in addition to implementing the APIs exposed to the transaction, it also tracks cluster configuration and manages the transaction requests, making any change in cluster membership completely transparent to the Application.
Citrusleaf’s Client Layer manages transactions, finds the optimal server for each request – even when confronted with cluster re-configuration - and retrys failures. This allows application developers to focus on key tasks of the application rather than database administration. The Client also implements its own TCP/IP connection pool for transaction efficiency. The Client Layer itself consists only of a linkable library (the ‘Client’) that talks directly to the cluster. Citrusleaf provides full source code to the Clients, as well as documentation on the wire protocol used between the Client and servers. Clients are available in many languages, including C, C#, Java, Ruby, PHP and Python.
The Client Layer has three main functions; providing an API interface for the Application, tracking cluster configuration, and managing transactions between the Application and the Cluster.
Providing an API interface for the Application
The Citrusleaf Client library provides an API interface for database transactions. The APIs also provide several parameters that allow application developers to modify the operation of transaction requests.
Tracking cluster configuration
The Client Layer also tracks the current configuration of the server cluster. The Client communicates periodically with the cluster to maintain an internal list of server nodes. Any changes to the cluster size or configuration are automatically tracked by the Client, and are entirely transparent to the Application.
Managing transactions between the Application and the Cluster
When a query comes in from the application, the Client quickly formats the query into a binary format understandable by the servers. The Client then uses its knowledge of the cluster configuration to determine which server is most likely to contain the requested data and parcels out the request appropriately.
Distribution Layer
In the Distribution Layer, server cluster nodes communicate among themselves to ensure data consistency and data replication across the cluster. The Distribution Layer also ensures that the Citrusleaf cluster remains operational when individual server nodes are removed from or added to the cluster. No server is the master of any other; all participate equally in the cluster.
There are three major modules within the Distribution Layer – the Cluster Management Module, the Data Migration Module, and the Transaction Management Module.
Cluster Management Module
The Cluster Management Module is responsible for keeping track of what nodes are currently in the cluster. The key algorithm is the consensus voting process which determines which nodes are considered part of the cluster, and ensures that all nodes in the cluster maintain a consistent view of the system.
Data Migration Module
The Data Migration Module is responsible for balancing the distribution of data across the cluster nodes, and for ensuring that each piece of data is duplicated across nodes as specified by the system’s configured Replication Factor.
Transaction Processing Module
The Transaction Processing Module manages transaction requests from the Citrusleaf Client. This module has several responsibilities, including fulfilling the request and handing the result back to the client, proxying the request to another node, and resolving conflicts between different copies of the requested data that may occur when the cluster is recovering from being partitioned.
Data Layer
The Data Layer is implemented on each Citrusleaf server node. The Data Layer handles management of stored data on disk, and maintains indices corresponding to the data in the node.
Data Model
At the highest level, data is collected into policy containers called ‘namespaces’, semantically similar to ‘databases’ in an RDBMS system. Namespaces are configured when the cluster is started, and are used to control retention and reliability requirements for a given set of data. One of the most important system configuration policies is the Replication Factor, which controls the number of stored copies of every piece of data. Within a namespace the data is subdivided into ‘sets’ (similar to ‘tables’) and ‘records’ (similar to ‘rows’). Each record has an indexed ‘key’ that is unique in the set, and one or more named ‘bins’ (similar to columns) that hold values associated with the record. Values in the bins are strongly typed, and can include strings, integers, and binary data, as well as language-specific binary blobs that are automatically serialized and de-serialized by the system. The Citrusleaf system is entirely schema-less, sets and bins do not need to be defined up front, but can be added during run-time if additional flexibility is needed.
Data Storage
Citrusleaf can store data in DRAM, traditional rotational media, and SSDs. Significant work has been done to optimize data storage on SSDs, including bypassing the file system to take advantage of low-level SSD read and write patterns.
Citrusleaf’s data storage methodology is optimized for fast transactions. Indices (via the primary Key) are stored in DRAM for instant availability, and data writes to disk are performed in large blocks to minimize latencies that occur on both traditional rotational disk and SSD media. The contents of each namespace are spread evenly across every node in the cluster, and duplicated as specified by the namespace’s configured Replication Factor. For optimal efficiency, the location of any piece of data in the system can be determined algorithmically, without the need for a stored lookup table.
Two additional processes – the Defragmenter and the Evictor – work together to ensure that there is space both in DRAM and disk to write new data. The Defragmenter tracks the number of active records on each block on disk, and reclaims blocks that fall below a minimum level of use. The Evictor is responsible for removing references to expired records, and for reclaiming memory if the system gets beyond a set high water mark.
