Elliptics network is a fault tolerant distributed hash table object storage.
The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage and implements transactional data update. Small to medium write benchmarks can be found (its the latest to date, other presented earlier) in the appropriate blog section.
Distributed hash table design allows not to use dedicated metadata servers which frequently become points of failure in the classical storage designs, instead user can connect to any server in the network and all requests will be forwarded to the needed nodes, one can also lookup the needed server and connect there directly. It can really be called a cloud of losely connected equivalent nodes. Joining node will automatically connect to the needed servers according to the network topology, it can store data in different configurable backends like BerkeleyDB, Tokio Cabinet IO storage backend, file IO storage or using own IO storage backend.
Protocol allows to implement own data storage using specific features for the deploying project and generally extend data communication with infinite number of the extensions. One of the implemented examples is remote command execution, which can be used as a load balancing job manager.
The network uses ring addressing structure, where each node stores objects which belong to the ID range between given and the next node. When node joins it splits ID range of the neighbour node and copies (if needed) more recent objects from the neighbour.
Network implements transactional approach, which are processed in parallel and asynchronously by client and server. Moreover there is a flexible data storage policies like spreading object over the network, create single object on the single node, store separate transactions and so on. It is possible to manually recover the object (or its state anywhere in the past) by fetching an applying all update transactions.
Configurable number of IO threads and depth of the parallel transaction execution allows to implement fine-grained IO tuning for the different server hardware (and software if it is expected to share the server with other projects).
Each object in the network is indexed by its ID, which can be generated based on the transaction content, object name (like absolute path in the path based filesystems) or anything else. The most common approach for the write transaction to generate ID is to hash its content (with or without some key). Functions which generate ID from the data are called transformation functions, and while they may be simple hash, it is also possible to solve the problem of the geographical relation of the objects. For example we can assign nodes in the same datacenter (country) to have the same highest byte in the ID, and use the transformation function, which will take that into account, for example have 3 transformation functions, each one would hash the data and substitute its highest byte with the appropriate number of the datacenter (country). In this case every write transaction will be stored separately in 3 different datacenters, and when some of them are disconnected, application can remove corresponding transformation functions.
Thus when writing data with 3 different transformation functions data will be placed to 3 different nodes, while reading will try the first transformation, and if reading (or transformation) fails, system will try the next one and so on until either there are no more transformation functions provided or reading request succeeded.
This idea is presented in details in virtual datacenters feature description. It is also described how one can implement geographical linking and guaranteed per-datacenter object copies split.
Linear scalability of the communication channel breaks the limits of the horizontal server extensions which if used with the replication facilities allows to implement any level of the data redundancy.
It is possible to connect to the network from the NATed box as long as there is at least one accessible server, which then can forward IO requests to the other nodes.
Short feature list includes:
- distributed hash tables, no metadata servers, horizontal scaling
- data replication
- transactions and parallel IO
- different IO storage backends, modular architecture which allows to easily implement own transaction storage
- automatic data repartitioning in case of removed or added nodes
- ring addressing structure, ability to implement own transaction ID generation model
- support for NATed connections
- remote node statistics gathering module (usage of the disk, cpu, ram and so on)
- IO update notifications support for any object in the network
- advanced merge strategies
- automatic tests (IO, joining, merge, deletion)
- HTTP frontend (fastcgi application, benchmark)
The project originally started as POHMELFS distributed server backend, but then evolved into the own project, which will be used as backed for the POSIX accessible filesystem and providing own API for the developers.
The elliptics network development can be tracked in its taxonomy section and via GIT tree (the latest snapshot). Sources of the library with the examples are available in archive.
One can read usage example, which shows practical side of the ellitpics network in a nutshell (example/EXAMPLE in the source tree) or check documentation.
TODO list includes:
Recent comments
1 day 4 hours ago
1 day 5 hours ago
1 week 2 days ago
1 week 2 days ago
1 week 4 days ago
1 week 4 days ago
1 week 4 days ago
2 weeks 1 day ago
2 weeks 1 day ago
2 weeks 3 days ago