Storage and beyond

the elliptics network

Elliptics network is a fault tolerant distributed key/value storage.
With default key generation policy it implements hash table object storage.

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

Distributed hash table design allows not to use dedicated metadata servers which frequently become points of failure in the classical storages, instead user can connect to any server in the network and all requests will be forwarded to the needed nodes.
One can lookup the needed server and connect there directly to get data.

Elliptics can be called a cloud of losely connected equivalent nodes. Joining node will automatically connect to the needed servers according to the network topology, it can store data in different configurable backends like file IO storage, eblob backend. One can create own IO storage backend.

Protocol allows to implement own data storage using specific features for the deploying project and generally extend data communication with infinite number of the extensions. One of the implemented examples is remote command execution, which can be used as a load balancing job manager.

One can get 2010 Linux Kongress presentation here and whitepaper here.
We started community support site to gather feedback and information.

Linear scalability of the communication channel breaks the limits of the horizontal server extensions which if used with the replication facilities allows to implement any level of the data redundancy.

It is possible to connect to the network from the NATed box as long as there is at least one accessible server, which then can forward IO requests to the other nodes.

Short feature list includes:

  • distributed hash tables, no metadata servers, horizontal scaling
  • data replication
  • column data storage (eblob only)
  • range requests
  • different IO storage backends, modular architecture which allows to easily implement own transaction storage
  • automatic data repartitioning in case of removed or added nodes
  • ring addressing structure, ability to implement own key generation models
  • support for NATed connections
  • cluster statistics gathering
  • IO notifications support for any object in the network
  • automatic configuration
  • HTTP frontend (fastcgi application, benchmark), C/C++ and Python bindings
  • Google’s Snappy compression support (eblob)
  • server-side scripting extension support

The project originally started as POHMELFS distributed server backend, but then evolved into the own project, which will be used as backed for the POSIX accessible filesystem and providing own API for the developers.

Sources, configs and examples:
Discussion group:!forum/reverbrain