In a meantime elliptics network went production

Tagged:  

We started the first production elliptics network distributed hash table cloud last week. Configuration is rather simple - there are 3 virtual datacenters each one contains two physical machines with several Tbs of space each.

I do not know precise number of HTTP proxies (fastcgi frontends for elliptics network) installed, but we talked about one or two of them in each datacenter. Each proxy has 5 uploading and 50 downloading processes. Uploading ones are hidden behind firewall, and downloading proxies are configured to read small objects through themself (like XML and image files) and big objects (data files) are downloaded directly from storage nodes, proxy only generates XML output with direct URL to some storage node.

This pilot project will host only about 2-4 Tb of data, each object will have 3 copies, each one stored in the appropriate datacenter. We will add geographically-spread datacenters soon, which will host only the most popular content, so that clients local to those storage nodes would not go the main datacenter.

Objects are not supposed to be updated during its lifetime, only once uploaded and removed when time requires. Data synchronization for failed and new nodes will be done using fsck application, which currently does not support advanced merge algorithms we had in the elliptics network, it only checks number of copies and optionally downloads/uploads if something is missing. Fsck application uses log file to get information about what and how objects should be checked, currently storage does not store this metadata with objects themself.

And actually background fsck daemon discussed previously will not be something very different from this application. Instead I suppose that having a script which will parse object's metadata and invoke application is a good approach. Fsck checker does not yet work with all possible types of uploaded objects, namely it was not yet tested with the transaction logs, only with objects themself, since it is what we use in the pilot project.

The more I think about transaction merge algorithms, the more I like the idea, when we only merge them using timestamps. Currently elliptics network has 5 merge algorithms, which may or may not be appropriate in some or other setup. Merge algorithm is invoked when node joins the storage to sync its content with what we have in the storage. Idea is to drop this join syncing and use postponed fsck which will sync data sometime in the future. Thus it is a prerequistic to have multiple object copies in such system, since it can take a while to sync joining node.

I understand that it is not possible to maintain in-sync clocks on multiple machines to have correct timestamps for each update transaction, but it should not be a problem if proper locking is used. Although there is no appropriate distributed locking system yet, it will be implemented anyway (plan is to write PAXOS locking daemon).

Fsck application will be extended to allow external libraries to merge data, namely I have requests to allow external entities to merge data based on its content and not meta informaion I might store, so fsck will just call external functions loaded from provided shared library if configured.

I did not make a new release yet, we will figure out possible bugs and complete fsck application testing first. Next version will not have syncing during join time, so this will be a major update.

Stay tuned!