Towards the new IO model

Tagged:  

In a meantime <a herf="/projects/elliptics">elliptics network</a> got a full scatter/gather IO support as a default and automatic method to store and read the data.

Write path did not change noticebly - I just dropped several IO flags and made scatter write the default mode. Now every write is made as a separate transaction stored in the network and optionally history of the main object is updated. All is done asynchronously and only high-level API function sleeps waiting for all nodes to reply back with the acknowledge.

Reading changed dramatically. It contains two stages now: fetch the history for the needed object and then read transactions responsible for the requested data range. This is done in high-level API function only, so history reading can be made synchronous. All data reading transactions are requested in parallel from multiple nodes. It can be done in application if needed, since all helpers are exported.

So effectively elliptics network turned on by default parallel reading mode and takes this problem from the application writers. Low-level API functions were not changed and provide async access with completion callback invocation as before.

I also plan to simplify reading even further for the IO storage backends - it will not be required to split read data into multiple transaction replies when chunk is large enough (currently each IO storage backend splits record into 10Mb blocks), but instead each write will scatter the provided data into multiple smaller writes (made in parallel) spread over the network. This mode can be turned off with the special flag if needed, and thus each write (even huge enough) will be sent as a single transaction to the single node.

<a href="/projects/elliptics">Elliptics network</a> becomes quite large project, so I plan to implement some kind of automatic testing scripts which will run the same small set of basic operations with all options needed to be checked. Namely it will read and write data in multiple chunks at different offsets and check the joining protocol. Right now I have to manually check md5 sums of the files after changes in the 3 storage backends or IO functions, which is rather unconvenient and time consuming task. The only thing testing tool will do is to run several operation and report the result - no tuning, no tweaks and command line options. Simple and trivial, but automatic.

As of time being, results are available in a <a href="/cgi-bin/gitweb.cgi?p=elliptics.git;a=summary">git</a> tree.

Hello. It's a very interesting project. I've read all a la "Documentation and Architecture" posts but can't find out such things as dynamic reconfiguration, auto recovery after node corruption, any methodology of extracting of data in case of unrecoverable cloud disorganization.
I see you focus on polishing of normal Elliptics network activity, but things seldom behave in the way they should do, so crashes, outages etc etc are absolutely inevitable. Any ideas to improve the recovery, self curing etc?

It is indeed a very useful feature which is not yet presented in the project, but it should not be hard to add.

Plan is to add node disconnect hook, so that neighbours detected that node went offline and made some steps about it. Namely they can send objects from the data range, maintained by the failed node, which happend to be presented on the alive servers to the next to failed node, which now has to maintain range of the failed node.

Another approach is to maintain a 'replication counter' for each object and re-replicate some objects which were presented on the failed node.

Thanks for the job!