Eblob and large volumes of data
First, since eblob stores whole index in RAM, it is possible that it will take all the memory and there will be no IO progress.
While this sounds true, practice says quite the opposite. To save 40 millions of keys we only need about 6 Gb of ram (and we can decrease this number). Usually we have several times more keys on single machine. Those numbers are quite common in commodity servers not even saying about high-end machines. System bootstrap itself in several minutes, which may be a limitation though, but again it is 40 millions of objects to store in RAM.
Having index in RAM guarantees fast IO – we have to seek only once per object at most. If we put index to disk, IO can not be served that fast. For some cases this is valid requirement, while for other we do not need the whole index in RAM.
We have plans to move index from pure RAM (anonymous memory) into memory-mapped file. This will solve both problems, but to date we did not yet came close to this limitation, so it was not implemented yet.
Another mentioned issue is ‘atomic’ writes. I.e. since eblob is append-only storage, it has to have whole data object ready to be written into eblob. This is not true – eblob supports prepare/commit API, where the former reserves requested space in the blob and returns offset which can be used to write multiple portions of object. Commit will, well, commit your object into index.
This API is not exported into elliptics yet, since we, well, do not yet work with such large objects. But we have plans to start (or at least test) hosting multi-gigabytes files this year, so we will extend elliptics commands to support prepare/commit writes.