As mentioned in comments, there are possible limitations in eblob technology.
First, since eblob stores whole index in RAM, it is possible that it will take all the memory and there will be no IO progress.
While this sounds true, practice says quite the opposite. To save 40 millions of keys we only need about 6 Gb of ram (and we can decrease this number). Usually we have several times more keys on single machine. Those numbers are quite common in commodity servers not even saying about high-end machines. System bootstrap itself in several minutes, which may be a limitation though, but again it is 40 millions of objects to store in RAM.
Having index in RAM guarantees fast IO - we have to seek only once per object at most. If we put index to disk, IO can not be served that fast. For some cases this is valid requirement, while for other we do not need the whole index in RAM.
We have plans to move index from pure RAM (anonymous memory) into memory-mapped file. This will solve both problems, but to date we did not yet came close to this limitation, so it was not implemented yet.
Another mentioned issue is 'atomic' writes. I.e. since eblob is append-only storage, it has to have whole data object ready to be written into eblob. This is not true - eblob supports prepare/commit API, where the former reserves requested space in the blob and returns offset which can be used to write multiple portions of object. Commit will, well, commit your object into index.
This API is not exported into elliptics yet, since we, well, do not yet work with such large objects. But we have plans to start (or at least test) hosting multi-gigabytes files this year, so we will extend elliptics commands to support prepare/commit writes.
Two comments:
* It would be great to have a way to keep only a part of the index in RAM. Maybe a technique used in Lucene to store the term index in RAM might help here. Also, have you looked at Google's leveldb (it uses LSM trees) -- <a href="http://code.google.com/p/leveldb/" title="http://code.google.com/p/leveldb/">http://code.google.com/p/leveldb/</a>.
You could model the index on this datastructure.
* The prepare/commit API would require that the size be known before doing the prepare. If I am receiving a large data stream and don't know the length upfront then I can't use prepare/commit. Think of using eblob as a backend for an NFS server for example (this will also apply to POHMELFS in the future).
LSM trees are slow, I suspect that having index in memory mapped file will solve most of issues with unused entries eating RAM - they will not be read from disk to memory or will be flushed if memory is not enough. But it should be checked in practice of course, if things go wrong, we will think on how to improve the situation.
Databases which operates with <i>megabytes</i> in its benchmarks are very questionable. Anything that fits VFS cache in benchmarks will likely explode when its size grows larger. And I mean it - Kyoto Cabinet is extremely fast on smaller databases as well but as soon as its size increases, performance drops dramatically. I'm afraid the same will happen with leveldb.
As of prepare/commit API - any input stream actually operates on chunks of data, after all, someone wrote that data from another end, and there were size and offset.
One can reserve sufficiently large portion for the data and append received chunks there. If it is not enough, eblob will create and append whole new copy of the data. If it is not desired behavior (like really large streams), one can write multiple chunks and store their index somewhere (kinda metadata, what 'links' written blocks together), this can be another record with different key. After all, that's how filesystems are made.
There is no silver bullet, but flexible enough interface which allows to create desired functionality.