Elliptics network got new on-disk format

Tagged:  

Eventually any storage should go into production mode, which implies not only data storage itself but also access restrictions. Distributed hash table systems, like elliptics network, do not have dedicated servers which could store that information and manage access permissions, so each object should have its own set of rules. Although without proper security framework on top of network media this will not guarantee required data access granularity, but even in this model it is still possible to implement IO permissions to some degree.

Until now elliptics network did not have even a slight mechanism for doing this. And even that rudimentary supported metadata was stored in the transaction log and did not allow any kind of extensions or proper updates.

And although I did not yet write any line of code to deal with metadata, I already broke old-style transaction logs, which now contain only and only transaction information. There are no metadata objects at all, but I will update appropriate parts of the library to generate them and store in the separate entities.

It is possible to store metadata in the different objects like the ones being indexed by the hash of the original object's name plus some extension, but this will force system to perform two lookups to find out needed object and its metadata.
Another way is to add new object type to existing transaction and history log objects - all metadata will be stored close to the object itself and could be fetched using only object ID. In the filesystem backend where each object is stored as separate file, metadata will be indexed by the '.metadata' extension or similar - just like we have $ID.history for transaction logs. In the database backend (BDB and Tokyo Cabinet, although I seriously consider to drop the former, since it is unacceptibly slow compared to TC) it will be a separate table, indexed by the object ID.

Metadata will have flexible format (maybe even human-readable one based on strings?) to allow extensions without breaking backwards compatibility.

But first I should fix background log checker, which although syncs all kinds of objects currently (i.e. when there is no some object in the storage, but there is its copy with different ID, it will upload missing data from that copy), it does it slightly wrong way, namely messing with hashes and producing unneded additional transaction references. When checker is ready, whole storage fsck process will just combine a log based on metadata objects, and start check process for it.

Stay tuned, we are very close to the next major release, which will draw the line of the serious features and changes!

Hi,

will there be some tarball release for elliptics? This would be easier to package up for some distro :)

- fabian

but is not yet available, since there is no release yet. Discussed features are committed into the tree, but not yet released as a self-containing pack.

But you can always build package from git snapshot. For debian/ubunty it is quite simple getting that source tree contains needed objects :)
I did not put RPM spec there, since we do not build package on RHEL, which contains rather old packages, which do not meet dependency, so we use source-compiled objects.