It took a while to prepare a new release of the distributed hash table storage elliptics network, but here we go. This is still a minor version bump, although amount of changes is rather large for small update.
Likely this will be the last releae in 2.6 release cycle, since in parallel we are cooking up a completely new versioning and merge logic as well as data synchronization. Btw, this release breaks to some degree that logic, but there is a tool to fix things up. It will be automated in the next versions.
But let's dig into details and changelog:
- Data integrity checker. Although a little bit undocumented (see example below), it allows to check whether given object is present in the storage with requested number of its copies. And if number of found objects does not correspond to config, it will automatically download and upload data with the desired IDs. Later this tool will also be able to upload data into the storage. This checker will be a base for background FSCK, which will be a simple script, which will parse metadata and start checker with given log. It also supports external library call for requests merge.
- [FCGI frontend]: cookie, timeouts, tunable headers, variable content types, more and clean XML.
- Rewritten network state and reconnection logic. This makes NATed box support trivial (we do support it), client nodes became even simpler than ever, less code, less bugs, everyone is happy.
- Debian debug package.
- Fair number of bug fixes. This version is used in production, if time permits I will describe this load in details later.
Modulo possible bugs, main work is concentrated on the filesystem checker. There are two problems to solve.
The first one is absence of transaction log made by requested transformation function, or in plain words - absence of copy of the object in the storage. This happens when some node went offline and returned empty or was replaced. Or did not return at all. In this case fsck application will check how many copies are present in the storage and automatially download one of them (the first one from config) and upload with given ID.
Second issue to resolve is transaction merge. Elliptics network by default uses transactions for every update, so there is no object as is in the storage, instead reader will download transaction log, parse it and select transactions which cover requested object range. It is hidden in API of course, but it is possible to manually select needed transactions, for example to support versioning and data snapshots. As tasty effect two fully equal transactions (objects) will not use two times more space, since there are appopriate transaction reference counters.
Currently there are multiple (5) merge strategies, but practice shows that they introduce more harm or misunderstanding at best, than actual goodness. So I decided to drop them all in favour of trivial timestamp based merge algorithm. Of course it is possible to merge transactions based on private algorithm, which can be called from fsck daemon. We have request to allow external modules to merge objects based on actual data.
This version disables content synchronization during node joining. Instead admin has to call fsck application with externally stored log of the uploaded data to check whether things are ok and fixup what was broken. It will be automated and no external log will be required in the next versions.
Fsck application log file should look like this:
3 0,0,0 sha1,md5 object_name
where '3' is object creation flags - without transactions, just like those created by FSCK frontend. Will be removed in the next version.
'0,0,0' is a placeholder for object parsing information meaning start,end,update_existing. Start and end are positions of the starting and ending symbol in the object_name used to generate ID. Zeroes mean automatic detection. Update_existing is not currently supported, in the next version if set will upload local file named object_name into the storage no matter if its copies are already present.
sha1,md5 - transformation functions used to generate ID from object_name. This setup uses two copies - each one created by appropriate hash.
object_name - name of the uploaded object. Its hash (or actually transformation of the name using presented functions, it is allowed to be some other function than plain hash) will be object ID.
Stay tuned, work is boiling and results are very close!
Hello,
Please find the SNIA cloud storage community link, http://groups.google.com/group/snia-cloud/web?pli=1 and Cloud Data Management Interface (CDMI) v1.0g DRAFT has been released.
Is it possible to start working on CDMI interface for this project ?
If yes please give us suggestions to start working on this interface.
Thanks,
Kiran.
has interfaces for way too wider cases than elliptics network currently provides or has plans to provide in the closest future.
Since it is a distributed hash table storage, some of the discussed terms do not cleanly apply to the model, but generally I believe it is possible to find a good correlation between interfaces.
To date I have to note, that there is no security model supported in elliptics network as well as distributed locking system. THe latter will be implemented as a stand-along project accessible by clients, while the former will be a part of the storage system eventually.