Its original draft could be read previously, but I believe it became a little bit outdated, so requires some highlighting.
But first, let's clear the status of fsck log checker. I completed its implementation, which is now capable of supporting consistent number of copies in the storage. It does not allow to merge different transaction logs yet.
To determine object to check it uses special text log file, which among other info contains name of the object and transformation functions to work with. Each transformation function will produce unique ID, which will be checked in the storage. For example we can put there sha1 and md5 transformation functions, so we will have two IDs equal to appropriate hash of the input name (and optionally hash of the transactions content).
When some objects are not presented in the storage, checker will download first existing copy and try to upload it using transformation functions corresponding to missing objects. So, if object with ID being equal to md5(name) is present and sha1(name) isn't, then checker will download all transactions stored in the existing object and upload them using sha1 transformation, thus recovering requested number of copies.
Checker currently requires log file to get information from and admin to start the process.
Background fsck is supposed to eliminate both needs.
Basic idea is to store some metadata with each object, which will tell origin of the given object and how it was supposed to be stored in the elliptics network. Thus we can timely or on request parse metadata for all objects in the given node (or only part of them), create a log file and run existing checker against it.
It becomes similar to what extended attributes are in the existing filesystems. Metadata can contain information not only about what object is, but also its IO permissions or access policies, owner information and anything else we would like to have there, which will allow to implement at least basic security model for elliptics network as well as simplify POHMELFS port.
If none of the ID functions resolve to the current node, do you delete the object after ensuring that the other copies are there?
Not yet, it will be done in automatic fsck.
That's a key part of rebalancing when you add nodes. New nodes will cause the hash functions to change where they are placing objects causing orphaned objects. But you have to check that you haven't accidentally orphaned all the copies of the object. If you do that the object will disappear until fsck brings it back.
An alternative to the fsck scheme is to have object expiration times. When the timer on the object expires you check to make sure all of the copies are in place and then delete if you are an orphan. Object reliability can be enhance with short expiration times and more copies. Object expiration basically creates a continuous low level of fsck noise on the net. Bit torrent uses object expiration.
Yes, it is possible that we acidentally orphan all copies of some objects, but new node addition policy may forbid to add more servers than a common (or smallest) number of copies at a time for example, or admin will have to quickly start fsck process on its own.
With proper policies such situation can be avoided, but generally one has to watch it.
BT expiration time will not work in elliptics as is, since nodes cover range of IDs while in BT it is a set. When range collapses, node does not even try to check whether requested ID exists in its backing storage, since it can not differentiate between own range collapse and the fact, that it always had such range.
How are you synchronizing the hash functions on all of the nodes? fsck will fail if everyone isn't using the same hash functions.
They are written in metadata info as well as original object name for all transactions, so we can check in background required number of copies.