Tag Archives: recovery

Server-side operations

Elliptics distributed storage is being built as a client-server architecture, and although servers may discover themselves, exchange various statistic information and forward requests, they most of the time serve client’s requests.

Recovery in a distributed storage is a tricky operation which requires serious thinking on which keys have to be copied to which destinations. In Elliptics recovery is another client process which iterates remote nodes, reads data to be recovered and update needed keys.

But there are cases when this round trip to client is useless. For example when you require missing replica, or when you have a set of keys you want to copy or move to new destination.

Thus I introduced two server-side operations which allow to send content from one server to multiple replicas. It is intended for various recovery tools which optimize by not copying data from local node to recovery temporary location, instead they may tell remote node to send data directly to required locations. It can also be used to move data from one low-level backend (for example eblob) to a newer version or different backend without server interruption.

There is a new iterator type now which sends all keys being iterated to set of remote groups. It does it with the speed of network or disk (what it slower), in local tests iteration over 200Gb blobs sending data over the network to one remote node via write commands ended up with ~78MB/s sustained speed. There were pikes though, especially when remote node synced caches. Both sender and recipient had 30Gb of RAM. Rsync shows ~32MB/s speed on these machines, but not because it is that slow, but because of ssh which maxed out CPU by packet encryption.
Iterator sends dnet_iterator_response structure for each write result for every key it has processed just like for usual iterator, neither API nor ABI is broken.

Second server-send command accepts vector of keys. It searches for all remote nodes/backends which host given keys in the one specified group, splits keys into per-node/backend basis and tells remote backends to send appropriate keys to specified remote groups. The same iterator response is generated for every key which has been processed.

All operations are async and can run in background with other client requests being handled in parallel.

There are 3 operation modes:
1. default – writing data to remote node using compare-and-swap, i.e. only write data if it either doesn’t exist or it is the same on remote servers. Server sending (iterator or per-key) running in this mode is especially useful for recovery – there is no way it can overwrite newer copy with the old data.
2. overwrite – when special flag is set, it overwrites data (clears compare-and-swap logic)
3. move – if write has been successful, remove local key

There is example tool in examples which iterates over remote node and backends and performs copy/move of the keys being iterated. Next step is to update our Backrunner HTTP proxy to use this new logic to automatically recover all buckets in background.

Stay tuned!

Data recovery in Elliptics

As you know, Elliptics includes built-in utility for data synchronization both within one replica and between number of replicas. In this article I describe different modes in which synchronization could be run, how it works and give details on situations when it could be used.

dnet_recovery is the main utility for data synchronization in Elliptics. It is distributed with the package elliptics-client. dnet_recovery supports 2 modes: `merge` and `dc`. `merge` is intended for data synchronization within one replica and `dc` – for data synchronization between number of replicas. Each of the modes can automatically search keys for synchronization or use keys from dump file.

Most of described bellow can be found at recovery doxygen documentation. All doxygen documentation will be published onĀ doc.reverbrain.com soon.

Continue reading

Automatic recovery in elliptics network

Eventually consistent systems are very scalable – its performance by orders of magnitude higher that that in synchronous systems, and its design allows true horizontal scalability without complexities and bugs.

But it comes for price – data recovery is postponed in time, and sometimes quite for a long period. All distributed systems provide some kind of data replication, so this is usually not a problem.
But having complete set of fully recovered replicas is not only safe, it also provides better performance, since clients may read data in parallel from different replicas (in those systems like elliptics which provide this functionality).

To close the gap between data recovery and current data requests, we implemented on-demand recovery in elliptics. This is simply a data writeback into the storage, when we have read the object and detected that one or more replicas are missing.

We preserve timestamps of the records. This method does not imply that there is no need to perform regular data check (with data recovery if needed) anymore, instead it speeds up recovery for the most actively used objects.