Data recovery in Elliptics

As you know, Elliptics includes built-in utility for data synchronization both within one replica and between number of replicas. In this article I describe different modes in which synchronization could be run, how it works and give details on situations when it could be used.

dnet_recovery is the main utility for data synchronization in Elliptics. It is distributed with the package elliptics-client. dnet_recovery supports 2 modes: `merge` and `dc`. `merge` is intended for data synchronization within one replica and `dc` – for data synchronization between number of replicas. Each of the modes can automatically search keys for synchronization or use keys from dump file.

Most of described bellow can be found at recovery doxygen documentation. All doxygen documentation will be published on doc.reverbrain.com soon.

How does dnet_recovery work?

It is better to consider different modes separately.

`merge` synchronization

To run `merge` synchronization use follow lines:

  1. dnet_recovery merge -r %host%:%port%:%family% -g 1,2,3
    Synchronization will perform all nodes from groups: 1, 2 and 3. It will move keys according to route list.
  2. dnet_recovery merge -o %host%:%port%:%family% -g 1
    Synchronization will perform only node specified by -o and will move keys from it according to route list.
  3. dnet_recovery merge -r %host%:%port%:%family% -g 1,2,3 -f /var/tmp/dump_file
    Synchronization will move all keys from dump_file according to route list from old nodes  to the proper nodes.
  4. dnet_recovery merge -o %host%:%port%:%family% -g 1 -f /var/tmp/dump_file
    Synchronization will move all keys from dump_file from the node specified by -o and will move them according to route list.

In first two cases dnet_recovery makes follow steps for each node:

  1. iterates and filters keys with metadata that should not be on the node
  2. for each filtered keys:
    1. lookups key on node that is responsible for the key according to route list
    2. compares metadata of key from both node
    3. if it is necessary read/write key from the current node to the proper node
    4. removes key from the current node

In third and fourth cases dnet_recovery make follow steps for each specified group:

  1. for each key from dump file:
    1. lookups key on each node from the group
    2. compares metadatas of key from different nodes and finds out newest version
    3. if it is necessary read/write newest version of key to the proper node according to route list
    4. removes key from all improper nodes

`dc` synchronization

To run `dc` synchronization use follow lines:

  1. dnet_recovery dc -r %host%:%port%:%family% -g 1,2,3
    Synchronization will perform all keys from groups 1,2 and 3.
  2. dnet_recovery dc -o %host%:%port%:%family% -g 1,2,3
    Synchronization will perform all keys from ranges for which the node specified by -o is responsible. This keys will be synchronized in groups 1, 2 and 3.
  3. dnet_recovery dc -r %host%:%port%:%family% -g 1,2,3 -f /var/tmp/dump_file
    Synchronization will perform all keys from dump_file. This keys will be synchronized in groups 1, 2 and 3.

`dc` synchronization can be customized by specifying custom recovery module by `-C`. By default `dc` uses built-in version of custom recovery module `dc_recovery.py` that could be found there and could be used as an example or adapted/rewritten for working with user data.

In first two cases dnet_recovery makes follow steps:

  1. from route list for each small ranges determines nodes from each group that is responsible for it. If it run with -o, filters ranges for which specified by -o node is responsible.
  2. iterate all nodes from all groups according to determined ranges
  3. for each iterated keys collects metadata: key, list of nodes that has the key and for each node metadata of key from it
  4. save this metadata of all keys to temporary file
  5. run custom recovery module or built-in version of it

Built-in version of custom recovery module makes follow:

  1. for each key from temporary file:
    1. determines lists of groups:
      1. groups that contains newest version of key
      2. groups that contains old version of key
      3. groups that does not contain key
    2. reads newest version of key
    3. if read key is index shard:
      1. reads all versions of key
      2. merge read versions of keys
      3. write final version to all groups
    4. otherwise writes key to all groups that missed the key or have old version of it

In third case dnet_recovery makes follow steps:

  1. skips iteration steps, instead of it for each key from dump file makes lookup to all groups and collects the same metadata: key, list of nodes that has the key and for each node metadata of key from it
  2. save this metadata of all keys to temporary file
  3. run custom recovery module or built-in version of it (the same as in first two cases)

Different situations when dnet_recovery should be used

Let our Elliptics clusters consists of 3 groups and 3 nodes in each group:

For more details let:

  •   group 1 has nodes:
    • host_1_1:1025:2
    • host_1_2:1025:2
    • host_1_3:1025:2
  •  group 2 has nodes:
    • host_2_1:1025:2
    • host_2_2:1025:2
    • host_2_3:1025:2
  •  group 3 has nodes:
    • host_3_1:1025:2
    • host_3_2:1025:2
    • host_3_3:1025:2

Merge recovering after hardware failures

Once problem with hardware was occured at one of the nodes from our cluster. Some time one group (let it be group #1) was working without one node. For this time serviceable nodes was responding for key from problematic node. When this node be restored it will have some keys outdated or missed. To deal with such situation use `merge` recovery on all serviceable nodes from problematic groups.

Let host_1_2:1025:2 be the node with hardware issue. For synchronization data within group 1 after restoring node host_1_2:1025:2 dnet_recovery should be used with follow parameters:

  • dnet_recovery merge -r host_1_1:1025:2 -g 1

or

  • dnet_recovery merge -o host_1_1:1025:2 -g 1
  • dnet_recovery merge -o host_1_3:1025:2 -g 1

Second way allows to run several dnet_recovery in parallel and run each of them directly near the node with which it will process.

If hardware issues were occured at the several groups (let it be 1 and 2), use follow parameters:

  • dnet_recovery merge -r host_1_1:1025:2 -g 1,2

Merge recovering after adding new empty nodes

Once we decided that current cluster capacity is not enough and we should add several nodes to group #3.

After configurating and starting new nodes we need to move keys from old nodes to new ones.

For this we should use `merge` recovery with follow parameters:

  • dnet_recovery merge -r host_1_1:1025:2 -g 1

or

  • dnet_recovery merge -o host_3_1:1025:2 -g 1
  • dnet_recovery merge -o host_2_1:1025:2 -g 1
  • dnet_recovery merge -o host_2_1:1025:2 -g 1

Second way allows to run several dnet_recovery in parallel and run each of them directly near the node with which it will process.

If serveral groups (let it be 1 and 3) have to be enlarged by adding new nodes, use follow parameters:

  • dnet_recovery merge -r host_1_1:1025:2 -g 1,3

DC recovering after network or hardware failures

Lets all 3 groups of our cluster are located in different 3 DataCenters. For some time connection with one of DataCenter (elliptics group) has been lost. After restoring connection data in this replic can be outdated and/or missed.

Let group 2 be problematic group. For restoring consistency between replicas use `dc` recovery with follow parameters:

  • dnet_recovery dc -r host_2_1:1025:2 -g 1,2,3
  • dnet_recovery dc -r host_2_2:1025:2 -g 1,2,3
  • dnet_recovery dc -r host_2_3:1025:2 -g 1,2,3

If the timestamp of issue is known that dnet_recovery can check only keys changed from this timestamp by using -t dnet_recovery option.

DC recovering from dump file after some failures

As a result of scanning logs we have found that some keys are missed/outdated in some groups. Grep hex keys of all found keys and write it to dump file like follow:
1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75
5267768822ee624d48fce15ec5ca79cbd602cb7f4c2157a516556991f22ef8c7b5ef7b18d1ff41c59370efb0858651d44a936c11b7b144c48fe04df3c6a3e8da  acc28db2beb7b42baa1cb0243d401ccb4e3fce44d7b02879a52799aadff541522d8822598b2fa664f9d5156c00c924805d75c3868bd56c2acb81d37e98e35adc
….
5ae625665f3e0bd0a065ed07a41989e4025b79d13930a2a8c57d6b4325226707d956a082d1e91b4d96a793562df98fd03c9dcf743c9c7b4e3055d4f9f09ba015

For synchronization only these keys from all groups (replicas) use `dc` recovery with follow parameters:

  • dnet_recovery dc -r host_1_1:1025:2 -g 1,2,3 -f /path/to/dump_file

Merge recovering from dump file after some failures

As a result of scanning logs we have found that some keys are missed/outdated in some groups. Grep hex keys of all found keys and write it to dump file like follow:

1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75  5267768822ee624d48fce15ec5ca79cbd602cb7f4c2157a516556991f22ef8c7b5ef7b18d1ff41c59370efb0858651d44a936c11b7b144c48fe04df3c6a3e8da  acc28db2beb7b42baa1cb0243d401ccb4e3fce44d7b02879a52799aadff541522d8822598b2fa664f9d5156c00c924805d75c3868bd56c2acb81d37e98e35adc
….  5ae625665f3e0bd0a065ed07a41989e4025b79d13930a2a8c57d6b4325226707d956a082d1e91b4d96a793562df98fd03c9dcf743c9c7b4e3055d4f9f09ba015

For synchronization only these keys within group 1 (replicas) use `merge` recovery with follow parameters:

  • dnet_recovery merge -r host_1_1:1025:2 -g 1 -f /path/to/dump_file