Eventually consistent updates is considered to be bad design choice, but sometimes it is not possible to live without updates, one has to overwrite data and can not always write using new keys.
How harmful could this be at large scale? At Facebook’s scale.
Paper “Measuring and Understanding Consistency at Facebook” measures all consistency anomalies in Facebook TAO storage system, i.e., when results returned by eventually consistent TAO differ from what is allowed by stronger consistency models.
Facebook TAO has quite sophisticated levels of caches and storages, common update consists of 7 steps each of which may end up with temporal inconsistency.
And it happens that eventually consistent system even at Facebook scale is quite harmless, somewhere at the noise level, like 5 out of million request which violate linearizability, i.e. you overwrite data with the new content but read older value.
You may also check shorter gist paper describing how Facebook TAO works, how they measured consistency errors (by building update graph for each request from random selection out of billions facebook updates and proving there are no loops) and final results.
Interesting USENIX paper about Eiger – new consistency system in a distributed storage.
A short gist of it is quite simple: client tracks what it saw, thus operations must obey/fix dependency on those objects.
Each datacenter maintains whole replica of data, and data within datacenter can not be lost as well as its update is always consistent. This is achieved by Paxos within datacenter.
Eiger is based on Spanner – much hyped Google distributed storage with atomic clocks, GPS and other such cool stuff.
Because of that Eiger has so called logical clocks – timestamps unique across all datacenters, this is achieved via aforementioned atomic clocks and GPS. Given those unique IDs servers order operations and client can track dependencies.
Eiger is a next step from simple key-value storage, it supports columns and read-only/write-only transactions. Transactions are based on dependencies.
Write operations are replicated between datacenters, this is being done by the server which received data from client. Replication just sends data to other servers in different datacenters, which compare unique timestamps, and if timestamp is older than that in replica, update is discarded – the last writer wins.
I did not really read how transactions that spans multiple datacenters are implemented – real life applications do not have atomic clocks and GPS to implement distributed uinque timestamp, thus it will not be able to work with such system. In a real life we either have to deal with eventual consistency or not being able to scale to web sizes.
Elliptics has eventual consistency model, albeit with ability to read latest updated data among multiple replicas, and that’s so far the only way to implement web-scale volumes (our largest cluster hosts 36+ billions of records – about 2 instagrams).