I will bring back various tech article analysis in this blog
Let’s start with Mesa – Google’s adwords storage and query system. Albeit the article is very self-advertising and a little bit pathos, it still shows some very interesting details on how to design systems which should scale to trillions operations a day.
In particular, data is versioned and never overwritten. Old versions are immutable. Queries aka transactions operate on given version, which are asynchronously propagated over the google’s world datacenters. The only sync part is versioning database – committer updates/pins new version after update transaction criteria has been met, like number of datacenters have a copy and so on. As usual – it is Paxos. Looks like everything in Google uses Paxos, not Raft. Recovery, data movement and garbage collection tasks are also performed asynchronously. Since the only sync operation is version update, committer (updating client) can write heavy data in parallel into the storage and only then ‘attach’ version to those chunks.
Data is stored in (table, key, value) format, and keys are split in chunks. Data is stored in compressed columnar format within each chunk. Well, Mesa uses LevelDB, it explains and simplifies such things.
Mesa uses very interesting versioning – besides so called singletons – single version update, there are also ranges like [0, v0], [v1, v5], [v6, vN] named ‘deltas’ which are basically batches of key updates with appropriate versions. There are async processes which updates version ranges: compaction [0, vX] and new batch creation processes. For example to operate a query on version 7 in example above one could ‘join’ versions [0, v0], [v1, v5] and apply singletons v6 and v7. Async processes calculate various ‘common’ ranges of versions to speed up querying.
Because of versioning Mesa doesn’t need locks between query and update processing – both can be run heavily in parallel.