Data has a natural live cycle – the newer the data the most actively it is accessed (read or viewed). This is exceptionally the case for digital content like pictures, movies, letters and messages. Facebook has found that the most recent 8% of the user generated content accounts for more than 80% of data access.
But the old data still has to be saved for rare access. Those servers which store the old data are virtually idle and kind of useless – the only thing they do most of the time is heating the air and eating an electricity, which is quite pricey.
Facebook experiments with moving this old data (actually only a backup copy of data) into special “Cold storage” which has two particular features: it can shutdown already written disks and data can be copied from those backup disks into Blu-Ray rack of disks with special robot handling blu-ray drives loading.
Besides the fact that blu-ray disk is much more robust than spinning disks (it doesn’t afraid of water or dust for example, it can store data for 50 years until something new appears) the rack of disks doesn’t eat pricey electricity.
Everyone knows Redis – high-performance persistent cache system. Here is an article on how Redis is being used in Twitter.
It happens that Twitter not only forked and extended 1 year old Redis version, but looks like it doesn’t have plans to upgrade. Redis and its latencies are much-much-much faster than Twitter infrastructure written in Java because of GC in JVM. This allows to put a bunch of proxies on top of Redis caching cluster to do cluster management, the thing Redis misses for a while.
Also Twitter uses Redis only to cache data, doesn’t care about consistency issues, doesn’t use persistent caching, at least article says data is being thrown away when server goes offline.
It is client responsibility to read data from disk storage if there is no data in the cache.
Article desribes Twitter timeline architecture, and that’s quite weird to me: instead of having list of semifixed (or limited by size) chunks of timeline which are loaded on demand, they created a bunch of realtime updated structures in Redis, found non-trivial consistency issues and eventually ended up with the same simple approach of having ‘chunks’ of timeline stored in cache.
I started to compare cache management in Twitter using Redis with what we have in Reverbrain for caching: our Elliptics SLRU cache. It uses persistent caching system (which was also described a bit in article in comparison with memcache), but also uses persistent storage to backup cache, and while cache is actually segmented LRU, its backing store can be arbitrary large at size compared to Redis.
Although article is written as ‘set of facts’ somehow cut out of context (it was interview with the twitter employee), it is a good reading to think about caching, JVM, Redis and cache cluster architecture.
A good entry-level article on how replication (including multidatacenter) is implemented in Cassandra.
Basic idea is random (iirc there is weighted RR algorithm too) remote node selection which behaves like local storage (commit log + memory table) and ‘proxy’ which asynchronously sends data to remote replicas. Depending on strategies those remote replicas can be updated synchronously and can be in a different datacenter.
Seems good except that rack-awareness and complexity of the setup is not covered. Also synchronous update latencies like writing data to one node and then propagate update to others as well as local queue management are not covered at all.
But that’s a good start.