Tag Archives: twitter

How is Redis used in twitter

Everyone knows Redis – high-performance persistent cache system. Here is an article on how Redis is being used in Twitter.

It happens that Twitter not only forked and extended 1 year old Redis version, but looks like it doesn’t have plans to upgrade. Redis and its latencies are much-much-much faster than Twitter infrastructure written in Java because of GC in JVM. This allows to put a bunch of proxies on top of Redis caching cluster to do cluster management, the thing Redis misses for a while.

Also Twitter uses Redis only to cache data, doesn’t care about consistency issues, doesn’t use persistent caching, at least article says data is being thrown away when server goes offline.
It is client responsibility to read data from disk storage if there is no data in the cache.

Article desribes Twitter timeline architecture, and that’s quite weird to me: instead of having list of semifixed (or limited by size) chunks of timeline which are loaded on demand, they created a bunch of realtime updated structures in Redis, found non-trivial consistency issues and eventually ended up with the same simple approach of having ‘chunks’ of timeline stored in cache.

I started to compare cache management in Twitter using Redis with what we have in Reverbrain for caching: our Elliptics SLRU cache. It uses persistent caching system (which was also described a bit in article in comparison with memcache), but also uses persistent storage to backup cache, and while cache is actually segmented LRU, its backing store can be arbitrary large at size compared to Redis.

Although article is written as ‘set of facts’ somehow cut out of context (it was interview with the twitter employee), it is a good reading to think about caching, JVM, Redis and cache cluster architecture.

Twitter realtime search engine

Twitter uses humans (pool of in-house ‘turks’ in mechanical turk) each time new trending topic is being propagated to search results: http://engineering.twitter.com/2013/01/improving-twitter-search-with-real-time.html

Every time.

And Storm is only used to gather statistics and detect trending topics. It uses Thrift to upload new active search term to Amazon’s Mechanical Turk.

Instead we want Grape – our realtime processing engine – to be able to perform much more complicated tasks. In particular, we implement secondary indexes and realtime search in elliptics over grape.

Grape’s ultimate goal is to implement a platform for every kind of realtime processing tasks. For this purpose we are developing a technology for guaranteed event processing, pipeline restart, event order preserve and so on. In realtime search this will be something like emit new event with new document uploaded to elliptics distributed storage, and that event will trigger whole search indexing (like stemming, language detection, inverted index updates and so on), and if one of those steps fail, we will restart indexing from failed point and proceed.