Facebook’s scale is out of the radar for the vast majority of cases, but yet it is very interesting to lookup new ideas there.
Audience insight is a way to show us pages which were are about to like with higher probability. It is based on previously liked pages, gender, location and many other features.
In particular at Facebook’s scale it is about 35 Tb of raw data and query ‘give me several pages which should be shown to user X’ must be completed within hundreds of milliseconds. That requires to process hundreds of billions of likes and billions of pages – numbers beyond reasonable – and in fraction of a second.
It happens that 168 nodes can handle it with some magic like columnar storage, in-memory data, bitmap indexes, GPU and caches. Very interesting reading!