First elliptics network HTTP frontend benchmark results

Tagged:  

We ran performance testing quite for a while already, but system was not tuned for the maximum performance, so I wanted to show the best ones, but so far it is a little bit postponed, so in a day or so I will have another set, and now will post what we already got.

HTTP proxy was configured as elliptics node which performed node lookup for every request it got, so it returned small XML with download information and not object itself. We setup single lighttpd process and 100 fastcgi daemons. It ran on 2-way 32-bit Xeon (2 physical + 2 HT processors) with 8 Gb of RAM.

Elliptics storage network contained 8 nodes on 4 physical servers, where each one had 8 cores of 64-bit Xeons with 8 Gb of RAM. Each elliptics node ran on top of its own SCSI disk, it was configured to work with file storage backend, i.e. each object written was separate file. I used reiserfs because of its good performance for this workload.

Everything was connected by 1 Gbit ethernet network.

I do not know what is the client software, but it provides details statistics about reply time and allows to show nice graphs (in flash though).

So, the first results - reply time and rate. We were maxed at 5k rps and up to 4k rps behaviour was very good. At 5k rps reply time started to degrade although still was able to match request rate. At this point single lighttpd process was not able to dispatch requests fast enough, it got close to 100% CPU usage while elliptics fastcgi processes loafed at 3-5 % maximum. That's why I want to rerun this test with more lighttpd processes (namely 4 for 2 CPUs + 2 HT CPUs).
First graph shows number of replies per second changing with time (and number of requests per second). Legend says that green is number of momentary replies, blue is median number of replies, red - load in rps. Second graph shows reply time. Dark blue is momentary reply time in ms, orange is median reply time and red is load in rps.


Number of replies and its time

Next graphs show reply time distribution and HTTP reply status codes depending on workload. The first one is workload scheme and reply time distribution in ms, second one is HTTP status code distribution.


Reply time and HTTP status codes distribution

It was redirect data test, i.e. ellipitcs node on proxy server did not download data but only requested remote nodes whether they contain needed object, and returned some small formatted XML data with download info, which could be parsed by the client to create direct URL to data object to be fetched. Also proxy server generated cookies and secure authentification codes.

Next task is to run a test where data is actually downloaded through the elliptics network HTTP proxy daemon. Plan is to upload several thousands of small files (5-15 Kb each) and fetch them through this proxy.

And of course tune server software for maximum performance. I want to get 10k rps from that rather old 2-way machine in redirect test. I have another test machine with fair 8 modern cores, where I can setup this proxy too, so those data will also be interesting.

Stay tuned!