I decided to find out how NFS managed to have that fast random read performance (with so slow sequential read), and started a 8gb random IO test in IOzone. And machine started to die. I already killed it three times for this day, and reason is likely in the NFS server. That’s what
slabtop shows on the server:
Active / Total Objects (% used) : 4741969 / 4755356 (99.7%) Active / Total Slabs (% used) : 201029 / 201049 (100.0%) Active / Total Caches (% used) : 91 / 162 (56.2%) Active / Total Size (% used) : 750871.15K / 753121.28K (99.7%) Minimum / Average / Maximum Object : 0.01K / 0.16K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 1798890 1798672 99% 0.12K 59963 30 239852K cred_jar 1798320 1798307 99% 0.25K 119888 15 479552K size-256 1091430 1091401 99% 0.05K 16290 67 65160K buffer_head 18824 17997 95% 0.28K 1448 13 5792K radix_tree_node
size-256 slabs constantly grew during the test, so I suppose there is a leak in the current kernel (iirc there were no leaks in .28), while I’m waiting for Trond Myklebust for comments, I thought on how NFS is capable to have higher random read than sequential one.
The main theory is its request combining on the client. I.e. when system joins two random but close enough requests, server will send not only requested data, but also additional region between them. Or some similar logic. I.e. essentially increased readahead by both client and server.
If this theory is correct, then simple way to solve it is to increase readahead in POHMELFS, or actually not to shrink it in some conditions. I will try this idea…