It happend that my previous idea of using socket buffer and VFS pages is very wrong. Mainly because of POHMELFS transaction nature. Transaction must stay in memory until remote server acknoledges its data.
But what will happen when second write is about to update the same area? We can not overwrite data, since then we will lost previous transaction and there will be no way to resend it and store elsewhere on timeout or other error. Instead we should allocate new buffer and copy data there. But this is not that simple, since we have to update VFS page cache, and thus to evict previous page first. Also all pages have to be somehow linked, so that when transaction is committed, appropriate pages could be freed.
Other filesystems, namely btrfs, waits until writback is over on the page about to be overwritten, which may or may not be a good idea for the overwrite workload, and I expect it actually to be a bad idea, especially for the high-latency storages, but it is noticebly simpler to implement. Buffer heads used to track partial page updates are quite heavy and not really needed for my case, so I will implement trivial tags attached to pages, and when overwrite is going to happen, system will wait for the pages in question to be flushed to the remote server, and then overwritten in place creating new transction.
Above tags are needed for the usual writeback – we will not really write data at writeback time, instead we will find transactions which refer to given page and resend them. In the perfect case, which I expect to happen most of the time, there should be no such stall transactions at all, since they will be quickly acked soon after write time when we will send data to the server, but it is still possible that there are no quick acks, so writeback can fire the inode.
That’s the plan, now back to drawing board to actually find out how pages should be attached to transactions… Stay tuned!