pohmelfs: network raid1 example
Pohmelfs configuration is actually trivial:
# mount -t pohmelfs -o "server=172.16.136.1:1025:2,fsid=xxx,groups=3:2:1,noatime,noreadcsum,successful_write_count=1,sync_timeout=600,readdir_allocation=5" none /mnt/
where ‘server’ mount option specifies IP address in form address:port:family (2 – ipv4, 6 – ipv6). It is ok to specify only subset of all cluster IP address – pohmelfs will download route table itself, it only needs at least one alive node at connection time to discover other nodes
‘groups’ mount option specifies groups you want to write data into. Group is kind of replica ID.
That’s all for pohmelfs.
Let’s configure elliptics and create 2 groups (with id 2 and 3 for example), which will store essentially identical replicas (they may differ, since writes can be unordered, or one group may be down for some time)
There are 2 configuration files – elliptics server (let’s call it ioserv.conf) and server-side script environment (we use python, so it is python.init).
Here is ioserv.conf
I will highlight parameters, which differ in separate groups
# log file # set to 'syslog' without inverted commas if you want elliptics to log through syslog log = syslog # log mask #log_mask = 10 log_mask = 15 # specifies whether to join storage network join = 1 # config flags # bits start from 0, 0 is unused (its actuall above join flag) # bit 1 - do not request remote route table # bit 2 - mix states before read operations according to state's weights # bit 3 - do not checksum data on upload and check it during data read # bit 4 - do not update metadata at all # bit 5 - randomize states for read requests flags = 4 # node will join nodes in this group group = 2 # list of remote nodes to connect # address:port:family where family is either 2 (AF_INET) or 6 (AF_INET6) # address can be host name or IP remote = 172.16.136.1:1025:2 172.16.136.2:1025:2 # local address to bind to # port 0 means random port #addr = localhost:1025:2 addr = 172.16.136.1:1025:2 # wait timeout specifies number of seconds to wait for command completion wait_timeout = 60 # this timeout specifies number of seconds to wait before killing # unacked transaction check_timeout = 60 # number of IO threads in processing pool io_thread_num = 64 # number of IO threads in processing pool dedicated to nonblocking operations # they are invoked from recursive commands like DNET_CMD_EXEC, when script # tries to read/write some data using the same id/key as in original exec command nonblocking_io_thread_num = 32 # number of thread in network processing pool net_thread_num = 64 # specifies history environment directory # it will host file with generated IDs # and server-side execution scripts history = /opt/elliptics/history.2 # specifies whether to go into background daemon = 1 # authentification cookie # if this string (32 bytes long max) does not match to server nodes, # new node can not join and serve IO auth_cookie = qwerty # Background jobs (replica checks and recovery) IO priorities # ionice for background operations (disk scheduler should support it) # class - number from 0 to 3 # 0 - default class # 1 - realtime class # 2 - best-effort class # 3 - idle class bg_ionice_class = 3 # prio - number from 0 to 7, sets priority inside class bg_ionice_prio = 0 # IP priorities # man 7 socket for IP_PRIORITY # server_net_prio is set for all joined (server) connections # client_net_prio is set for other connection # is only turned on when non zero server_net_prio = 1 client_net_prio = 6 # anything below this line will be processed # by backend's parser and will not be able to # change global configuration # backend can be 'filesystem' or 'blob' backend = blob # zero here means 'sync on every write' # positive number means data amd metadata updates # are synced every @sync seconds sync = 300 # eblob objects prefix. System will append .NNN and .NNN.index to new blobs data = /opt/elliptics/eblob.2/data # Maximum blob size. New file will be opened after current one # grows beyond @blob_size limit # Supports K, M and G modifiers blob_size = 500G # Maximum number of records in blob. # When number of records reaches this level, # blob is closed and sorted index is generated. # Its meaning is similar to above @blob_size, # except that it operates on records and not bytes. records_in_blob = 10000000
Our second replica will live in group 3, so you should change above ‘group’ parameter to 3 as well as node’s address and optionally ‘remote’ parameter, which is a list of nodes to connect. It can include local address itself.
Second configuration file is python.init
It must live in directory specified in ‘history’ parameter above
You should put all srw/pohmelfs* scripts in ‘history’ path too.
import sys sys.path.append('/tmp/dnet/lib') sys.path.append('/opt/elliptics/history.2') from libelliptics_python import * # groups used in metadata write pohmelfs_groups = [1, 2, 3] pohmelfs_log_file = '/opt/elliptics/history.2/python.log' log = elliptics_log_file(pohmelfs_log_file, 10) n = elliptics_node_python(log) # we should only add own local group, since we do not want all updates to be repeated for all groups # this should be changed to 3 for group number 3 n.add_groups() # this is an IP address for local node, i.e. server, which belongs to group 2 # you may specify multiple addresses with multiple calls n.add_remote('172.16.136.1', 1025) __return_data = 'unused' import gc import struct # python sstable implementation from sstable2 import sstable import logging FORMAT = "%(asctime)-15s %(process)d %(script)s %(dentry_name)s %(message)s" logging.basicConfig(filename=pohmelfs_log_file, level=logging.DEBUG, format=FORMAT) pohmelfs_offset = 0 pohmelfs_size = 0 # do not check csum #pohmelfs_ioflags_read = 256 pohmelfs_ioflags_read = 0 pohmelfs_ioflags_write = 0 # do not lock operation, since we are 'inside' DNET_CMD_EXEC command already pohmelfs_aflags = 16 pohmelfs_column = 0 pohmelfs_link_number_column = 2 pohmelfs_inode_info_column = 3 pohmelfs_group_id = 0 def pohmelfs_write(parent_id, content): n.write_data(parent_id, content, pohmelfs_offset, pohmelfs_aflags, pohmelfs_ioflags_write) n.write_metadata(parent_id, '', pohmelfs_groups, pohmelfs_aflags)
Putting together this initialization script (you may edit one in source tree) with pohmelfs scripts ends up adding support for server-side scripts executed with above context.
There is a pool of processes which pick up execution contexts to run your requests.
That’s it, feel free to ask if you hit any problem!