Elliptics python

Tagged:  

I extended C++ and Python bindings for elliptics library, although python part was a little bit messy at first.

Python is massively ... single-threaded language: GIL is a tricky global lock monster, which does not easily allow to implement not only threads but also async communications. Of course python has threads, but they are internal entities which can not be worked with from the outside system threads.

Contrary elliptics network library is a multi-threaded application, and the main problem related to python was its async completion notifications. When transaction is finished or being processed, remote side can send multiple replies about its state (like chunks of data being read for exampl), which are processed in different thread than original sending one.

Python does not expect itself to be interrupted by those callbacks (even if we properly wrap them into python classes). But still we can (or it can be called a hack) invoke async python callbacks from C/C++ code and external threads.

Python may have multiple execution threads, or states, and at startup we have to select the one, which will be used to invoke our C++ callbacks. In older python versions it took quite a bit of efforts: stack selection, saving it somewhere in private data, then switch to/from it and so on. In newer python versions it is just as simple as calling PyEval_InitThreads(). Python thread which called it first will be selected as the one to dispatch exernal callbacks. Then just doing

PyGILState_STATE st = PyGILState_Ensure();
this->get_override("some_virtual_callback_invoked_from_cpp")(its, data);
PyGILState_Release(st);

will schedule C++ callback invocation. It will take care about thread state and GIL.

And when I managed to finally implement all wrappers and helpers for async bidirectional C++-to-Python communication, I dropped its support. Just because it is much simpler to read/write data using blocking calls, which is I believe the most common Python programming model.

That's how this works in python now:

#!/usr/bin/python

from libelliptics_python import *
from array import *
import sys

id = array('B')
for x in xrange(0, 20) :
	id.append(x + 1)

trans = array('B')
for x in xrange(0, 20) :
	trans.append(1)

try:
	log = elliptics_log_file("/dev/stderr", 15)
	n = elliptics_node_python(id.buffer_info()[0], log)

	t = elliptics_transform_openssl("sha1")

	n.add_transform(t)
	# weird thing happens if I write n.add_transform(elliptics_transform_openssl("sha1"))
	# we crash somewhere inside c++ binding, probably because I implemented lazy
	# reference counting model (i.e. not at all :)
	# thus object MUST live after this function is completed
	# this should be fixed of course with proper copy constructors
	# the same applies to logger actually

	n.add_remote("devfs8", 1025)

	#n.write_file(trans.buffer_info()[0], "/tmp/test_file", 0, 0, 0)
	#n.read_file(trans.buffer_info()[0], "/tmp/test_file.read", 0, 0)

	data = array('B', "1234567890")
	n.write_data(trans.buffer_info()[0], data.buffer_info()[0], 0, data.buffer_info()[1])

	read = array('B')
	for x in xrange(0, len(data)) : read.append(0)

	n.read_data(trans.buffer_info()[0], read.buffer_info()[0], 0, read.buffer_info()[1])

	for x in xrange(0, len(data)) :
		print data[x], " ", read[x]
except:
	print "Ooops, error:", sys.exc_info()[0]

$ ./test.py  # written and read data from example above
49   49
50   50
51   51
52   52
53   53
54   54
55   55
56   56
57   57
48   48

Also finished proper object copy for logger, it will clone logger and when proper methods are implemented one can create own private python-made loggers. But that's details.

To date I consider python bindings as well as C++ ones fully finished. C++ has async callbacks as well as blocking sync IO operations.

You know, once upon a time I almost posted links to Libatomic to the Python-Dev list. I thought "hey, one of the reasons Python has to be careful with threads is the reference counting, maybe atomic INCREF/DECREF would help.

It looks like one of the core developers has since taken a look at it:
http://mail.python.org/pipermail/python-ideas/2009-November/006599.html

Still, I hope you get curious and take a look at Python's internals. The worst that could happen would be nothing ;)