ioremap.net

Storage and beyond

Python loving psto

I love languages with rich standard library. Python is just awesome in this regard.

But amount of already written extension is outstanding – I parsed HTML using regexps in Lisp, but in Python with python-lxml it took just couple of hours to parse rather broken html using xpath and small string matching calls.

I spent one day to write a parser of non-structured morphological data (frequently with suddnly unwanted symbols or additional tags within) from aot.ru to create a quite large (300k+ morphems) russian dictionary, and then to store it into prefix array and ouput as XML file.

Yes, default CPython sucks with threads, it is not (yet) suitable for trivial audio processing (play and stop sound when pressing/releasing a key), but it is just bloody ubergood at high-level prototyping.

Returning back to morphological analysis I’m about to start rewriting my experimental knowledge extraction and grammatic generation ‘engine’ from Lisp to Python. And I expect to have some cool results with it soon.

Comments are currently closed.

2 Responses to “Python loving psto”

  • Anonymous says:

    Hey Evgeniy,
    this might be of interest for you: http://blip.tv/file/2232410 . An insightful talk about the GIL and CPython threading. If you haven’t seen it and are interested in the topic an absolute must.

    Great to see you’re as productive as ever! BTW is there any chance you might change the license of libeblob to LGPL (or similar)?
    Johann

  • zbr says:

    Albeit quite long – an hour, but I suppose I must watch it. Thanks for the link.

    I see no problem in libeblob relicensing, although it uses list.h header from kernel source tree, which is GPL licensed. I suppose I can relicense my work and replace header if there will be compliants…