Tag Archives: machine learning

Using reinforcement learning to find maze exit

Although this task may look rather straightforward – use A* or similar algorithm, real life can make the problem much more complex: you have to find an exit from the maze when you do not know where it is.

You can only observe maze (or grid world – widely used term in RL community) at the point where you are located at the moment or where you were before (to some extent). You do not know the size of a maze or where exit is located. This makes A* family of algorithms useless.

Grid world’s environment provides you with some reward like -1 for every step and +100 for the exit, and it can also suddenly move you or affect in some other way (consider sudden wind blow which moves your helicopter or road slope which affects how your car moves).

I created a rather simple solver for this kinds of worlds using reinforcement learning. It is yet exponential to solve the maze, also it requires quite a lot of steps to converge, but this is only the beginning.

Every path agent takes adds into global convergence map using Bellman’s equation, it requires about 100k random transactions to complete (arrow shows the steepest way to the discovered and most rewarded destination point). Since every transaction runs from the beginning to the end using either steepest descent learned so far or some random exploration path, this is equivalent to Monte-Carlo search method. It is rather slow to converge, and it should be replaced with temporal difference TD(lambda) algorithm, I expect it to be about 100 times faster.

Stay tuned!

Crafting knowledge base the right way

Google is automatically building its next generation knowledge graph named Knowledge Vault

Although article is very pop-science (not science at all actually) and doesn’t contain any technical detail, it is clear on google’s idea and the way information retrieval systems head. Automatic knowledge gathering and fact extraction is also what I originally aimed at Reverbrain company, although my idea was much simpler – I wanted to automatically build a language model and fact relations between words to understand native language questions.

Aug 25 there will be a presentation of Google’s Knowledge Vault, I’m too much tempting to see it and try to gather and understand bits of information on how it is implemented inside.

Upfate: a paper on knowledge vault: Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion

Machine learning, optimization and event stream management

This is a bit different post – it is not about storage per se, but actually it is.

Let me start a bit from the other side – I’ve read Netflix article about how they created their excellent recommendation service ( http://techblog.netflix.com/2012/06/netflix-recommendations-beyond-5-stars.html ), and it raised a question in my head – is there a way for system to heal itself if it can learn?

Well, machine learning is not quite a learning – it is an explicitly defined mathematical problem, but I wonder whether it can be applied to optimization problem.
And to simplify things, lets move away from storage to robots.

Consider a simple scenario – robot moves over the road to its final destination point. It has a number of sensors which tell its control system about speed, edge, current coordination, wheel state and so on. And there are 2 rotated wheels.

At some point robot’s right wheel moves into the hole. Robot starts turning over and in a few moments it will fall.
How to fix this problem?

There are numerous robot control algorithms which very roughly say ‘if sensor X says Y do Z’ and those parameters may vary and be controlled by the management optimization.
But what if we step away to more generic solution – what if we do not have such algorithms. But we can change wheel rotations and perform some other tasks which we do not know in advance how they affect current situation.

Solution to this problem allows to solve a storage problem too – if storage control system can write into multiple disks, which one to select so that read and write performance would be maximized, space is efficiently used and so on.

A bad solution uses heuristics – if disk A is slow at the moment, use disk B. If robot falls to the right, rotate the right wheel. And so on. It doesn’t really work in practice.

Another solution – a naive one – is to ‘try every possible solution’ – rotate the right wheel, if things are worse, stop it. Rotate the left wheel – check sensors, if situation changed – react accordingly cialis rezeptfrei holland. Combine all possible controls (at random or at some order), adjust each control’s ‘weight’, hopefully this will fix the things. But when feature space is very large, i.e. we can use many controls – this solution doesn’t scale.

So, what is a generic way to solve such a problem?
Or speaking a little bit more mathematician way – if we have a ‘satisfaction’ function on several variables (we move to given direction, all sensors are green), and suddenly something out of our control started happening, which changed satisfaction function badly (sensors became red), how to use our variables (wheels and other controls) to return things back to normal?