Friday, December 22, 2006

Functional Python Sorting Using Short Circuited Lambda

Given a python list there are two inbuilt ways that the list can be sorted. One is using the lists sort() method. This will sort the list in-place. The second is the sorted() function introduced in Python 2.4. The sorted function returns a sorted list, but leaves the original list untouched. It is intended to be used functionally or in list comprehensions. So: sorted()
>>> a=[40,-1,3,100,-24,0.3,-99]
>>> from math import pi, sqrt
>>> [sqrt(val*pi) for val in sorted(a) if val>=0]
[0.97081295627784958, 3.0699801238394655, 11.209982432795858, 17.724538509055161]
notice list 'a' remains unchanged
>>> a
[40, -1, 3, 100, -24, 0.29999999999999999, -99]
sort() The sort method sorts the list in place.
>>> a.sort()
>>> a
[-99, -24, -1, 0.29999999999999999, 3, 40, 100]
cmp Both of these functions take a 'cmp' argument that passes in a comparison function. The function will be passed two of the objects for comparison. By definition it should return a value less than 0 if the first value is less than the second, a value greater than 0 if the first value is more than the second, and 0 if the values are the same. A slow but illustrative example is the following:
def our_compare(x, y):
 if x>y:
   return 1
 elif x<y:
   return -1
 else:
   return 0

and then perform our sort as

a.sort(our_compare)
or
sorted(a,our_compare)

It would be great if we could wrap this comparison in a lambda expression. But lambdas, being a functional paradigm, only allow expression evaluation. One way that is commonly referred to is to do something like the following

a.sort( lambda x,y: x-y )

The disadvantages of this method is firstly it isn't very flexible, and secondly if the result of the subtraction is outside the range allowed by the variable type, then the sort can not be relied upon. The technique that can help us here is short circuited boolean expressions.

Short Circuited Booleans

For those of you not familiar with expression short circuiting, it can help you construct conditional blocks in a single expression. It can effectively give lambdas an if/then/else construct. It is not just limited to lambdas though. It can be used in regular code as well, of course.

Have a look at the boolean 'or' table below

A B A or B
False False False
False True True
True False True
True True True

If we imagine we are the computer for a second and we are evaluating the expression 'A or B'. Notice that from the table, if A is true then 'A or B' is always true. That is, if we evaluate 'A' to be true, then there is no need to evaluate 'B' at all, because the outcome is always True. This is the essence of short circuited boolean evaluation. The step where the computer can definitely know the result is the step where it stops evaluation and returns the result.

If the value of A is False, then B needs to be evaluated and that outcome is the final evaluation. You can see that 'or' acts as if it were an 'if not' in a conditional structure. It also acts like an 'else'. More on that in a second.

Lets look at an 'and' and see how that behaves in a short circuited language.

A B A and B
False False False
False True False
True False False
True True True

Imagining we are the computer again, if we evaluate 'A' and find it to be false, then we can skip the evaluation of 'B' because the answer will always be false. However if we evaluate 'A' and it turns out to be true then we must evaluate 'B', and the final result is then whatever 'B' is evaluated to. In this way an 'and' behaves like an 'if' statement.

By using these two operators, complex if/elif/else conditional blocks can be expressed as a single expression. Evaluating the expression is equivalent to running the equivalent conditional code. In summary we have the following equivalencies. Don't memorise this table. You should be able to 'think short circuited' and mimic the computers evaluation procedure when writing your code. But here's a nice table for you to get a grip on it.

Python Conditional Equivalent Expression
if A: B A and B
if not A: B A or B
if A: B else: C (A and B) or C
if A: B elif C: D else: E (A and B) or (C and D) or E
The disadvantage of these short circuit expressions is readability, which can degrade significantly with complex nesting, so comment liberally. They are also purely functional, which means A, B, C and D cannot be statements, only expressions. Also beware when the expression after the 'and' statement could be interperated as the boolean false. For example, in "(A and B) or C", if B is False or 0 or "" or None or [] or () then C will be evaluated and returned regardless of the truth of A, because the whole of (A and B) is evaluated to be false. As a brief side note this problem can be accounted for by making the returns be a list of one element like [(A and [B]) or [C])[0]. Even with these all these limitations, they help us in many instances to construct more effective lambdas. So our_compare can be rewritten as a single expression: "(x>y) or (x<y and -1)". Notice there is no need to say (x>y or 1) because x>y already evaluates to True(1) if it is correct. Also there is no need for a trailing "or 0" because the result will be false if neither of the earlier conditionals are true. This leads us to write our sort as:
a.sort( lambda x,y: x>y or (x<y and -1) )
give this a go and you will see it works. Applications So how can we apply this to save ourselves time and bugs? One application is for code that requires multiple sorts on lists of objects. If we have a list of objects that always need to be sorted the same way, then we can override the objects _cmp_ method in its class definition and place our cmp function in there. However sometimes we may have a list of objects that we want to sort in different ways, on different values, at different points in the code. This comes up quite often when dealing with groups of game objects. For instance:
# a list of the game object from top of screen to bottom ERRATA: see comments for correction
toptobottom=objlist[:].sort(lambda a,b: a.y>b.y or (a.y<b.y and -1))

# a list of the same objects from left of screen to right
lefttoright=sorted(objlist, lambda a,b: a.x>b.x or (a.x<b.x and -1))

# draw all the game objects in z order
[obj.Draw(renderer) for obj in sorted( objlist, lambda a,b: a.zindex>b.zindex
  or (a.zindex<b.zindex and -1) ) ]
This is just scratching the surface of what can be done. All kinds of conditional structures can be reduced by short circuiting, and when combined with methods like sort() and even more so with list comprehensions, extremely complex operations can be performed with much less code (and bugs) than an ordinary procedural approach. Try bringing some of these functional techniques into your code. Try and think of ways that you can use it to reduce your burden. It doesn't take long to grasp the idea, but it does take practice to be efficient at it and also to recognise some of the more esoteric places it can be used. Give it a go, and merry christmas!