python sorting and cmp_to_key

I learned something cool today related to sorting lists in python. One way to do this is to use a function called "sorted":

>> sorted([3, 1, 4, 2, 5])
[1, 2, 3, 4, 5]

The function takes a collection and, optionally, a key - a function which, for a given element in the collection, returns a key by which that element can be sorted. This can be useful when you want to sort by a specific attribute on an object, for instance:

def sort_by_height(collection):
    return sorted(collection, lambda obj: obj.height)

At times, though, it's not obvious how to produce a key mapping. For example, today I did day 13 of Advent of Code 2022 (better late than never), which asks you to implement a rather convoluted algorithm for comparing "packets", which consist of arbitrarily nested lists of integers. Once you've implemented this comparison function, applying it to a pair of packets tells you whether one is "larger" than another. However, there's no obvious way to derive a key for each packet.

Advent of Code day 13

In fact, this kind of comparison function used to be the canonical way to run custom sorts back in python 2. "Old-style" comparison functions take two elements of your collection—let's call them a and b—and return an integer indicating the result of the comparison: 0 if a equals b, -1 a is less than b, and +1 if a is greater than b.

As you can imagine, translating these comparison functions to key functions manually would make migrating from python 2 to 3 pretty miserable. Fortunately, the python standard library provides a handy helper function in the functools module. cmp_to_key takes an old style comparison function as its argument, and returns instead a python 3 compatible key function!

Now, if you're anything like me, you may have done a double-take on reading that last sentence. It derives orderable keys from a comparison function? But... how?? I mean, purely from a mathematical perspective... that doesn't seem like it should be possible! After all, it's pretty simple to construct a non-transitive comparison function, and theres no way to map that to a set of well ordered keys!

And to be fair, if you pass a non-transitive sort function to cmp_to_key you'll get some silly results... but for reasonable comparison functions, it will work as expected, and the way it works is quite clever!

The fundamental assumption that threw me off was the idea that the function output by cmp_to_key would have to provide a "primitive" keying—given a collection element, it would have to return some absolutely sortable thing, like and integer. In actuality, all it needs to return for the sorting to work is a key object which can be compared to other key objects using python's built-in comparison operators.

And there's a really easy way to achieve this in python: magic methods! In fact, all you need to sort a collection based on some comparison function called "my_cmp" is the following:

class MyCmpKey():

    def __init__(self, obj):
        self.obj = obj	

    def __lt__(self, other):
        return my_cmp(self.obj, other.obj) < 0

What this means is that every time the underlying implementation of sorted (or whatever other function you're keying) compares two keys to determine their order, MyCmpKey's magic method will call my_cmp to determine the order! In practice, cmp_to_key populates the magic methods for greater-than, equality, and others, but sorted only required less-than to actually work.

Pretty clever huh! Shout out to this StackOverflow answer, which explains it very well.

A StackOverflow answer which explains how cmp_to_key works

-- ouro, 2022-12-18