I was working on a client/server project where we send collections of data across the wire. I needed a method of matching datasets on the client and server, and the python hash function seemed ideal. I suspected that the hash function might have different behaviour on different systems, but conveniently forgot to test it until after I tried to deploy it.
I expected differences, but I didn’t really know to what extent, so I did a little research. So far, ints are the only thing I have found that hash the same, because int’s __hash__ function just returns the int value. Otherwise, Python’s hash functions depend on multiplication using long ints.
While doing my research, I found a page discussing hashing in Python 2.3. The algorithms are similar to the C implementations in Python 2.6.
Of course, I got bit because Python 2.5 on OS X 10.4 and 64bit RedHat 5 didn’t hash my objects the same. In the end, I serialized the data’s metadata and performed a md5 instead, which requres more CPU cycles, but at least it works…