hash() differences for 32bit and 64bit systems

Apr 28 2009 Published by under Enthought Tool Suite, Python

I was working on a client/server project where we send collections of data across the wire. I needed a method of matching datasets on the client and server, and the python hash function seemed ideal. I suspected that the hash function might have different behaviour on different systems, but conveniently forgot to test it until after I tried to deploy it.

I expected differences, but I didn’t really know to what extent, so I did a little research. So far, ints are the only thing I have found that hash the same, because int’s __hash__ function just returns the int value. Otherwise, Python’s hash functions depend on multiplication using long ints.

While doing my research, I found a page discussing hashing in Python 2.3. The algorithms are similar to the C implementations in Python 2.6.

Of course, I got bit because Python 2.5 on OS X 10.4 and 64bit RedHat 5 didn’t hash my objects the same. In the end, I serialized the data’s metadata and performed a md5 instead, which requres more CPU cycles, but at least it works…

3 responses so far

  • Even ints do not yield the same hash. Technically an int returns its value as its hash. However depending on the size of the OS’s C long, a number might become a long.

    32bit Linux:

    >>> 2>> type(2
    >>> hash(2>> 2>> type(2
    >>> hash(2

  • avatar Thomas K. says:

    How can I convert I hash generated on 32bit system to a 64bit hash (or vice-versa)?

  • avatar Bryce Hendrix says:

    Thomas- I don’t think you can. You can write your own __hash__ functions to be platform independent instead, if thats really what you need.

Leave a Reply

Featuring Advanced Search Functions plugin by YD