hash() differences for 32bit and 64bit systems

I was working on a client/server project where we send collections of data across the wire. I needed a method of matching datasets on the client and server, and the python hash function seemed ideal. I suspected that the hash function might have different behaviour on different systems, but conveniently forgot to test it until after I tried to deploy it.

I expected differences, but I didn’t really know to what extent, so I did a little research. So far, ints are the only thing I have found that hash the same, because int’s __hash__ function just returns the int value. Otherwise, Python’s hash functions depend on multiplication using long ints.

While doing my research, I found a page discussing hashing in Python 2.3. The algorithms are similar to the C implementations in Python 2.6.

Of course, I got bit because Python 2.5 on OS X 10.4 and 64bit RedHat 5 didn’t hash my objects the same. In the end, I serialized the data’s metadata and performed a md5 instead, which requres more CPU cycles, but at least it works…

3 thoughts on “hash() differences for 32bit and 64bit systems

  1. avatarChristian Heimes

    Even ints do not yield the same hash. Technically an int returns its value as its hash. However depending on the size of the OS’s C long, a number might become a long.

    32bit Linux:

    >>> 2>> type(2
    >>> hash(2>> 2>> type(2
    >>> hash(2

    Reply
  2. avatarBryce Hendrix

    Thomas- I don’t think you can. You can write your own __hash__ functions to be platform independent instead, if thats really what you need.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 101,030 bad guys.