Fast Protocol Buffers in Python

Jul 01 2010 Published by Bryce Hendrix under NumPy

A couple of years ago I worked on a project which needed to transport a large dataset over the wire. I looked at a number of technologies, and Google Protocol Buffers looked very interesting. Over the past week, I’ve been asked about my experience a couple of times, so I hope this provides a little bit of insight into how to use Protocol Buffers in Python when performance matters.

I wrote a little test case to model the serialization of the data I wanted to send, a list of 100 pairs of arrays, where each array contained 250,000 elements. The raw data size was 381 MB.

First, I ran the pure python test: the write took 83 seconds, the read took 202 seconds. Not good.

Next I tested the same data in C++: the write took 4.4 seconds and the read took 2.8 seconds. Impressive.

The obvious path then was to write the serialization code in C++ and expose it through an extension point. The read function, including putting all of the data into numpy arrays now takes 7.5 seconds. I only needed the read function from Python, but the write function should take about the same time.

9 responses so far

  • Why not put the code somewhere ?

    it looks like a common usecase (numpy+pbuffer).

    L.

  • Mark Smith Mark Smith says:

    What does the code look like? Could you post it?

  • [...] This post was mentioned on Twitter by blogs of the world, Planet Python. Planet Python said: Enthought: Fast Protocol Buffers in Python http://bit.ly/98Tg0W [...]

  • bryce Bryce Hendrix says:

    I fully intend to post the code, but I was at a conference and didn’t have the code handy. Look for it next week.

  • Stu Stu says:

    It would be interesting to see how the cython equivilent of this code performs (when the code is out)

  • dripton dripton says:

    What version of protobuf were you using when you saw these results? The latest (2.3) is supposed to be 10-25 times faster for Python than previous versions. I haven’t benchmarked this myself, though.

  • bryce Bryce Hendrix says:

    I used protobuf 2.3. I see the notes in their change log, but I didn’t see any significant improvement. Maybe its depends on the message, and the didn’t optimize for large array like data structures?

  • Sheila Sheila says:

    Did you ever figure out the reason you didn’t see any improvement? I posted a link to your blog and subsequent entry with the code to the protobuf mailing list, since I am very curious. No replies yet, so wondering if you’ve discovered anything else.

    I don’t use pure python protobufs in a production environment. We use java compiled messages. I use jython to concoct test messages based on the java versions, and started playing around with pure python out of curiosity.

  • bryce Bryce Hendrix says:

    Shelia- no, I never found the cause. To be honest, I didn’t spend too much time on it since I haven’t needed to update the code in the last 12-18 months.

Leave a Reply

Featuring Advanced Search Functions plugin by YD