Fast Protocol Buffers in Python

A couple of years ago I worked on a project which needed to transport a large dataset over the wire. I looked at a number of technologies, and Google Protocol Buffers looked very interesting. Over the past week, I’ve been asked about my experience a couple of times, so I hope this provides a little bit of insight into how to use Protocol Buffers in Python when performance matters.

I wrote a little test case to model the serialization of the data I wanted to send, a list of 100 pairs of arrays, where each array contained 250,000 elements. The raw data size was 381 MB.

First, I ran the pure python test: the write took 83 seconds, the read took 202 seconds. Not good.

Next I tested the same data in C++: the write took 4.4 seconds and the read took 2.8 seconds. Impressive.

The obvious path then was to write the serialization code in C++ and expose it through an extension point. The read function, including putting all of the data into numpy arrays now takes 7.5 seconds. I only needed the read function from Python, but the write function should take about the same time.

9 thoughts on “Fast Protocol Buffers in Python

  1. Pingback: Tweets that mention Fast Protocol Buffers in Python --

  2. avatarBryce Hendrix

    I fully intend to post the code, but I was at a conference and didn’t have the code handy. Look for it next week.

  3. avatardripton

    What version of protobuf were you using when you saw these results? The latest (2.3) is supposed to be 10-25 times faster for Python than previous versions. I haven’t benchmarked this myself, though.

  4. avatarBryce Hendrix

    I used protobuf 2.3. I see the notes in their change log, but I didn’t see any significant improvement. Maybe its depends on the message, and the didn’t optimize for large array like data structures?

  5. avatarSheila

    Did you ever figure out the reason you didn’t see any improvement? I posted a link to your blog and subsequent entry with the code to the protobuf mailing list, since I am very curious. No replies yet, so wondering if you’ve discovered anything else.

    I don’t use pure python protobufs in a production environment. We use java compiled messages. I use jython to concoct test messages based on the java versions, and started playing around with pure python out of curiosity.

  6. avatarBryce Hendrix

    Shelia- no, I never found the cause. To be honest, I didn’t spend too much time on it since I haven’t needed to update the code in the last 12-18 months.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 107,115 bad guys.