Fast Numpy/Protobuf deserialization example

July 7th, 2010

In my last post I wrote about how I was able to improve protobuf deserialization using a C++ extension. I’ve boiled down the code to its essence to show how I did it. Rather than zip everything up in a file, the code is short enough to show in its entirety.

Here’s the simplified protobuf message which is used to represent a time series as 2 arrays:

package timeseries;

message TimeSeries {
    repeated double times = 2;
    repeated double values = 3;
}

I then wrote a test app in Python and C++ to provide a benchmark. Here is the Python version:

import numpy
import time_series_pb2

def write_test():
    ts = time_series_pb2.TimeSeries()
    for i in range(10000000):
        ts.times.append(i)
        ts.values.append(i*10.0)

    import time
    start = time.time();

    f = open("ts.bin", "wb")
    f.write(ts.SerializeToString())
    f.close()

    print time.time() - start

def read_test():
    ts = time_series_pb2.TimeSeries()

    import time
    start = time.time();

    f = open("ts.bin", "rb")
    ts.ParseFromString(f.read())
    f.close()

    t = numpy.array(ts.times._values)
    v = numpy.array(ts.values._values)

    print 'Read time:', time.time() - start
    print "Read time series of length %d" % len(ts.times)

if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print "usage:   %s <--read> | <--write>" % sys.argv[0]

    if sys.argv[1] == "--read":
        read_test()
    else:
        write_test()

I will spare you the C++ standalone code, since it was only a stepping stone. Instead here is the C++ extension, with 2 exposed methods, one which deserializes a string and the other which operates on a file.

#include <fcntl.h>

#include <Python.h>
#include <numpy/arrayobject.h>
#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl_lite.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>

#include "time_series.pb.h"

static PyObject* construct_numpy_arrays(timeseries::TimeSeries* ts)
{
    // returns a tuple (t,v) where t and v are double arrays of the same length

    PyObject* data_tuple = PyTuple_New(2);

    long array_size = ts->times_size();
    double* times = new double[array_size];
    double* values = new double[array_size];

    // the data must be copied because the tsid will go away and its mutable data
    // will too
    memcpy(times, ts->times().data(), ts->times_size()*sizeof(double));
    memcpy(values, ts->values().data(), ts->values_size()*sizeof(double)); 

    // put the arrays into numpy array objects
    npy_intp dims[1] = {array_size};
    PyObject* time_array = PyArray_SimpleNewFromData(1, dims, PyArray_DOUBLE, times);
    PyObject* value_array = PyArray_SimpleNewFromData(1, dims, PyArray_DOUBLE, values);

    PyTuple_SetItem(data_tuple, 0, time_array);
    PyTuple_SetItem(data_tuple, 1, value_array);

    return data_tuple;
}

static PyObject* TimeSeries_load(PyObject* self, PyObject* args)
{
    char* filename = NULL;

    if (! PyArg_ParseTuple(args, "s", &filename))
    {
        return NULL;
    }

    timeseries::TimeSeries ts;

    int fd = open(filename, O_RDONLY);
    google::protobuf::io::FileInputStream fs(fd);
    google::protobuf::io::CodedInputStream coded_fs(&fs);
    coded_fs.SetTotalBytesLimit(500*1024*1024, -1);
    ts.ParseFromCodedStream(&coded_fs);
    fs.Close();
    close(fd);

    return construct_numpy_arrays(&ts);
}

static PyObject* TimeSeries_deserialize(PyObject* self, PyObject* args)
{
    int buffer_length;
    char* serialization = NULL;

    if (! PyArg_ParseTuple(args, "t#", &serialization, &buffer_length))
    {
        return NULL;
    }
    google::protobuf::io::ArrayInputStream input(serialization, buffer_length);

    google::protobuf::io::CodedInputStream coded_fs(&input);
    coded_fs.SetTotalBytesLimit(500*1024*1024, -1);

    timeseries::TimeSeries ts;
    ts.ParseFromCodedStream(&coded_fs);
    return construct_numpy_arrays(&ts);
}

static PyMethodDef TSMethods[] = {
    {"load", TimeSeries_load, METH_VARARGS, "loads a TimeSeries from a file"},
    {"deserialize", TimeSeries_deserialize, METH_VARARGS, "loads a TimeSeries from a string"}
};

#ifndef PyMODINIT_FUNC  /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC inittimeseries(void)
{
    import_array();
    (void) Py_InitModule("timeseries", TSMethods);
}

Calling the exension from python is trivial:

import time
import timeseries
start = time.time()
t, v = timeseries.load('ts.bin')
print "read and converted to numpy array in %f" % (time.time()-start)
print "timeseries contained %d values" % len(v)

Fast Protocol Buffers in Python

July 1st, 2010

A couple of years ago I worked on a project which needed to transport a large dataset over the wire. I looked at a number of technologies, and Google Protocol Buffers looked very interesting. Over the past week, I’ve been asked about my experience a couple of times, so I hope this provides a little bit of insight into how to use Protocol Buffers in Python when performance matters.

I wrote a little test case to model the serialization of the data I wanted to send, a list of 100 pairs of arrays, where each array contained 250,000 elements. The raw data size was 381 MB.

First, I ran the pure python test: the write took 83 seconds, the read took 202 seconds. Not good.

Next I tested the same data in C++: the write took 4.4 seconds and the read took 2.8 seconds. Impressive.

The obvious path then was to write the serialization code in C++ and expose it through an extension point. The read function, including putting all of the data into numpy arrays now takes 7.5 seconds. I only needed the read function from Python, but the write function should take about the same time.

Travis Oliphant announces…

July 1st, 2010

Travis announces project to extend NumPy/SciPy to .Net

Travis announces project to extend NumPy/SciPy to .Net

Travis Oliphant kicked off today’s SciPy 2010 Day 2 with a great keynote talk. He told the story of his own path to Python, filling his slides with the faces and work of other developers, scientists, and mathematicians — inspiration, teachers, and collaborators. He explained how his academic trajectory, from electrical engineering, through a brief affair with neuroscience, to a biomedical engineering PhD, both drove and competed with his work creating NumPy.
Last, but not least, Travis closed his talk with rather large announcement: Enthought has undertaken the extension of NumPy and SciPy to the .NET framework. For details on the project refer to the official release.

SciPy 2010 underway!

July 1st, 2010

Everyone minus Ian, the most valiant photographer!

Everyone minus Ian, the most valiant photographer!

We were thrilled to host SciPy 2010 in Austin this year. Everyone seems to be enjoying the cool weather (so what if it’s borne of thunderstorms?) and the plush conference center/hotel (even if we had to retrain their A/V team).
After two days of immensely informative Tutorials, the General Session began yesterday with speaker Dave Beazley’s awesome keynote on Python concurrency. In addition to the solid line-up of talks at the main conference, we had two very well-attended specialized tracks: Glen Otero, chaired the Bioinformatics track, while Brian Granger and Ken Elkabany coordinated the Parallel Processing & Cloud Computing talks. The day then closed with a conference reception and guacamole-fueled Birds of a Feather sessions.
SciPy 2010

When a Spring is not a spring

March 29th, 2010

TraitsUI has a lot of layout magic, which is usually good enough, but sometimes you need to apply a little extra effort to get your GUI layout correct. One trick I often employ is to use springs between Items to align them. That works most of the time, but on occasion I need a fixed width spacer.

Lets look at the spring code:

class Spring ( Item ):
    # An item that is a layout "spring".

    # Name of the trait the item is editing
    name = 'spring'

    # Should a label be displayed?
    show_label = Bool( False )

    # Editor to use for the item
    editor = Instance( 'enthought.traits.ui.api.NullEditor', () )

    # Should the item use extra space along its Group's
    # layout orientation?
    springy = True

spring = Spring()

So what would happen if we tried to create a Spring with the width set and springy disabled? We get a fixed width spacer!

from enthought.traits.api import HasTraits, Int
from enthought.traits.ui.api import View, Item, VGroup, HGroup, \
     Spring, spring

class ExampleView(HasTraits):
    a = Int
    b = Int

    def trait_view(self, parent=None):
       regular_group = HGroup(
                         HGroup(Item('a'),
                                show_border=True),
                         HGroup(Item('b'),
                                show_border=True),
                         show_border=True,
                         label='regular')

       flex_group = HGroup(
                       HGroup(Item('a'),
                              show_border=True),
                       spring,
                       HGroup(Item('b'),
                              show_border=True),
                       show_border=True,
                       label='flexible')

       fixed_group = HGroup(
                       HGroup(Item('a'),
                              show_border=True),
                       Spring(width=200, springy=False),
                       HGroup(Item('b'),
                              show_border=True),
                       show_border=True,
                       label='fixed')

       return View(VGroup(regular_group,
                          flex_group,
                          fixed_group),
                   resizable=True)

ExampleView().configure_traits()

spring_example

Simplified creation of NumPy arrays from pre-allocated memory.

March 22nd, 2010

About 18 months ago, I wrote a blog-post that illustrates how to create a NumPy array that refers to another block of pre-allocated memory by creating a small low-level Python object that wraps around the memory and correctly deallocates the memory once NumPy is done with it.

The basic idea was to create a new Python object in C and point the base member of the NumPy array structure to this newly created object after using PyArray_SimpleNewFromData to create the array from a block of pre-allocated memory. NumPy will decref the object pointed to by its base member when it is destroyed. The example I provided created a new Python object in C and pointed the base member to it. That way, when the NumPy array is destroyed, the tp_dealloc function will be called on the newly created object.

The solution I provided is very flexible and also illustrates how to create a new Python object in C. However, as Lisandro Dalcin pointed out in the comments to that post, there is a simpler way to do it if you just need to call a particular deallocation function to free the underlying memory (or decrease a reference) after the NumPy array is destroyed.

The simple solution is to use a low-level Python object that holds a pointer to anything as well as a destructor. This low-level object is called a CObject in Python 2.x. The equivalent in Python 3.x (backported to Python 2.7) is the Capsule object.

Creating a CObject in C code is very simple. You just call PyCObject_FromVoidPtr with an arbitrary pointer as the first argument and a destructor function as the second argument. The signature of this destructor function should take the arbitrary pointer as its only argument and return nothing. The Python Object returned can be assigned to the base member of the ndarray directly.

Using the CObject removes the need to create your own “dummy” Python Object just to handle calling some needed code after NumPy is done with the memory. Thus, the code example can be updated to just:

int nd=2;
npy_intp dims[2]={10,20};
size_t size;
PyObject arr=NULL;
void *mymem;

size = PyArray_MultiplyList(dims, nd);
mymem = _aligned_malloc(size, 16);
arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, mymem);
if (arr == NULL) {
    _aligned_free(mymem);
    Py_XDECREF(arr);
}
PyArray_BASE(arr) = PyCObject_FromVoidPtr(mymem, _aligned_free);

For completeness, the original code containing the _aligned_malloc and
_aligned_free function calls is reproduced here:

#include <errno.h>
#define uintptr_t size_t

#define _NOT_POWER_OF_TWO(n) (((n) & ((n) - 1)))
#define _UI(p) ((uintptr_t) (p))
#define _CP(p) ((char *) p)

#define _PTR_ALIGN(p0, alignment)			\
((void *) (((_UI(p0) + (alignment + sizeof(void*)))	\
& (~_UI(alignment - 1)))))

/* pointer must sometimes be aligned; assume sizeof(void*) is a power of two */
#define _ORIG_PTR(p) (*(((void **) (_UI(p) & (~_UI(sizeof(void*) - 1)))) - 1))

static void *_aligned_malloc(size_t size, size_t alignment) {
    void *p0, *p;

    if (_NOT_POWER_OF_TWO(alignment)) {
        errno = EINVAL;
        return ((void *) 0);
    }

    if (size == 0) {
        return ((void *) 0);
    }
    if (alignment < sizeof(void *)) {
        alignment = sizeof(void *);
    }

    /* including the extra sizeof(void*) is overkill on a 32-bit
       machine, since malloc is already 8-byte aligned, as long
       as we enforce alignment >= 8 ...but oh well */
    p0 = malloc(size + (alignment + sizeof(void *)));
    if (!p0) {
        return ((void *) 0);
    }
    p = _PTR_ALIGN(p0, alignment);
    _ORIG_PTR(p) = p0;
    return p;
}

static void _aligned_free(void *memblock) {
    if (memblock) {
        free(_ORIG_PTR(memblock));
    }
}

Thanks to Lisandro for pointing out this simplified approach.

EPD 6.1: MKL on Linux, Windows, & OSX

March 2nd, 2010

epd-6-1-long
There were several reasons we initially decided to include MKL, an extensively threaded, highly optimized library, in the Enthought Python Distribution. For one thing, we like that MKL detects the processing capability of the machine and then runs optimal algorithm for that hardware. Secondly, we knew that MKL would offer faster linear algebra routines than the ATLAS framework, previously used for EPD Linux and Windows, and Accelerate library, previously used for OSX.

We didn’t anticipate, however, just how dramatic that speed up would be. Our benchmarking tests document the astounding increases in processing speed that MKL lends to EPD.

In EPD 6.1, NumPy and SciPy are dynamically linked against the MKL linear algebra routines. This allows EPD users to seamlessly benefit from the highly optimized BLAS and LAPACK routines in the MKL. In addition, EPD 6.1 comes bundled with all of the MKL run-time libraries so that advanced users can take advantage (with ctypes) of even more of the MKL library such as fast Fourier transforms, trust-region optimization methods, sparse solvers, and vector math.

We’re really pleased with the optimizations MKL offers to our EPD users. Try out EPD 6.1 for yourself!

PyCon 2010 Report

February 23rd, 2010

I just got back from PyCon, where I gave my tutorial on Traits, met all sorts of smart and interesting people, listened to some great talks, ate more food than I probably should have, and got less sleep than I probably needed.  It was really nice to see talks and posters about things that people are doing with Python in science at a general-purpose conference.

Highlights for me:
Dave Beazley’s awesome talk on Python’s Global Interpreter Lock (GIL) and its curious behaviour.  If you do any programming with threads in Python you have to make sure that you watch the video — it’s a clear, understandable and unflinching look at some of the weird behaviour that you can get from CPU-bound threads in Python.  The take-home message of his talk for scientific programmers: With current versions of Python, there is little point of running intensive pure-Python computations in parallel using threads — and doing so on multiple core machines may severely degrade computation time.  Dave showed a simple example where a computation split into two threads on two cores took twice as long as the same computation in one thread on one core.  Fortunately NumPy releases the GIL for most of its operations, but his talk highlights something that we should all keep in mind.

• Wes McKinney’s new Pandas package for data series analysis.  I didn’t get to see his talk, but I got to spend a fair amount of time talking to Wes and his package was generating quite a bit of buzz in the financial and numerical computing open spaces.  It’s an early release, but there have been several projects I’ve been involved in over the past couple of years where the Pandas data structures would have been invaluable.  I’m looking forward to being able to use it in the future in projects that I’m working on.

Altogether, a great time.  Thanks to everyone who dropped by the Enthought booth and talked with us, and we’ll see you in Atlanta at PyCon 2011 next year!

PyCon Giveaway Winners Announced

February 22nd, 2010

Corran, Ilan, and Travis had a great time at PyCon 2010 in Atlanta. We collected names for two drawings at our booth and just drew the winners this morning. Joseph Tennies, from Esterline, won the seat at an Enthought Open Training Course.

Meanwhile, Basic subscriptions will go out to:

  • Jason Zillioux
  • R. Mark Sharp, Southwest Foundation for Biomedical Research
  • Piotr Adam Zolnierczuk, Oak Ridge National Lab
  • Daniel Schep, Savannah River National Lab

Congratulations to all the winners!

Save the Date: SciPy 2010, June 28-July 3

February 11th, 2010

The annual US Scientific Computing with Python Conference, SciPy, has been held at Caltech since it began in 2001. While we always love an excuse to go to California, it’s also important to make sure that we allow everyone an opportunity to attend the conference. So, as Jarrod announced last fall, we’ll begin rotating the conference location and hold the 2010 conference in Austin, Texas.

As you may know, Enthought is headquartered in Austin. So in addition to our standard SciPy sponsorship, this year we’ll also be undertaking a great deal of the planning and organization.

To begin with, we’re thrilled to announce that we’ve secured several corporate sponsorships that will allow us to host the conference at the brand new AT&T Executive Education and Conference Center on campus at the University of Texas. It’s a wonderful facility in Central Austin and provides easy access to an array of great restaurants, parks, and music venues.

We will also be able to provide stipends for our Tutorial presenters for the first time. These community members provide us with an invaluable learning experience every year, and we’re very excited to be able to compensate them for their efforts. And of course, we’ll also continue to offer student sponsorships to support active academic contributors who want to attend the conference.

So mark your calendars, folks! June 28 – July 3. Early registration open now.