Archive for the 'Python' category

SciPy for .NET & IronPython

Aug 29 2011 Published by jmccampbell under News, NumPy, Python, SciPy

This week Enthought and Microsoft are announcing the availability of NumPy and SciPy for IronPython and the .NET Framework, released in conjunction with the release of Microsoft’s Python Tool for Visual Studio release. These packages implement a fast and flexible multi-dimensional array package (NumPy) and a large collection of scientific and numerical algorithms built on top of the array (SciPy) and are two fundamental building blocking for technical computing in Python. CPython users are well familiar with both but up until now they have not been available on other Python environments. This is because both packages include many heavily optimized native-code (typically C or FORTRAN) implementation of the algorithms, thus requiring custom wrappers to integrate them into each Python environment.

The result of the project is IronPython ports of NumPy and SciPy, which are full .NET ports and include custom C#/C interfaces to a common native C core. This means that the full functionality is available not only to IronPython but to all .NET languages such as C# or F# by directly accessing the C# interface objects.

Completing the port required considerably more work than just writing new interfaces IronPython.  To start with large parts of the NumPy C code were written specifically for CPython and made free use of the CPython APIs.  To support additional Python environments the codebase had to be refactored into a Python-independent core library plus a CPython-specific interface layer.  A big advantage to this architectural separation is that the core NumPy library can now be used from applications written in C or C++ much more easily and ports to other Python environments will require less work.

The second supporting project was an extension of Cython to support IronPython.  Cython, not to be confused with CPython, is a tool that allows C extensions to be written for Python using a language based on Python with extensions for interacting with native code libraries.  Up until now Cython has only supported the CPython environment.  We have extended Cython to be able to generate C++/CLI code interfaces for .NET, and thus IronPython.  Using a single Cython input file it is now possible to target both CPython and IronPython environments.

The main driver for the Cython project was the nature of the SciPy library.  SciPy consists of dozens of different packages, many of which have a hand-written C interface for CPython.  One option was to write a matching C# interface for each of these to support IronPython.  While this would have taken less time, the result would have been a significant duplication of code and more maintenance work in the future. Instead we chose to port Cython to .NET and then rewrite each C module in Cython so one a single source file is needed.

The first release of SciPy and NumPy for .NET are available now as binary distributions from SciPy.org or directly from Enthought.  All of the code for these and the supporting projects are open source and available at the links below.

6 responses so far

Fast Numpy/Protobuf Deserialization Example

Jul 07 2010 Published by Bryce Hendrix under NumPy

In my last post I wrote about how I was able to improve protobuf deserialization using a C++ extension. I’ve boiled down the code to its essence to show how I did it. Rather than zip everything up in a file, the code is short enough to show in its entirety.

Here’s the simplified protobuf message which is used to represent a time series as 2 arrays:

package timeseries;

message TimeSeries {
    repeated double times = 2;
    repeated double values = 3;
}

I then wrote a test app in Python and C++ to provide a benchmark. Here is the Python version:

import numpy
import time_series_pb2

def write_test():
    ts = time_series_pb2.TimeSeries()
    for i in range(10000000):
        ts.times.append(i)
        ts.values.append(i*10.0)

    import time
    start = time.time();

    f = open("ts.bin", "wb")
    f.write(ts.SerializeToString())
    f.close()

    print time.time() - start

def read_test():
    ts = time_series_pb2.TimeSeries()

    import time
    start = time.time();

    f = open("ts.bin", "rb")
    ts.ParseFromString(f.read())
    f.close()

    t = numpy.array(ts.times._values)
    v = numpy.array(ts.values._values)

    print 'Read time:', time.time() - start
    print "Read time series of length %d" % len(ts.times)

if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print "usage:   %s <--read> | <--write>" % sys.argv[0]

    if sys.argv[1] == "--read":
        read_test()
    else:
        write_test()

I will spare you the C++ standalone code, since it was only a stepping stone. Instead here is the C++ extension, with 2 exposed methods, one which deserializes a string and the other which operates on a file.

#include <fcntl.h>

#include <Python.h>
#include <numpy/arrayobject.h>
#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl_lite.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>

#include "time_series.pb.h"

static PyObject* construct_numpy_arrays(timeseries::TimeSeries* ts)
{
    // returns a tuple (t,v) where t and v are double arrays of the same length

    PyObject* data_tuple = PyTuple_New(2);

    long array_size = ts->times_size();
    double* times = new double[array_size];
    double* values = new double[array_size];

    // the data must be copied because the tsid will go away and its mutable data
    // will too
    memcpy(times, ts->times().data(), ts->times_size()*sizeof(double));
    memcpy(values, ts->values().data(), ts->values_size()*sizeof(double)); 

    // put the arrays into numpy array objects
    npy_intp dims[1] = {array_size};
    PyObject* time_array = PyArray_SimpleNewFromData(1, dims, PyArray_DOUBLE, times);
    PyObject* value_array = PyArray_SimpleNewFromData(1, dims, PyArray_DOUBLE, values);

    PyTuple_SetItem(data_tuple, 0, time_array);
    PyTuple_SetItem(data_tuple, 1, value_array);

    return data_tuple;
}

static PyObject* TimeSeries_load(PyObject* self, PyObject* args)
{
    char* filename = NULL;

    if (! PyArg_ParseTuple(args, "s", &filename))
    {
        return NULL;
    }

    timeseries::TimeSeries ts;

    int fd = open(filename, O_RDONLY);
    google::protobuf::io::FileInputStream fs(fd);
    google::protobuf::io::CodedInputStream coded_fs(&fs);
    coded_fs.SetTotalBytesLimit(500*1024*1024, -1);
    ts.ParseFromCodedStream(&coded_fs);
    fs.Close();
    close(fd);

    return construct_numpy_arrays(&ts);
}

static PyObject* TimeSeries_deserialize(PyObject* self, PyObject* args)
{
    int buffer_length;
    char* serialization = NULL;

    if (! PyArg_ParseTuple(args, "t#", &serialization, &buffer_length))
    {
        return NULL;
    }
    google::protobuf::io::ArrayInputStream input(serialization, buffer_length);

    google::protobuf::io::CodedInputStream coded_fs(&input);
    coded_fs.SetTotalBytesLimit(500*1024*1024, -1);

    timeseries::TimeSeries ts;
    ts.ParseFromCodedStream(&coded_fs);
    return construct_numpy_arrays(&ts);
}

static PyMethodDef TSMethods[] = {
    {"load", TimeSeries_load, METH_VARARGS, "loads a TimeSeries from a file"},
    {"deserialize", TimeSeries_deserialize, METH_VARARGS, "loads a TimeSeries from a string"}
};

#ifndef PyMODINIT_FUNC  /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC inittimeseries(void)
{
    import_array();
    (void) Py_InitModule("timeseries", TSMethods);
}

Calling the exension from python is trivial:

import time
import timeseries
start = time.time()
t, v = timeseries.load('ts.bin')
print "read and converted to numpy array in %f" % (time.time()-start)
print "timeseries contained %d values" % len(v)

One response so far

Older posts »

Featuring Advanced Search Functions plugin by YD