Tag Archives: High Performance Python

PyGotham Sneak Peek: The Three “P’s”

As we announced last week, Enthought is sponsoring an open-source High Performance Python track at PyGotham this year. The video above is meant to give you a sneak peek at the Parallel Python class. Please remember that the class will cover a broader array of topics (as described below), so don’t worry if you aren’t familiar with MPI as discussed in the video.

The talk itself will offer advice every developer should know for writing effective parallel python. Many of the examples you find on parallel python focus on the mechanics of getting parallel infrastructure working with your code, and not on actually building good Portable Parallel Python (the 3 P’s!). This talk is intended to be a broad introduction that is well suited to both the beginner and the veteran developer.

For example:

  1. Leverage existing code wherever possible. Most likely someone has solved a subset of your problem and it was probably their primary interest. As such, their code is probably better optimized for performance within their defined scope than your code will be. Use the standards: numpy, scipy, etc. Don’t do anything fancy unless its absolutely necessary.
  2. Build efficient models. Evolve code from for-loops to list comprehensions and generator comprehensions to numpy and cython.
  3. Optimize your code for speed and memory performance using profilers.
  4. Keep the minimum set of information you need at hand at all times. Whatever you do, don’t trap a piece of necessary information in a dark corner of your code. The rediscovery of information is very expensive.
  5. Separate the consumer and producer of information from the communication mechanism. Build simple-as-possible data structures, using or deriving from basic types. Use function-based operations on a data type or to map between data types. Keep any diagnostic and monitoring tools separate from your data and functions. Good design here can increase the parallel nature of your code significantly.
  6. Learn the different parallel communication technologies (multiprocessing, MPI, zmq, GPU, cloud, …) and use parallel map and/or map-reduce.
  7. Stay name space aware. Ensure your code has self-contained key-value pairs for name=object definitions at the level of the parallel map. Use imports and definitions wisely. Know scoping rules and how they apply to parallelization.
  8. When the above fails, lean on a good serializer.

We look forward to seeing you at PyGotham! Let us know if there’s anything you’d like to see covered (within the scope of each respective talk) and stay tuned for future “sneak peeks.”

Enthought at PyGotham: June 8th & 9th

To the PyCluster!

Enthought is a proud sponsor of the second annual PyGotham conference in New York City (June 8th and 9th). As part of our commitment, we are also offering a High Performance Python track that will illustrate how to build applications and utilize parallel computing techniques with various open source projects. Stayed tuned for more details as they become available.

Here’s the lineup so far:

  • Python with Parallelism in Mind. Rarely does code just happen to be “embarrassingly parallel.” We will discuss some simple rules, structural changes, and diagnostic tools that can help optimize the parallel efficiency of your code. This session will also introduce several common parallel communication technologies that can lower the barrier to parallel computing.
  • GPU Computing with CLyther. GPU computing has come a long way over the past few years but still requires knowledge of CUDA or OpenCL. Similar to Cython, CLyther is a Python language extension that makes writing OpenCL code as easy as Python itself.
  • MapReduce with the Disco Project. MapReduce frameworks provide a powerful abstraction for distributed data storage and processing. Our friend, Chris Mueller, will talk about the Disco Project, a refreshing alternative to the Hadoop hegemony that uses Python for the front-end and Erlang for the back-end. More importantly, he will discuss when a MapReduce framework makes sense and when it doesn’t.
  • Interactive Plotting with Chaco. Most “big data” problems don’t stop with distributed computation. You have to render your results in a way that a larger audience can understand. Chaco is an open source library that helps developers generate performant, interactive data visualizations.
  • Declarative UIs with Enaml. Enaml is pythonic UI development done right. Enaml shares Python’s goals of simplicity, conciseness and elegance. Enaml implements a constraint based layout system which ensures that UI’s built with Enaml behave and appear identical on Windows, Linux and OSX. This introduction to Enaml will get you started on the path of writing non trivial UI’s in an afternoon.
  • Tie It Together: Build An App. In an updated version of his Pycon talk, Jonathan Rocher ties together time series data — from storage to analysis to visualization — in a demo application. We’ll also walk through a more computationally demanding application to illustrate concepts introduced in the previous talks.

Look forward to seeing everyone there!