As we announced last week, Enthought is sponsoring an open-source High Performance Python track at PyGotham this year. The video above is meant to give you a sneak peek at the Parallel Python class. Please remember that the class will cover a broader array of topics (as described below), so don’t worry if you aren’t familiar with MPI as discussed in the video.
The talk itself will offer advice every developer should know for writing effective parallel python. Many of the examples you find on parallel python focus on the mechanics of getting parallel infrastructure working with your code, and not on actually building good Portable Parallel Python (the 3 P’s!). This talk is intended to be a broad introduction that is well suited to both the beginner and the veteran developer.
- Leverage existing code wherever possible. Most likely someone has solved a subset of your problem and it was probably their primary interest. As such, their code is probably better optimized for performance within their defined scope than your code will be. Use the standards: numpy, scipy, etc. Don’t do anything fancy unless its absolutely necessary.
- Build efficient models. Evolve code from for-loops to list comprehensions and generator comprehensions to numpy and cython.
- Optimize your code for speed and memory performance using profilers.
- Keep the minimum set of information you need at hand at all times. Whatever you do, don’t trap a piece of necessary information in a dark corner of your code. The rediscovery of information is very expensive.
- Separate the consumer and producer of information from the communication mechanism. Build simple-as-possible data structures, using or deriving from basic types. Use function-based operations on a data type or to map between data types. Keep any diagnostic and monitoring tools separate from your data and functions. Good design here can increase the parallel nature of your code significantly.
- Learn the different parallel communication technologies (multiprocessing, MPI, zmq, GPU, cloud, …) and use parallel map and/or map-reduce.
- Stay name space aware. Ensure your code has self-contained key-value pairs for name=object definitions at the level of the parallel map. Use imports and definitions wisely. Know scoping rules and how they apply to parallelization.
- When the above fails, lean on a good serializer.
We look forward to seeing you at PyGotham! Let us know if there’s anything you’d like to see covered (within the scope of each respective talk) and stay tuned for future “sneak peeks.”