Enthought awarded $1M DOE SBIR grant to develop open-source Python HPC framework

May 20 2013 Published by under General

We are excited to announce that Enthought is undertaking a multi-year project to bring the strengths of NumPy to high-performance distributed computing.  The goal is to provide a more intuitive and user-friendly interface to both distributed array computing and to high-performance parallel libraries.  We will release the project as open source, providing another tool in our toolbox to help with data processing, modeling, and simulation capabilities in the realm of big data.  The project is funded under a Phase II grant from the DOE SBIR program [0] [1], and is headed by Kurt Smith.

The project will develop three packages designed to work in concert to provide a high-performance computing framework.  To maximize interoperability and extensibility, the project will design a distributed array protocol akin to the Python PEP-3118 buffer protocol [2], making it possible for other libraries and projects to easily interoperate with Odin and PyTrilinos distributed data structures. The protocol will allow interoperability with the Global Arrays and the Global Arrays in NumPy (GAIN) projects based out of Pacific Northwest National Laboratory (PNNL). Computational scientist Jeff Daily, who leads GAIN development at PNNL, will help in this effort.

The three components are described in more detail below.

Optimized Distributed NumPy (ODIN)

ODIN provides a NumPy-like interface for distributed array computations.  It provides

  • distributed parallel computing on array expressions;

  • specification of an array’s domain decomposition, whether for processing or for storage across files, with sensible defaults;

  • specification of the processes involved in specific array computations;

  • features for specifying the locality of computations, whether global or local;

  • support for out-of-core computations;

  • interoperability with existing NumPy-based packages.

Expressions involving ODIN arrays will allow users to perform sophisticated array computations in a distributed fashion, including basic array computations, array slicing and fancy-indexing computations, finite-difference-style computations, and several more.  ODIN’s road map includes array expression analysis and loop fusion for optimizations of distributed computations.   ODIN will provide built-in capabilities for distributed UFunc calculations as well as reduction and accumulation-type computations.  Odin is designed to be extensible and adaptable to existing libraries, and will allow domain experts to make their distributed algorithms easily available to a much wider audience based on a common platform.  The package will build on existing technologies and takes inspiration from several distributed array libraries and languages already in existence, including Chapel, X10, Fortress, HP-Fortran, and Julia.  Odin will interoperate with the Trilinos suite of HPC solvers via PyTrilinos, and will provide a high-level interface to make Trilinos and PyTrilinos easier to use.

ODIN will be tested on the Texas Advanced Computing Center’s Stampede supercomputer, and scaling tests will be run on Stampede’s Intel Phi accelerators.

PyTrilinos improvements and enhancements

Trilinos is a suite of dozens of HPC packages that provide access to state-of-the-art distributed solvers, and PyTrilinos is the Python interface to several of the Trilinos packages.  The Trilinos packages, developed primarily at Sandia National Laboratories, allow scientists to solve partial differential equations and large linear, nonlinear, and optimization problems in parallel, from desktops to distributed clusters to supercomputers, with active research on modern architectures such as GPUs.  Bill Spotz, senior research scientist at Sandia, will lead the PyTrilinos portion of the project to improve and continue to expand the PyTrilinos interfaces, making Trilinos easier to use.

Seamless

Seamless provides functionality to speed up Python via JIT compilation and makes integration between Python and other languages nearly effortless. Based on LLVM, Seamless uses LLVM’s introspection capabilities to easily wrap existing C, C++ (and eventually Fortran) libraries while minimizing code duplication, combining many of the best features of Cython, Ctypes, SWIG, and PyPy.

We are very excited to have the opportunity to work on this Python HPC framework, and look forward to working with the Scientific Python community to move NumPy into the next age of distributed scientific computing.  We will be updating Enthought’s website with project progress and updates.  We would like to thank the Department of Energy’s SBIR program for the opportunity to develop these packages, and the collaborators and industry partners whose support made this possible.

[0] http://science.energy.gov/sbir/awards/

[1] http://science.energy.gov/~/media/sbir/excel/2013_Phase_II_Release_1.xlsx

[2] http://www.python.org/dev/peps/pep-3118/

No responses yet

Introducing Enthought Canopy

Apr 11 2013 Published by under General

-Eric Jones, Enthought CEO

Yesterday we launched Enthought Canopy, our next-generation, Python-based analysis environment and our follow-on to EPD.

Since 2003, the Enthought Python Distribution (EPD) has helped hundreds of thousands of scientists, engineers and analysts develop and deploy with Python. 2013 is its 10th anniversary! It’s hard to believe it’s been that long. Time flies when you are having fun.

Over the years, we’ve watched customers use Python for a variety of scientific and analytic computing applications. Their tasks fell into three areas: exploration, development, and visualization. We developed Canopy to specifically help with these 3 key analysis tasks.

Exploration:

Python makes it straightforward to acquire and manage data from a variety of sources like the web, databases and files on the computer. Using Canopy’s IPython interactive prompt, you can interactively access, combine and explore data from all these sources. With the IPython Notebook, you can capture your sessions, re-use them later or share them with others.

Development:

Analysts develop algorithms and scripts to work with their data and to deliver results. Canopy’s editor includes syntax highlighting to help identify elements in the Python code, auto completion to speed programming, and error checking to reduce time spent hunting bugs. You can execute entire scripts or selected lines from the editor at the IPython prompt to test and debug your code.

Python’s scientific and analytic computing ecosystem is quite large, with hundreds, if not thousands, of packages. Canopy itself ships with over 100 math, science, and analytic packages. And the ecosystem continues to grow and evolve. The Canopy Package Manager makes it simple to find, install and update these Python packages.

Visualization:

Beyond the numbers and the scripts, one of the best ways to understand and explain data is with visualization. Python provides many tools and packages to help visualize data and results. Within Canopy, Matplotlib works seamlessly with IPython and provides an extensive toolset for data plotting and presentation.

For more extensive graphical user interfaces, Canopy includes 2-D and 3-D visualization packages like Chaco and Mayavi.

We’re really excited about Canopy’s design. It brings together the full extent and power of Python in an easy-to-use analysis environment today and provides an analysis platform on which to build for the next 10 years. For more details, check the Canopy web-page.

 

No responses yet

Older posts »

Featuring Advanced Search Functions plugin by YD