Author Archives: mtranby

Avoiding “Excel Hell!” using a Python-based Toolchain

Update (Feb 6, 2014):  Enthought is now the exclusive distributor of PyXLL, a solution that helps users avoid “Excel Hell” by making it easy to develop add-ins for Excel in Python. Learn more here.

Didrik Pinte gave an informative, provocatively-titled presentation at the second, in-person New York Quantitative Python User’s Group (NY QPUG) meeting earlier this month.

There are a lot of examples in the press of Excel workflow mess-ups and spreadsheet errors contributing to some eye-popping mishaps in the finance world (e.g. JP Morgan’s spreadsheet issues may have led to the 2012 massive loss at “the London Whale”). Most of these can be traced to similar fundamental issues:

  • Data referencing/traceability

  • Numerical errors

  • Error-prone manual operations (cut & paste, …)

  • Tracing IO’s in libraries/API’s

  • Missing version control

  • Toolchain that doesn’t meet the needs of researchers, analysts, IT, etc.

Python, the coding language and its tool ecosystem, can provide a nice solution to these challenges, and many organizations are already turning to Python-based workflows in response. And with integration tools like PyXLL (to execute Python functions within Excel) and others, organizations can adopt Python-based workflows incrementally and start improving their current “Excel Hell” situation quickly.

For the details check out the video of Didrik’s NY QPUG presentation.  He demonstrates a an example solution using PyXLL and Enthought Canopy.

http://vimeo.com/67327735
Vimeo - http://vimeo.com/67327735

And grab the PDF of his slides here.

QPUG_20130514_ExcelHell_Slides

It would be great to hear your stories about “Excel Hell”. Let us know below.

–Brett Murphy

Enthought awarded $1M DOE SBIR grant to develop open-source Python HPC framework

We are excited to announce that Enthought is undertaking a multi-year project to bring the strengths of NumPy to high-performance distributed computing.  The goal is to provide a more intuitive and user-friendly interface to both distributed array computing and to high-performance parallel libraries.  We will release the project as open source, providing another tool in our toolbox to help with data processing, modeling, and simulation capabilities in the realm of big data.  The project is funded under a Phase II grant from the DOE SBIR program [0] [1], and is headed by Kurt Smith.

The project will develop three packages designed to work in concert to provide a high-performance computing framework.  To maximize interoperability and extensibility, the project will design a distributed array protocol akin to the Python PEP-3118 buffer protocol [2], making it possible for other libraries and projects to easily interoperate with Odin and PyTrilinos distributed data structures. The protocol will allow interoperability with the Global Arrays and the Global Arrays in NumPy (GAIN) projects based out of Pacific Northwest National Laboratory (PNNL). Computational scientist Jeff Daily, who leads GAIN development at PNNL, will help in this effort.

The three components are described in more detail below.

Optimized Distributed NumPy (ODIN)

ODIN provides a NumPy-like interface for distributed array computations.  It provides

  • distributed parallel computing on array expressions;

  • specification of an array’s domain decomposition, whether for processing or for storage across files, with sensible defaults;

  • specification of the processes involved in specific array computations;

  • features for specifying the locality of computations, whether global or local;

  • support for out-of-core computations;

  • interoperability with existing NumPy-based packages.

Expressions involving ODIN arrays will allow users to perform sophisticated array computations in a distributed fashion, including basic array computations, array slicing and fancy-indexing computations, finite-difference-style computations, and several more.  ODIN’s road map includes array expression analysis and loop fusion for optimizations of distributed computations.   ODIN will provide built-in capabilities for distributed UFunc calculations as well as reduction and accumulation-type computations.  Odin is designed to be extensible and adaptable to existing libraries, and will allow domain experts to make their distributed algorithms easily available to a much wider audience based on a common platform.  The package will build on existing technologies and takes inspiration from several distributed array libraries and languages already in existence, including Chapel, X10, Fortress, HP-Fortran, and Julia.  Odin will interoperate with the Trilinos suite of HPC solvers via PyTrilinos, and will provide a high-level interface to make Trilinos and PyTrilinos easier to use.

ODIN will be tested on the Texas Advanced Computing Center’s Stampede supercomputer, and scaling tests will be run on Stampede’s Intel Phi accelerators.

PyTrilinos improvements and enhancements

Trilinos is a suite of dozens of HPC packages that provide access to state-of-the-art distributed solvers, and PyTrilinos is the Python interface to several of the Trilinos packages.  The Trilinos packages, developed primarily at Sandia National Laboratories, allow scientists to solve partial differential equations and large linear, nonlinear, and optimization problems in parallel, from desktops to distributed clusters to supercomputers, with active research on modern architectures such as GPUs.  Bill Spotz, senior research scientist at Sandia, will lead the PyTrilinos portion of the project to improve and continue to expand the PyTrilinos interfaces, making Trilinos easier to use.

Seamless

Seamless provides functionality to speed up Python via JIT compilation and makes integration between Python and other languages nearly effortless. Based on LLVM, Seamless uses LLVM’s introspection capabilities to easily wrap existing C, C++ (and eventually Fortran) libraries while minimizing code duplication, combining many of the best features of Cython, Ctypes, SWIG, and PyPy.

We are very excited to have the opportunity to work on this Python HPC framework, and look forward to working with the Scientific Python community to move NumPy into the next age of distributed scientific computing.  We will be updating Enthought’s website with project progress and updates.  We would like to thank the Department of Energy’s SBIR program for the opportunity to develop these packages, and the collaborators and industry partners whose support made this possible.

[0] http://science.energy.gov/sbir/awards/

[1] http://science.energy.gov/~/media/sbir/excel/2013_Phase_II_Release_1.xlsx

[2] http://www.python.org/dev/peps/pep-3118/

Introducing Enthought Canopy

-Eric Jones, Enthought CEO

Yesterday we launched Enthought Canopy, our next-generation, Python-based analysis environment and our follow-on to EPD.

Since 2003, the Enthought Python Distribution (EPD) has helped hundreds of thousands of scientists, engineers and analysts develop and deploy with Python. 2013 is its 10th anniversary! It’s hard to believe it’s been that long. Time flies when you are having fun.

Over the years, we’ve watched customers use Python for a variety of scientific and analytic computing applications. Their tasks fell into three areas: exploration, development, and visualization. We developed Canopy to specifically help with these 3 key analysis tasks.

Exploration:

Python makes it straightforward to acquire and manage data from a variety of sources like the web, databases and files on the computer. Using Canopy’s IPython interactive prompt, you can interactively access, combine and explore data from all these sources. With the IPython Notebook, you can capture your sessions, re-use them later or share them with others.

Development:

Analysts develop algorithms and scripts to work with their data and to deliver results. Canopy’s editor includes syntax highlighting to help identify elements in the Python code, auto completion to speed programming, and error checking to reduce time spent hunting bugs. You can execute entire scripts or selected lines from the editor at the IPython prompt to test and debug your code.

Python’s scientific and analytic computing ecosystem is quite large, with hundreds, if not thousands, of packages. Canopy itself ships with over 100 math, science, and analytic packages. And the ecosystem continues to grow and evolve. The Canopy Package Manager makes it simple to find, install and update these Python packages.

Visualization:

Beyond the numbers and the scripts, one of the best ways to understand and explain data is with visualization. Python provides many tools and packages to help visualize data and results. Within Canopy, Matplotlib works seamlessly with IPython and provides an extensive toolset for data plotting and presentation.

For more extensive graphical user interfaces, Canopy includes 2-D and 3-D visualization packages like Chaco and Mayavi.

We’re really excited about Canopy’s design. It brings together the full extent and power of Python in an easy-to-use analysis environment today and provides an analysis platform on which to build for the next 10 years. For more details, check the Canopy web-page.

 

Python Helps Win at Hardware Hackathon

–Jack Minardi, Enthought Developer

I recently participated in the Upverter + YCombinator Hardware Hackathon. My team placed first overall, and it was made possible with the power of Python.

The hackathon lasted about 10 hours, and the goal was to design and build a prototype hardware device. For our entry, my team built a wearable force feedback glove. When worn, the glove is able to simulate the feeling of holding a physical object. Potential uses for this device include gaming, surgical assistance, or other applications in augmented reality space. I have been interested in expanding the human-computer interface for a while, and this hack allowed me to explore the world of haptic feedback.

Mechanical Design
—————–
A length of twine is connected from the glove’s fingertips, through two guiding braces, back to a hobby servo. This is repeated for each finger. The servo is connected to a platform which is connected to the back of the glove. When the servo is actuated it pulls back on the twine holding the fingers open. Through this process we are able to simulate the resistive force of an object holding the wearers hand open.

For the demo at the competition, we used a distance sensor to set the hand position. The closer your hand was to the distance sensor, the more your fingers were pulled open. This simulated the feeling of squeezing a virtual object in your hand.

-Drawing by Tom Sherlock

Hardware I/O
—————-
To control servos and read from sensors usually you use GPIO pins. GPIO stands for General Purpose Input/Output. For output, the basic idea is that you can set the voltage on the pin high (5 volts) or low (0 volts). Sending the correct sequence of high and low pulses to a servo will cause it to go to a certain position. For input, you can read whether a certain pin is high or low. If you connect a sensor to an input pin, it is able to communicate information by sending specific sequences of high and low pulses.

Control Software
——–
In this case, to control the servo and read from the distance sensor, we used a Raspberry Pi running Python. The script that reads from the sensor and sets the servo position can be found here: https://gist.github.com/jminardi/5022297

This script uses a library I wrote called RobotBrain. It sits on top of RPi.GPIO and provides a higher level interface for controlling individual pins and motors. The only module used in this project was the Servo, which makes it easy to set a servo to a given position. The Servo module uses the ServoBlaster kernal module under the covers, which exposes the servo as device in the filesystem.

Conclusion
———-
In the end I had a great time at the hackathon. I think we were able to put together a winning demo in part because of the power of Python. With just a few libraries we are able to reason at a high level about what we wanted our servo to do and what our sensors were seeing. If you have ever tried doing hardware control in a lower level language, you know just how hard that can be, and as you can see, how easy Python makes it.

Read more here: http://techcrunch.com/2013/02/26/y-combinator-hardware-hackathon-winner/

To see more hardware control using Python, stop by Enthought’s booth at PyCon 2013 in Santa Clara. I will be demoing reading from sensors and controlling actuators using Python and a Raspberry Pi. We will also be giving out 6 raspi’s so stop by and enter your name.

 

A Python-based Framework for the EnergizAIR Program

The EnergizAIR project collects and publishes public-friendly interpretations of energy production statistics for various renewable energy sources (photovoltaic, thermal, wind) across several European countries.

Enthought’s role was to create the data-management framework for this project.  This framework was required to:

  • retrieve raw energy data and meteorological data from various online sources at regular intervals
  • process raw data: in particular, aggregating over time (e.g., what was the mean daily energy production for January 2012) and space (e.g., what was the total energy production for the Lyon region of France)
  • store raw and processed data in a reusable and easily searchable format
  • provide interpretations of processed data in ‘real-world’ terms (e.g., “in Belgium in March 2012, the average household equipped with photovoltaic solar panels produced enough energy to run a refrigerator and 2 televisions”).
  • create and distribute reports in various formats on a regular basis (e.g., a daily email sent to a particular email address, a weekly summary in XML format posted to an FTP server, monthly and yearly reports to various channels, etc.)

The output from the framework and the energy input sources could vary, so the task was to create a *flexible*, *extensible* framework that provided all of the infrastructure for the above requirements.

Challenges

Some of the challenges involved in this project:

  • Creating the framework from scratch meant putting significant thought and energy into the design, and iterating several times until we were satisfied that we had something that was clean, robust and flexible.
  • The deliverables needed to be usable and extensible by non-expert Python programmers:  that is, our end users were programmers rather than people interacting with a UI.  This meant investing a lot of time and energy in making sure that the API was well thought out and meticulously documented, and providing several example uses of the framework for those end-users to build on.
  • The data retrieval and report publishing components needed to be robust and give sane results in the face of external server problems.
  • We needed to be able to deal with inputs from various sources:  e.g., one set of energy inputs was available by downloading selected files from an FTP server;  another was available in JSON form through a RESTful web interface; yet another had to be retrieved directly from a JavaScript interface on an existing web page.  Moreover, the framework needed to be able to cope with the addition of new types of data sources at a later time.
  • Similarly, the framework needed to allow for various report distribution mechanisms (e.g., by email, HTTP, FTP).
  • The data storage backend needed to be potentially accessible by several processes and threads at once; we thus needed a ‘broker’ architecture to serialize read and write requests to the underlying storage.
  • Many of the tasks needed to happen at particular times and dates:  e.g., we needed to check a particular FTP server for new files after 7am on every weekday, or send an email report with the previous week’s statistics every Monday morning.  For this we developed a scheduler in Python, which became a core part of the solution.

Using Agile

As with almost every Enthought project, we used an agile approach to crafting the eventual deliverables:

  • Rapid iterations, together with continuous feedback from the customer, allowed us to converge on a working solution fast.  The customer had full read and write access to the development repository and issue tracker.
  • We used test-driven development for a large portion of the project, resulting in code with a high level of test coverage and a high degree of confidence in its reliability. This has been of great value for the EnergizAIR team when the project was handed off to them as it helped them to extend the framework with no fear of breaking the system.
  • Much of the code was developed using pair programming.  This was especially true during the early stages, where we were iterating over the design;  it resulted in a well thought out, consistent, easy-to-use API.

HDF5 and ZeroMQ

Using standard and well-tested solutions for data storage and inter-process communication allowed us to build a working solution rapidly.  We chose to use HDF5 for storage of the raw and processed time-series data, via the excellent PyTables Python package.  ZeroMQ, with its existing Python bindings, provided us with a lightweight and flexible solution for communication between the HDF5 broker process (serializing concurrent reads and writes to the HDF5 backend) and its clients.

A Clean, User-Oriented OOP Design

The solution revolved around a number of Traits-based Python class hierarchies, designed to be easily accessible and comprehensible to the programmers who would have to maintain and use the code.  Key base classes:

  • the Provider class represented a raw data source, encapsulating the information necessary to retrieve data from that data source.
  • the Report class represented the generated report
  • the Publisher class class was responsible for publishing reports once created
  • etc.

A Python-Based DSL

A major challenge was to produce a solution that was flexible and extensible, while remaining readable and requiring minimal Python knowledge to extend and modify.  Part of our solution was to create a Python-based Domain Specific Language (DSL).  Here’s some example code that sets up an application retrieving photovoltaic, meteorological and wind data from various French and Belgian sources, and publishing daily and weekly reports by email.  This code also serves to highlight the OOP design described above.

 

dms = DMS(
        logging_level = logging.DEBUG,
        storage  = HDF5TimeSeriesStorageClient(),
        schedule = [
            # Actions to import time series...
            Daily(
                at         = datetime.time(hour=8, minute=30),
                action     = ImportTimeSeries(
                    providers = [
                        EpicePvyieldProvider(
                            country=Belgium
                        ),
                        EpiceWeatherProvider(),
                        FranceWindProvider(),
                        IndexisWindProvider(),
                    ]
                ),
            ),
            # Actions to publish reports...
            Daily(
                at         = datetime.time(hour=11),
                action     = PublishReport(
                    publisher = EmailPublisher(
                        recipients = RECIPIENTS,
                        subject    = u'Daily France Wind Report'
                    ),
                    report = DailyFranceWindReport(),
                )
            ),

            Daily(
                first_date = datetime.date.today(),
                at     = datetime.time(hour=11),
                action = PublishReport(
                    publisher = EmailPublisher(
                        recipients = RECIPIENTS,
                        subject    = u'Weekly Belgium Report'
                    ),

                    report = DailyWebReport(

                        # Wind model.
                        speed_db   = GEOGRAPHIC_DB,
                        geographic_db      = REGIONS_DB,
                        wind_farm_db    = WIND_FARM_DB,
                        wind_turbine_db = WIND_TURBINE_DB,

                        # Photovoltaic model.
                        appliances_db            = APPLIANCES_DB,
                        equivalent_appliances_db = EQUIVALENT_APPLIANCES_DB,
                        systems_db               = SYSTEMS_DB,

                        # Thermal model
                        solar_point_names = SOLAR_POINTS_DB,
                        consumption_level = 140,

                        country = Belgium
                    )
                )
            ),
        ]
    )

 

 

More About EnergizAIR

The basic principle of the project was to utilize renewable energy indicators (in this case, photovoltaic, solar thermal, and wind energy) in an every day weather forecast.  Simply, how can wind and sun meet our energy requirements? Belgium, France, Italy, Portugal and Slovenia are all participating in the project and now provide renewable energy indicators online and in local weather reports.

See the project website for more examples of a renewable energy indicators, or watch the video overview of the project.

Scipy Early Bird Ends Today!

SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from both academic and commercial organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.

SciPy 2012 is a month away. Did you know…

You can save $50 with early bird registration, but it ends today, June 18. Register now to get this discount.

We have great rooms available while they last: Stay on-site at the AT&T Executive Education and Conference Center on the University of Texas Campus for $104 a night plus tax.  Current rates for other downtown Austin hotels range from $125-$279 per night, with most hotels quoting around $200.  Make your reservations now because at this rate, rooms will be sold out soon.

Interested in sponsoring SciPy2012? Sponsorship opportunities are still available.  Your sponsorship of the conference provides vital financial support to the community.  Let us know as soon as possible if you would like to sponsor.

Consider presenting your latest contributions to scientific computing. If you would like to submit a poster for consideration, send an e-mail with your proposed title, abstract and other information to 2012submissions “at” scipy.org.