EPD for OS X: public beta available

July 2nd, 2008

I am pleased to announce that we have just made available our first public beta of the Enthought Python Distribution for Mac OS X!   This is a universal binary (x86, PPC) for OS X 10.4 and 10.5.   It contains all the same libraries and components as our other EPD platform releases, but does not yet contain the EPD documentation nor ETS examples.   Those will be added soon.

You can download a trial, academic version, or make a purchase here.

Download and mount the .dmg, read the “EPD README.txt” in the root of the image, then run the “EPD.mpkg” to install.   This will not overwrite your OS X system Python, nor a previous install of MacPython.  It does install itself as the default Python on your system, but a few symlinks changes will allow you to control that.  Uninstall instructions are documented in the “EPD README.txt”.

Please report any problems, comments, or suggestions on our EPD Trac site at https://svn.enthought.com/epd

About EPD
The Enthought Python Distribution (EPD) is a “kitchen-sink-included” distribution of the Python™ Programming Language, including over 60 additional tools and libraries. The EPD bundle includes the following major packages:

Python - Core Python
NumPy - Multidimensional arrays and fast numerics for Python
SciPy - Scientific Library for Python
Enthought Tool Suite (ETS) - A suite of tools including:
   Traits - Manifest typing, validation, visualization, delegation, etc.
   Mayavi - 3D interactive data exploration environment.
   Chaco - Advanced 2D plotting toolkit for interactive 2D visualization.
   Kiva  - 2D drawing library in the spirit of DisplayPDF.
   Enable - Object-based canvas for interacting with 2D components and widgets.
Matplotlib - 2D plotting library
wxPython - Cross-platform windowing and widget library.
Visualization Toolkit (VTK) - 3D visualization framework

There are many more included packages as well.  Please see the complete list of the packages.

License
EPD is a bundle of software–every piece of which is available for free under various open-source licenses.  The bundle itself is offered as a free download to academic and individual hobbyist use.  Commercial and non-degree granting institutions and agencies may purchase individual subscriptions for the bundle or contact Enthought to discuss an Enterprise license.  Please see the FAQ for further explanation about how the software came together.

A Perspective on Numpy

July 1st, 2008

In a former life, before joining Enthought, I was a researcher in operator algebras, a field of mathematics which can be briefly summed up as the study of certain sets of “infinite-dimensional matrices.” My first reaction when seeing numpy (actually, numerical python, this was a while back) was “A matrix module! In Python! Show me more!”

But then I looked a little deeper, and my second reaction was “Why on earth would you not have the product of two arrays be the matrix product?” And then: “And why is function application element-wise?”

From the point of view of an operator algebraist everything interesting comes out of non-commutative multiplication, and clearly the “right” way to apply a function to a matrix is

f(A) = U f(D) U *

where A = U D U * is a unitary diagonalization of A, and f is applied just to entries on the diagonal of D. The fact that this only works for unitarily diagonalizable matrices was a minor concern for me at the time… after all I was a pure mathematician!

Since then, particularly in a more recent life as a scientist at a data visualization company, I’ve come to appreciate the way that numpy slices and dices arrays of numbers. And numpy now meets me half-way with a first-class matrix type, so I can at least write A*B and have it mean what I want it to.

But for those who still think that numpy made the wrong choice for applying functions to arrays…

from numpy import *

def mfunc(func):
    def new_func(x):
        x = mat(x)
        if not allclose(x*x.H, x.H*x):
            # matrix is not normal-ish
            raise ValueError, "matrix must be normal"
        # matrix is normal-ish
        e, U = linalg.eig(x)
        return U*mat(diag(func(e)))*U.H
    return new_func

Which you can then use as:

mexp = mfunc(exp)

for any ufunc, or as a decorator on your own functions:

@mfunc
def f(x):
    return cos(x) + 1.0j*sin(x)

When I showed this to Travis Oliphant, he pointed out that scipy already has a function linalg.funm which does something similar to the above, but not as a decorator. So if you have scipy, you can instead use:

import scipy.linalg

def mfunc(func):
    def new_func(x):
        return scipy.linalg.funm(x, func)
    return new_func

Blocks and Contexts

June 27th, 2008

My background has been in engineering and scientific circles mainly in academia, so programming has been something I have primarily used to solve problems. At Enthought we solve problems, but most of the time we are creating solutions to help others solve problems. We are trying to create the tools that someone like me would want to use.

There are several compelling ideas and tools that are buried in the Enthought Tool Suite that we have used to build commercial scientific applications over the past several years. Two of these ideas are the Block and the Context. These concepts have wiggled into the foundation of almost all of our most recent applications because they seem to be solving fundamental problems with how to integrate scientists and engineers typically procedural workflows into a larger GUI application in flexible and re-usable ways.

Eric developed the ideas behind the Block and the Context over 18 months ago and presented them at the SciPy 2007 conference. His slides for that talk are here. Our implementation of these ideas has been slowly but steadily improving as we have learned more about how to make use of them. They are becoming a common abstraction that we use all the time in thinking about new problems. In this way they are a bit like the reactive programming and automatic GUI building of Traits, the pluggable, interactive plotting of Chaco, and the plugin-based application building of Envisage in their importance.

In order to bring to the surface the Block and Context abstractions that have been buried in the BlockCanvas project, two days ago I created a new project in the Enthought Tools Suite called CodeTools. This project contains the two packages enthought.blocks and enthought.contexts. While I’m thinking about these ideas, I wanted to demonstrate a little bit how you can use these tools as well as showcase a few things that I’ve added to enthought.blocks which makes working with them even easier. I think there are some good ideas here, but I’m not sure our implementation is the best one yet. I’m anxious to engage more people in the discussion about how we should be doing this sort of thing.

To get CodeTools you need to get the latest version of ETS. Instructions for how to do this are here. You don’t actually need to make sure all of ETS is working, Traits is enough except for the units package that is still in BlockCanvas, but which I will moving to SciMath soon.

As explained by Eric in his slides, a Block is a set of executable instructions — an intelligent version of the string you might pass to the exec statement. The intelligence of the blocks come from the code analysis they perform when they are created. With these smarts, you get three very nice things:

  1. Information about the “inputs” required and “outputs” delivered by the code block.
  2. The ability to execute the same code over and over again using different namespaces (thus taking inputs from different sources and delivering outputs to a dictionary-like object of your choice).
  3. The ability to restrict execution to the portion of the code block that depends on a specified set of inputs and/or delivers a specified set of outputs.

Over the past few days, besides moving the blocks into their own project I’ve added a few enhancements to the blocks which makes them easier (in my mind) to use. These enhancements are:

  1. Decorators to turn the body of a function into a Block (or a string that could be later turned into a block).
  2. Separation from the outputs of the variables imported from somewhere. These are now stored as “fromimports” attributes
  3. Cataloging of any variables which are assigned constant expressions in the Block.
  4. A method to return a function from the block given a list of inputs to vary and the list of outputs to return. This function will only execute that portion of the block needed to produce the requested outputs.

With the decorators, you can now create a block very easily:

from enthought.blocks.api import func2block

@func2block
def block():
    c = a + b
    d = a - b
    g = f + e
    h = f - e

This little decorator means no more little code snippets running
around in strings messing up the syntax highlighting and the tab
indentation capabilities of your favorite editor. There is also a
func2str decorator which will take the body of a function and produce
a string (with the first level of indentation removed).

The other little improvement I want to advertise is the ability of the
block to now return a function that evaluates the (restricted) block
based on the list of inputs and outputs provided. Thus,

func1 = block.get_function(inputs=['a','b'], outputs='c')
func2 = block.get_function(inputs=['f','e'], outputs=['g','h'])

returns two functions equivalent to

def func1(a,b):
    c = a + b
    return c

def func2(f,e):
    g = f + e
    h = f - e
    return g, h

The big deal about blocks is really the code restriction. You can
take a scientists script and re-evaluate (restricted) versions of it
for “what-if” analysis, optimization, or even stochastic processing.

It’s a cool idea, but it will have to be left to another blog to make
this more clear.

My Scientific Tool Stack

June 24th, 2008

Numenta is distributing some of my favorite tools with their NuPIC toolkit including Python itself (on windows), wxPython, IPython, Traits and matplotlib.  They’re also building some of their prototype and example applications using Traits. Enthought’s typical tool stack for building scientific apps looks like this (with the stuff Numenta currently distributes in white letters):

Yesterday, I gave a brief talk [.pdf] about the stack of tools that Enthought uses for building scientific applications at Numenta’s HTM Workshop (a gathering of over 200 folks interested in their library which implements Jeff Hawkins’ Hierarchical Temporal Memory ideas). My talk included a very gentle introduction to Traits/TraitsGUI.

I also briefly demonstrated some of the capabilities of the Chaco library (2D visualization) by showing some of the examples, and Mayavi (3D visualization) as an example of a Traits and Envisage (plugin framework) application.

Of particular interest to me today: there was a presentation by Subutai Ahmad showing a nice TraitsGUI-based node inspector for Numenta’s HTM library.  It reminded me a lot of the “GUI for free” that happened with the TConfig project.

Compile decorator

June 17th, 2008

The PyPy project has some very nice things in it. Among those things is the translator module. The purpose of this module is to compile a restricted subset of Python (RPython) into C and other lower level languages. Within the PyPy project, the translator is used to compile the PyPy interpreter. However, it is also possible to compile individual Python functions at runtime and dynamically execute the compiled function. The translator module makes this task very easy. I have written a class which allows using the compiled version of a Python functions simply by adding a decorator to the function:

@compdec
def foo(x, y):
    return x + y

The nice thing about doing things this way is that all the code is pure Python code, and switching between the compiled and uncompiled version of the function is as simple as possible. You can find the definition of this decorator on my personal enthought site.

My Kind of People: Summer Python Gatherings

June 17th, 2008

You used to have to identify fellow “technical” folks by the velcro on their shoes (hey, velcro is very efficient), but today there are a myriad of ways to spot and mingle with like-minded people.  So if you agree that an evening sharing tales of code vectorization is an evening well spent, there are a few special gatherings you may be interested to attend this year.  I’ve listed the meetings that I’m going to (or helping with) and some important dates:


EuroSciPy 2008

EuroSciPy 2008: Scientific Computing with Python, European Style

This is the first annual Europe-hosted SciPy Conference. It will match the stye and content of the popular US version (see below) which has been around for 7 years now.  It will be in Leipzig, Germany.  Travis Oliphant is delivering the keynote talk.

  • June 20: Early-bird registration deadline
  • July 26-27: Conference

SciPy 2008

SciPy 2008: Scientific Computing with Python

This will be the 7th annual gathering.  Proceedings will be published for the first time this year.  Alex Martelli is delivering the keynote talk.

  • June 27: Abstracts Due
  • July 11: Early-bird registration deadline
  • August 19-20: Tutorials
  • August 21-22: Conference
  • August 23-24: Sprints

PyCamp

Texas Python Unconference (link to last year’s site)


An informal, self-organizing gathering.  It will be hosted in Austin this year.

  • October 4-5 (tentative)

Feel free to list any other meetings in the comments.

Traits for doing Reactive Programming

May 30th, 2008

I have done event-driven programming. I have done structured programming. I’ve done OOD that was exceeded only by the Platonic Ideal. I’ve done spaghetti code that would make an Italian chef proud. But I’ve never, until coming to Enthought, done Reactive Programming.

Enthought’s Traits module leads to doing a different kind of software development. I think Reactive Programming is the best way to describe Traits programming. It’s possible to do some really elegant things with it: clean, tight, transparent. And it’s possible to do some evil things too: hidden-dependencies, and obfuscated code.

I’ll see about getting some examples up soon. But not today. Not on my Friday afternoon.

Project Estimation Guidelines

May 19th, 2008

When preparing to embark on a new project it is useful to establish a proper expectation on cost, time, and quality of the project. So here are a few things that you should think about and document as part of your planning process,

  1. Project scope statement: What exactly will the project accomplish?
  2. Project management plan: How will the project be monitored? Will a single person keep track of the hours and scheduling? How will the customer, or stake-holders, be kept informed of how the project is proceeding?
  3. Work breakdown structure: The chunks necessary to complete the tasks assigned in the scope. This may be Trac tickets, or code modules, etc..
  4. Resource allocation: Who will be on the project and when?
  5. Risk analysis: What are the most likely things to go wrong? Often it’s that the code takes longer to write than expected, but what about unexpected hardware problems, time constraints, or specific customer issues?

When doing your analysis, it’s easy to see the big picture. However, there’s a lot of small and not-so-small tasks that you need to remember. There’s a good chance you’ll need to take into account the following tasks, even if individual task sizes will vary based on the size of the development team and the scope of the application.

Project overhead

Status meetings, phone calls, talking to customers, managing people, generating reports, mid-stream application redesigns all add up to a big chunk of change. Some projects can get by with 10% overhead, others will push 50%. It depends on the size of the project and the degree of reporting desired. On some projects I was able to get the overhead down to about 8-10 hours for every 120 hours logged. But the numbers went up markedly at the start and the end of the project life-cycle.

Time to estimate the project task estimates

Since Enthought is a software company, we estimate how long it will take to write or provide a software application. Estimating how long it will take to write code is one of the riskiest estimates in a project since ongoing development often reveals additional features or hurdles that were not originally planned for. Creating the task estimates requires having some of the design work completed to make sure all subtasks have been identified. Therefore, the risk of estimate overruns can be reduced by doing a design phase to identify, design, and estimate the subtasks. But extensive design phases can be expensive and possibly wasteful if the project scope is changed to not include the estimated subtasks.

Software build deliverable

Generating a build of the application for the customer should be a trivial task but historically all kinds of problems crop up with libraries, paths, versioning, you name it. Make sure to plan enough time to deliver to the customer the product or service they requested.

Team member training

There will need to be an initial training phase for any new team members joining the project. If the team is already trained then the estimates can be low, but if there is a lot of new domain knowledge to be learned, or the initial development environment takes a long time to set up, or there will be a lot of members joining and leaving the team then this number could be much higher.

Demos and project realignment

In a dynamic and iterative development environment like we have at Enthought, the customers play an active role in the guidance of the tools we write. That means we take time to demo the application, and integrate the feedback they provide into what has already been written and where the project is going. The demos and realignments take time, so include them into your schedule and budget. One possible time-sink is rewriting GUIs that already work, but need to be redesigned based upon customer feedback.

Integration Testing

Integration testing is different than unit tests, in that problems can arise when pieces start fitting together. Integration can mean different things for specific projects. For example, integration testing might verify that Envisage Plugins that were written separately do actually work together, or it could be testing that the application integrates with the customer’s target server, or it could be integrating an old code base into an new environment.

Quality Assurance

Before delivering the final product it should be rigorously tested to verify that everything works as expected. New features may have accidentally broken something that was tested in the past, so the application should be locked-down and tested one last time before being sent out.

Documentation

Documentation is an important component of a project, yet often forgotten by developers when doing a work breakdown analysis. When estimating documentation costs, be sure to include the time that the technical writer will spend on writing the documentation, AND the time that developers will spend being interviewed by the documentation team.

Known Unknowns

A time reserve, or ”contingency allowance” should also be created. This pool of resources is then managed by the project manager to be used as needed. Unexpected events occur in projects with such regularity that they often are included in the initial estimates and called ”known unknowns” so that other task estimates can be exceeded without a need to modify the project schedule. Some developers calculate a value by multiplying the total estimates by a scaling factor, but the multiplier is only useful if the original estimates are correct. Whether the reserve resources are actually included in the budget plan depends upon who the estimator is. A developer shouldn’t be able to dip into the contingency allowance without the project manager’s approval. But then the Management Team may have a second contingency allowance for ”unknown unknowns” that even the project manager does not have access to without approval.
When doing your analysis, the big pictuce

Greg Wilson speaking at the Austin Python User Group meeting

May 12th, 2008

Wednesday, May 14th, Greg Wilson will be joining us in Austin for the monthly APUG meeting.  He’ll be talking about Beautiful Code.  If you’re in the area, swing by Enthought’s Offices right downtown at the corner of 6th and Congress (the Epicenter for Weirdness, as we like to call it).

There’s more information at the python.org wiki page for the User Group and at the meetup site for APUG.

Facelift for code.enthought.com: of Commodities, Communities and Mullets

May 2nd, 2008

We’ve recently refreshed the look and content of our code.enthought.com site, so I thought I’d provide my thoughts about what the site is about and what it means to Enthought (and the world). I’ll apologize in advance for rambling.

code.enthought.com

For several years Enthought has hosted code.enthought.com as a site for the Open Source tools that we’ve developed and used in our business of custom application development. The collective set of tools is called the Enthought Tool Suite (ETS). The impetus for the site came from some technologies that we believe will change the landscape of scientific computing and which we believe should be software commodities.

What do we mean by Software Commodity?

A commodity is a product that you don’t ship and your competition does —Jonathan Schwartz, Sun Microsystems

There are many aspects of traditional commodities that provide nuance to the idea of commodity software, like equilibrium price, interoperability, fungibility, substitution, etc. My working definition for the purpose of Enthought’s mission hearkens to the French commodité, or even better, the Latin commoditas, which both connote convenience, appropriateness, suitability, general usefulness, or common utility. So any piece of software about which you may say “No one should ever have to write another ____” fits into my meaning. Some examples we’ve come up with:

  • No one should ever have to write another 2-D visualization library (Chaco)
  • No one should ever have to write another 3-D visualization library/general application (Mayavi, thanks to Prabhu’s vision and hard work)
  • No one should ever have to write another event model (Traits)
  • No one should ever have to write another adapter library (Traits)
  • No one should ever have to write another interface library for Python (Traits)
  • No one should ever have to write another GUI abstraction layer (TraitsGUI)
  • and many more…

So what’s our mechanism for the commoditization of software? In three words, the BSD license. There’s nothing new here. Commodity is an important by-product of Open Source. It’s a beautiful way to compete very aggressively on price (free!) and marshal a community of academic and commercial interests. It doesn’t alienate (most of) the Open Source world and it doesn’t frighten (most of) the commercial world. It provides the most liberal license for an extremely free use of the software while allowing us to protect the aspects of our clients’ business that they view as a competitive advantage (no copyleft). To my thinking, it not only creates commodity, it creates abundance. While there are many benefits to open licensing, the most important to us is the community it creates.

Speaking of Community…

Forming and working on the experiment that we call “Enthought” has been a wild ride. I can’t imagine a more lasting thing I’ll help create (except my kids; and my contribution there was stunningly trivial). We’ve worked with a community of others to create tools that should outlive us all. The truth about open source community interaction, however, is that it’s hard. It’s messy. It’s not immune to politics. It takes work. We’ve been fortunate in that the SciPy, ETS and the other Python communities are made of extraordinarily bright, thoughtful, slightly crazy people—so tedious squabbles are very rare. Good things happen because a community can rally around a vision of common utility, esteem for contributions is based on the calculus of that utility and, for the most part, everyone’s goals are aligned. It’s a virtuous circle.

Our hope as an organization is that we can help foster this environment—to equip the community without directing it, to contribute without owning, to create common utility. How does this jibe with the following facts?

  1. Enthought is in business to make money.
  2. We don’t work for free.
  3. We are self interested.
  4. We need to feed those kids somehow.

We founded our company with the quaint, yet earnest, belief that we could do well by doing good. And you know what? It worked. We’ll let others judge the amount of good we’re doing, but we have created a thriving business with a model that leans heavily on using and creating Open Source software.

We fully recognize that there will be folks who are skeptical about what Enthought is about—they should be. Any representations we make about our company should be borne out by our actions. We invest heavily in open source software. We do see the benefits—both in leveraged utility and in the area of marketing our company.

So what does this have to do with the changes at code.enthought.com?

We’re excited about finally committing some resources to spruce up the site and provide some content that’s actually useful. We’ll continue to invest in the tools and in helping the growing community of developers and users. That said, the site is the product of a community effort and we hope that our efforts are just priming the pump.

There are actually two venues of content available for folks interested in these tools. The more-static, more-business-like code.enthought.com site that was recently refurbished, and the constantly-changing, wild party that happens on the trac instance (issue-tracking, wikis, roadmaps, etc.) that supports the community development efforts. These two areas cooperate in what has been eloquently called a mullet strategy by some (business in the front, party in the back—thanks to Janet for the reference). The mullet appoach makes a lot of sense for a community site, (and it’s fully neologism compliant: see the pro-am movement, the long tail, crowdsourcing, etc. ;-) ). Cynicism aside, we think it’s a nice way to balance content consistency and relevance.

Both the code.enthought.com site and the trac instance are community sites. If you can get some utility from participating in the community or if you want to pay commodity prices (or less), join us.

There is serious work happening in both professional and academic settings using these tools. The productivity gains afforded are nothing short of astonishing. I encourage anyone who is interested to read the front matter of the site and provide feedback We hope it makes a strong business case for utilizing the tools.

All that said, there are many great ideas that have yet to be implemented. If you’re interested in joining the conversation and getting elbow-deep in code I encourage you to get involved. Come join the party out back.