Monthly Archives: July 2011

Import registry hooks in EPD

At the end of April (2011), Travis and I were playing with Python importhooks. The goal was to create an import registry mechanism which wouldallow multiple versions of a package to be installed and easily switchingbetween those versions. The idea is not new, setuptools provides such amechanism by installing packages into egg directories and using theeasy-install.pth file. However, as EPD packages are (as of EPD 5.0) notbeing installed using setuptools anymore (because of all the problems wehad with it), we were looking for a better alternative.

Starting with EPD 7.1 (including EPD Free), we have added an importhook registry mechanism to the core Python interpreter. This mechanismis useful for setting up different environments, i.e. Python runtime
environments in which a specific set of packages are importable.
For example, it is possible to set up one environment in which numpy 1.5.1
and pyzmq 2.1.7 are present, and another environment in which numpy 1.6.0
and no pyzmq is present. This can be useful is may different scenarios,
e.g. testing ipython with or without different versions of optional
dependencies installed. The alternative would be to set up multiple
Python interpreters and install the desired packages into each one.
virtualenv can help here, but adds it’s own set of problems.

This mechanism is based on the not very well known sys.meta_path listof finder objects, see PEP 302 for the original specification. By default sys.meta_path isan empty list, in any regular Python and EPD. However, when adding a file <sys.prefix>/registry.txt (or setting the environment variable EPDREGISTRY to a special registry file), the new registry mechanism is activated, and sys.meta_path now contains an instance of our new PackageRegistry object. As sys.meta_path is searched before any implicit default finders or sys.path, the mechanism allows importing different modules (and packages) than the ones installed into site-packages. The PackageRegistry class is quite simple. Basically, the find_module method checks whether the module name is contained in the registry, and the load_module method uses imp.load_module for the actual importing. The code can be found in <sys.prefix>/lib/python2.7/custom_tools/, which also defines a main function for setting up the hook, which is called from Python’s

Let’s return to the above example, suppose you want to have two environments:

  • env1: numpy 1.5.1, pyzmq 2.1.7
  • env2: numpy 1.6.0, no pyzmq

First, you need to download (or create) the corresponding eggs. Now theseeggs need to be installed in a special location (not into site-packages,or any other directory on the PYTHONPATH):

$ egginst --hook numpy-1.5.1-2.egg
$ egginst --hook numpy-1.6.0-5.egg
$ egginst --hook pyzmq-2.1.7-1.egg

This will install the eggs into versioned directories in <sys.prefix>/pkgs. During the install process egginst also creates a registry file for each package, e.g. <sys.prefix>/pkgs/numpy-1.5.1-2/EGG-INFO/registry.txt. Now simply concatenate the desired registry files to create the registry file EPDREGISTRY is going to point to:

$ cd <sys.prefix>
$ cat pkgs/numpy-1.5.1-2/EGG-INFO/registry.txt > registry1.txt
$ cat pkgs/pyzmq-2.1.7-1/EGG-INFO/registry.txt >> registry1.txt
$ cp pkgs/numpy-1.6.0-5/EGG-INFO/registry.txt registry2.txt
$ EPDREGISTRY=registry1.txt python
>>> import numpy
>>> numpy.version
>>> import zmq
>>> exit()
$ EPDREGISTRY=registry2.txt python
>>> import numpy
>>> numpy.version
>>> import zmq # this is supposed to fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module names zmq

The files registry1.txt and registry2.txt do not have to reside in sys.prefix, they can be anywhere on your system. All that is important is that EPDREGISTRY points to their path.

We have tested the new import registry mechanism with all packages in EPD and initially considered installing all EPD packages in this manner. However Robert Kern was against it, and we came to the realization that he was right. The multiversion directory layout for packages is not good for development, as one is never really sure what versions are active. This is the same reason I created egginst in the first place, with the ability to install packages “splatted out” into site-packages and remove them or upgrade them cleanly.