Installing and Managing a Central Python Install with Enthought Canopy v1.1

Author: Jason McCampbell

In the last post we talked about virtual environments and how we have back-ported venv from Python 3 and extended it in Canopy 1.1. This post will now walk through how we use virtual environments to provide new options to organizations and workgroups who want to install Canopy on a multi-user network and how Canopy provides a flexible Python environment on large compute clusters without sacrificing performance.

Multi-user Network Installs

In a standard, single-user installation, Canopy creates two virtual environments, System and User. System is used for running the GUI itself and User is the main Python environment for running user code. The package set in User is completely under the user’s control (ie, won’t break the GUI).

With the 1.1 release, Canopy supports the creation of shared versions of the System and User virtual environments. These virtual environments, referred to as Common System and Common User, can be centrally managed, providing an easy means of managing a consistent set of package versions and dramatically reducing disk usage by having shared copies of the packages. Each individual user’s System and User virtual environment are layered on top of the common installs as shown below.

Canopy venv layout

In this case, Canopy Core and the two virtual environments “Common System” and “Common User” are installed in a central networked disk. Typically, all of the standard packages would be installed in “Common User”, making them available to all users. When each user first starts Canopy, the per-user virtual environments “User’s System” and “User’s User” are automatically created. Users have the freedom to install new packages and alternate package versions in their own virtual environments while still benefitting from the centrally managed package set.

To set up this structure, after installing Canopy, an administrator first runs Canopy and creates the System (“Common System”) and User (“Common User”) virtual environment in the desired location as one would in a single-user environment. Changes to the package set in User can be made by this administrative user. To make these environments available to all users, the following command is run, again as the administrative user:

canopy_cli –common-install

This writes a file named ‘location.cfg’ to Canopy Core. Now whenever a user starts Canopy, the per-user environments will be layered on top of the common environments.

The initial setup of the virtual environments, by default, uses the Canopy GUI, which is not always available or desired. To address these cases, Canopy now supports a new switch “–no-gui-setup’. See the Canopy Users Guide for more details.

Cluster Installs

Large compute clusters are an interesting special case of the multi-user network because a large number of nodes may be requiring the same resources at the same time. Starting a 1000-node job where a large number of files are required from a networked disk can increase startup time substantially, wasting precious time on an expensive cluster. Ideally, most or all of the files will be local to each node.

We can use a modified version of the multi-user setup above to address this. After installing Canopy on each node, we want to create the System and User virtual environments with all of the standard packages installed. Running the GUI to install to 1000+ machines is … inefficient… so we will use the non-GUI setup option (assuming Canopy is installed in /usr/local/Canopy on each machine):

ssh node1 /usr/local/Canopy/bin/canopy_cli –no-gui-setup –install-dir /usr/local/Canopy –common-install

Running this command once for each node in the cluster results in the virtual environments being installed to /usr/local/Canopy/Canopy_64bit on each machine. Large packages such as NumPy and SciPy can now be loaded from the local disk instead of being pulled over the network.

How do users add their own packages? When each user starts Canopy from the same or similar core install, Canopy will create the user-specific virtual environments layered on top of the ones in /usr/local/Canopy/Canopy_64bit. This gives us the structure shown in the diagram below where Canopy Core and the common virtual environments are local to each node (ie, fast I/O access) and the user environments are on a networked file system.

Canopy cluster install

It should be noted that while the Canopy GUI may be available on the cluster one would typically not use the GUI on the compute nodes. Instead, the “User’s User” virtual environment can be used like a standard Python distribution, such as EPD, to execute the Python application. But the big advantage to this structure over a plain Python installation is that we have the performance advantage of having most of the Python packages local to each node while also providing an easy means for users to customize their environments. Users can run the Canopy GUI on their desktop to prototype an application and then run the same application on the compute cluster using the same package set — no additional configuration needed.

For more, get Canopy v1.1 and try it out.

3 responses so far

  • avatar dharhas says:

    Hi,

    The cluster stuff seems really exciting. Have y’all tried using it on rocks cluster (http://www.rocksclusters.org/) ? In case you haven’t heard of it, it is one of the most popular distributions for managing HPC clusters (See http://www.rocksclusters.org/rocks-register/ for a list of folks using it). It is basically a customized version of CentOS 6.

    Anyway, we have a small cluster running rocks and I’m interested in trying out Canopy on it. Not sure how the paradigm listed above will work with rocks yet. Basically with rocks you set up the head node with everything you need and then compute nodes are kickstarted from the frontend. Anything installed directly on the node will disappear on the next kickstart.

    So for this to work with Canopy there would either need to be an rpm that could be deployed to the compute nodes or maybe just add ‘ssh node1 blah’ command as one of the post install instructions.

    The ideal awesomeness would be if there was a Canopy ‘Roll’ for Rocks. ‘Rolls’ are the rocks ways of installing various things like say the intel fortran compilers on the cluster.

    • I’m not aware of anyone using Canopy on a Rocks cluster yet, but I’d be very interested to understand more about it. From your description and a quick read-through of some of the docs, I think our model fits well.

      We don’t have an RPM available (though some users have made their own), but I think it should work if Canopy plus the common virtual environments are installed on the head node and then copied verbatim to the compute nodes. Each user’s virtual environments will reference copies local to each compute node.

      Feel free to shoot me email (jmccampbell@enthought.com or canopy.support@enthought.com) and we can talk more specifics if you like.

      Jason

Leave a Reply

Featuring Advanced Search Functions plugin by YD