Tag Archives: Canopy Core

Installing and Managing a Central Python Install with Enthought Canopy v1.1

Author: Jason McCampbell

In the last post we talked about virtual environments and how we have back-ported venv from Python 3 and extended it in Canopy 1.1. This post will now walk through how we use virtual environments to provide new options to organizations and workgroups who want to install Canopy on a multi-user network and how Canopy provides a flexible Python environment on large compute clusters without sacrificing performance.

Multi-user Network Installs

In a standard, single-user installation, Canopy creates two virtual environments, System and User. System is used for running the GUI itself and User is the main Python environment for running user code. The package set in User is completely under the user’s control (ie, won’t break the GUI).

With the 1.1 release, Canopy supports the creation of shared versions of the System and User virtual environments. These virtual environments, referred to as Common System and Common User, can be centrally managed, providing an easy means of managing a consistent set of package versions and dramatically reducing disk usage by having shared copies of the packages. Each individual user’s System and User virtual environment are layered on top of the common installs as shown below.

Canopy venv layout

In this case, Canopy Core and the two virtual environments “Common System” and “Common User” are installed in a central networked disk. Typically, all of the standard packages would be installed in “Common User”, making them available to all users. When each user first starts Canopy, the per-user virtual environments “User’s System” and “User’s User” are automatically created. Users have the freedom to install new packages and alternate package versions in their own virtual environments while still benefitting from the centrally managed package set.

To set up this structure, after installing Canopy, an administrator first runs Canopy and creates the System (“Common System”) and User (“Common User”) virtual environment in the desired location as one would in a single-user environment. Changes to the package set in User can be made by this administrative user. To make these environments available to all users, the following command is run, again as the administrative user:

canopy_cli –common-install

This writes a file named ‘location.cfg’ to Canopy Core. Now whenever a user starts Canopy, the per-user environments will be layered on top of the common environments.

The initial setup of the virtual environments, by default, uses the Canopy GUI, which is not always available or desired. To address these cases, Canopy now supports a new switch “–no-gui-setup’. See the Canopy Users Guide for more details.

Cluster Installs

Large compute clusters are an interesting special case of the multi-user network because a large number of nodes may be requiring the same resources at the same time. Starting a 1000-node job where a large number of files are required from a networked disk can increase startup time substantially, wasting precious time on an expensive cluster. Ideally, most or all of the files will be local to each node.

We can use a modified version of the multi-user setup above to address this. After installing Canopy on each node, we want to create the System and User virtual environments with all of the standard packages installed. Running the GUI to install to 1000+ machines is … inefficient… so we will use the non-GUI setup option (assuming Canopy is installed in /usr/local/Canopy on each machine):

ssh node1 /usr/local/Canopy/bin/canopy_cli –no-gui-setup –install-dir /usr/local/Canopy –common-install

Running this command once for each node in the cluster results in the virtual environments being installed to /usr/local/Canopy/Canopy_64bit on each machine. Large packages such as NumPy and SciPy can now be loaded from the local disk instead of being pulled over the network.

How do users add their own packages? When each user starts Canopy from the same or similar core install, Canopy will create the user-specific virtual environments layered on top of the ones in /usr/local/Canopy/Canopy_64bit. This gives us the structure shown in the diagram below where Canopy Core and the common virtual environments are local to each node (ie, fast I/O access) and the user environments are on a networked file system.

Canopy cluster install

It should be noted that while the Canopy GUI may be available on the cluster one would typically not use the GUI on the compute nodes. Instead, the “User’s User” virtual environment can be used like a standard Python distribution, such as EPD, to execute the Python application. But the big advantage to this structure over a plain Python installation is that we have the performance advantage of having most of the Python packages local to each node while also providing an easy means for users to customize their environments. Users can run the Canopy GUI on their desktop to prototype an application and then run the same application on the compute cluster using the same package set — no additional configuration needed.

For more, get Canopy v1.1 and try it out.

“venv” in Python 2.7 and how it Simplifies Life

Virtual environments, specifically ‘venv’ which we backported from Python 3.x, are a technology that enables the creation of multiple, lightweight, independent Python environments. Each virtual environment appears to be a self-contained Python installation, but loads the Python standard library and other common resources from a common base Python installation. Optionally, a virtual environment can also load packages from its base Python environment, whether that’s Canopy Core itself or another virtual environment.

What makes virtual environments so interesting? Well, they reduce disk space by not having to duplicate the full Python environment each time. But more than that, making Python environments far “lighter” enables several interesting capabilities.

First, the most common use of virtual environments is to allow separate projects to run in separate environments with different packages requirements. Each Python application runs in a separate virtual environment so package updates needed for one application don’t break the others. This model has long been used by web developers as well as a few scientific software developers.

The second case is specifically enabled by Canopy. Sharp-eyed readers will have noted in the first paragraph that we said that a virtual environment can have Canopy Core or another virtual environment as the base. But virtual environments can’t be layered, right? Now they can.

We have extended venv to support arbitrary numbers of layers, so we can do something like this:

'venv' in Canopy

‘venv’ in Canopy

‘Project1’ can be created with the following Canopy command:

canopy_cli  setup  ./Project1

Canopy constructs Project1 with all of the standard Canopy packages installed, and Project1 can now be customized to run the application. Once we’ve got Project1 working with a particular Python configuration, what if we want to see if the application works with the latest version of NumPy? We could update and potentially break the stable environment. Or, we can do this:

./Project1/bin/ venv  -s  ./Project1_play

Now ‘Project1_Play’ is a virtual environment which has by default all of Project1’s packages and package versions available. We can now update NumPy or other packages in Project1_play and test the application. If it doesn’t work, no big deal, we just delete it. We now have the ability to rapidly experiment with different (safe) Python environments without breaking our stable working area.

Canopy makes use of virtual environments to provide a protected Python environment for the Canopy GUI application to run in, and to provide one or more User Python environments which you can customize and run your own code in. Canopy Core is the base for each of these virtual environments, providing the core Python components and several common, large packages such as Qt and PySide. This structure means that the Canopy GUI can be updated without impacting your code, and any package updates you install won’t destabilize the Canopy GUI.

Canopy Core can be updated if you want, such as to move to a new version of Python, and each of the virtual environments will be updated automatically as well. This eliminates the need to install a new Python environment and then re-install any third-party packages into that new environment just to update Python.

For more information on how to set up virtual environments with Canopy, check the online docs, or get Canopy v1.1 and try it out.

Our next post will detail how to use Canopy and virtual environments to set up multi-user networks and cluster environments.

For all you EPD Users: Canopy v1.1

EPD (Enthought Python Distribution) provided a simple install of Python for scientific computing on the major platforms: Windows, Linux and Mac-OS. Those looking for a clean, straightforward Python stack to unpack into a particular directory found EPD to be pretty ideal.

With the introduction of Enthought Canopy, we began addressing users who are more engineer or scientist than programmer and were much less familiar with command-line interfaces. The Canopy desktop (in the vein of MATLAB or Spyder) aims at these technical users who want to use Python, but more as an application or IDE. To implement the desktop in Python and to allow both it and a user-defined Python environment to co-exist and be separately updated, we used virtual environments. As a consequence Canopy can feel a bit foreign to EPD users. With 1.1 we have added a new command line interface (CLI) that will hopefully make EPD users feel more at home in Canopy while retaining many of the Canopy advantages such as in-place update and virtual environment support.

Now, EPD users who just want to use Canopy as a plain Python environment with their own tools or IDE can easily create one or more Python environments. For example, from the command line on Windows:

        Canopy_cli.exe setup C:\Python27

or on Linux:

        canopy_cli setup ~/canopy

The target directory can be any you choose. If you want to make this Python environment the default on your system, you can specify the –default switch, and Canopy will add the appropriate bin directory (Scripts directory on Windows) to your PATH environment variable. On Mac OS and Linux systems, Canopy does this by appending a line to your ~/.bash_profile file which activates the correct virtual environment. On Windows, this Python environment is also added to the system registry so third-party tools can correctly find it.

Since we use virtual environments, the installation layout for Canopy is different. With Canopy we install what is referred to as “Canopy Core”: the core Python environment and a minimum set of packages needed to bootstrap Canopy itself. With it we can lock down the Canopy environment, facilitate the automatic update mechanism, and provide reliable startup and fail-safe recovery. For the user, there is a different environment. This means when a Python update comes out, it is no longer necessary to install a whole new environment plus all of your packages and get everything working again. Instead, simply update Canopy and go back to working — all of your packages are still installed but Python has been upgraded.

To complete an install, Canopy creates two virtual environments named ‘System’ and ‘User’. System is where the Canopy GUI runs; no user code runs in this environment. Updates to this virtual environment are done via the Canopy update mechanisms. The User environment is where the kernel and all user code runs. This virtual environment is managed by Package Manager from the desktop or by enpkg from the command line; any packages can be updated and installed without fear of disrupting the GUI. Similarly, updates to the Canopy GUI will not affect packages installed in the User environment and break your code.

So why stick with virtual environments for an “EPD-like” install? One of the big challenges with the old, “flat” EPD installation method was updating an install, or trying out different package configurations. With virtual environments, you can create a new environment which inherits packages from another virtual environment, and try out a few package changes. When you are satisfied, it’s straightforward to throw away the experimentation area and make the changes to the original, stable virtual environment.

For more details, check out Creating an EPD-like Python environment in our online docs. And you can download Canopy v1.1 now.