Category Archives: General

Exploring NumPy/SciPy with the “House Location” Problem

Author: Aaron Waters

I created a Notebook that describes how to examine, illustrate, and solve a geometric mathematical problem called “House Location” using Python mathematical and numeric libraries. The discussion uses symbolic computation, visualization, and numerical computations to solve the problem while exercising the NumPy, SymPy, Matplotlib, IPython and SciPy packages.

I hope that this discussion will be accessible to people with a minimal background in programming and a high-school level background in algebra and analytic geometry. There is a brief mention of complex numbers, but the use of complex numbers is not important here except as “values to be ignored”. I also hope that this discussion illustrates how to combine different mathematically oriented Python libraries and explains how to smooth out some of the rough edges between the library interfaces.

http://nbviewer.ipython.org/urls/raw.github.com/awatters/CanopyDemoArchive/master/misc/house_locations.ipynb

Advanced Cython Recorded Webinar: Typed Memoryviews

Author: Kurt SmithWebinar_screenshot

Typed memoryviews are a new Cython feature for accessing memory buffers, such as NumPy arrays, without any Python overhead. This makes them very useful for manipulating blocks of memory in Cython directly without calling into the Python-C API.  Typed memoryviews have a clean declaration syntax and have a NumPy-like look and feel, supporting slicing, striding and indexing.

I go into more detail and provide some specific examples on how to use typed memoryviews in this webinar: “Advanced Cython: Using the new Typed Memoryviews”.

If you would like to watch the recorded webinar, you can find a link below (the different formats will play directly in different browsers so check to see which one works for you, and you won’t have to download the whole recording ahead of time):

“venv” in Python 2.7 and how it Simplifies Life

Virtual environments, specifically ‘venv’ which we backported from Python 3.x, are a technology that enables the creation of multiple, lightweight, independent Python environments. Each virtual environment appears to be a self-contained Python installation, but loads the Python standard library and other common resources from a common base Python installation. Optionally, a virtual environment can also load packages from its base Python environment, whether that’s Canopy Core itself or another virtual environment.

What makes virtual environments so interesting? Well, they reduce disk space by not having to duplicate the full Python environment each time. But more than that, making Python environments far “lighter” enables several interesting capabilities.

First, the most common use of virtual environments is to allow separate projects to run in separate environments with different packages requirements. Each Python application runs in a separate virtual environment so package updates needed for one application don’t break the others. This model has long been used by web developers as well as a few scientific software developers.

The second case is specifically enabled by Canopy. Sharp-eyed readers will have noted in the first paragraph that we said that a virtual environment can have Canopy Core or another virtual environment as the base. But virtual environments can’t be layered, right? Now they can.

We have extended venv to support arbitrary numbers of layers, so we can do something like this:

'venv' in Canopy

‘venv’ in Canopy

‘Project1’ can be created with the following Canopy command:

canopy_cli  setup  ./Project1

Canopy constructs Project1 with all of the standard Canopy packages installed, and Project1 can now be customized to run the application. Once we’ve got Project1 working with a particular Python configuration, what if we want to see if the application works with the latest version of NumPy? We could update and potentially break the stable environment. Or, we can do this:

./Project1/bin/ venv  -s  ./Project1_play

Now ‘Project1_Play’ is a virtual environment which has by default all of Project1’s packages and package versions available. We can now update NumPy or other packages in Project1_play and test the application. If it doesn’t work, no big deal, we just delete it. We now have the ability to rapidly experiment with different (safe) Python environments without breaking our stable working area.

Canopy makes use of virtual environments to provide a protected Python environment for the Canopy GUI application to run in, and to provide one or more User Python environments which you can customize and run your own code in. Canopy Core is the base for each of these virtual environments, providing the core Python components and several common, large packages such as Qt and PySide. This structure means that the Canopy GUI can be updated without impacting your code, and any package updates you install won’t destabilize the Canopy GUI.

Canopy Core can be updated if you want, such as to move to a new version of Python, and each of the virtual environments will be updated automatically as well. This eliminates the need to install a new Python environment and then re-install any third-party packages into that new environment just to update Python.

For more information on how to set up virtual environments with Canopy, check the online docs, or get Canopy v1.1 and try it out.

Our next post will detail how to use Canopy and virtual environments to set up multi-user networks and cluster environments.

Canopy v1.1 – Linux, Command Line Interface and More

Final-version-canopy-logo (1)

With version 1.1, Enthought Canopy now:

1) addresses, much more completely, the command line use cases that EPD users and IT managers expect from their Python distributions,

2) makes Linux support generally available,

3) streamlines installation for users without internet access with full, single-click installers,

4) supports multiple virtual environments for advanced users via “venv” backported to Python 2.7, and

5) provides updates like numpy 1.7.1, matplotlib 1.3.0 and more.

It’s been just over 4 months since Canopy v1.0 shipped with the new desktop analysis environment and our updated Python distribution for scientific computing. Canopy’s analysis environment seems to be well-received by users looking for a simpler GUI environment, but the Canopy graphical installation process left something to be desired by our EPD users.

Along with the Canopy desktop for users that don’t want to work directly from a command line, Canopy version 1.1 now provides command-line utilities that streamline the installation of a complete Python scientific stack for current EPD users who want to work from the shell or command line. In addition, IT groups or tools specialists that need to manage a central install of Python for a workgroup or department now have the tools they need to install and maintain Canopy. Version 1.1’s command-line installation and setup (and the 1-click, full installers detailed below) are much better for supporting Canopy installations on clusters as well.

Canopy for Linux is now fully released. We have full, tested support for RedHat5, CentOS5, and Ubuntu 12.04. Linux distros and versions beyond those work as well (anecdotally and based on some in-house use), but those are our tested versions.

With Canopy v1.0 we implemented a 2-step installation process. The installer includes the Canopy desktop, the Python packages needed by Canopy itself, and other core scientific Python stack packages for a minimal install (the libraries in Canopy Express). For those with a subscription, the second step requires downloading any additional packages using the Package Manager. This 2-step process is problematic for users that don’t have easy internet access or need to install centrally for a group. To help, we now provide full installers with all the Python packages we support included. This provides a streamlined 1-step install process for those who need it or want it.

To ensure users can install any package updates they wish without messing up package dependencies for Canopy itself, we use virtual environments under the hood. With v1.1 we now provide command-line access to our backport of “venv”. The new CLI provides utilities to create, upgrade, activate and deactivate your own virtual environments. Now its much easier to try out new Python environments or set up multiple configurations for a workgroup.

Canopy v1.1 ships many updates to packages and many new ones: OpenCV, LLVM, Bottleneck, gevent, msgpack, py, pytest, six, NLTK, Numba, Mock, patsy and more. You can see the full details on the Canopy Package Index page.

We hope you find version 1.1 useful!

SciPy 2013 Cython Tutorial

Author: Kurt Smith

Thanks to everyone who attended the Cython tutorial at this year’s SciPy conference, and thanks to the conference organizers and tutorial chairs for ensuring everything ran smoothly. The enthusiasm of the students came through in their questions, and there were several good conversations after the tutorial throughout the week.

If you want the tutorial experience from the comfort of your couch, you can download the tutorial slides, exercises, and demos, and follow along with the videos. Please read the setup instructions on the tutorial webpage. The easiest way to satisfy the requirements for the tutorial is to download and install an existing scientific Python distribution, such as Enthought Canopy. You will need Cython version 0.19 or greater.

SciPy Talk

Tutorial Highlights

Cython is a language for adding static type information to Python with the objective of improving Python’s performance; it is a compiler (the cython command) for generating Python extension modules; and it helps wrap C and C++ libraries in a nice and Pythonic way with a minimum of overhead. These three aspects of Cython are tightly integrated with each other and shouldn’t be thought about in isolation: it is common to have Cython code that is intended to both speed up a pure-Python algorithm and that also calls out to C or C++ libraries.

Compared with some of the newer Python JIT compilers that are on the rise, Cython is relatively mature — not SWIG mature, but it certainly has been around long enough that it has grown features beyond its core functionality: annotated source files for compile-time performance profiling, runtime profiling that integrates with Python’s profilers, cross-language debugging capabilities, integration with IPython and the IPython notebook via the %%cython magic and other magic commands, the pyximport import hook support, Python 2 and Python 3 support, parallelization support via OpenMP, and others. I wanted this tutorial to touch on several of these extra capabilities that make Cython easier to use.

The tutorial was in the advanced track because I wanted to dive into newer and more advanced Cython features, especially typed memoryviews. Typed memoryviews are Cython’s interface to PEP-3118 buffers, the new buffer protocol for accessing and passing around (possibly strided) blobs of memory without copying. This is of considerable interest to scientific computing audiences for whom non-copying array operations are essential. NumPy arrays support this protocol, and are the primary object used with this protocol. Cython’s syntax for typed memoryviews is nice, and taking a slice of a typed memoryview yields another typed memoryview and, as you would expect, does not copy memory.

I was able to cover typed memoryviews towards the end of the tutorial, but didn’t have quite enough time to demonstrate their full power. Typed memoryviews are in every way superior to the existing numpy buffer support in Cython (the cdef np.ndarray[double, ndim=2] declarations) — they are faster, they are supported in function signatures for every kind of Cython function definition (def, cdef, and cpdef), they are easier to declare and use, and they do not have any external dependencies (i.e., you do not have to cimport anything to use them, and you do not have to add extra include flags when compiling).

Another advanced topic I touched on in the tutorial was wrapping templated C++ classes. Cython’s syntax for wrapping templated C++ is fairly easy to work with if you are wrapping just one template instantiation. Once you need to wrap several template instantiations, I recommend you use a code generation tool like cheetah or jinja2 to avoid manual code duplication. It is often helpful to provide a top-level wrapper class for a more Pythonic experience. Examples of this approach are in the tutorial material zip file. Cython’s fused types can alleviate the need for these workarounds, but can require some gymnastics of their own to use. The “real” solution to wrapping many instantiations of C++ templates is a templating system or macro system in Cython itself, which is hard to get right and is likely beyond the scope of the project.

I also wanted to demonstrate how to wrap modern Fortran: user derived types, assumed shape and assumed size arrays, module procedures, etc. You can accomplish this using the ISO_C_BINDING module that is part of the Fortran 2003 standard and supported by nearly every modern Fortran compiler. Alas, this had to be cut due to time constraints. I want to emphasize that it is possible to use Cython to provide very nice wrappers for modern Fortran, keeping Fortran relevant as a viable performance-oriented language that gains expressivity with Cython-generated Python wrappers.

Cython Webinar

I will be giving a Cython webinar to cover some of the topics that were skipped during the SciPy tutorial. I will likely cover typed memoryviews in more depth, and perhaps give more detail on getting Cython to work with modern Fortran and templated C++. If you have a subscription to Enthought Canopy, you will receive a notification for the webinar, so stay tuned. Or sign up for an Enthought account to get a notification.

Raspberry Pi Sensor and Actuator Control

Author: Jack Minardi

I gave a talk at SciPy 2013 titled open('dev/real_world') Raspberry Pi Sensor and Actuator Control. You can find the video on youtube, the slides on google drive and I will summarize the content here.

Typically as a programmer you will work with data on disk, and if you are lucky you will draw pictures on the screen. This is in contrast to physical computing which allows you as a programmer to work with data sensed from the real world and with data sent to control devices that move in the real world.

Mars Rover

physical computing at work. (source)

Goal

Use a Raspberry Pi to read in accelerometer value and to control a servo motor.

Definitions

  • Raspberry Pi
    • Small $35 Linux computer with 2 USB ports, HDMI out, Ethernet, and most importantly…
  • GPIO Pins
    • General Purpose Input/Output Pins
    • This is the component that truly enables “physical computing”. You as a programmer can set the voltage high or low on each pin, which is how you will talk to actuators. You can also read what the voltage is currently on each pin. This is how sensors will talk back to you. It is important to note that each pin represents a binary state, you can only output a 0 or a 1, nothing in between.

In this article I will go over four basic Python projects to demonstrate the hardware capabilities of the Raspberry Pi. Those projects are:

  • Blink an LED.
  • Read a pot (potentiometer).
  • Stream data.
  • Control a servo.

Blink an LED.

An LED is a Light Emitting Diode. A diode is a circuit element that allows current to flow in one direction but not the other. Light emitting means … it emits light. Your typical LED needs current in the range of 10-30 mA and will drop about 2-3 volts. If you connect an LED directly to your Pi’s GPIO it will source much more than 30 mA and will probably fry your LED. To prevent this we have to put a resistor. If you want to do math you can calculate the appropriate resistance using the following equation:

R = (Vs - Vd) / I

But if you don’t want to do math then pick a resistor between 500-1500 ohms. Once you’ve gathered up all your circuit elements (LED and resistor), build this circuit on a breadboard:

LED Circuit

thats not so bad, is it?

The code is also pretty simple. But first you will need to install RPi.GPIO. (It might come preinstalled on your OS.)

import time
from itertools import cycle
import RPi.GPIO as io
io.setmode(io.BCM)
io.setup(12, io.OUT)
o = cycle([1, 0])
while True:
    io.output(12, o.next())
    time.sleep(0.5)

The important lines basically are:

io.setup(12, io.OUT)
io.output(12, 1)

These lines of code setup pin 12 as an output, and then output a 1 (3.3 volts). Run the above code connected to the circuit and you should see your LED blinking on and off every half second.


Read a pot.

A pot is short for potentiometer, which is a variable resistor. This is just a fancy word for knob. Basically by turning the knob you affect the resistance, which affects the voltage across the pot. (V = IR, remember?). Changing voltage relative to some physical value is how many sensors work, and this class of sensor is known as an analog sensor. Remember when I said the GPIO pins can only represent a binary state? We will have to call in the aide of some more silicon to convert that analog voltage value into a binary stream of bits our Pi can handle.

That chunk of silicon is refered to as an Analog-to-Digital Converter (ADC). The one I like is called MCP3008, it has 8 10-bit channels, meaning we can read 8 sensors values with a resolution of 1024 each (2^10). This will map our input voltage of 0 – 3.3 volts to an integer between 0 and 1023.

LED Circuit

I’ve turned the Pi into ephemeral yellow labels to simplify the diagram

To talk to the chip we will need a python package called spidev. For more information about the package and how it works with the MCP3008 check out this great blog post

With spidev installed and the circuit built, run the following program to read live sensor values and print them to stdout.

import spidev
import time
spi = spidev.SpiDev()
spi.open(0,0)
def readadc(adcnum):
    if not 0 <= adcnum <= 7:
        return -1
    r = spi.xfer2([1, (8+adcnum)<<4, 0])
    adcout = ((r[1] & 3) << 8) + r[2]
    return adcout
while True:
    val = readadc(0)
    print val
    time.sleep(0.5)

The most important parts are these two lines:

r = spi.xfer2([1, (8+adcnum)<<4, 0])
adcout = ((r[1] & 3) << 8) + r[2]

They send the read command and extract the relevant returned bits. See the blog post I linked above for more information on what is going on here.


Stream data.

To stream data over the wire we will be using the ØMQ networking library and implementing the REQUEST/REPLY pattern. ØMQ makes it super simple to set up a client and server in Python. The following is a complete working example.

Server

import zmq
context = zmq.Context()
socket = context.socket(
    zmq.REP)
socket.bind('tcp://*:1980')
while True:
    message = socket.recv()
    print message
    socket.send("I'm here")

Client

import zmq
context = zmq.Context()
socket = context.socket(
    zmq.REQ)
a = 'tcp://192.168.1.6:1980'
socket.connect(a)
for request in range(10):
    socket.send('You home?')
    message = socket.recv()
    print message

Now we can use traits and enaml to make a pretty UI on the client side. Check out the acc_plot demo in the github repo to see an example of the Pi streaming data over the wire to be plotted by a client.


Control a servo

Servos are (often small) motors which you can drive to certain positions. For example, for a given servo you may be able to set the drive shaft from 0 to 18o degrees, or anywhere in between. As you can imagine, this could be useful for a lot of tasks, not least of which is robotics.

Shaft rotation is controlled by Pulse Width Modulation (PWM) in which you encode information in the duration of a high voltage pulse on the GPIO pins. Most hobby servos follow a standard pulse width meaning. A 0.5 ms pulse means go to your min position and a 2.5 ms pulse means go to your max position. Now repeat this pulse every 20 ms and you’re controlling a servo.

PWM Diagram

The pulse width is much more critical than the frequency

These kind of timings are not possible with Python. In fact, they aren’t really possible with a modern operating system. An interrupt could come in at any time in your control code, causing a longer than desired pulse and a jitter in your servo. To meet the timing requirements we have to enter the fun world of kernel modules. ServoBlaster is a kernel module that makes use of the DMA control blocks to bypass the CPU entirely. When loaded, the kernel module opens a device file at /dev/servoblaster that you can write position commands to.

I’ve written a small object oriented layer around this that makes servo control simpler. You can find my library here:

https://github.com/jminardi/RobotBrain

Simple connect the servo to 5v and ground on your Pi and then connect the control wire to pin 4.

Servo Diagram

The python code is quite simple:

import time
import numpy as np
from robot_brain.servo import Servo
servo = Servo(0, min=60, max=200)
for val in np.arange(0, 1, 0.05):
    servo.set(val)
    time.sleep(0.1)

All you have to do is instantiate a servo and call its set() method with a floating point value between 0 and 1. Check out the servo_slider demo on github to see servo control implemented over the network.

“Why We Built Enthought Canopy, An Inside Look” Recorded Webinar

We posted a recording of a 30 minute webinar that we did on the 20th that covers what Canopy is and why we developed it. There’s a few minutes of Brett Murphy(Product Manager at Enthought) discussing the “why” with some slides, and then Jason McCampbell (Development Manager for Canopy) gets into the interesting part with a 15+ minute demo of some of the key capabilities and workflows in Canopy. If you would like to watch the recorded webinar, you can find it here (the different formats will play directly in different browsers so check them and you won’t have to download the whole recording first):

Summed up in one line: Canopy provides the minimal set of tools for non-programmers to access, analyze and visualize data in an open-source Python environment.

The challenge in the past for scientists, engineers and analysts who wanted to use Python had been pulling together a working, integrated Python environment for scientific computing. Finding compatible versions of the dozens of Python packages, compiling them and integrating it all was very time consuming. That’s why we released the Enthought Python Distribution (EPD) many years back. It provided a single install of all the major packages you needed to do scientific and analytic computing with Python.

But the primary interface for a user of EPD was the command line. For a scientist or analyst used to an environment like MATLAB or one of the R IDEs, the command line is a little unapproachable and makes Python challenging to adopt. This is why we developed Canopy.

Enthought Canopy is both a Python distribution (like EPD) and an analysis environment. The analysis environment includes an integrated editor and IPython prompt to faciliate script development & testing and data analysis & plotting. The graphical package manager becomes the main interface to the Python ecosystem with its package search, install and update capabilities. And the documentation browser makes online documentation for Canopy, Python and the popular Python packages available on the desktop.

Check out the Canopy demo in the recorded webinar (link above). We hope it’s helpful.

Avoiding “Excel Hell!” using a Python-based Toolchain

Update (Feb 6, 2014):  Enthought is now the exclusive distributor of PyXLL, a solution that helps users avoid “Excel Hell” by making it easy to develop add-ins for Excel in Python. Learn more here.

Didrik Pinte gave an informative, provocatively-titled presentation at the second, in-person New York Quantitative Python User’s Group (NY QPUG) meeting earlier this month.

There are a lot of examples in the press of Excel workflow mess-ups and spreadsheet errors contributing to some eye-popping mishaps in the finance world (e.g. JP Morgan’s spreadsheet issues may have led to the 2012 massive loss at “the London Whale”). Most of these can be traced to similar fundamental issues:

  • Data referencing/traceability

  • Numerical errors

  • Error-prone manual operations (cut & paste, …)

  • Tracing IO’s in libraries/API’s

  • Missing version control

  • Toolchain that doesn’t meet the needs of researchers, analysts, IT, etc.

Python, the coding language and its tool ecosystem, can provide a nice solution to these challenges, and many organizations are already turning to Python-based workflows in response. And with integration tools like PyXLL (to execute Python functions within Excel) and others, organizations can adopt Python-based workflows incrementally and start improving their current “Excel Hell” situation quickly.

For the details check out the video of Didrik’s NY QPUG presentation.  He demonstrates a an example solution using PyXLL and Enthought Canopy.

[vimeo 67327735 http://vimeo.com/67327735]

And grab the PDF of his slides here.

QPUG_20130514_ExcelHell_Slides

It would be great to hear your stories about “Excel Hell”. Let us know below.

–Brett Murphy