Category Archives: Python

Webinar: Fast Forward Through the “Dirty Work” of Data Analysis: New Python Data Import and Manipulation Tool Makes Short Work of Data Munging Drudgery

Python Import & Manipulation Tool Intro Webinar

No matter whether you are a data scientist, quantitative analyst, or an engineer, whether you are evaluating consumer purchase behavior, stock portfolios, or design simulation results, your data analysis workflow probably looks a lot like this:

Acquire > Wrangle > Analyze and Model > Share and Refine > Publish

The problem is that often 50 to 80 percent of time is spent wading through the tedium of the first two stepsacquiring and wrangling data – before even getting to the real work of analysis and insight. (See The New York Times, For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights)


Enthought Canopy Data Import Tool

Try the Data Import Tool with your own data. Download here.

In this webinar we’ll demonstrate how the new Canopy Data Import Tool can significantly reduce the time you spend on data analysis “dirty work,” by helping you:

  • Load various data file types and URLs containing embedded tables into Pandas DataFrames
  • Perform common data munging tasks that improve raw data
  • Handle complicated and/or messy data
  • Extend the work done with the tool to other data files


Webinar sample data sets:

Download a zip file of the example data sets

  1. Example 1 data set: bob-ross-elements
  2. Example 2 data set: pigeon-racing-results
  3. Example 3 data set: Oklahoma oil and gas well data:


Simply download Canopy and click on the “Data Import Tool” icon on the Welcome Screen.

Canopy Data Import Tool - Free Trial

Just Released: PyXLL v 3.0 (Python in Excel). New Real Time Data Stream Capabilities, Excel Ribbon Integration, and More.

Download a free 30 day trial of PyXLL and try it with your own data.

Since PyXLL was first released back in 2010 it has grown hugely in popularity and is used by businesses in many different sectors.

The original motivation for PyXLL was to be able to use all the best bits of Excel combined with a modern programming language for scientific computing, in a way that fits naturally and works seamlessly.

Since the beginning, PyXLL development focused on the things that really matter for creating useful real-world spreadsheets; worksheet functions and macro functions. Without these all you can do is just drive Excel by poking numbers in and reading numbers out. At the time the first version of PyXLL was released, that was already possibly using COM, and so providing yet another API to do the same was seen as little value add. On the other hand, being able to write functions and macros in Python opens up possibilities that previously were only available in VBA or writing complicated Excel Addins in C++ or C#.

With the release of PyXLL 3, integrating your Python code into Excel has become more enjoyable than ever. Many things have been simplified to get you up and running faster, and there are some major new features to explore.

  • If you are new to PyXLL have a look at the Getting Started section of the documentation.
  • All the features of PyXLL, including these new ones, can be found in the Documentation


1. Ribbon Customization

Screen Shot 2016-02-29 at 15.57.12

Ever wanted to write an add-in that uses the Excel ribbon interface? Previously the only way to do this was to write a COM add-in, which requires a lot of knowledge, skill and perseverance! Now you can do it with PyXLL by defining your ribbon as an XML document and adding it to your PyXLL config. All the callbacks between Excel and your Python code are handled for you.

See the Customizing the Ribbon for more detailed information or try the example included in the download.

2. RTD (Real Time Data) Functions


PyXLL can stream live data into your spreadsheet without you having to write any extra services or register any COM controls. Any Python function exposed to Excel through PyXLL can return a new RTD type that acts as a ticking data source; Excel updates whenever the returned RTD publishes new data.

See Real Time Data for more detailed information or try the example included in the download.

3. Function Signatures and Type Annotation

xl_func and xl_macro need to know the argument and return types to be
able to tell Excel how they should be called. In previous versions that was always done by passing a ‘signature’ string to these decorators.

Now in PyXLL 3 the signature is entirely optional. If a signature is not supplied PyXLL will inspect the function and determine the signature for you.

If you use Python type annotations when declaring the function, PyXLL will use those when determining the function signature. Otherwise all arguments and the return type will be assumed to be `var`.

4. Default Keyword Arguments

Python functions with default keyword arguments now preserve their default value when called from Excel with missing arguments. This means that a function like the one below
when called from Excel with b or c missing will be invoked with the correct default values for b and c.

 def func_with_kwargs(a, b=1, c=2):
 return a + b + c

 5. Deep Reloading

If you’ve used PyXLL for a while you will have noticed that when you reload PyXLL only the modules listed in your pyxll.cfg file get reloaded. If you are working on a project that has multiple modules and not all of them are added to the config those won’t get reloaded, even if modules that are listed in the config file import them.

PyXLL can now track all the imports made by each module listed in the config file, and when you reload PyXLL all of those modules will be reloaded in the right order.

This feature is enabled in the config file by setting

deep_reload = 1

6. Error Caching

Sometimes it’s not convenient to have to pick through the log file to determine why a particular cell is failing to calculate.

The new function get_last_error takes an XLCell or a COM Range and returns the last exception (and traceback) to have occurred in that cell.

This can be used in menu functions or other worksheet functions to give end users better feedback about any errors in the worksheet.

7. Python Functions for Reload and Rebind

PyXLL can now be reloaded or it can rebind its Excel functions using the new Python functions reload and rebind.

8. Better win32com and comtypes Support

PyXLL has always had some integration with the pythoncom module, but it required some user code to make it really useful. It didn’t have any direct integration with the higher level win32com package or the
comtypes package.

The new function xl_app returns the current Excel Application instance either as a pythoncom PyIDispatch instance, a win32com.client.Dispatch instance or a wrapped comtypes POINTER(IUnknown) instance.

You may specify which COM library you want to use with PyXLL in the pyxll.cfg file

com_package = <win32com, comtypes or pythoncom>

Download a free 30 day trial of PyXLL and see how PyXLL can help you use the power of Python to make Excel an even more powerful data analysis tool.

Plotting in Excel with PyXLL and Matplotlib

Author: Tony Roberts, creator of PyXLL, a Python library that makes it possible to write add-ins for Microsoft Excel in Python. Download a FREE 30 day trial of PyXLL here.

Plotting in Excel with PyXLL and MatplotlibPython has a broad range of tools for data analysis and visualization. While Excel is able to produce various types of plots, sometimes it’s either not quite good enough or it’s just preferable to use matplotlib.

Users already familiar with matplotlib will be aware that when showing a plot as part of a Python script the script stops while a plot is shown and continues once the user has closed it. When doing the same in an IPython console when a plot is shown control returns to the IPython prompt immediately, which is useful for interactive development.

Something that has been asked a couple of times is how to use matplotlib within Excel using PyXLL. As matplotlib is just a Python package like any other it can be imported and used in the same way as from any Python script. The difficulty is that when showing a plot the call to matplotlib blocks and so control isn’t returned to Excel until the user closes the window.

This blog shows how to plot data from Excel using matplotlib and PyXLL so that Excel can continue to be used while a plot window is active, and so that same window can be updated whenever the data in Excel is updated. Continue reading

Webinar: Work Better, Smarter, and Faster in Python with Enthought Training on Demand

Join Us For a Webinar

Enthought Training on Demand Webinar

We’ll demonstrate how Enthought Training on Demand can help both new Python users and experienced Python developers be better, smarter, and faster at the scientific and analytic computing tasks that directly impact their daily productivity and drive results.

View a recording of the Work Better, Smarter, and Faster in Python with Enthought Training on Demand webinar here.

What You’ll Learn

Continue reading

The Latest and Greatest Pandas Features (since v 0.11)

On May 28, 2014 Phillip Cloud, core contributor for the Pandas data analytics Python library, spoke at a joint meetup of the New York Quantitative Python User’s Group (NY QPUG) and the NY Finance PUG. Enthought hosted and about 60 people joined us to listen to Phillip present some of the less-well-known, but really useful features that have come out since Pandas version 0.11 and some that are coming soon. We all learned more about how to take full advantage of the Pandas Python library, and got a better sense of how excited Phillip was to discover Pandas during his graduate work.

Pandas to MATLAB

After a fairly comprehensive overview of Pandas, Phillip got into the new features. In version 0.11 he covered: Continue reading

PyXLL: Deploy Python to Excel Easily

PyXLL Solution Home | Buy PyXLL | Press Release

Today Enthought announced that it is now the worldwide distributor for PyXLL, and we’re excited to offer this key product for deploying Python models, algorithms and code to Excel. Technical teams can use the full power of Enthought Canopy, or another Python distro, and end-users can access the results in their familiar Excel environment. And it’s straightforward to set up and use.

Installing PyXLL from Enthought Canopy

PyXLL is available as a package subscription (with significant discounts for multiple users). Once you’ve purchased a subscription you can easily install it via Canopy’s Package Manager as shown in the screenshots below (note that at this time PyXLL is only available for Windows users). The rest of the configuration instructions are in the Quick Start portion of the documentation. PyXLL itself is a plug-in to Excel. When you start Excel, PyXLL loads into Excel and reads in Python modules that you have created for PyXLL. This makes PyXLL especially useful for organizations that want to manage their code centrally and deploy to multiple Excel users.

Enthought Canopy Package Manager   Install PyXLL from Enthought Canopy's Package Manager

Creating Excel Functions with PyXLL

To create a PyXLL Python Excel function, you use the @xl_func decorator to tell PyXLL the following function should be registered with Excel, what its argument types are, and optionally what its return type is. PyXLL also reads the function’s docstring and provides that in the Excel function description. As an example, I created a module and registered it with PyXLL via the Continue reading

Enthought Canopy v1.2 is Out: PTVS, Mavericks, and Qt

Author: Jason McCampbell

Canopy 1.2 is out! The release of Mac OS “Mavericks” as a free update broke a few features, primarily IPython, so we held the release to try to make sure everything worked. That ended up taking longer than we wanted, but 1.2 is finally out and adds support for Mavericks. There is one Mavericks-specific, Qt font issue that we are working on correcting which causes the wrong system font to be selected so UI’s look less-nice than they should.

Enthought Canopy integrated into PTVS

Enthought Canopy integrated into PTVS

The biggest new feature is integration with Microsoft’s Python Tools for Visual Studio (PTVS) package. PTVS is a full, professional-grade development IDE for Python based on Visual Studio and provides mixed Python/C debugging. The ability to do mixed-mode debugging is a huge boon to software developers creating C (or FORTRAN) extensions to Python. Canopy v1.2 includes a custom DLL that allows us to integrate more completely with PTVS and solves some issues with auto-completion of Python standard library calls.

Beyond PTVS, we have added the Qt development tools, such as qmake and the UIC compiler, to the Canopy installation tree. These tools are available on all platforms now and enable Qt developers to access them from Canopy directly rather than having to build the tools themselves.

Canopy 1.2 includes a large number of smaller additions and stability improvements. Highlights can be found in the release notes and we encourage all users to update existing installs. As always, thanks for using Canopy and please don’t hesitate to drop us a note letting us know what you like or what you would like to see improved. You can contact us via the Help -> Suggestions/Feedback menu item or by sending email to

And you can download Canopy from the Enthought Store page.

Python at Inflection Point in HPC

Authors: Kurt Smith, Robert Grant, and Lauren Johnson

We attended SuperComputing 2013, held November 17-22 in Denver, and saw huge interest around Python. There were several Python related events, including the “Python in HPC” tutorial (Monday), the Python BoF (Tuesday), and a “Python for HPC” workshop held in parallel with the tutorial on Monday. But we had some of our best conversations on the trade show floor.

Python Buzz on the Floor

The Enthought booth had a prominent “Python for HPC: High Productivity Computing” headline, and we looped videos of our parallelized 2D Julia set rendering GUI (video below).  The parallelization used Cython’s OpenMP functionality, came in at around 200 lines of code, and generated lots of discussions.  We also used a laptop to display an animated 3D Julia set rendered in Mayavi and to demo Canopy.

Many people came up to us after seeing our banner and video and asked “I use Python a little bit, but never in HPC – what can you tell me?”  We spoke with hundreds of people and had lots of good conversations.

It really seems like Python has reached an inflection point in HPC.

Python in HPC Tutorial, Monday

Kurt Smith presented a 1/4 day section on Cython, which was a shortened version of what he presented at SciPy 2013.  In addition, Andy Terrel presented “Introduction to Python”; Aron Ahmadia presented “Scaling Python with MPI”; and Travis Oliphant presented “Python and Big Data”. You can find all the material on the website.

The tutorial was generally well attended: about 100–130 people.  A strong majority of attendees were already programming in Python, with about half using Python in a performance-critical area and perhaps 10% running Python on supercomputers or clusters directly.

In the Cython section of the tutorial, Kurt went into more detail on how to use OpenMP with Cython, which was of interest to many based on questions during the presentation. For the exercises, students were given temporary accounts on  Stampede (TACC’s latest state-of-the-art supercomputer) to help ensure everyone was able to get their exercise environment working.

Andy’s section of the day went well, covering the basics of using Python.  Aron’s section was good for establishing that Python+MPI4Py can scale to ~65,000 nodes on massive supercomputers, and also for adressing people’s concerns regarding the import challenge.

Python in HPC workshop, Monday

There was a day-long workshop of presentations on “Python in HPC” which ran in parallel with the “Python for HPC” tutorial. Of particular interest were the talks on “Doubling the performance of NumPy” and “Bohrium: Unmodified NumPy code on CPU, GPU, and Cluster“.

Python for High Performance and Scientific Computing BoF, Tuesday

Andy Terrel, William Scullin, and Andreas Schreiber organized a Birds-of-a-Feather session on Python, which had about 150 attendees (many thanks to all three for organizing a great session!).  Kurt gave a lightning talk on Enthought’s SBIR work.  The other talks focused on applications of Python in HPC settings, as well as IPython notebooks on the basics of the Navier-Stokes equations.

It was great to see so much interest in Python for HPC!