Avoiding “Excel Hell!” using a Python-based Toolchain

Update (Feb 6, 2014):  Enthought is now the exclusive distributor of PyXLL, a solution that helps users avoid “Excel Hell” by making it easy to develop add-ins for Excel in Python. Learn more here.

Didrik Pinte gave an informative, provocatively-titled presentation at the second, in-person New York Quantitative Python User’s Group (NY QPUG) meeting earlier this month.

There are a lot of examples in the press of Excel workflow mess-ups and spreadsheet errors contributing to some eye-popping mishaps in the finance world (e.g. JP Morgan’s spreadsheet issues may have led to the 2012 massive loss at “the London Whale”). Most of these can be traced to similar fundamental issues:

  • Data referencing/traceability

  • Numerical errors

  • Error-prone manual operations (cut & paste, …)

  • Tracing IO’s in libraries/API’s

  • Missing version control

  • Toolchain that doesn’t meet the needs of researchers, analysts, IT, etc.

Python, the coding language and its tool ecosystem, can provide a nice solution to these challenges, and many organizations are already turning to Python-based workflows in response. And with integration tools like PyXLL (to execute Python functions within Excel) and others, organizations can adopt Python-based workflows incrementally and start improving their current “Excel Hell” situation quickly.

For the details check out the video of Didrik’s NY QPUG presentation.  He demonstrates a an example solution using PyXLL and Enthought Canopy.

[vimeo 67327735 http://vimeo.com/67327735]

And grab the PDF of his slides here.

QPUG_20130514_ExcelHell_Slides

It would be great to hear your stories about “Excel Hell”. Let us know below.

–Brett Murphy

4 thoughts on “Avoiding “Excel Hell!” using a Python-based Toolchain

  1. avatarVladan

    You just made me laugh 🙂

    I didn’t watch whole video (actually just the first 5min), but as someone who loves Python and uses Excel too, let me express that you are way beyond, as both are different universes.

    And why? To sell shareware PyXLL instead using VBA or VSTO? Or because a user didn’t know how to drive the application, as UI made him/her think that application reads minds? Such people will make same mistake with whatever tool they are using.

    If you can afford Excel, and load a database in second and do Pivot report in next, it is silly not to use it and to start writing scripts. Use Python where applicable, but don’t underestimate something way bigger then you are. It’s no good in general and no good toward Python, as it is excellent for what is designed but it can’t replace everything, even if you append the word Hell on it.

    Reply
  2. avatarDidrik

    Hi Vladan,

    Thanks for your comment. It think you exactly got the point of the talk 😉 The talk is not about Excel being a hell to use and Python being the rosy alternative. I was focusing on business case we see so often in the industry where people create a hell around their usage of Excel.

    Excel is great and no company in the world as yet created a usable alternative to it. Users just need to be careful about the way they use it. Loading data from a database and doing some pivot table, charting, etc is definitely a perfect example of the good stuff Excel provides.

    I mainly focused on the massive issues caused by burying complex dependency graphs within Excel and erros that can come out of such mess. The talk shows that one potential solution could be based on Python (or any decent programing language). It could be used for the modeling, processing and Excel used as a frontend for the easy interaction with data, easy charting, etc. The key point is to ensure the critical dependency graph is controlled and tested.

    Last but not least, with Python or/and Excel, you can always create a Hell. It is all about people choosing the right tool and taking good design decision in their implementation (but this is about code design, something that Excel users typically don’t know well).

    Reply
  3. avatarKameshwar Rao

    Python will not gain by trashing Excel or google spreadsheet which are basically office productivity tools .Excel does not protect against “carelessness” niether does Python or R.
    The If mega corporate entities were to follow the basic rule of having the computations validated/verified by a second “method” , the white whales would go away .This is Quality Assurance /Testing in essence.

    Knowing Excel’s limitations and knowing that JP Morgan did have extensive resources including python , R etc to implement their trading algorithms in parallel, they were a “sitting duck” for “excel hell”.
    So the Ph. Ds in Maths , Computer Science and Physics blame Excel for their carelssness in “copying /pasting” !!!

    Reply
  4. avatarjess nault

    Actually before moving to Python, we had a lot of excel errors. Or mostly user errors. People get interrupted and when they go back to work, they use a vlookup instead of a sumif. that kind of thing. Constantly interrupted. But when you’ve got python in front of you, they can’t get nosey and play with the data. They have to wait until you’ve processed it, then they can go back to their own desk and dink with it. And when you get interrupted, you really have to concentrate on what you’re coding.

    I turned a weeks worth of excel work into about an hour of running functions. I can load databases from excel. Or download data, manipulate it, group it, and spit it back out into excel on appropriate tabs. With formulas. If I need a pivot I can create one, or I can use an excel template and refresh the data (from python). Excel is great- and when you have been given basically a database of data to work on (500,000 rows in just one workbook, and working with more ) and need to do formulas- excel is hell. A weeks worth of work in about an hour. Multiplied by 52 weeks- that’s a lot of frustration saved. As well as time. I’ll pick python over excel. I have better things to do.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 104,629 bad guys.