Author Archives: Tim Diller

avatar

About Tim Diller

Tim is Director of the IIoT Solutions Group at Enthought. Tim holds a Ph.D. in mechanical engineering, specializing in thermal and fluid sciences. He has worked in the automotive industry with emissions measurement, modeling, and control in addition to vehicle dynamics modeling and simulation. He also held a post-doctoral research position at the University of Texas, where he developed a numerical model and simulation of thermal transport processes in the laser sintering rapid-prototyping process at the LFF.

Enthought at National Instruments’ NIWeek 2017: An Inside Look

This week I had the distinct privilege of representing Enthought at National Instruments‘ 23rd annual user conference, NIWeek 2017. National Instruments is a leader in test, measurement, and control solutions, and we share many common customers among our global scientific and engineering user base.

NIWeek kicked off on Monday with Alliance Day, where my colleague Andrew Collette and I went on stage to receive the LabVIEW Tools Network 2017 Product of the Year Award for Enthought’s Python Integration Toolkit, which provides a bridge between Python and LabVIEW, allowing you to create VI’s (virtual instruments) that make Python function and object method calls. Since its release last year, the Python Integration Toolkit has opened up access to a broad range of new capabilities for LabVIEW users,  by combining the best of Python with the best of LabVIEW. It was also inspiring to hear about the advances being made by other National Instruments partners. Congratulations to the award winners in other categories (Wineman Technology, Bloomy, and Moore Good Ideas)!

On Wednesday, Andrew gave a presentation titled “Building and Deploying Python-Powered LabVIEW Applications” to a standing-room only crowd.  He gave some background on the relative strengths of Python and LabVIEW (some of which is covered in our March 2017 webinar “Using Python and LabVIEW to Rapidly Solve Engineering Problems“) and then showcased some of the capabilities provided by the toolkit, such as plotting data acquisition results live to a web server using plotly, which is always a crowd-pleaser (you can learn more about that in the blog post “Using Plotly from LabVIEW via Python”).  Other demos included making use of the Python scikit-learn library for machine learning, (you can see Enthought’s CEO Eric Jones run that demo here, during the 2016 NIWeek keynotes.)

For a mechanical engineer like me, attending NIWeek is a bit like giving a kid a holiday in a candy shop.  There was much to admire on the expo floor, with all kinds of mechatronic gizmos and gadgets.  I was most interested by the lightning-fast video and image processing possible with NI’s FPGA systems, like the part sorting system shown below.  Really makes me want to play around with nifpga.

Another thing really gaining traction is the implementation of machine learning for a number of applications. I attended one talk titled “Deep Learning With LabVIEW and Acceleration on FPGAs” that demonstrated image classification using a neural network and talked about strategies to reduce the code size to get it to fit on an FPGA.

Finally, of course, I was really excited by all of the activity in the Industrial Internet of Things (IIoT), which is an area of core focus for Enthought.  We have been in the big data analytics game for a long time, and writing software for hard science is in our company DNA. But this year especially, starting with the AIChE 2017 Spring Meeting and now at NIWeek 2017, it has been really energizing to meet with industry leaders and see some of the amazing things that are being implemented in the IIoT.  National Instruments has been a leader in the test and measurement sector for a long time, and they have been pioneers in IIoT.  Now it is easy to download and install an interface to Amazon S3 for LabVIEW, and just like that, your sensor is now a connected sensor … and your data is ready for analysis in Enthought’s Canopy Data platform.

After immersion in NIWeek, I guess you could say, I’ve been “LabVIEWed”:

Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting

by: Tim Diller, Product Manager and Scientific Software Developer, Enthought

Last week I attended the AIChE (American Institute of Chemical Engineers) Spring Meeting in San Antonio, Texas. It was a great time of year to visit this cultural gem deep in the heart of Texas (and just down the road from our Austin offices), with plenty of good food, sights and sounds to take in on top of the conference and its sessions.

The AIChE Spring Meeting focuses on applications of chemical engineering in industry, and Enthought was invited to present a poster and deliver a “vendor perspective” talk on the Canopy Platform for Process Monitoring and Optimization as part of the “Big Data Analytics” track. This was my first time at AIChE, so some of the names were new, but in a lot of ways it felt very similar to many other engineering conferences I have participated in over the years (for instance, ASME (American Society of Mechanical Engineers), SAE (Society of Automotive Engineers), etc.).

This event underscored that regardless of industry, engineers are bringing the same kinds of practical ingenuity to bear on similar kinds of problems, and with the cost of data acquisition and storage plummeting in the last decade, many engineers are now sitting on more data than they know how to effectively handle.

What exactly is “big data”? Does it really matter for solving hard engineering problems?

One theme that came up time and again in the “Big Data Analytics” sessions Enthought participated in was what exactly “big data” is. In many circles, a good working definition of what makes data “big” is that it exceeds the size of the physical RAM on the machine doing the computation, so that something other than simply loading the data into memory has to be done to make meaningful computations, and thus a working definition of some tens of GB delimits “big” data from “small”.

For others, and many at the conference indeed, a more mundane definition of “big” means that the data set doesn’t fit within the row or column limits of a Microsoft Excel Worksheet.

But the question of whether your data is “big” is really a moot one as far as we at Enthought are concerned; really, being “big” just adds complexity to an already hard problem, and the kind of complexity is an implementation detail dependent on the details of the problem at hand.

And that relates to the central message of my talk, which was that an analytics platform (in this case I was talking about our Canopy Platform) should abstract away the tedious complexities, and help an expert get to the heart of the hard problem at hand.

At AIChE, the “hard problems” at hand seemed invariably to involve one or both of two things: (1) increasing safety/reliability, and (2) increasing plant output.

To solve these problems, two general kinds of activity were on display: different pattern recognition algorithms and tools, and modeling, typically through some kind of regression-based approach. Both of these things are straightforward in the Canopy Platform.

The Canopy Platform is a collection of related technologies that work together in an integrated way to support the scientist/analyst/engineer.

What is the Canopy Platform?

If you’re using Python for science or engineering, you have probably used or heard of Canopy, Enthought’s Python-based data analytics application offering an integrated code editor and interactive command prompt, package manager, documentation browser, debugger, variable browser, data import tool, and lots of hidden features like support for many kinds of proxy systems that work behind the scenes to make a seamless work environment in enterprise settings.

However, this is just one part of the Canopy Platform. Over the years, Enthought has been building other components and related technologies that work together in an integrated way to support the engineer/analyst/scientist solving hard problems.

At the center of the this is the Enthought Python Distribution, with runtime interpreters for Python 2.7 and 3.x and over 450 pre-built Python packages for scientific computing, including tools for machine learning and the kind of regression modeling that was shown in some of the other presentations in the Big Data sessions. Other components of the Canopy Platform include interface modules for Excel (PyXLL) and for National Instruments’ LabView software (Python Integration Toolkit for LabVIEW), among others.

A key component of our Canopy Platform is our Deployment Server, which simplifies the tricky tasks of deploying proprietary applications and packages or creating customized, reproducible Python environments inside an organization, especially behind a firewall or an air-gapped network.

Finally, (and this is what we were really showing off at the AIChE Big Data Analytics session) there are the Data Catalog and the Cloud Compute layers within the Canopy Platform.

The Data Catalog provides an indexed interface to potentially heterogeneous data sources, making them available for search and query based on various kinds of metadata.

The Data Catalog provides an indexed interface to potentially heterogeneous data sources. These can range from a simple network directory with a collection of HDF5 files to a server hosting files with the Byzantine complexity of the IRIG 106 Ch. 10 Digital Recorder Standard used by US military test flight ranges. The nice thing about the Data Catalog is that it lets you query and select data based on computed metadata, for example “factory A, on Tuesdays when Ethylene output was below 10kg/hr”, or in a test flight data example “test flights involving a T-38 that exceeded 10,000 ft but stayed subsonic.”

With the Cloud Compute layer, an expert user can write code and test it locally on some subset of data from the Data Catalog. Then, when it is working to satisfaction, he or she can publish the code as a computational kernel to run on some other, larger subset of the data in the Data Catalog, using remote compute resources, which might be an HPC cluster or an Apache Spark server. That kernel is then available to other users in the organization, who do not have to understand the algorithm to run it on other data queries.

In the demo below, I showed hooking up the Data Catalog to some historical factory data stored on a remote machine.

Data Catalog View The Data Catalog allows selection of subsets of the data set for inspection and ad hoc analysis. Here, three channels are compared using a time window set on the time series data shown on the top plot.

Then using a locally tested and developed compute kernel, I did a principal component analysis on the frequencies of the channel data for a subset of the data in the Data Catalog. Then I published the kernel and ran it on the entire data set using the remote compute resource.

After the compute kernel has been published and run on the entire data set, then the result explorer tool enables further interactions.

Ultimately, the Canopy Platform is for building and distributing applications that solve hard problems.  Some of the products we have built on the platform are available today (for instance, Canopy Geoscience and Virtual Core), others are in prototype stage or have been developed for other companies with proprietary components and are not publicly available.

It was exciting to participate in the Big Data Analytics track this year, to see what others are doing in this area, and to be a part of many interesting and fruitful discussions. Thanks to Ivan Castillo and Chris Reed at Dow for arranging our participation.

New Year, New Enthought Products!

We’ve had a number of major product development efforts underway over the last year, and we’re pleased to share a lot of new announcements for 2017:

A New Chapter for the Enthought Python Distribution (EPD):
Python 3 and Intel MKL 2017

In 2004, Enthought released the first “Python: Enthought Edition,” a Python package distribution tailored for a scientific and analytic audience. In 2008 this became the Enthought Python Distribution (EPD), a self-contained installer with the "enpkg" command-line tool to update and manage packages. Since then, over a million users have benefited from Enthought’s tested, pre-compiled set of Python packages, allowing them to focus on their science by eliminating the hassle of setting up tools.

Enthought Python Distribution logo

Fast forward to 2017, and we now offer over 450 Python packages and a new era for the Enthought Python Distributionaccess to all of the packages in the new EPD is completely free to all users and includes packages and runtimes for both Python 2 and Python 3 with some exciting new additions. Our ever-growing list of packages includes, for example, the 2017 release of the MKL (Math Kernel Library), the fruit of an ongoing collaboration with Intel.

The New Enthought Deployment Server:
Secure, Onsite Access to EPD and Private Packages

enthought-deployment-server-centralized-management-illustration-v2

For those who are interested in having a private copy of the Enthought Python Distribution behind their firewall, as well as the ability to upload and manage internal private packages alongside it, we now offer the Enthought Deployment Server, an onsite version of the server we have been using for years to serve millions of Python packages to our users.

enthought-deployment-server-logoWith a local Enthought Deployment Server, your private copy will periodically synchronize with our master repository, on a schedule of your choosing, to keep you up to date with the latest releases. You can also set up private package repositories and control access to them using your existing LDAP or Active Directory service in a way that suits your organization.  We can even give you access to the packages (and their historical versions) inside of air-gapped networks! See our webinar introducing the Enthought Deployment Server.

Command Line Access to the New EPD and Flat Environments
via the Enthought Deployment Manager (EDM)

In 2013, we expanded the original EPD to introduce Enthought Canopy, coupling an integrated analysis environment with additional features such as a graphical package manager, documentation browser, and other user-friendly tools together with the Enthought Python Distribution to provide even more features to help “make science and analysis easy.”

With its MATLAB-like experience, Canopy has enabled countless engineers, scientists and analysts to perform sophisticated analysis, build models, and create cutting-edge data science algorithms. The all-in-one analysis platform for Python has also been widely adopted in organizations who want to provide a single, unified platform that can be used by everyone from data analysts to software engineers.

But we heard from a number of you that you also still wanted the capability to have flat, standalone environments not coupled to any editor or graphical tool. And we listened!  

enthought-deployment-manager-cli-screenshot2So last year, we finished building out our next-generation command-line tool that makes producing flat, standalone Python environments super easy.  We call it the Enthought Deployment Manager (or EDM for short), because it’s a tool to quickly deploy one or multiple Python environments with the full control over package versions and runtime environments.

EDM is also a valuable tool for use cases such as command line deployment on local machines or servers, web application deployment on AWS using Ansible and Amazon CloudFormation, rapid environment setup on continuous integration systems such as Travis-CI, Appveyor, or Jenkins/TeamCity, and more.

Finally, a new state-of-the-art package dependency solver included in the tool guarantees the consistency of your environment, and if your workflow requires switching between different environments, its sandboxed architecture makes it a snap to switch contexts.  All of this has also been designed with a focus on providing robust backward compatibility to our customers over time.  Find out more about EDM here.

Enthought Canopy 2.0:
Python 3 packages and New EDM Back End Infrastructure

Enthought Canopy LogoThe new Enthought Python Distribution (EPD) and Enthought Deployment Manager (EDM) will also provide additional benefits for Canopy.  Canopy 2.0 is just around the corner, which will be the first version to include Python 3 packages from EPD.

In addition, we have re-worked Canopy’s graphical package manager to use EDM as its back end, to take advantage of both the consistency and stability of the environments EDM provides, as well as its new package dependency solver.  By itself, this will provide a big boost in stability for users (ever found yourself wrapped up in a tangle of inconsistent package versions?).  Alongside the conversion of Canopy’s back end infrastructure to EDM, we have also included a substantial number of stability improvements and bug fixes.

Canopy’s Graphical Debugger adds external IPython kernel debugging support

On the integrated analysis environment side of Canopy, the graphical debugger and variable browser, first introduced in 2015, has gotten some nifty new features, including the ability to connect to and debug an external IPython kernel, in addition to a number of stability improvements.  (Weren’t aware you could connect to an external process?  Look for the context menu in the IPython console, use it to connect to the IPython kernel running, say, a Jupyter notebook, and debug away!)

Canopy Data Import Tool adds CSV exports and input file templates

Enthought Canopy Data Import ToolAlso, we’ve continued to add new features to the Canopy Data Import Tool since its initial release in May of 2016. The Data Import Tool allows users to quickly and easily import CSVs and other structured text files into Pandas DataFrames through a graphical interface, manipulate the data, and create reusable Python scripts to speed future data wrangling.

The latest version of the tool (v. 1.0.9, shipping with Canopy 2.0) has some nice new features like CSV exporting, input file templates, and more. See Enthought’s blog for some great examples of how the Data Import Tool speeds data loading, wrangling and analysis.

What to Look Forward to in 2017

So where are we headed in 2017?  We have put a lot of effort into building a strong foundation with our core suite of products, and now we’re focused on continuing to deliver new value (our enterprise users in particular have a number of new features to look forward to).  First up, for example, you can look for expanded capabilities around Python environments, making it easy to manage multiple environments, or even standardize and distribute them in your organization.  With the tremendous advancements in our core products that took place in 2016, there are a lot of follow-on features we can deliver. Stay tuned for updates!

Have a specific feature you’d like to see in one of Enthought’s products? E-mail our product team at canopy.support@enthought.com and tell us about it!