Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution

Python 3 and multi-environment support, new state of the art package dependency solver, and over 450 packages now available free for all users

Enthought Canopy logoEnthought is pleased to announce the release of Canopy 2.1, a significant feature release that includes Python 3 and multi-environment support, a new state of the art package dependency solver, and access to over 450 pre-built and tested scientific and analytic Python packages completely free for all users. We highly recommend that all current Canopy users upgrade to this new release.

Ready to dive in? Download Canopy 2.1 here.


For those currently familiar with Canopy, in this blog we’ll review the major new features in this exciting milestone release, and for those of you looking for a tool to improve your workflow with Python, or perhaps new to Python from a language like MATLAB or R, we’ll take you through the key reasons that scientists, engineers, data scientists, and analysts use Canopy to enable their work in Python.

First, let’s talk about the latest and greatest in Canopy 2.1!

  1. Support for Python 3 user environments: Canopy can now be installed with a Python 3.5 user environment. Users can benefit from all the Canopy features already available for Python 2.7 (syntax checking, debugging, etc.) in the new Python 3 environments. Python 3.6 is also available (and will be the standard Python 3 in Canopy 2.2).
  2. All 450+ Python 2 and Python 3 packages are now completely free for all users: Technical support, full installers with all packages for offline or shared installation, and the premium analysis environment features (graphical debugger and variable browser and Data Import Tool) remain subscriber-exclusive benefits. See subscription options here to take advantage of those benefits.
  3. Built in, state of the art dependency solver (EDM or Enthought Deployment Manager): the new EDM back end (which replaces the previous enpkg) provides additional features for robust package compatibility. EDM integrates a specialized dependency solver which automatically ensures you have a consistent package set after installation, removal, or upgrade of any packages.
  4. Environment bundles, which allow users to easily share environments directly with co-workers, or across various deployment solutions (such as the Enthought Deployment Server, continuous integration processes like Travis-CI and Appveyor, cloud solutions like AWS or Google Compute Engine, or deployment tools like Ansible or Docker). EDM environment bundles not only allow the user to replicate the set of installed dependencies but also support persistence for constraint modifiers, the list of manually installed packages, and the runtime version and implementation.
  5. Multi-environment support: with the addition of Python 3 environments and the new EDM back end, Canopy now also supports managing multiple Python environments from the user interface. You can easily switch between Python 2.7 and 3.5, or between multiple 2.7 or 3.5 environments. This is ideal especially for those migrating legacy code to Python 3, as it allows you to test as you transfer and also provides access to historical snapshots or libraries that aren’t yet available in Python 3.


Why Canopy is the Python platform of choice for scientists and engineers

Since 2001, Enthought has focused on making the scientific Python stack accessible and easy to use for both enterprises and individuals. For example, Enthought released the first scientific Python distribution in 2004, added robust and corporate support for NumPy on 64-bit Windows in 2011, and released Canopy 1.0 in 2013.

Since then, with its MATLAB-like experience, Canopy has enabled countless engineers, scientists and analysts to perform sophisticated analysis, build models, and create cutting-edge data science algorithms. Canopy’s all-in-one package distribution and analysis environment for Python has also been widely adopted in organizations who want to provide a single, unified platform that can be used by everyone from data analysts to software engineers.

Here are five of the top reasons that people choose Canopy as their tool for enabling data analysis, data modelling, and data visualization with Python:

Continue reading

SciPy 2017 Conference to Showcase Leading Edge Developments in Scientific Computing with Python

Renowned scientists, engineers and researchers from around the world to gather July 10-16, 2017 in Austin, TX to share and collaborate to advance scientific computing tool


AUSTIN, TX – June 6, 2017 –
Enthought, as Institutional Sponsor, today announced the SciPy 2017 Conference will be held July 10-16, 2017 in Austin, Texas. At this 16th annual installment of the conference, scientists, engineers, data scientists and researchers will participate in tutorials, talks and developer sprints designed to foster the continued rapid growth of the scientific Python ecosystem. This year’s attendees hail from over 25 countries and represent academia, government, national research laboratories, and industries such as aerospace, biotechnology, finance, oil and gas and more.

“Since 2001, the SciPy Conference has been a highly anticipated annual event for the scientific and analytic computing community,” states Dr. Eric Jones, CEO at Enthought and SciPy Conference co-founder. “Over the last 16 years we’ve witnessed Python emerge as the de facto open source programming language for science, engineering and analytics with widespread adoption in research and industry. The powerful tools and libraries the SciPy community has developed are used by millions of people to advance scientific inquest and innovation every day.”

Special topical themes for this year’s conference are “Artificial Intelligence and Machine Learning Applications” and the “Scientific Python (SciPy) Tool Stack.” Keynote speakers include:

  • Kathryn Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign  
  • Sean Gulick, Research Professor at the Institute for Geophysics at the University of Texas at Austin
  • Gaël Varoquaux, faculty researcher in the Neurospin brain research institute at INRIA (French Institute for Research in Computer Science and Automation)

In addition to the special conference themes, there will also be over 100 talk and poster paper speakers/presenters covering eight mini-symposia tracks including: Astronomy; Biology, Biophysics, and Biostatistics; Computational Science and Numerical Techniques; Data Science; Earth, Ocean, and Geo Sciences; Materials Science and Engineering; Neuroscience; and Open Data and Reproducibility.

New for 2017 is a sold-out “Teen Track,” a two-day curriculum designed to inspire the scientists of tomorrow.  From July 10-11, high school students will learn more about the Python language and how developers solve real world scientific problems using Python and its scientific libraries.

Conference and tutorial registration is open at https://scipy2017.scipy.org.

About the SciPy Conference

SciPy 2017, the sixteenth annual Scientific Computing with Python conference, will be held July 10-16, 2017 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers and collaborate on code development. For more information or to register, visit https://scipy2017.scipy.org.

About Enthought

Enthought is a global leader in scientific and analytic software, consulting and training solutions serving a customer base comprised of some of the most respected names in the oil and gas, manufacturing, financial services, aerospace, military, government, biotechnology, consumer products and technology industries. The company was founded in 2001 and is headquartered in Austin, Texas, with additional offices in Cambridge, United Kingdom and Pune, India. For more information visit www.enthought.com and connect with Enthought on Twitter, LinkedIn, Google+, Facebook and YouTube.

 

 

Enthought at National Instruments’ NIWeek 2017: An Inside Look

This week I had the distinct privilege of representing Enthought at National Instruments‘ 23rd annual user conference, NIWeek 2017. National Instruments is a leader in test, measurement, and control solutions, and we share many common customers among our global scientific and engineering user base.

NIWeek kicked off on Monday with Alliance Day, where my colleague Andrew Collette and I went on stage to receive the LabVIEW Tools Network 2017 Product of the Year Award for Enthought’s Python Integration Toolkit, which provides a bridge between Python and LabVIEW, allowing you to create VI’s (virtual instruments) that make Python function and object method calls. Since its release last year, the Python Integration Toolkit has opened up access to a broad range of new capabilities for LabVIEW users,  by combining the best of Python with the best of LabVIEW. It was also inspiring to hear about the advances being made by other National Instruments partners. Congratulations to the award winners in other categories (Wineman Technology, Bloomy, and Moore Good Ideas)!

On Wednesday, Andrew gave a presentation titled “Building and Deploying Python-Powered LabVIEW Applications” to a standing-room only crowd.  He gave some background on the relative strengths of Python and LabVIEW (some of which is covered in our March 2017 webinar “Using Python and LabVIEW to Rapidly Solve Engineering Problems“) and then showcased some of the capabilities provided by the toolkit, such as plotting data acquisition results live to a web server using plotly, which is always a crowd-pleaser (you can learn more about that in the blog post “Using Plotly from LabVIEW via Python”).  Other demos included making use of the Python scikit-learn library for machine learning, (you can see Enthought’s CEO Eric Jones run that demo here, during the 2016 NIWeek keynotes.)

For a mechanical engineer like me, attending NIWeek is a bit like giving a kid a holiday in a candy shop.  There was much to admire on the expo floor, with all kinds of mechatronic gizmos and gadgets.  I was most interested by the lightning-fast video and image processing possible with NI’s FPGA systems, like the part sorting system shown below.  Really makes me want to play around with nifpga.

Another thing really gaining traction is the implementation of machine learning for a number of applications. I attended one talk titled “Deep Learning With LabVIEW and Acceleration on FPGAs” that demonstrated image classification using a neural network and talked about strategies to reduce the code size to get it to fit on an FPGA.

Finally, of course, I was really excited by all of the activity in the Industrial Internet of Things (IIoT), which is an area of core focus for Enthought.  We have been in the big data analytics game for a long time, and writing software for hard science is in our company DNA. But this year especially, starting with the AIChE 2017 Spring Meeting and now at NIWeek 2017, it has been really energizing to meet with industry leaders and see some of the amazing things that are being implemented in the IIoT.  National Instruments has been a leader in the test and measurement sector for a long time, and they have been pioneers in IIoT.  Now it is easy to download and install an interface to Amazon S3 for LabVIEW, and just like that, your sensor is now a connected sensor … and your data is ready for analysis in Enthought’s Canopy Data platform.

After immersion in NIWeek, I guess you could say, I’ve been “LabVIEWed”:

Enthought Receives 2017 Product of the Year Award From National Instruments LabVIEW Tools Network

Python Integration Toolkit for LabVIEW recognized for extending LabVIEW connectivity and bringing the power of Python to applications in Test, Measurement and the Industrial Internet of Things (IIoT)

AUSTIN, TX – May 24, 2017 Enthought, a global leader in scientific and analytic computing solutions, was honored this week by National Instruments with the LabVIEW Tools Network Platform Connectivity 2017 Product of the Year Award for its Python Integration Toolkit for LabVIEW.

Python Integration Toolkit for LabVIEWFirst released at NIWeek 2016, the Python Integration Toolkit enables fast, two-way communication between LabVIEW and Python. With seamless access to the Python ecosystem of tools, LabVIEW users are able to do more with their data than ever before. For example, using the Toolkit, a user can acquire data from test and measurement tools with LabVIEW, perform signal processing or apply machine learning algorithms in Python, display it in LabVIEW, then share results using a Python-enabled web dashboard.

Enthought-Python-Integration-Toolkit-for-LabVIEW-Machine-Learning

Click to see the webinar “Using Python and LabVIEW to Rapidly Solve Engineering Problems” to learn more about adding capabilities such as machine learning by extending LabVIEW applications with Python.

“Python is ideally suited for scientists and engineers due to its simple, yet powerful syntax and the availability of an extensive array of open source tools contributed by a user community from industry and R&D,” said Dr. Tim Diller, Director, IIoT Solutions Group at Enthought. “The Python Integration Toolkit for LabVIEW unites the best elements of two major tools in the science and engineering world and we are honored to receive this award.”

Continue reading

Webinar: Python for Data Science: A Tour of Enthought’s Professional Training Course

DataView Python for Data Science Webinar
What: A guided walkthrough and Q&A about Enthought’s technical training course “Python for Data Science and Machine Learning” with VP of Training Solutions, Dr. Michael Connell

Who Should Watch: individuals, team leaders, and learning & development coordinators who are looking to better understand the options to increase professional capabilities in Python for data science and machine learning applications

VIEW


Enthought’s Python for Data Science training course is designed to accelerate the development of skill and confidence in using Python’s core data science tools — including the standard Python language, the fast array programming package NumPy, and the Pandas data analysis package, as well as tools for database access (DBAPI2, SQLAlchemy), machine learning (scikit-learn), and visual exploration (Matplotlib, Seaborn).

In this webinar, we give you the key information and insight you need to evaluate whether Enthought’s Python for Data Science course is the right solution to advance your professional data science skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous course attendees say about the course

VIEW


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Considering Moving to Python for Data Science?

Then Enthought’s Python for Data Science training course is definitely for you! This class has been particularly appealing to people who have been using other tools like R or SAS (or even Excel) for their data science work and want to start applying their analytic skills using the Python toolset.  And it’s no wonder — Python has been identified as the most popular coding language for five years in a row for good reason.

One reason for Python’s broad popularity across a range of disciplines is its efficiency and ease-of-use. Many people consider Python more fun to work in than other languages (and we agree!). Another reason for its popularity among data analysts and data scientists in particular is Python’s extensive (and growing) open source library of powerful tools for preparing, visualizing, analyzing, and modeling data.

Python is also an extraordinarily comprehensive toolset – it supports everything from interactive analysis to automation to software engineering to web app development within a single language and plays very well with other languages like C/C++ or FORTRAN so you can continue leveraging your existing code libraries written in those other languages.

Many organizations are moving to Python so they can consolidate all of their technical work streams under a single comprehensive toolset. In the first part of this class we’ll give you the fundamentals you need to switch from another language to Python and then we cover the core tools that will enable you to do in Python what you were doing with other tools, only faster!

Additional Resources

Upcoming Open Python for Data Science Sessions:
Austin, TX, June 12-16, 2017
San Jose, CA, July 17-21, 2017Learn MoreHave a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.
Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat Sheets
Enthought's Machine Learning with Python Cheat Sheets
Download Enthought’s Pandas Cheat SheetsEnthought's Pandas Cheat Sheets

Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

This is the second blog in a series. See the first blog here: Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

No dataset is perfect and most datasets that we have to deal with on a day-to-day basis have values missing, often represented by “NA” or “NaN”. One of the reasons why the Pandas library is as popular as it is in the data science community is because of its capabilities in handling data that contains NaN values.

But spending time looking up the relevant Pandas commands might be cumbersome when you are exploring raw data or prototyping your data analysis pipeline. This is one of the places where the Canopy Data Import Tool helps make data munging faster and easier, by simplifying the task of identifying missing values in your raw data and removing/replacing them.

Why are missing values a problem you ask? We can answer that question in the context of machine learning. scikit-learn and TensorFlow are popular and widely used libraries for machine learning in Python. Both of them caution the user about missing values in their datasets. Various machine learning algorithms expect all the input values to be numerical and to hold meaning. Both of the libraries suggest removing rows and/or columns that contain missing values.

If removing the missing values is not an option, given the size of your dataset, then they suggest replacing the missing values. The scikit-learn library provides an Imputer class, which can be used to replace missing values. See the sci-kit learn documentation for an example of how the Imputer class is used. Similarly, the decode_csv function in the TensorFlow library can be passed a record_defaults argument, which will replace missing values in the dataset. See the TensorFlow documentation for specifics.

The Data Import Tool provides capabilities to handle missing values in your dataset because we strongly believe that discovering and handling missing values in your dataset is a part of the data import and cleaning phase and not the analysis phase of the data science process.

Digging into the specifics, here we’ll compare how you can go about handling missing values with three typical scenarios, first using the Pandas library, then contrasting with the Data Import Tool:

  1. Identifying missing values in data
  2. Replacing missing values in data, and
  3. Removing missing values from data.

Note : Pandas’ internal representation of your data is called a DataFrame. A DataFrame is simply a tabular data structure, similar to a spreadsheet or a SQL table.

Continue reading

Webinar- Get More From Your Core: Applying Artificial Intelligence to CT, Photo, and Well Log Analysis with Virtual Core

What: Presentation, demo, and Q&A with Brendon Hall, Geoscience Product Manager, Enthought

Who should watch this webinar:

  • Oil and gas industry professionals who are looking for ways to extract more value from expensive science wells
  • Those interested in learning how artificial intelligence and machine learning techniques can be applied to core analysis

VIEW 


Geoscientists and petroleum engineers rely on accurate core measurements to characterize reservoirs, develop drilling plans and de-risk play assessments. Whole-core CT scans are now routinely performed on extracted well cores, however the data produced from these scans are difficult to visualize and integrate with other measurements.

Virtual Core automates aspects of core description for geologists, drastically reducing the time and effort required for core description, and its unified visualization interface displays cleansed whole-core CT data alongside core photographs and well logs. It provides tools for geoscientists to analyze core data and extract features from sub-millimeter scale to the entire core.

In this webinar and demo, we’ll start by introducing the Clear Core processing pipeline, which automatically removes unwanted artifacts (such as tubing) from the CT image. We’ll then show how the machine learning capabilities in Virtual Core can be used to describe the core, extracting features such as bedding planes and dip angle. Finally, we’ll show how the data can be viewed and analyzed alongside other core data, such as photographs, wellbore images, well logs, plug measurements, and more.

What You’ll Learn:

  • How core CT data, photographs, well logs, borehole images, and more can be integrated into a digital core workshop
  • How digital core data can shorten core description timelines and deliver business results faster
  • How new features can be extracted from digital core data using artificial intelligence
  • Novel workflows that leverage these features, such as identifying parasequences and strategies for determining net pay

VIEW 

Presenter:

Brendon Hall, Geoscience Applications Engineer, Enthought Brendon Hall, Enthought
Geoscience Product Manager and Application Engineer

Continue reading

Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting

by: Tim Diller, Product Manager and Scientific Software Developer, Enthought

Last week I attended the AIChE (American Institute of Chemical Engineers) Spring Meeting in San Antonio, Texas. It was a great time of year to visit this cultural gem deep in the heart of Texas (and just down the road from our Austin offices), with plenty of good food, sights and sounds to take in on top of the conference and its sessions.

The AIChE Spring Meeting focuses on applications of chemical engineering in industry, and Enthought was invited to present a poster and deliver a “vendor perspective” talk on the Canopy Platform for Process Monitoring and Optimization as part of the “Big Data Analytics” track. This was my first time at AIChE, so some of the names were new, but in a lot of ways it felt very similar to many other engineering conferences I have participated in over the years (for instance, ASME (American Society of Mechanical Engineers), SAE (Society of Automotive Engineers), etc.).

This event underscored that regardless of industry, engineers are bringing the same kinds of practical ingenuity to bear on similar kinds of problems, and with the cost of data acquisition and storage plummeting in the last decade, many engineers are now sitting on more data than they know how to effectively handle.

What exactly is “big data”? Does it really matter for solving hard engineering problems?

One theme that came up time and again in the “Big Data Analytics” sessions Enthought participated in was what exactly “big data” is. In many circles, a good working definition of what makes data “big” is that it exceeds the size of the physical RAM on the machine doing the computation, so that something other than simply loading the data into memory has to be done to make meaningful computations, and thus a working definition of some tens of GB delimits “big” data from “small”.

For others, and many at the conference indeed, a more mundane definition of “big” means that the data set doesn’t fit within the row or column limits of a Microsoft Excel Worksheet.

But the question of whether your data is “big” is really a moot one as far as we at Enthought are concerned; really, being “big” just adds complexity to an already hard problem, and the kind of complexity is an implementation detail dependent on the details of the problem at hand.

And that relates to the central message of my talk, which was that an analytics platform (in this case I was talking about our Canopy Platform) should abstract away the tedious complexities, and help an expert get to the heart of the hard problem at hand.

At AIChE, the “hard problems” at hand seemed invariably to involve one or both of two things: (1) increasing safety/reliability, and (2) increasing plant output.

To solve these problems, two general kinds of activity were on display: different pattern recognition algorithms and tools, and modeling, typically through some kind of regression-based approach. Both of these things are straightforward in the Canopy Platform.

The Canopy Platform is a collection of related technologies that work together in an integrated way to support the scientist/analyst/engineer.

What is the Canopy Platform?

If you’re using Python for science or engineering, you have probably used or heard of Canopy, Enthought’s Python-based data analytics application offering an integrated code editor and interactive command prompt, package manager, documentation browser, debugger, variable browser, data import tool, and lots of hidden features like support for many kinds of proxy systems that work behind the scenes to make a seamless work environment in enterprise settings.

However, this is just one part of the Canopy Platform. Over the years, Enthought has been building other components and related technologies that work together in an integrated way to support the engineer/analyst/scientist solving hard problems.

At the center of the this is the Enthought Python Distribution, with runtime interpreters for Python 2.7 and 3.x and over 450 pre-built Python packages for scientific computing, including tools for machine learning and the kind of regression modeling that was shown in some of the other presentations in the Big Data sessions. Other components of the Canopy Platform include interface modules for Excel (PyXLL) and for National Instruments’ LabView software (Python Integration Toolkit for LabVIEW), among others.

A key component of our Canopy Platform is our Deployment Server, which simplifies the tricky tasks of deploying proprietary applications and packages or creating customized, reproducible Python environments inside an organization, especially behind a firewall or an air-gapped network.

Finally, (and this is what we were really showing off at the AIChE Big Data Analytics session) there are the Data Catalog and the Cloud Compute layers within the Canopy Platform.

The Data Catalog provides an indexed interface to potentially heterogeneous data sources, making them available for search and query based on various kinds of metadata.

The Data Catalog provides an indexed interface to potentially heterogeneous data sources. These can range from a simple network directory with a collection of HDF5 files to a server hosting files with the Byzantine complexity of the IRIG 106 Ch. 10 Digital Recorder Standard used by US military test flight ranges. The nice thing about the Data Catalog is that it lets you query and select data based on computed metadata, for example “factory A, on Tuesdays when Ethylene output was below 10kg/hr”, or in a test flight data example “test flights involving a T-38 that exceeded 10,000 ft but stayed subsonic.”

With the Cloud Compute layer, an expert user can write code and test it locally on some subset of data from the Data Catalog. Then, when it is working to satisfaction, he or she can publish the code as a computational kernel to run on some other, larger subset of the data in the Data Catalog, using remote compute resources, which might be an HPC cluster or an Apache Spark server. That kernel is then available to other users in the organization, who do not have to understand the algorithm to run it on other data queries.

In the demo below, I showed hooking up the Data Catalog to some historical factory data stored on a remote machine.

Data Catalog View The Data Catalog allows selection of subsets of the data set for inspection and ad hoc analysis. Here, three channels are compared using a time window set on the time series data shown on the top plot.

Then using a locally tested and developed compute kernel, I did a principal component analysis on the frequencies of the channel data for a subset of the data in the Data Catalog. Then I published the kernel and ran it on the entire data set using the remote compute resource.

After the compute kernel has been published and run on the entire data set, then the result explorer tool enables further interactions.

Ultimately, the Canopy Platform is for building and distributing applications that solve hard problems.  Some of the products we have built on the platform are available today (for instance, Canopy Geoscience and Virtual Core), others are in prototype stage or have been developed for other companies with proprietary components and are not publicly available.

It was exciting to participate in the Big Data Analytics track this year, to see what others are doing in this area, and to be a part of many interesting and fruitful discussions. Thanks to Ivan Castillo and Chris Reed at Dow for arranging our participation.