Tag Archives: Python

Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution

Python 3 and multi-environment support, new state of the art package dependency solver, and over 450 packages now available free for all users

Enthought Canopy logoEnthought is pleased to announce the release of Canopy 2.1, a significant feature release that includes Python 3 and multi-environment support, a new state of the art package dependency solver, and access to over 450 pre-built and tested scientific and analytic Python packages completely free for all users. We highly recommend that all current Canopy users upgrade to this new release.

Ready to dive in? Download Canopy 2.1 here.


For those currently familiar with Canopy, in this blog we’ll review the major new features in this exciting milestone release, and for those of you looking for a tool to improve your workflow with Python, or perhaps new to Python from a language like MATLAB or R, we’ll take you through the key reasons that scientists, engineers, data scientists, and analysts use Canopy to enable their work in Python.

First, let’s talk about the latest and greatest in Canopy 2.1!

  1. Support for Python 3 user environments: Canopy can now be installed with a Python 3.5 user environment. Users can benefit from all the Canopy features already available for Python 2.7 (syntax checking, debugging, etc.) in the new Python 3 environments. Python 3.6 is also available (and will be the standard Python 3 in Canopy 2.2).
  2. All 450+ Python 2 and Python 3 packages are now completely free for all users: Technical support, full installers with all packages for offline or shared installation, and the premium analysis environment features (graphical debugger and variable browser and Data Import Tool) remain subscriber-exclusive benefits. See subscription options here to take advantage of those benefits.
  3. Built in, state of the art dependency solver (EDM or Enthought Deployment Manager): the new EDM back end (which replaces the previous enpkg) provides additional features for robust package compatibility. EDM integrates a specialized dependency solver which automatically ensures you have a consistent package set after installation, removal, or upgrade of any packages.
  4. Environment bundles, which allow users to easily share environments directly with co-workers, or across various deployment solutions (such as the Enthought Deployment Server, continuous integration processes like Travis-CI and Appveyor, cloud solutions like AWS or Google Compute Engine, or deployment tools like Ansible or Docker). EDM environment bundles not only allow the user to replicate the set of installed dependencies but also support persistence for constraint modifiers, the list of manually installed packages, and the runtime version and implementation.
  5. Multi-environment support: with the addition of Python 3 environments and the new EDM back end, Canopy now also supports managing multiple Python environments from the user interface. You can easily switch between Python 2.7 and 3.5, or between multiple 2.7 or 3.5 environments. This is ideal especially for those migrating legacy code to Python 3, as it allows you to test as you transfer and also provides access to historical snapshots or libraries that aren’t yet available in Python 3.


Why Canopy is the Python platform of choice for scientists and engineers

Since 2001, Enthought has focused on making the scientific Python stack accessible and easy to use for both enterprises and individuals. For example, Enthought released the first scientific Python distribution in 2004, added robust and corporate support for NumPy on 64-bit Windows in 2011, and released Canopy 1.0 in 2013.

Since then, with its MATLAB-like experience, Canopy has enabled countless engineers, scientists and analysts to perform sophisticated analysis, build models, and create cutting-edge data science algorithms. Canopy’s all-in-one package distribution and analysis environment for Python has also been widely adopted in organizations who want to provide a single, unified platform that can be used by everyone from data analysts to software engineers.

Here are five of the top reasons that people choose Canopy as their tool for enabling data analysis, data modelling, and data visualization with Python:

Continue reading

SciPy 2017 Conference to Showcase Leading Edge Developments in Scientific Computing with Python

Renowned scientists, engineers and researchers from around the world to gather July 10-16, 2017 in Austin, TX to share and collaborate to advance scientific computing tool


AUSTIN, TX – June 6, 2017 –
Enthought, as Institutional Sponsor, today announced the SciPy 2017 Conference will be held July 10-16, 2017 in Austin, Texas. At this 16th annual installment of the conference, scientists, engineers, data scientists and researchers will participate in tutorials, talks and developer sprints designed to foster the continued rapid growth of the scientific Python ecosystem. This year’s attendees hail from over 25 countries and represent academia, government, national research laboratories, and industries such as aerospace, biotechnology, finance, oil and gas and more.

“Since 2001, the SciPy Conference has been a highly anticipated annual event for the scientific and analytic computing community,” states Dr. Eric Jones, CEO at Enthought and SciPy Conference co-founder. “Over the last 16 years we’ve witnessed Python emerge as the de facto open source programming language for science, engineering and analytics with widespread adoption in research and industry. The powerful tools and libraries the SciPy community has developed are used by millions of people to advance scientific inquest and innovation every day.”

Special topical themes for this year’s conference are “Artificial Intelligence and Machine Learning Applications” and the “Scientific Python (SciPy) Tool Stack.” Keynote speakers include:

  • Kathryn Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign  
  • Sean Gulick, Research Professor at the Institute for Geophysics at the University of Texas at Austin
  • Gaël Varoquaux, faculty researcher in the Neurospin brain research institute at INRIA (French Institute for Research in Computer Science and Automation)

In addition to the special conference themes, there will also be over 100 talk and poster paper speakers/presenters covering eight mini-symposia tracks including: Astronomy; Biology, Biophysics, and Biostatistics; Computational Science and Numerical Techniques; Data Science; Earth, Ocean, and Geo Sciences; Materials Science and Engineering; Neuroscience; and Open Data and Reproducibility.

New for 2017 is a sold-out “Teen Track,” a two-day curriculum designed to inspire the scientists of tomorrow.  From July 10-11, high school students will learn more about the Python language and how developers solve real world scientific problems using Python and its scientific libraries.

Conference and tutorial registration is open at https://scipy2017.scipy.org.

About the SciPy Conference

SciPy 2017, the sixteenth annual Scientific Computing with Python conference, will be held July 10-16, 2017 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers and collaborate on code development. For more information or to register, visit https://scipy2017.scipy.org.

About Enthought

Enthought is a global leader in scientific and analytic software, consulting and training solutions serving a customer base comprised of some of the most respected names in the oil and gas, manufacturing, financial services, aerospace, military, government, biotechnology, consumer products and technology industries. The company was founded in 2001 and is headquartered in Austin, Texas, with additional offices in Cambridge, United Kingdom and Pune, India. For more information visit www.enthought.com and connect with Enthought on Twitter, LinkedIn, Google+, Facebook and YouTube.

 

 

Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

This is the second blog in a series. See the first blog here: Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

No dataset is perfect and most datasets that we have to deal with on a day-to-day basis have values missing, often represented by “NA” or “NaN”. One of the reasons why the Pandas library is as popular as it is in the data science community is because of its capabilities in handling data that contains NaN values.

But spending time looking up the relevant Pandas commands might be cumbersome when you are exploring raw data or prototyping your data analysis pipeline. This is one of the places where the Canopy Data Import Tool helps make data munging faster and easier, by simplifying the task of identifying missing values in your raw data and removing/replacing them.

Why are missing values a problem you ask? We can answer that question in the context of machine learning. scikit-learn and TensorFlow are popular and widely used libraries for machine learning in Python. Both of them caution the user about missing values in their datasets. Various machine learning algorithms expect all the input values to be numerical and to hold meaning. Both of the libraries suggest removing rows and/or columns that contain missing values.

If removing the missing values is not an option, given the size of your dataset, then they suggest replacing the missing values. The scikit-learn library provides an Imputer class, which can be used to replace missing values. See the sci-kit learn documentation for an example of how the Imputer class is used. Similarly, the decode_csv function in the TensorFlow library can be passed a record_defaults argument, which will replace missing values in the dataset. See the TensorFlow documentation for specifics.

The Data Import Tool provides capabilities to handle missing values in your dataset because we strongly believe that discovering and handling missing values in your dataset is a part of the data import and cleaning phase and not the analysis phase of the data science process.

Digging into the specifics, here we’ll compare how you can go about handling missing values with three typical scenarios, first using the Pandas library, then contrasting with the Data Import Tool:

  1. Identifying missing values in data
  2. Replacing missing values in data, and
  3. Removing missing values from data.

Note : Pandas’ internal representation of your data is called a DataFrame. A DataFrame is simply a tabular data structure, similar to a spreadsheet or a SQL table.

Continue reading

Traits and TraitsUI: Reactive User Interfaces for Rapid Application Development in Python

The Enthought Tool Suite team is pleased to announce the release of Traits 4.6. Together with the release of TraitsUI 5.1 last year, these core packages of Enthought’s open-source rapid application development tools are now compatible with Python 3 as well as Python 2.7.  Long-time fans of Enthought’s open-source offerings will be happy to hear about the recent updates and modernization we’ve been working on, including the recent release of Mayavi 4.5 with Python 3 support, while newcomers to Python will be pleased that there is an easy way to get started with GUI programming which grows to allow you to build applications with sophisticated, interactive 2D and 3D visualizations.

A Brief Introduction to Traits and TraitsUI

Traits is a mature reactive programming library for Python that allows application code to respond to changes on Python objects, greatly simplifying the logic of an application.  TraitsUI is a tool for building desktop applications on top of the Qt or WxWidgets cross-platform GUI toolkits. Traits, together with TraitsUI, provides a programming model for Python that is similar in concept to modern and popular Javascript frameworks like React, Vue and Angular but targeting desktop applications rather than the browser.

Traits is also the core of Enthought’s open source 2D and 3D visualization libraries Chaco and Mayavi, drives the internal application logic of Enthought products like Canopy, Canopy Geoscience and Virtual Core, and Enthought’s consultants appreciate its the way it facilitates the rapid development of desktop applications for our consulting clients. It is also used by several open-source scientific software projects such as the HyperSpy multidimensional data analysis library and the pi3Diamond application for controlling diamond nitrogen-vacancy quantum physics experiments, and in commercial projects such as the PyRX Virtual Screening software for computational drug discovery.

 The open-source pi3Diamond application built with Traits, TraitsUI and Chaco by Swabian Instruments.

The open-source pi3Diamond application built with Traits, TraitsUI and Chaco by Swabian Instruments.

Traits is part of the Enthought Tool Suite of open source application development packages and is available to install through Enthought Canopy’s Package Manager (you can download Canopy here) or via Enthought’s new edm command line package and environment management tool. Running

edm install traits

at the command line will install Traits into your current environment.

Traits

The Traits library provides a new type of Python object which has an event stream associated with each attribute (or “trait”) of the object that tracks changes to the attribute.  This means that you can decouple your application model much more cleanly: rather than an object having to know all the work which might need to be done when it changes its state, instead other parts of the application register the pieces of work that each of them need when the state changes and Traits automatically takes care running that code.  This results in simpler, more modular and loosely-coupled code that is easier to develop and maintain.

Traits also provides optional data validation and initialization that dramatically reduces the amount of boilerplate code that you need to write to set up objects into a working state and ensure that the state remains valid.  This makes it more likely that your code is correct and does what you expect, resulting in fewer subtle bugs and more immediate and useful errors when things do go wrong.

When you consider all the things that Traits does, it would be reasonable to expect that it may have some impact on performance, but the heart of Traits is written in C and knows more about the structure of the data it is working with than general Python code. This means that it can make some optimizations that the Python interpreter can’t, the net result of which is that code written with Traits is often faster than equivalent pure Python code.

Continue reading

Webinar: An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

See the webinar

Enthought’s Pandas Mastery Workshop is designed to accelerate the development of skill and confidence with Python’s Pandas data analysis package — in just three days, you’ll look like an old pro! This course was created ground up by our training experts based on insights from the science of human learning, as well as what we’ve learned from over a decade of extensive practical experience of teaching thousands of scientists, engineers, and analysts to use Python effectively in their everyday work.

In this webinar, we’ll give you the key information and insight you need to evaluate whether the Pandas Mastery Workshop is the right solution to advance your data analysis skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous workshop attendees say about the course

See the Webinar


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Continue reading

Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

This is the first blog in a series. See the second blog here: Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

Data exploration, manipulation, and visualization start with loading data, be it from files or from a URL. Pandas has become the go-to library for all things data analysis in Python, but if your intention is to jump straight into data exploration and manipulation, the Canopy Data Import Tool can help, instead of having to learn the details of programming with the Pandas library.

The Data Import Tool leverages the power of Pandas while providing an interactive UI, allowing you to visually explore and experiment with the DataFrame (the Pandas equivalent of a spreadsheet or a SQL table), without having to know the details of the Pandas-specific function calls and arguments. The Data Import Tool keeps track of all of the changes you make (in the form of Python code). That way, when you are done finding the right workflow for your data set, the Tool has a record of the series of actions you performed on the DataFrame, and you can apply them to future data sets for even faster data wrangling in the future.

At the same time, the Tool can help you pick up how to use the Pandas library, while still getting work done. For every action you perform in the graphical interface, the Tool generates the appropriate Pandas/Python code, allowing you to see and relate the tasks to the corresponding Pandas code.

With the Data Import Tool, loading data is as simple as choosing a file or pasting a URL. If a file is chosen, it automatically determines the format of the file, whether or not the file is compressed, and intelligently loads the contents of the file into a Pandas DataFrame. It does so while taking into account various possibilities that often throw a monkey wrench into initial data loading: that the file might contain lines that are comments, it might contain a header row, the values in different columns could be of different types e.g. DateTime or Boolean, and many more possibilities as well.

Importing files or data into Pandas with the Canopy Data Import Tool

The Data Import Tool makes loading data into a Pandas DataFrame as simple as choosing a file or pasting a URL.

A Glimpse into Loading Data into Pandas DataFrames (The Hard Way)

The following 4 “inconvenience” examples show typical problems (and the manual solutions) that might arise if you are writing Pandas code to load data, which are automatically solved by the Data Import Tool, saving you time and frustration, and allowing you to get to the important work of data analysis more quickly.

Continue reading

Mayavi (Python 3D Data Visualization and Plotting Library) adds major new features in recent release

Key updates include: Jupyter notebook integration, movie recording capabilities, time series animation, updated VTK compatibility, and Python 3 support

by Prabhu Ramachandran, core developer of Mayavi and director, Enthought India

The Mayavi development team is pleased to announce Mayavi 4.5.0, which is an important release both for new features and core functionality updates.

Mayavi is a general purpose, cross-platform Python package for interactive 2-D and 3-D scientific data visualization. Mayavi integrates seamlessly with NumPy (fast numeric computation library for Python) and provides a convenient Pythonic wrapper for the powerful VTK (Visualization Toolkit) library. Mayavi provides a standalone UI to help visualize data, and is easy to extend and embed in your own dialogs and UIs. For full information, please see the Mayavi documentation.

Mayavi is part of the Enthought Tool Suite of open source application development packages and is available to install through Enthought Canopy’s Package Manager (you can download Canopy here).

Mayavi 4.5.0 is an important release which adds the following features:

  1. Jupyter notebook support: Adds basic support for displaying Mayavi images or interactive X3D scenes
  2. Support for recording movies and animating time series
  3. Support for the new matplotlib color schemes
  4. Improvements on the experimental Python 3 support from the previous release
  5. Compatibility with VTK-5.x, VTK-6.x, and 7.x. For more details on the full set of changes see here.

Let’s take a look at some of these new features in more detail:

Jupyter Notebook Support

This feature is still basic and experimental, but it is convenient. The feature allows one to embed either a static PNG image of the scene or a richer X3D scene into a Jupyter notebook. To use this feature, one should first initialize the notebook with the following:

from mayavi import mlab
mlab.init_notebook()

Subsequently, one may simply do:

s = mlab.test_plot3d()
s

This will embed a 3-D visualization producing something like this:

Mayavi in a Jupyter Notebook

Embedded 3-D visualization in a Jupyter notebook using Mayavi

When the init_notebook method is called it configures the Mayavi objects so they can be rendered on the Jupyter notebook. By default the init_notebook function selects the X3D backend. This will require a network connection and also reasonable off-screen support. This currently will not work on a remote Linux/OS X server unless VTK has been built with off-screen support via OSMesa as discussed here.

For more documentation on the Jupyter support see here.

Animating Time Series

This feature makes it very easy to animate a time series. Let us say one has a set of files that constitute a time series (files of the form some_name[0-9]*.ext). If one were to load any file that is part of this time series like so:

from mayavi import mlab
src = mlab.pipeline.open('data_01.vti')

Animating these is now very easy if one simply does the following:

src.play = True

This can also be done on the UI. There is also a convenient option to synchronize multiple time series files using the “sync timestep” option on the UI or from Python. The screenshot below highlights the new features in action on the UI:

Time Series Animation in Mayavi

New time series animation feature in the Python Mayavi 3D visualization library.

Recording Movies

One can also create a movie (really a stack of images) while playing a time series or running any animation. On the UI, one can select a Mayavi scene and navigate to the movie tab and select the “record” checkbox. Any animations will then record screenshots of the scene. For example:

from mayavi import mlab
f = mlab.figure()
f.scene.movie_maker.record = True
mlab.test_contour3d_anim()

This will create a set of images, one for each step of the animation. A gif animation of these is shown below:

Recording movies with Mayavi

Recording movies as gif animations using Mayavi

More than 50 pull requests were merged since the last release. We are thankful to Prabhu Ramachandran, Ioannis Tziakos, Kit Choi, Stefano Borini, Gregory R. Lee, Patrick Snape, Ryan Pepper, SiggyF, and daytonb for their contributions towards this release.

Additional Resources on Mayavi:

Geophysical Tutorial: Facies Classification using Machine Learning and Python

Published in the October 2016 edition of The Leading Edge magazine by the Society of Exploration Geophysicists. Read the full article here.

By Brendon Hall, Enthought Geosciences Applications Engineer 
Coordinated by Matt Hall, Agile Geoscience

ABSTRACT

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist’s toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

One of the best examples is scikit-learn, a collection of tools for machine learning in Python. What is machine learning? You can think of it as a set of data-analysis methods that includes classification, clustering, and regression. These algorithms can be used to discover features and trends within the data without being explicitly programmed, in essence learning from the data itself.

Well logs and facies classification results from a single well.

Well logs and facies classification results from a single well.

In this tutorial, we will demonstrate how to use a classification algorithm known as a support vector machine to identify lithofacies based on well-log measurements. A support vector machine (or SVM) is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described.

See the tutorial in The Leading Edge here.

ADDITIONAL RESOURCES: