Category Archives: General

Geophysical Tutorial: Facies Classification using Machine Learning and Python

Published in the October 2016 edition of The Leading Edge magazine by the Society of Exploration Geophysicists. Read the full article here.

By Brendon Hall, Enthought Geosciences Applications Engineer 
Coordinated by Matt Hall, Agile Geoscience


There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist’s toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

One of the best examples is scikit-learn, a collection of tools for machine learning in Python. What is machine learning? You can think of it as a set of data-analysis methods that includes classification, clustering, and regression. These algorithms can be used to discover features and trends within the data without being explicitly programmed, in essence learning from the data itself.

Well logs and facies classification results from a single well.

Well logs and facies classification results from a single well.

In this tutorial, we will demonstrate how to use a classification algorithm known as a support vector machine to identify lithofacies based on well-log measurements. A support vector machine (or SVM) is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described.

See the tutorial in The Leading Edge here.


Webinar: Introducing the NEW Python Integration Toolkit for LabVIEW

See a recording of the webinar:

LabVIEW is a software platform made by National Instruments, used widely in industries such as semiconductors, telecommunications, aerospace, manufacturing, electronics, and automotive for test and measurement applications. In August 2016, Enthought released the Python Integration Toolkit for LabVIEW, which is a “bridge” between the LabVIEW and Python environments.

In this webinar, we’ll demonstrate:

  1. How the new Python Integration Toolkit for LabVIEW from Enthought seamlessly brings the power of the Python ecosystem of scientific and engineering tools to LabVIEW
  2. Examples of how you can extend LabVIEW with Python, including using Python for signal and image processing, cloud computing, web dashboards, machine learning, and more

Continue reading

The Latest Features in Virtual Core: CT Scan, Photo, and Well Log Co-visualization

Enthought is pleased to announce Virtual Core 1.8.  Virtual Core automates aspects of core description for geologists, drastically reducing the time and effort required for core description, and its unified visualization interface displays cleansed whole-core CT data alongside core photographs and well logs.  It provides tools for geoscientists to analyze core data and extract features from sub-millimeter scale to the entire core.

NEW VIRTUAL CORE 1.8 FEATURE: Rotational Alignment on Core CT Sections

Virtual Core 1.8 introduces the ability to perform rotational alignment on core CT sections.  Core sections can become misaligned during extraction and data acquisition.   The alignment tool allows manual realignment of the individual core sections.  Wellbore image logs (like FMI) can be imported and used as a reference when aligning core sections.  The Digital Log Interchange Standard (DLIS) is now fully supported, and can be used to import and export data.

Whole-core CT scans are routinely performed on extracted well cores.  The data produced from these scans is typically presented as static 2D images of cross sections and video scans.  Images are limited to those provided by the vendor, and the raw data, if supplied, is difficult to analyze. However, the CT volume is a rich 3D dataset of compositional and textural information that can be incorporated into core description and analysis workflows.

Enthought’s proprietary Clear Core technology is used to process the raw CT data, which is notoriously difficult to analyze.  Raw CT data is stored in 3 foot sections, with each section consisting of many thousands of individual slice images which are approximately .2 mm thick. Continue reading

The Latest and Greatest Pandas Features (since v 0.11)

On May 28, 2014 Phillip Cloud, core contributor for the Pandas data analytics Python library, spoke at a joint meetup of the New York Quantitative Python User’s Group (NY QPUG) and the NY Finance PUG. Enthought hosted and about 60 people joined us to listen to Phillip present some of the less-well-known, but really useful features that have come out since Pandas version 0.11 and some that are coming soon. We all learned more about how to take full advantage of the Pandas Python library, and got a better sense of how excited Phillip was to discover Pandas during his graduate work.

Pandas to MATLAB

After a fairly comprehensive overview of Pandas, Phillip got into the new features. In version 0.11 he covered: Continue reading

Implied Volatility with Python’s Pandas Library AND Python in Excel

Authors: Brett Murphy and Aaron Waters

The March 6 New York Quantitative Python User’s Group (NY QPUG) Meetup included presentations by NAG (Numerical Algorithms Group), known for its high quality numerical computing software and high performance computing (HPC) services, and Enthought, a provider of scientific computing solutions powered by Python.

Brian Spector, a technical consultant at NAG, presented “Implied Volatility using Python’s Pandas Library.” He covered a technique and script for calculating implied volatility for option prices in the Black–Scholes formula using Pandas and nag4py. With this technique, you can determine for what volatility the Black–Scholes equation price equals the market price. This volatility is then denoted as the implied volatility observed in the market. Brian fitted varying degrees of polynomials to the volatility curves, then examined the volatility surface and its sensitivity with respect to the interest rate. See the full presentation in the video below:

Continue reading

Python at Inflection Point in HPC

Authors: Kurt Smith, Robert Grant, and Lauren Johnson

We attended SuperComputing 2013, held November 17-22 in Denver, and saw huge interest around Python. There were several Python related events, including the “Python in HPC” tutorial (Monday), the Python BoF (Tuesday), and a “Python for HPC” workshop held in parallel with the tutorial on Monday. But we had some of our best conversations on the trade show floor.

Python Buzz on the Floor

The Enthought booth had a prominent “Python for HPC: High Productivity Computing” headline, and we looped videos of our parallelized 2D Julia set rendering GUI (video below).  The parallelization used Cython’s OpenMP functionality, came in at around 200 lines of code, and generated lots of discussions.  We also used a laptop to display an animated 3D Julia set rendered in Mayavi and to demo Canopy.

Many people came up to us after seeing our banner and video and asked “I use Python a little bit, but never in HPC – what can you tell me?”  We spoke with hundreds of people and had lots of good conversations.

It really seems like Python has reached an inflection point in HPC.

Python in HPC Tutorial, Monday

Kurt Smith presented a 1/4 day section on Cython, which was a shortened version of what he presented at SciPy 2013.  In addition, Andy Terrel presented “Introduction to Python”; Aron Ahmadia presented “Scaling Python with MPI”; and Travis Oliphant presented “Python and Big Data”. You can find all the material on the website.

The tutorial was generally well attended: about 100–130 people.  A strong majority of attendees were already programming in Python, with about half using Python in a performance-critical area and perhaps 10% running Python on supercomputers or clusters directly.

In the Cython section of the tutorial, Kurt went into more detail on how to use OpenMP with Cython, which was of interest to many based on questions during the presentation. For the exercises, students were given temporary accounts on  Stampede (TACC’s latest state-of-the-art supercomputer) to help ensure everyone was able to get their exercise environment working.

Andy’s section of the day went well, covering the basics of using Python.  Aron’s section was good for establishing that Python+MPI4Py can scale to ~65,000 nodes on massive supercomputers, and also for adressing people’s concerns regarding the import challenge.

Python in HPC workshop, Monday

There was a day-long workshop of presentations on “Python in HPC” which ran in parallel with the “Python for HPC” tutorial. Of particular interest were the talks on “Doubling the performance of NumPy” and “Bohrium: Unmodified NumPy code on CPU, GPU, and Cluster“.

Python for High Performance and Scientific Computing BoF, Tuesday

Andy Terrel, William Scullin, and Andreas Schreiber organized a Birds-of-a-Feather session on Python, which had about 150 attendees (many thanks to all three for organizing a great session!).  Kurt gave a lightning talk on Enthought’s SBIR work.  The other talks focused on applications of Python in HPC settings, as well as IPython notebooks on the basics of the Navier-Stokes equations.

It was great to see so much interest in Python for HPC!

PyQL and QuantLib: A Comprehensive Finance Framework

Authors: Kelsey Jordahl, Brett Murphy

Earlier this month at the first New York Finance Python User’s Group (NY FPUG) meetup, Kelsey Jordahl talked about how PyQL streamlines the development of Python-based finance applications using QuantLib. There were about 30 people attending the talk at the Cornell Club in New York City. We have a recording of the presentation below.

FPUG Meetup Presentation Screenshot

QuantLib is a free, open-source (BSD-licensed) quantitative finance package. It provides tools for financial instruments, yield curves, pricing engines, creating simulations, and date / time management. There is a lot more detail on the QuantLib website along with the latest downloads. Kelsey refers to a really useful blog / open-source book by one of the core QuantLib developers on implementing QuantLib. Quantlib also comes with different language bindings, including Python.

So why use PyQL if there are already Python bindings in QuantLib? Well, PyQL provides a much more Pythonic set of APIs, in short. Kelsey discusses some of the differences between the original QuantLib Python API and the PyQL API and how PyQL streamlines the resulting Python code. You get better integration with other packages like NumPy, better namespace usage and better documentation. PyQL is available up on GitHub in the PyQL repo. Kelsey uses the IPython Notebooks in the examples directory to explore PyQL and QuantLib and compares the use of PyQL versus the standard (SWIG) QuantLib Python APIs.

PyQL remains a work in progress, with goals to make its QuantLib coverage more complete, the API even more Pythonic, and getting a successful build on Windows (works on Mac OS and Linux now). It’s open source, so feel free to step up and contribute!

For the details, check out the video of Kelsey’s presentation (44 minutes).

And here are the slides online if you want to check the links in the presentation.

If you are interested in working on either QuantLib or PyQL, let the maintainers know!

Exploring NumPy/SciPy with the “House Location” Problem

Author: Aaron Waters

I created a Notebook that describes how to examine, illustrate, and solve a geometric mathematical problem called “House Location” using Python mathematical and numeric libraries. The discussion uses symbolic computation, visualization, and numerical computations to solve the problem while exercising the NumPy, SymPy, Matplotlib, IPython and SciPy packages.

I hope that this discussion will be accessible to people with a minimal background in programming and a high-school level background in algebra and analytic geometry. There is a brief mention of complex numbers, but the use of complex numbers is not important here except as “values to be ignored”. I also hope that this discussion illustrates how to combine different mathematically oriented Python libraries and explains how to smooth out some of the rough edges between the library interfaces.