By Brendon Hall, Enthought Geosciences Applications Engineer Coordinated by Matt Hall, Agile Geoscience
There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist’s toolbox, much of which used to be only available in proprietary (and expensive) software platforms.
One of the best examples is scikit-learn, a collection of tools for machine learning in Python. What is machine learning? You can think of it as a set of data-analysis methods that includes classification, clustering, and regression. These algorithms can be used to discover features and trends within the data without being explicitly programmed, in essence learning from the data itself.
Well logs and facies classification results from a single well.
In this tutorial, we will demonstrate how to use a classification algorithm known as a support vector machine to identify lithofacies based on well-log measurements. A support vector machine (or SVM) is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described.
Enter the machine learning contest: your mission, should you choose to accept it, is to make the best lithology prediction you can. We want you to try to beat the accuracy score Brendon Hall achieved in his Geophyscial Tutorial (The Leading Edge, October 2016). See the full contest details here.
LabVIEW is a software platform made by National Instruments, used widely in industries such as semiconductors, telecommunications, aerospace, manufacturing, electronics, and automotive for test and measurement applications. In August 2016, Enthought released the Python Integration Toolkit for LabVIEW, which is a “bridge” between the LabVIEW and Python environments.
Presented by: Brendon Hall, Geoscience Applications Engineer, Enthought, and Andrew Govert, Geologist, Cimarex Energy
It has become an industry standard for whole-core X-ray computed tomography (CT) scans to be collected over cored intervals. The resulting data is typically presented as static 2D images, video scans, and as 1D density curves.
CT scans of cores before and after processing to remove artifacts and normalize features.
However, the CT volume is a rich data set of compositional and textural information that can be incorporated into core description and analysis workflows. In order to access this information the raw CT data initially has to be processed to remove artifacts such as the aluminum tubing, wax casing and mud filtrate. CT scanning effects such as beam hardening are also accounted for. The resulting data is combined into contiguous volume of CT intensity values which can be directly calibrated to plug bulk density.
Since PyXLL was first released back in 2010 it has grown hugely in popularity and is used by businesses in many different sectors.
The original motivation for PyXLL was to be able to use all the best bits of Excel combined with a modern programming language for scientific computing, in a way that fits naturally and works seamlessly.
Since the beginning, PyXLL development focused on the things that really matter for creating useful real-world spreadsheets; worksheet functions and macro functions. Without these all you can do is just drive Excel by poking numbers in and reading numbers out. At the time the first version of PyXLL was released, that was already possibly using COM, and so providing yet another API to do the same was seen as little value add. On the other hand, being able to write functions and macros in Python opens up possibilities that previously were only available in VBA or writing complicated Excel Addins in C++ or C#.
With the release of PyXLL 3, integrating your Python code into Excel has become more enjoyable than ever. Many things have been simplified to get you up and running faster, and there are some major new features to explore.
If you are new to PyXLL have a look at the Getting Started section of the documentation.
All the features of PyXLL, including these new ones, can be found in the Documentation
NEW FEATURES IN PYXLL V. 3.0
1. Ribbon Customization
Ever wanted to write an add-in that uses the Excel ribbon interface? Previously the only way to do this was to write a COM add-in, which requires a lot of knowledge, skill and perseverance! Now you can do it with PyXLL by defining your ribbon as an XML document and adding it to your PyXLL config. All the callbacks between Excel and your Python code are handled for you.
Enthought is pleased to announce Virtual Core 1.8. Virtual Core automates aspects of core description for geologists, drastically reducing the time and effort required for core description, and its unified visualization interface displays cleansed whole-core CT data alongside core photographs and well logs. It provides tools for geoscientists to analyze core data and extract features from sub-millimeter scale to the entire core.
NEW VIRTUAL CORE 1.8 FEATURE: Rotational Alignment on Core CT Sections
Virtual Core 1.8 introduces the ability to perform rotational alignment on core CT sections. Core sections can become misaligned during extraction and data acquisition. The alignment tool allows manual realignment of the individual core sections. Wellbore image logs (like FMI) can be imported and used as a reference when aligning core sections. The Digital Log Interchange Standard (DLIS) is now fully supported, and can be used to import and export data.
Whole-core CT scans are routinely performed on extracted well cores. The data produced from these scans is typically presented as static 2D images of cross sections and video scans. Images are limited to those provided by the vendor, and the raw data, if supplied, is difficult to analyze. However, the CT volume is a rich 3D dataset of compositional and textural information that can be incorporated into core description and analysis workflows.
Enthought’s proprietary Clear Core technology is used to process the raw CT data, which is notoriously difficult to analyze. Raw CT data is stored in 3 foot sections, with each section consisting of many thousands of individual slice images which are approximately .2 mm thick.Continue reading →
Today we officially release Canopy Geoscience 0.10.0, our Python-based analysis environment for geoscience data.
Canopy Geoscience integrates data I/O, visualization, and programming, in an easy-to-use environment. Canopy Geoscience is tightly integrated with Enthought Canopy’s Python distribution, giving you access to hundreds of high-performance scientific libraries to extract information from your data.
The Canopy Geoscience environment allows easy exploration of your data in 2D or 3D. The data is accessible from the embedded Python environment, and can be analyzed, modified, and immediately visualized with simple Python commands.
Feature and capability highlights for Canopy Geoscience version 0.10.0 include:
Read and write common geoscience data formats (LAS, SEG-Y, Eclipse, …)
3D and 2D visualization tools
Well log visualization
Conversion from depth to time domain is integrated in the visualization tools using flexible depth-time models
Integrated IPython shell to programmatically access and analyse the data
Integrated with the Canopy editor for scripting
Extensible with custom-made plugins to fit your personal workflow
Python has a broad range of tools for data analysis and visualization. While Excel is able to produce various types of plots, sometimes it’s either not quite good enough or it’s just preferable to use matplotlib.
Users already familiar with matplotlib will be aware that when showing a plot as part of a Python script the script stops while a plot is shown and continues once the user has closed it. When doing the same in an IPython console when a plot is shown control returns to the IPython prompt immediately, which is useful for interactive development.
Something that has been asked a couple of times is how to use matplotlib within Excel using PyXLL. As matplotlib is just a Python package like any other it can be imported and used in the same way as from any Python script. The difficulty is that when showing a plot the call to matplotlib blocks and so control isn’t returned to Excel until the user closes the window.
This blog shows how to plot data from Excel using matplotlib and PyXLL so that Excel can continue to be used while a plot window is active, and so that same window can be updated whenever the data in Excel is updated. Continue reading →
On May 28, 2014 Phillip Cloud, core contributor for the Pandas data analytics Python library, spoke at a joint meetup of the New York Quantitative Python User’s Group (NY QPUG) and the NY Finance PUG. Enthought hosted and about 60 people joined us to listen to Phillip present some of the less-well-known, but really useful features that have come out since Pandas version 0.11 and some that are coming soon. We all learned more about how to take full advantage of the Pandas Python library, and got a better sense of how excited Phillip was to discover Pandas during his graduate work.
After a fairly comprehensive overview of Pandas, Phillip got into the new features. In version 0.11 he covered: Continue reading →