Canopy Data Import Tool: New Updates

In May of 2016 we released the Canopy Data Import Tool, a significant new feature of our Canopy graphical analysis environment software. With the Data Import Tool, users can now quickly and easily import CSVs and other structured text files into Pandas DataFrames through a graphical interface, manipulate the data, and create reusable Python scripts to speed future data wrangling.

Watch a 2-minute demo video to see how the Canopy Data Import Tool works:

With the latest version of the Data Import Tool released this month (v. 1.0.4), we’ve added new capabilities and enhancements, including:

  1. The ability to select and import a specific table from among multiple tables on a webpage,
  2. Intelligent alerts regarding the saved state of exported Python code, and
  3. Unlimited file sizes supported for import.

Download Canopy and start a free 7 day trial of the data import tool

New: Choosing from multiple tables on a webpage

Example of page with multiple tables for selection

The latest release of the Canopy Data Import Tool supports the selection of a specific table from a webpage for import, such as this Wikipedia page

In addition to CSVs and structured text files, the Canopy Data Import Tool (the Tool) provides the ability to load tables from a webpage. If the webpage contains multiple tables, by default the Tool loads the first table.

With this release, we provide the user with the ability to choose from multiple tables to import using a scrollable index parameter to select the table of interest for import.

Example: loading and working with tables from a Wikipedia page

For example, let’s try to load a table from the Demography of the UK wiki page using the Tool. In total, there are 10 tables on that wiki page.

  • As you can see in the screenshot below, the Tool initially loads the first table on the wiki page.
  • However, we are interested in loading the table ‘Vital statistics since 1960’, which is the fifth table on the page. (Note that indexing starts at 0). For a quick history lesson on why Python uses zero based indexing, see Guido van Rossum’s explanation here).
  • After the initial read-in, we can click on the ‘Table index on page’ scroll bar, choose ‘4’ and click on ‘Refresh Data’ to load the table of interest in the Data Import Tool.

See how the Canopy Data Import Tool loads a table from a webpage and prepares the data for manipulation and interaction:

The Data Import Tool allows you to select a specific table from a webpage where multiple are present, with a simple drop down menu. Once you’ve selected your table, you can readily toggle between 3 views: the Pandas DataFrame generated by the Tool, the raw data and the corresponding auto-generated Python code. Consecutively, you can export the DataFrame to the IPython console for further plotting and further analysis.

  • Further, as you can see, the first row contains column names and the first column looks like an index for the Data Frame. Therefore, you can select the ‘First row is column names’ checkbox and again click on ‘Refresh Data’ to prompt the Tool to re-read the table but, this time, use the data in the first row as column names. Then, we can right-click on the first column and select the ‘Set as Index’ option to make column 0 the index of the DataFrame.
  • You can toggle between the DataFrame, Raw Data and Python Code tabs in the Tool, to peek at the raw data being loaded by the Tool and the corresponding Python code auto-generated by the Tool.
  • Finally, you can click on the ‘Use DataFrame’ button, in the bottom right, to send the DataFrame to the IPython kernel in the Canopy User Environment, for plotting and further analysis.

New: Keeping track of exported Python scripts

The Tool generates Python commands for all operations performed by the user and provides the user with the ability to save the generated Python script. With this new update, the Tool keeps track of the saved and current states of the generated Python script and intelligently alerts the user if he/she clicks on theUse DataFrame’ button without saving changes in the Python script.

New: Unlimited file sizes supported for import

In the initial release, we chose to limit the file sizes that can be imported using the Tool to 70 MB, to ensure optimal performance. With this release, we removed that restriction and allow files of any size to be uploaded with the tool. For files over 70 MB we now provide the user with a warning that interaction, manipulation and operations on the imported Data Frame might be slower than normal, and allow them to select whether to continue or begin with a smaller subset of data to develop a script to be applied to the larger data set.

Additions and Fixes

Along with the feature additions discussed above, based on continued user feedback, we implemented a number of UI/UX improvements and bug fixes in this release. For a complete list of changes introduced in version 1.0.4 of the Data Import Tool, please refer to the Release Notes page in the Tool’s documentation. If you have any feedback regarding the Data Import Tool, we’d love to hear from you at canopy.support@enthought.com.

Additional resources:

Download Canopy and start a free 7 day trial of the data import tool

See the Webinar “Fast Forward Through Data Analysis Dirty Work” for examples of how the Canopy Data Import Tool accelerates data munging:

 

Webinar: Introducing the NEW Python Integration Toolkit for LabVIEW

See a recording of the webinar:

LabVIEW is a software platform made by National Instruments, used widely in industries such as semiconductors, telecommunications, aerospace, manufacturing, electronics, and automotive for test and measurement applications. In August 2016, Enthought released the Python Integration Toolkit for LabVIEW, which is a “bridge” between the LabVIEW and Python environments.

In this webinar, we’ll demonstrate:

  1. How the new Python Integration Toolkit for LabVIEW from Enthought seamlessly brings the power of the Python ecosystem of scientific and engineering tools to LabVIEW
  2. Examples of how you can extend LabVIEW with Python, including using Python for signal and image processing, cloud computing, web dashboards, machine learning, and more

Continue reading

5 Simple Steps to Create a Real-Time Twitter Feed in Excel using Python and PyXLL

PyXLL 3.0 introduced a new, simpler, way of streaming real time data to Excel from Python.

Excel has had support for real time data (RTD) for a long time, but it requires a certain knowledge of COM to get it to work. With the new RTD features in PyXLL 3.0 it is now a lot simpler to get streaming data into Excel without having to write any COM code.

This blog will show how to build a simple real time data feed from Twitter in Python using the tweepy package, and then show how to stream that data into Excel using PyXLL.

(Note: The code from this blog is available on github https://github.com/pyxll/pyxll-examples/tree/master/twitter). 

Create a real-time Twitter data feed using Python and PyXLL.

Create a real-time Twitter feed in Excel using Python and PyXLL.

Continue reading

AAPG 2016 Conference Technical Presentation: Unlocking Whole Core CT Data for Advanced Description and Analysis

Microscale Imaging for Unconventional Plays Track Technical Presentation:

Unlocking Whole Core CT Data for Advanced Description and Analysis

Brendon Hall, Geoscience Applications Engineer, EnthoughtAmerican Association of Petroleum Geophysicists (AAPG)
2016 Annual Convention and Exposition Technical Presentation
Tuesday June 21st at 4:15 PM, Hall B, Room 2, BMO Centre, Calgary

Presented by: Brendon Hall, Geoscience Applications Engineer, Enthought, and Andrew Govert, Geologist, Cimarex Energy

PRESENTATION ABSTRACT:

It has become an industry standard for whole-core X-ray computed tomography (CT) scans to be collected over cored intervals. The resulting data is typically presented as static 2D images, video scans, and as 1D density curves.

CT scan of core pre- and post-processing

CT scans of cores before and after processing to remove artifacts and normalize features.

However, the CT volume is a rich data set of compositional and textural information that can be incorporated into core description and analysis workflows. In order to access this information the raw CT data initially has to be processed to remove artifacts such as the aluminum tubing, wax casing and mud filtrate. CT scanning effects such as beam hardening are also accounted for. The resulting data is combined into contiguous volume of CT intensity values which can be directly calibrated to plug bulk density.

Continue reading

Webinar: Fast Forward Through the “Dirty Work” of Data Analysis: New Python Data Import and Manipulation Tool Makes Short Work of Data Munging Drudgery

Python Import & Manipulation Tool Intro Webinar

Whether you are a data scientist, quantitative analyst, or an engineer, or if you are evaluating consumer purchase behavior, stock portfolios, or design simulation results, your data analysis workflow probably looks a lot like this:

Acquire > Wrangle > Analyze and Model > Share and Refine > Publish

The problem is that often 50 to 80 percent of time is spent wading through the tedium of the first two stepsacquiring and wrangling data – before even getting to the real work of analysis and insight. (See The New York Times, For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights)

WHAT YOU’LL LEARN:

Enthought Canopy Data Import Tool

Try the Data Import Tool with your own data. Download here.

In this webinar we’ll demonstrate how the new Canopy Data Import Tool can significantly reduce the time you spend on data analysis “dirty work,” by helping you:

  • Load various data file types and URLs containing embedded tables into Pandas DataFrames
  • Perform common data munging tasks that improve raw data
  • Handle complicated and/or messy data
  • Extend the work done with the tool to other data files

WEBINAR RECORDING:
Continue reading

Just Released: PyXLL v 3.0 (Python in Excel). New Real Time Data Stream Capabilities, Excel Ribbon Integration, and More.

Download a free 30 day trial of PyXLL and try it with your own data.

Since PyXLL was first released back in 2010 it has grown hugely in popularity and is used by businesses in many different sectors.

The original motivation for PyXLL was to be able to use all the best bits of Excel combined with a modern programming language for scientific computing, in a way that fits naturally and works seamlessly.

Since the beginning, PyXLL development focused on the things that really matter for creating useful real-world spreadsheets; worksheet functions and macro functions. Without these all you can do is just drive Excel by poking numbers in and reading numbers out. At the time the first version of PyXLL was released, that was already possibly using COM, and so providing yet another API to do the same was seen as little value add. On the other hand, being able to write functions and macros in Python opens up possibilities that previously were only available in VBA or writing complicated Excel Addins in C++ or C#.

With the release of PyXLL 3, integrating your Python code into Excel has become more enjoyable than ever. Many things have been simplified to get you up and running faster, and there are some major new features to explore.

  • If you are new to PyXLL have a look at the Getting Started section of the documentation.
  • All the features of PyXLL, including these new ones, can be found in the Documentation

NEW FEATURES IN PYXLL V. 3.0

1. Ribbon Customization

Screen Shot 2016-02-29 at 15.57.12

Ever wanted to write an add-in that uses the Excel ribbon interface? Previously the only way to do this was to write a COM add-in, which requires a lot of knowledge, skill and perseverance! Now you can do it with PyXLL by defining your ribbon as an XML document and adding it to your PyXLL config. All the callbacks between Excel and your Python code are handled for you.

Continue reading

The Latest Features in Virtual Core: CT Scan, Photo, and Well Log Co-visualization

Enthought is pleased to announce Virtual Core 1.8.  Virtual Core automates aspects of core description for geologists, drastically reducing the time and effort required for core description, and its unified visualization interface displays cleansed whole-core CT data alongside core photographs and well logs.  It provides tools for geoscientists to analyze core data and extract features from sub-millimeter scale to the entire core.

NEW VIRTUAL CORE 1.8 FEATURE: Rotational Alignment on Core CT Sections

Virtual Core 1.8 introduces the ability to perform rotational alignment on core CT sections.  Core sections can become misaligned during extraction and data acquisition.   The alignment tool allows manual realignment of the individual core sections.  Wellbore image logs (like FMI) can be imported and used as a reference when aligning core sections.  The Digital Log Interchange Standard (DLIS) is now fully supported, and can be used to import and export data.

Whole-core CT scans are routinely performed on extracted well cores.  The data produced from these scans is typically presented as static 2D images of cross sections and video scans.  Images are limited to those provided by the vendor, and the raw data, if supplied, is difficult to analyze. However, the CT volume is a rich 3D dataset of compositional and textural information that can be incorporated into core description and analysis workflows.

Enthought’s proprietary Clear Core technology is used to process the raw CT data, which is notoriously difficult to analyze.  Raw CT data is stored in 3 foot sections, with each section consisting of many thousands of individual slice images which are approximately .2 mm thick. Continue reading

Canopy Geoscience: Python-Based Analysis Environment for Geoscience Data

Today we officially release Canopy Geoscience 0.10.0, our Python-based analysis environment for geoscience data.

Canopy Geoscience integrates data I/O, visualization, and programming, in an easy-to-use environment. Canopy Geoscience is tightly integrated with Enthought Canopy’s Python distribution, giving you access to hundreds of high-performance scientific libraries to extract information from your data.


The Canopy Geoscience environment allows easy exploration of your data in 2D or 3D. The data is accessible from the embedded Python environment, and can be analyzed, modified, and immediately visualized with simple Python commands.

Feature and capability highlights for Canopy Geoscience version 0.10.0 include:

  • Read and write common geoscience data formats (LAS, SEG-Y, Eclipse, …)
  • 3D and 2D visualization tools
  • Well log visualization
  • Conversion from depth to time domain is integrated in the visualization tools using flexible depth-time models
  • Integrated IPython shell to programmatically access and analyse the data
  • Integrated with the Canopy editor for scripting
  • Extensible with custom-made plugins to fit your personal workflow

Contact us to learn more about Canopy Geoscience! Continue reading