Category Archives: Pandas

Webinar: An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

pandas-mastery-workshop-webinar-email-header-900x300comp

Enthought’s Pandas Mastery Workshop is designed to accelerate the development of skill and confidence with Python’s Pandas data analysis package — in just three days, you’ll look like an old pro! This course was created ground up by our training experts based on insights from the science of human learning, as well as what we’ve learned from over a decade of extensive practical experience of teaching thousands of scientists, engineers, and analysts to use Python effectively in their everyday work.

In this webinar, we’ll give you the key information and insight you need to evaluate whether the Pandas Mastery Workshop is the right solution to advance your data analysis skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous workshop attendees say about the course

Date and Registration Info:
January 26, 2017, 11-11:45 AM CT
Register (if you can’t attend, register and we’ll be happy to send you a recording of the session)

Register


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Why Focus on Pandas:

Python has been identified as the most popular coding language for five years in a row. One reason for its popularity, especially among data analysts, data scientists, engineers, and scientists across diverse industries, is its extensive library of powerful tools for data manipulation, analysis, and modeling. For anyone working with tabular data (perhaps currently using a tool like Excel, R, or SAS), Pandas is the go-to tool in Python that not only makes the bulk of your work easier and more intuitive, but also provides seamless access to more specialized packages like statsmodels (statistics), scikit-learn (machine learning), and matplotlib (data visualization). Anyone looking for an entry point into the general scientific and analytic Python ecosystem should start with Pandas!

Who Should Attend: 

Whether you’re a training or learning development coordinator who wants to learn more about our training options and approach, a member of a technical team considering group training, or an individual looking for engaging and effective Pandas training, this webinar will help you quickly evaluate how the Pandas Mastery Workshop can meet your needs.


Additional Resources

Upcoming Open Pandas Mastery Workshop Sessions:

London, UK, Feb 22-24
Chicago, IL, Mar 8-10
Albuquerque, NM, Apr 3-5
Washington, DC May 10-12
Los Alamos, NM, May 22-24
New York City, NY, Jun 7-9

Learn More

Have a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.

Download Enthought’s Pandas Cheat Sheets

Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

Data exploration, manipulation, and visualization start with loading data, be it from files or from a URL. Pandas has become the go-to library for all things data analysis in Python, but if your intention is to jump straight into data exploration and manipulation, the Canopy Data Import Tool can help, instead of having to learn the details of programming with the Pandas library.

The Data Import Tool leverages the power of Pandas while providing an interactive UI, allowing you to visually explore and experiment with the DataFrame (the Pandas equivalent of a spreadsheet or a SQL table), without having to know the details of the Pandas-specific function calls and arguments. The Data Import Tool keeps track of all of the changes you make (in the form of Python code). That way, when you are done finding the right workflow for your data set, the Tool has a record of the series of actions you performed on the DataFrame, and you can apply them to future data sets for even faster data wrangling in the future.

At the same time, the Tool can help you pick up how to use the Pandas library, while still getting work done. For every action you perform in the graphical interface, the Tool generates the appropriate Pandas/Python code, allowing you to see and relate the tasks to the corresponding Pandas code.

With the Data Import Tool, loading data is as simple as choosing a file or pasting a URL. If a file is chosen, it automatically determines the format of the file, whether or not the file is compressed, and intelligently loads the contents of the file into a Pandas DataFrame. It does so while taking into account various possibilities that often throw a monkey wrench into initial data loading: that the file might contain lines that are comments, it might contain a header row, the values in different columns could be of different types e.g. DateTime or Boolean, and many more possibilities as well.

Importing files or data into Pandas with the Canopy Data Import Tool

The Data Import Tool makes loading data into a Pandas DataFrame as simple as choosing a file or pasting a URL.

A Glimpse into Loading Data into Pandas DataFrames (The Hard Way)

The following 4 “inconvenience” examples show typical problems (and the manual solutions) that might arise if you are writing Pandas code to load data, which are automatically solved by the Data Import Tool, saving you time and frustration, and allowing you to get to the important work of data analysis more quickly.

Continue reading

Using the Canopy Data Import Tool to Speed Cleaning and Transformation of Data & New Release Features

Enthought Canopy Data Import Tool

Download Canopy to try the Data Import Tool

In November 2016, we released Version 1.0.6 of the Data Import Tool (DIT), an addition to the Canopy data analysis environment. With the Data Import Tool, you can quickly import structured data files as Pandas DataFrames, clean and manipulate the data using a graphical interface, and create reusable Python scripts to speed future data wrangling.

For example, the Data Import Tool lets you delete rows and columns containing Null values or replace the Null values in the DataFrame with a specific value. It also allows you to create new columns from existing ones. All operations are logged and are reversible in the Data Import Tool so you can experiment with various workflows with safeguards against errors or forgetting steps.


What’s New in the Data Import Tool November 2016 Release

Pandas 0.19 support, re-usable templates for data munging, and more.

Over the last couple of releases, we added a number of new features and enhanced a number of existing ones. A few notable changes are:

  1. The Data Import Tool now supports the recently released Pandas version 0.19.0. With this update, the Tool now supports Pandas versions 0.16 through 0.19.
  2. The Data Import Tool now allows you to delete empty columns in the DataFrame, similar to existing option to delete empty rows.
  3. Tdelete-empty-columnshe Data Import Tool allows you to choose how to delete rows or columns containing Null values: “Any” or “All” methods are available.
  4. autosaved_scripts

    The Data Import Tool automatically generates a corresponding Python script for data manipulations performed in the GUI and saves it in your home directory re-use in future data wrangling.

    Every time you successfully import a DataFrame, the Data Import Tool automatically saves a generated Python script in your home directory. This way, you can easily review and reproduce your earlier work.

  5. The Data Import Tool generates a Template with every successful import. A Template is a file that contains all of the commands or actions you performed on the DataFrame and a unique Template file is generated for every unique data file. With this feature, when you load a data file, if a Template file exists corresponding to the data file, the Data Import Tool will automatically perform the operations you performed the last time. This way, you can save progress on a data file and resume your work.

Along with the feature additions discussed above, based on continued user feedback, we implemented a number of UI/UX improvements and bug fixes in this release. For a complete list of changes introduced in Version 1.0.6 of the Data Import Tool, please refer to the Release Notes page in the Tool’s documentation.

 

 


Example Use Case: Using the Data Import Tool to Speed Data Cleaning and Transformation

Now let’s take a look at how the Data Import Tool can be used to speed up the process of cleaning up and transforming data sets. As an example data set, let’s take a look at the Employee Compensation data from the city of San Francisco.

NOTE: You can follow the example step-by-step by downloading Canopy and starting a free 7 day trial of the data import tool

Step 1: Load data into the Data Import Tool

import-data-canopy-menuFirst we’ll download the data as a .csv file from the San Francisco Government data website, then open it from File -> Import Data -> From File… menu item in the Canopy Editor (see screenshot at right).

After loading the file, you should see the DataFrame below in the Data Import Tool:
data-frame-view
Continue reading

Webinar: Fast Forward Through the “Dirty Work” of Data Analysis: New Python Data Import and Manipulation Tool Makes Short Work of Data Munging Drudgery

Python Import & Manipulation Tool Intro Webinar

Whether you are a data scientist, quantitative analyst, or an engineer, or if you are evaluating consumer purchase behavior, stock portfolios, or design simulation results, your data analysis workflow probably looks a lot like this:

Acquire > Wrangle > Analyze and Model > Share and Refine > Publish

The problem is that often 50 to 80 percent of time is spent wading through the tedium of the first two stepsacquiring and wrangling data – before even getting to the real work of analysis and insight. (See The New York Times, For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights)

WHAT YOU’LL LEARN:

Enthought Canopy Data Import Tool

Try the Data Import Tool with your own data. Download here.

In this webinar we’ll demonstrate how the new Canopy Data Import Tool can significantly reduce the time you spend on data analysis “dirty work,” by helping you:

  • Load various data file types and URLs containing embedded tables into Pandas DataFrames
  • Perform common data munging tasks that improve raw data
  • Handle complicated and/or messy data
  • Extend the work done with the tool to other data files

WEBINAR RECORDING:
Continue reading

Just Released: PyXLL v 3.0 (Python in Excel). New Real Time Data Stream Capabilities, Excel Ribbon Integration, and More.

Download a free 30 day trial of PyXLL and try it with your own data.

Since PyXLL was first released back in 2010 it has grown hugely in popularity and is used by businesses in many different sectors.

The original motivation for PyXLL was to be able to use all the best bits of Excel combined with a modern programming language for scientific computing, in a way that fits naturally and works seamlessly.

Since the beginning, PyXLL development focused on the things that really matter for creating useful real-world spreadsheets; worksheet functions and macro functions. Without these all you can do is just drive Excel by poking numbers in and reading numbers out. At the time the first version of PyXLL was released, that was already possibly using COM, and so providing yet another API to do the same was seen as little value add. On the other hand, being able to write functions and macros in Python opens up possibilities that previously were only available in VBA or writing complicated Excel Addins in C++ or C#.

With the release of PyXLL 3, integrating your Python code into Excel has become more enjoyable than ever. Many things have been simplified to get you up and running faster, and there are some major new features to explore.

  • If you are new to PyXLL have a look at the Getting Started section of the documentation.
  • All the features of PyXLL, including these new ones, can be found in the Documentation

NEW FEATURES IN PYXLL V. 3.0

1. Ribbon Customization

Screen Shot 2016-02-29 at 15.57.12

Ever wanted to write an add-in that uses the Excel ribbon interface? Previously the only way to do this was to write a COM add-in, which requires a lot of knowledge, skill and perseverance! Now you can do it with PyXLL by defining your ribbon as an XML document and adding it to your PyXLL config. All the callbacks between Excel and your Python code are handled for you.

Continue reading

Plotting in Excel with PyXLL and Matplotlib

Author: Tony Roberts, creator of PyXLL, a Python library that makes it possible to write add-ins for Microsoft Excel in Python. Download a FREE 30 day trial of PyXLL here.


Plotting in Excel with PyXLL and MatplotlibPython has a broad range of tools for data analysis and visualization. While Excel is able to produce various types of plots, sometimes it’s either not quite good enough or it’s just preferable to use matplotlib.

Users already familiar with matplotlib will be aware that when showing a plot as part of a Python script the script stops while a plot is shown and continues once the user has closed it. When doing the same in an IPython console when a plot is shown control returns to the IPython prompt immediately, which is useful for interactive development.

Something that has been asked a couple of times is how to use matplotlib within Excel using PyXLL. As matplotlib is just a Python package like any other it can be imported and used in the same way as from any Python script. The difficulty is that when showing a plot the call to matplotlib blocks and so control isn’t returned to Excel until the user closes the window.

This blog shows how to plot data from Excel using matplotlib and PyXLL so that Excel can continue to be used while a plot window is active, and so that same window can be updated whenever the data in Excel is updated. Continue reading