Tag Archives: CSV

Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

Data exploration, manipulation, and visualization start with loading data, be it from files or from a URL. Pandas has become the go-to library for all things data analysis in Python, but if your intention is to jump straight into data exploration and manipulation, the Canopy Data Import Tool can help, instead of having to learn the details of programming with the Pandas library.

The Data Import Tool leverages the power of Pandas while providing an interactive UI, allowing you to visually explore and experiment with the DataFrame (the Pandas equivalent of a spreadsheet or a SQL table), without having to know the details of the Pandas-specific function calls and arguments. The Data Import Tool keeps track of all of the changes you make (in the form of Python code). That way, when you are done finding the right workflow for your data set, the Tool has a record of the series of actions you performed on the DataFrame, and you can apply them to future data sets for even faster data wrangling in the future.

At the same time, the Tool can help you pick up how to use the Pandas library, while still getting work done. For every action you perform in the graphical interface, the Tool generates the appropriate Pandas/Python code, allowing you to see and relate the tasks to the corresponding Pandas code.

With the Data Import Tool, loading data is as simple as choosing a file or pasting a URL. If a file is chosen, it automatically determines the format of the file, whether or not the file is compressed, and intelligently loads the contents of the file into a Pandas DataFrame. It does so while taking into account various possibilities that often throw a monkey wrench into initial data loading: that the file might contain lines that are comments, it might contain a header row, the values in different columns could be of different types e.g. DateTime or Boolean, and many more possibilities as well.

Importing files or data into Pandas with the Canopy Data Import Tool

The Data Import Tool makes loading data into a Pandas DataFrame as simple as choosing a file or pasting a URL.

A Glimpse into Loading Data into Pandas DataFrames (The Hard Way)

The following 4 “inconvenience” examples show typical problems (and the manual solutions) that might arise if you are writing Pandas code to load data, which are automatically solved by the Data Import Tool, saving you time and frustration, and allowing you to get to the important work of data analysis more quickly.

Continue reading

Using the Canopy Data Import Tool to Speed Cleaning and Transformation of Data & New Release Features

Enthought Canopy Data Import Tool

Download Canopy to try the Data Import Tool

In November 2016, we released Version 1.0.6 of the Data Import Tool (DIT), an addition to the Canopy data analysis environment. With the Data Import Tool, you can quickly import structured data files as Pandas DataFrames, clean and manipulate the data using a graphical interface, and create reusable Python scripts to speed future data wrangling.

For example, the Data Import Tool lets you delete rows and columns containing Null values or replace the Null values in the DataFrame with a specific value. It also allows you to create new columns from existing ones. All operations are logged and are reversible in the Data Import Tool so you can experiment with various workflows with safeguards against errors or forgetting steps.


What’s New in the Data Import Tool November 2016 Release

Pandas 0.19 support, re-usable templates for data munging, and more.

Over the last couple of releases, we added a number of new features and enhanced a number of existing ones. A few notable changes are:

  1. The Data Import Tool now supports the recently released Pandas version 0.19.0. With this update, the Tool now supports Pandas versions 0.16 through 0.19.
  2. The Data Import Tool now allows you to delete empty columns in the DataFrame, similar to existing option to delete empty rows.
  3. Tdelete-empty-columnshe Data Import Tool allows you to choose how to delete rows or columns containing Null values: “Any” or “All” methods are available.
  4. autosaved_scripts

    The Data Import Tool automatically generates a corresponding Python script for data manipulations performed in the GUI and saves it in your home directory re-use in future data wrangling.

    Every time you successfully import a DataFrame, the Data Import Tool automatically saves a generated Python script in your home directory. This way, you can easily review and reproduce your earlier work.

  5. The Data Import Tool generates a Template with every successful import. A Template is a file that contains all of the commands or actions you performed on the DataFrame and a unique Template file is generated for every unique data file. With this feature, when you load a data file, if a Template file exists corresponding to the data file, the Data Import Tool will automatically perform the operations you performed the last time. This way, you can save progress on a data file and resume your work.

Along with the feature additions discussed above, based on continued user feedback, we implemented a number of UI/UX improvements and bug fixes in this release. For a complete list of changes introduced in Version 1.0.6 of the Data Import Tool, please refer to the Release Notes page in the Tool’s documentation.

 

 


Example Use Case: Using the Data Import Tool to Speed Data Cleaning and Transformation

Now let’s take a look at how the Data Import Tool can be used to speed up the process of cleaning up and transforming data sets. As an example data set, let’s take a look at the Employee Compensation data from the city of San Francisco.

NOTE: You can follow the example step-by-step by downloading Canopy and starting a free 7 day trial of the data import tool

Step 1: Load data into the Data Import Tool

import-data-canopy-menuFirst we’ll download the data as a .csv file from the San Francisco Government data website, then open it from File -> Import Data -> From File… menu item in the Canopy Editor (see screenshot at right).

After loading the file, you should see the DataFrame below in the Data Import Tool:
data-frame-view
Continue reading