Tag Archives: Pandas

Webinar: Python for Data Science: A Tour of Enthought’s Professional Training Course

DataView Python for Data Science Webinar
What: A guided walkthrough and Q&A about Enthought’s technical training course “Python for Data Science and Machine Learning” with VP of Training Solutions, Dr. Michael Connell

Who Should Watch: individuals, team leaders, and learning & development coordinators who are looking to better understand the options to increase professional capabilities in Python for data science and machine learning applications

VIEW


Enthought’s Python for Data Science training course is designed to accelerate the development of skill and confidence in using Python’s core data science tools — including the standard Python language, the fast array programming package NumPy, and the Pandas data analysis package, as well as tools for database access (DBAPI2, SQLAlchemy), machine learning (scikit-learn), and visual exploration (Matplotlib, Seaborn).

In this webinar, we give you the key information and insight you need to evaluate whether Enthought’s Python for Data Science course is the right solution to advance your professional data science skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous course attendees say about the course

VIEW


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Considering Moving to Python for Data Science?

Then Enthought’s Python for Data Science training course is definitely for you! This class has been particularly appealing to people who have been using other tools like R or SAS (or even Excel) for their data science work and want to start applying their analytic skills using the Python toolset.  And it’s no wonder — Python has been identified as the most popular coding language for five years in a row for good reason.

One reason for Python’s broad popularity across a range of disciplines is its efficiency and ease-of-use. Many people consider Python more fun to work in than other languages (and we agree!). Another reason for its popularity among data analysts and data scientists in particular is Python’s extensive (and growing) open source library of powerful tools for preparing, visualizing, analyzing, and modeling data.

Python is also an extraordinarily comprehensive toolset – it supports everything from interactive analysis to automation to software engineering to web app development within a single language and plays very well with other languages like C/C++ or FORTRAN so you can continue leveraging your existing code libraries written in those other languages.

Many organizations are moving to Python so they can consolidate all of their technical work streams under a single comprehensive toolset. In the first part of this class we’ll give you the fundamentals you need to switch from another language to Python and then we cover the core tools that will enable you to do in Python what you were doing with other tools, only faster!

Additional Resources

Upcoming Open Python for Data Science Sessions:
Austin, TX, June 12-16, 2017
San Jose, CA, July 17-21, 2017Learn MoreHave a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.
Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat Sheets
Enthought's Machine Learning with Python Cheat Sheets
Download Enthought’s Pandas Cheat SheetsEnthought's Pandas Cheat Sheets

Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

This is the second blog in a series. See the first blog here: Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

No dataset is perfect and most datasets that we have to deal with on a day-to-day basis have values missing, often represented by “NA” or “NaN”. One of the reasons why the Pandas library is as popular as it is in the data science community is because of its capabilities in handling data that contains NaN values.

But spending time looking up the relevant Pandas commands might be cumbersome when you are exploring raw data or prototyping your data analysis pipeline. This is one of the places where the Canopy Data Import Tool helps make data munging faster and easier, by simplifying the task of identifying missing values in your raw data and removing/replacing them.

Why are missing values a problem you ask? We can answer that question in the context of machine learning. scikit-learn and TensorFlow are popular and widely used libraries for machine learning in Python. Both of them caution the user about missing values in their datasets. Various machine learning algorithms expect all the input values to be numerical and to hold meaning. Both of the libraries suggest removing rows and/or columns that contain missing values.

If removing the missing values is not an option, given the size of your dataset, then they suggest replacing the missing values. The scikit-learn library provides an Imputer class, which can be used to replace missing values. See the sci-kit learn documentation for an example of how the Imputer class is used. Similarly, the decode_csv function in the TensorFlow library can be passed a record_defaults argument, which will replace missing values in the dataset. See the TensorFlow documentation for specifics.

The Data Import Tool provides capabilities to handle missing values in your dataset because we strongly believe that discovering and handling missing values in your dataset is a part of the data import and cleaning phase and not the analysis phase of the data science process.

Digging into the specifics, here we’ll compare how you can go about handling missing values with three typical scenarios, first using the Pandas library, then contrasting with the Data Import Tool:

  1. Identifying missing values in data
  2. Replacing missing values in data, and
  3. Removing missing values from data.

Note : Pandas’ internal representation of your data is called a DataFrame. A DataFrame is simply a tabular data structure, similar to a spreadsheet or a SQL table.

Continue reading

Webinar: An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

See the webinar

Enthought’s Pandas Mastery Workshop is designed to accelerate the development of skill and confidence with Python’s Pandas data analysis package — in just three days, you’ll look like an old pro! This course was created ground up by our training experts based on insights from the science of human learning, as well as what we’ve learned from over a decade of extensive practical experience of teaching thousands of scientists, engineers, and analysts to use Python effectively in their everyday work.

In this webinar, we’ll give you the key information and insight you need to evaluate whether the Pandas Mastery Workshop is the right solution to advance your data analysis skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous workshop attendees say about the course

See the Webinar


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Continue reading

Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

This is the first blog in a series. See the second blog here: Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

Data exploration, manipulation, and visualization start with loading data, be it from files or from a URL. Pandas has become the go-to library for all things data analysis in Python, but if your intention is to jump straight into data exploration and manipulation, the Canopy Data Import Tool can help, instead of having to learn the details of programming with the Pandas library.

The Data Import Tool leverages the power of Pandas while providing an interactive UI, allowing you to visually explore and experiment with the DataFrame (the Pandas equivalent of a spreadsheet or a SQL table), without having to know the details of the Pandas-specific function calls and arguments. The Data Import Tool keeps track of all of the changes you make (in the form of Python code). That way, when you are done finding the right workflow for your data set, the Tool has a record of the series of actions you performed on the DataFrame, and you can apply them to future data sets for even faster data wrangling in the future.

At the same time, the Tool can help you pick up how to use the Pandas library, while still getting work done. For every action you perform in the graphical interface, the Tool generates the appropriate Pandas/Python code, allowing you to see and relate the tasks to the corresponding Pandas code.

With the Data Import Tool, loading data is as simple as choosing a file or pasting a URL. If a file is chosen, it automatically determines the format of the file, whether or not the file is compressed, and intelligently loads the contents of the file into a Pandas DataFrame. It does so while taking into account various possibilities that often throw a monkey wrench into initial data loading: that the file might contain lines that are comments, it might contain a header row, the values in different columns could be of different types e.g. DateTime or Boolean, and many more possibilities as well.

Importing files or data into Pandas with the Canopy Data Import Tool

The Data Import Tool makes loading data into a Pandas DataFrame as simple as choosing a file or pasting a URL.

A Glimpse into Loading Data into Pandas DataFrames (The Hard Way)

The following 4 “inconvenience” examples show typical problems (and the manual solutions) that might arise if you are writing Pandas code to load data, which are automatically solved by the Data Import Tool, saving you time and frustration, and allowing you to get to the important work of data analysis more quickly.

Continue reading

Webinar: Fast Forward Through the “Dirty Work” of Data Analysis: New Python Data Import and Manipulation Tool Makes Short Work of Data Munging Drudgery

Python Import & Manipulation Tool Intro Webinar

Whether you are a data scientist, quantitative analyst, or an engineer, or if you are evaluating consumer purchase behavior, stock portfolios, or design simulation results, your data analysis workflow probably looks a lot like this:

Acquire > Wrangle > Analyze and Model > Share and Refine > Publish

The problem is that often 50 to 80 percent of time is spent wading through the tedium of the first two stepsacquiring and wrangling data – before even getting to the real work of analysis and insight. (See The New York Times, For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights)

WHAT YOU’LL LEARN:

Enthought Canopy Data Import Tool

Try the Data Import Tool with your own data. Download here.

In this webinar we’ll demonstrate how the new Canopy Data Import Tool can significantly reduce the time you spend on data analysis “dirty work,” by helping you:

  • Load various data file types and URLs containing embedded tables into Pandas DataFrames
  • Perform common data munging tasks that improve raw data
  • Handle complicated and/or messy data
  • Extend the work done with the tool to other data files

WEBINAR RECORDING:
Continue reading