Author Archives: admin

Cheat Sheets: Pandas, the Python Data Analysis Library

Download all 8 Pandas Cheat Sheets

Learn more about the Python for Data Analysis and Pandas Mastery Workshop training courses

Pandas (the Python Data Analysis library) provides a powerful and comprehensive toolset for working with data. Fundamentally, Pandas provides a data structure, the DataFrame, that closely matches real world data, such as experimental results, SQL tables, and Excel spreadsheets, that no other mainstream Python package provides. In addition to that, it includes tools for reading and writing diverse files, data cleaning and reshaping, analysis and modeling, and visualization. Using Pandas effectively can give you super powers, regardless of whether you’re working in data science, finance, neuroscience, economics, advertising, web analytics, statistics, social science, or engineering.

However, learning Pandas can be a daunting task because the API is so rich and large. This is why we created a set of cheat sheets built around the data analysis workflow illustrated below. Each cheat sheet focuses on a given task. It shows you the 20% of functions you will be using 80% of the time, accompanied by simple and clear illustrations of the different concepts. Use them to speed up your learning, or as a quick reference to refresh your mind.

Here’s the summary of the content of each cheat sheet:

  1. Reading and Writing Data with Pandas: This cheat sheet presents common usage patterns when reading data from text files with read_table, from Excel documents with read_excel, from databases with read_sql, or when scraping web pages with read_html. It also introduces how to write data to disk as text files, into an HDF5 file, or into a database.
  2. Pandas Data Structures: Series and DataFrames: It presents the two main data structures, the DataFrame, and the Series. It explain how to think about them in terms of common Python data structure and how to create them. It gives guidelines about how to select subsets of rows and columns, with clear explanations of the difference between label-based indexing, with .loc, and position-based indexing, with .iloc.
  3. Plotting with Series and DataFrames: This cheat sheet presents some of the most common kinds of plots together with their arguments. It also explains the relationship between Pandas and matplotlib and how to use them effectively. It highlights the similarities and difference of plotting data stored in Series or DataFrames.
  4. Computation with Series and DataFrames: This one codifies the behavior of DataFrames and Series as following 3 rules: alignment first, element-by-element mathematical operations, and column-based reduction operations. It covers the built-in methods for most common statistical operations, such as mean or sum. It also covers how missing values are handled by Pandas.
  5. Manipulating Dates and Times Using Pandas: The first part of this cheatsheet describes how to create and manipulate time series data, one of Pandas’ most celebrated features. Having a Series or DataFrame with a Datetime index allows for easy time-based indexing and slicing, as well as for powerful resampling and data alignment. The second part covers “vectorized” string operations, which is the ability to apply string transformations on each element of a column, while automatically excluding missing values.
  6. Combining Pandas DataFrames: The sixth cheat sheet presents the tools for combining Series and DataFrames together, with SQL-type joins and concatenation. It then goes on to explain how to clean data with missing values, using different strategies to locate, remove, or replace them.
  7. Split/Apply/Combine with DataFrames: “Group by” operations involve splitting the data based on some criteria, applying a function to each group to aggregate, transform, or filter them and then combining the results. It’s an incredibly powerful and expressive tool. The cheat sheet also highlights the similarity between “group by” operations and window functions, such as resample, rolling and ewm (exponentially weighted functions).
  8. Reshaping Pandas DataFrames and Pivot Tables: The last cheatsheet introduces the concept of “tidy data”, where each observation, or sample, is a row, and each variable is a column. Tidy data is the optimal layout when working with Pandas. It illustrates various tools, such as stack, unstack, melt, and pivot_table, to reshape data into a tidy form or to a “wide” form.

Download all 8 Pandas Cheat Sheets

Data Analysis Workflow

Ready to accelerate your skills with Pandas?

Enthought’s Pandas Mastery Workshop (for experienced Python users) and Python for Data Analysis (for those newer to Python) classes are ideal for those who work heavily with data. Contact us to learn more about onsite corporate or open class sessions.

 

Webinar: Machine Learning Mastery Workshop: An Exclusive Peek “Under the Hood” of Enthought Training

What: A guided walkthrough and live Q&A about Enthought’s new “Machine Learning Mastery Workshop” training course.

Who Should Watch: If predictive modeling and analytics would be valuable in your work, come to the webinar to find out what all the fuss is about and what there is to know. Whether you are looking to get started with machine learning, interested in refining your machine learning skills, or want to transfer your skills from another toolset to Python, come to the webinar to find out if Enthought’s highly interactive, expertly taught Machine Learning Mastery Workshop might be a good fit for accelerating your development!

View


Why Has Machine Learning Become So Popular?

Artificial Intelligence and Machine Learning are a defining feature of the 21st century and are quickly becoming a key factor in gaining and maintaining competitive advantage in each industry which incorporates them. Why is machine learning so beneficial?  Because it provides a fast and flexible way to build models that can surface signal, find patterns, and predict future behavior.  These powerful models are used for:

  • Forecasting supply chain availability
  • Clustering product defects for QA
  • Anticipating movements in financial markets
  • Predicting chemical tolerances
  • Optimizing the placement of advertisements
  • Managing process engineering
  • Modeling reservoir production
  • and much more.

In response to growing demand for Machine Learning expertise, Enthought has developed an intensive 3-day guided practicum to bring you up to speed quickly on key concepts and skills in this exciting realm. Join us in this webinar for an in-depth overview of Enthought’s Machine Learning Mastery Workshop — a training course designed to accelerate the development of intuition, skill, and confidence in applying machine learning methods to solve real-world problems.

In the webinar we’ll describe how Enthought’s training course combines conceptual knowledge of machine learning models with intensive experience applying them to real-world data to develop skill in applying Python’s machine learning tools, such as the scikit-learn package, to make predictions about complicated phenomena by leveraging the information contained in numerical data, natural language, 2D images, and discrete categories.

The hands-on, interactive course was created ground up by our training experts to enable you to develop transferable skills in Machine Learning that you can apply back at work the next day.

In this webinar, we’ll give you the key information and insight you need to quickly evaluate whether Enthought’s Machine Learning Mastery Workshop course is the right solution for you to build skills in using Python for advanced analytics, including:

  • Who will benefit most from the course, and what pre-requisite knowledge is required
  • What topics the course covers – a guided tour
  • What new knowledge, skills, and capabilities you’ll take away, and how the course design supports those outcomes
  • What the (highly interactive) learning experience is like
  • Why this course is different from other training alternatives (with a preview of actual course materials!)
  • What previous workshop attendees say about our courses

View


Presenter: Dr. Dillon Niederhut,

Enthought Training Instructor

Ph.D., University of California at Berkeley

 


 

Additional Resources

Upcoming Open Machine Learning Mastery Workshop Sessions:

Austin, TX, Feb. 21-23, 2017
Houston, TX, Apr. 18-20, 2018
Cambridge, UK, May 9-11, 2018

Upcoming Open Python for Data Science Sessions:

New York City, NY, Dec. 4-8, 2018
London, UK, Feb. 19-23, 2018
Washington, DC, Apr. 23-27, 2018
San Jose, CA, May 14-18, 2018

Have a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.

Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat Sheets

Enthought's Machine Learning with Python Cheat Sheets

Additional Webinars in the Training Series:

Python for MATLAB Users: What You Need to Know

Python for Scientists and Engineers: A Tour of Enthought’s Professional Technical Training Course

Python for Data Science: A Tour of Enthought’s Professional Technical Training Course

Python for Professionals: The Complete Guide to Enthought’s Technical Training Courses

An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

Enthought at the 2017 Society of Exploration Geophysicists (SEG) Conference

2017 will be Enthought’s 11th year at the SEG (Society of Exploration Geophysicists) Annual Meeting, and we couldn’t be more excited to be at the leading edge of the digital transformation in oil & gas being driven by the capabilities provided by machine learning and artificial intelligence.

Now in its 87th year, the Annual SEG (Society of Exploration Geophysicists) Meeting will be held in Houston, Texas on September 24-27, 2017 at the George R. Brown Convention Center. The SEG Annual Meeting will be the first large conference to take place in Houston since Hurricane Harvey and its devastating floods, and we’re so pleased to be a small part of getting Houston back “open for business.”

Pre-Event Kickoff: The Machine Learning Geophysics Hackathon

We had such a great experience at the EAGE Subsurface Hackathon in Paris in June that when we heard our friends at Agile Geoscience were planning a machine learning in geophysics hackathon for the US, we had to join! Brendon Hall, Enthought’s Energy Solutions Group Director will be there as a participant and coach and Enthought CEO Eric Jones will be on the judging panel.

Come Meet Us on the SEG Expo Floor & Learn About Our AI-Enabled Solutions for Oil & Gas

Presentations in Enthought Booth #318 (just to the left from the main entrance before the main aisle):

  • Monday, Sept 25, 12-12:45 PM: Lessons Learned From the Front Line: Moving AI From Research to Application
  • Tues, Sept 26, 1-1:45 PM: Canopy Geoscience: Building Innovative, AI-Enabled Geoscience Applications
  • Wed, Sept 27, 12-12:45 PM: Applying Artificial Intelligence to CT, Photo, and Well Log Analysis with Virtual Core

Hart Energy’s E&P Magazine Features Canopy Geoscience

Canopy Geoscience, Enthought’s cross-domain AI platform for oil & gas, was featured in the September 2017 edition of E&P magazine. See the coverage in the online SEG Technology Showcase, in the September print edition, or in the online E&P Flipbook.


Enthought's Canopy Geoscience featured in E&P's September 2017 edition

Webinar: Python for MATLAB Users: What You Need To Know

What:  A guided walkthrough and Q&A about how to migrate from MATLAB® to Python with Enthought Lead Instructor, Dr. Alexandre Chabot-Leclerc.

Who Should Watch: MATLAB® users who are considering migrating to Python, either partially or completely.

View the Webinar


Python has a lot of momentum. Many high profile projects use it and more are migrating to it all the time. Why? One reason is that Python is free, but more importantly, it is because Python has a thriving ecosystem of packages that allow developers to work faster and more efficiently. They can go from prototyping to production to scale on hardware ranging from a Raspberry Pi (or maybe micro controller) to a cluster, all using the same language. A large part of Python’s growth is driven by its excellent support for work in the fields of science, engineering, machine learning, and data science.

You and your organization might be thinking about migrating from MATLAB to Python to get access to the ecosystem and increase your productivity, but you might also have some outstanding questions and concerns, such as: How do I get started? Will any of my knowledge transfer? How different are Python and MATLAB? How long will it take me to become proficient? Is it too big a of a shift? Can I transition gradually or do I have to do it all at once? These are all excellent questions.

We know people put a lot of thought into the tools they select and that changing platforms is a big deal. We created this webinar to help you make the right choice.

In this webinar, we’ll give you the key information and insight you need to quickly evaluate whether Python is the right choice for you, your team, and your organization, including:

  • How to get started
  • What you need in order to replicate the MATLAB experience
  • Important conceptual differences between MATLAB and Python
  • Important similarities between MATLAB and Python: What MATLAB knowledge will transfer
  • Strategies for converting existing MATLAB code to Python
  • How to accelerate your transition

View the Webinar


Presenter: Dr. Alexandre Chabot-Leclerc, Enthought Lead Instructor

Ph.D, Electrical Engineering, Technical University of Denmark

 


Python for Scientists & Engineers Training: The Quick Start Approach to Turbocharging Your Work

If you are tired of running repeatable processes manually and want to (semi-) automate them to increase your throughput and decrease pilot error, or you want to spend less time debugging code and more time writing clean code in the first place, or you are simply tired of using a multitude of tools and languages for different parts of a task and want to replace them with one comprehensive language, then Enthought’s Python for Scientists and Engineers is definitely for you!

This class has been particularly appealing to people who have been using other tools like MATLAB or even Excel for their computational work and want to start applying their skills using the Python toolset.  And it’s no wonder — Python has been identified as the most popular coding language for five years in a row for good reason.

One reason for its broad popularity is its efficiency and ease-of-use. Many people consider Python more fun to work in than other languages (and we agree!). Another reason for its popularity among scientists, engineers, and analysts in particular is Python’s support for rapid application development and extensive (and growing) open source library of powerful tools for preparing, visualizing, analyzing, and modeling data as well as simulation.

Python is also an extraordinarily comprehensive toolset – it supports everything from interactive analysis to automation to software engineering to web app development within a single language and plays very well with other languages like C/C++ or FORTRAN so you can continue leveraging your existing code libraries written in those other languages.

Many organizations are moving to Python so they can consolidate all of their technical work streams under a single comprehensive toolset. In the first part of this class we’ll give you the fundamentals you need to switch from another language to Python and then we cover the core tools that will enable you to do in Python what you were doing with other tools, only faster and better!

Additional Resources

Upcoming Open Python for Scientists & Engineers Sessions:

Washington, DC, Sept 25-29
Los Alamos, NM, Oct 2-6, 2017
Cambridge, UK, Oct 16-20, 2017
San Diego, CA, Oct 30-Nov 3, 2017
Albuquerque, NM, Nov 13-17, 2017
Los Alamos, NM, Dec 4-8, 2017
Austin, TX, Dec 11-15, 2017

Have a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.

Learn More

Download Enthought’s MATLAB to Python White Paper

Additional Webinars in the Training Series:

Python for Scientists & Engineers: A Tour of Enthought’s Professional Technical Training Course

Python for Data Science: A Tour of Enthought’s Professional Technical Training Course

Python for Professionals: The Complete Guide to Enthought’s Technical Training Courses

An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat SheetsEnthought's Machine Learning with Python Cheat Sheets

Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions

When: Thursday, July 20, 2017, 11-11:45 AM CT (Live webcast)

What: A comprehensive overview and live demonstration of Enthought’s latest tools for Python for the enterprise with Enthought’s Chief Technical & Engineering Officer, Didrik Pinte

Who Should Attend: Python users (or those supporting Python users) who are looking for a universal solution set that is reliable and “just works”; scientists, engineers, and data science teams trying to answer the question “how can I more easily build and deploy my applications”; organizations looking for an alternative to MATLAB that is cost-effective, robust, and powerful

REGISTER  (if you can’t attend we’ll send all registrants a recording)


For over 15 years, Enthought has been empowering scientists, engineers, analysts, and data scientists to create amazing new technologies, to make new discoveries, and to do so faster and more effectively than they dreamed possible. Along the way, hand in hand with our customers in aerospace, biotechnology, finance, oil and gas, manufacturing, national laboratories, and more, we’ve continued to “build the science tools we wished we had,” and share them with the world.

For 2017, we’re pleased to announce the release of several major new products and tools, specifically designed to make Python more powerful and accessible for users like you who are building the future of science, engineering, artificial intelligence, and data analysis.

WHAT YOU’LL SEE IN THE WEBINAR

In this webinar, Enthought’s Chief Technical & Engineering Officer will share a comprehensive overview and live demonstration of Enthought’s latest products and how they provide the foundation for scientific computing and artificial intelligence applications with Python, including:

We’ll also walk through  specific use cases so you can quickly see how Enthought’s Enterprise Python tools can impact your workflows and productivity.

REGISTER  (if you can’t attend we’ll send all registrants a recording)


Presenter: Didrik Pinte, Chief Technical & Engineering Officer, Enthought

 

 

 

Related Blogs:

Blog: Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution (June 2017)

Blog: Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting (April 2017)

Blog: New Year, New Enthought Products (Jan 2017)

Product pages:

What’s New in the Canopy Data Import Tool Version 1.1

New features in the Canopy Data Import Tool Version 1.1:
Support for Pandas v. 20, Excel / CSV export capabilities, and more

Enthought Canopy Data Import ToolWe’re pleased to announce a significant new feature release of the Canopy Data Import Tool, version 1.1. The Data Import Tool allows users to quickly and easily import CSVs and other structured text files into Pandas DataFrames through a graphical interface, manipulate the data, and create reusable Python scripts to speed future data wrangling. Here are some of the notable updates in version 1.1:

1. Support for PyQt
The Data Import Tool now supports both PyQt and PySide backends. Python 3 support will also be available shortly.

2. Exporting DataFrames to csv/xlsx file formats
We understand that data exploration and manipulation are only one part of your data analysis process, which is why the Data Import Tool now provides a way for you to save the DataFrame as a CSV/XLSX file. This way, you can share processed data with your colleagues or feed this processed file to the next step in your data analysis pipeline.

3. Column Sort Indicators
In earlier versions of the Data Import Tool, it was not obvious that clicking on the right-end of the column header sorted the columns. With this release, we added sort indicators on every column, which can be pressed to sort the column in an ascending or descending fashion. And given the complex nature of the data we get, we know sorting the data based on single column is never enough, so we also made sorting columns using the Data Import Tool stable (ie, sorting preserves any existing order in the DataFrame).

Continue reading

SciPy 2017 Conference to Showcase Leading Edge Developments in Scientific Computing with Python

Renowned scientists, engineers and researchers from around the world to gather July 10-16, 2017 in Austin, TX to share and collaborate to advance scientific computing tool


AUSTIN, TX – June 6, 2017 –
Enthought, as Institutional Sponsor, today announced the SciPy 2017 Conference will be held July 10-16, 2017 in Austin, Texas. At this 16th annual installment of the conference, scientists, engineers, data scientists and researchers will participate in tutorials, talks and developer sprints designed to foster the continued rapid growth of the scientific Python ecosystem. This year’s attendees hail from over 25 countries and represent academia, government, national research laboratories, and industries such as aerospace, biotechnology, finance, oil and gas and more.

“Since 2001, the SciPy Conference has been a highly anticipated annual event for the scientific and analytic computing community,” states Dr. Eric Jones, CEO at Enthought and SciPy Conference co-founder. “Over the last 16 years we’ve witnessed Python emerge as the de facto open source programming language for science, engineering and analytics with widespread adoption in research and industry. The powerful tools and libraries the SciPy community has developed are used by millions of people to advance scientific inquest and innovation every day.”

Special topical themes for this year’s conference are “Artificial Intelligence and Machine Learning Applications” and the “Scientific Python (SciPy) Tool Stack.” Keynote speakers include:

  • Kathryn Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign  
  • Sean Gulick, Research Professor at the Institute for Geophysics at the University of Texas at Austin
  • Gaël Varoquaux, faculty researcher in the Neurospin brain research institute at INRIA (French Institute for Research in Computer Science and Automation)

In addition to the special conference themes, there will also be over 100 talk and poster paper speakers/presenters covering eight mini-symposia tracks including: Astronomy; Biology, Biophysics, and Biostatistics; Computational Science and Numerical Techniques; Data Science; Earth, Ocean, and Geo Sciences; Materials Science and Engineering; Neuroscience; and Open Data and Reproducibility.

Continue reading

Enthought Receives 2017 Product of the Year Award From National Instruments LabVIEW Tools Network

Python Integration Toolkit for LabVIEW recognized for extending LabVIEW connectivity and bringing the power of Python to applications in Test, Measurement and the Industrial Internet of Things (IIoT)

AUSTIN, TX – May 24, 2017 Enthought, a global leader in scientific and analytic computing solutions, was honored this week by National Instruments with the LabVIEW Tools Network Platform Connectivity 2017 Product of the Year Award for its Python Integration Toolkit for LabVIEW.

Python Integration Toolkit for LabVIEWFirst released at NIWeek 2016, the Python Integration Toolkit enables fast, two-way communication between LabVIEW and Python. With seamless access to the Python ecosystem of tools, LabVIEW users are able to do more with their data than ever before. For example, using the Toolkit, a user can acquire data from test and measurement tools with LabVIEW, perform signal processing or apply machine learning algorithms in Python, display it in LabVIEW, then share results using a Python-enabled web dashboard.

Enthought-Python-Integration-Toolkit-for-LabVIEW-Machine-Learning

Click to see the webinar “Using Python and LabVIEW to Rapidly Solve Engineering Problems” to learn more about adding capabilities such as machine learning by extending LabVIEW applications with Python.

“Python is ideally suited for scientists and engineers due to its simple, yet powerful syntax and the availability of an extensive array of open source tools contributed by a user community from industry and R&D,” said Dr. Tim Diller, Director, IIoT Solutions Group at Enthought. “The Python Integration Toolkit for LabVIEW unites the best elements of two major tools in the science and engineering world and we are honored to receive this award.”

Continue reading