Tag Archives: scikit-learn

Webinar: Machine Learning Mastery Workshop: An Exclusive Peek “Under the Hood” of Enthought Training

What: A guided walkthrough and live Q&A about Enthought’s new “Machine Learning Mastery Workshop” training course.

Who Should Attend: If predictive modeling and analytics would be valuable in your work, come to the webinar to find out what all the fuss is about and what there is to know. Whether you are looking to get started with machine learning, interested in refining your machine learning skills, or want to transfer your skills from another toolset to Python, come to the webinar to find out if Enthought’s highly interactive, expertly taught Machine Learning Mastery Workshop might be a good fit for accelerating your development!


Why Has Machine Learning Become So Popular?

Artificial Intelligence and Machine Learning are a defining feature of the 21st century and are quickly becoming a key factor in gaining and maintaining competitive advantage in each industry which incorporates them. Why is machine learning so beneficial?  Because it provides a fast and flexible way to build models that can surface signal, find patterns, and predict future behavior.  These powerful models are used for:

  • Forecasting supply chain availability
  • Clustering product defects for QA
  • Anticipating movements in financial markets
  • Predicting chemical tolerances
  • Optimizing the placement of advertisements
  • Managing process engineering
  • Modeling reservoir production
  • and much more.

In response to growing demand for Machine Learning expertise, Enthought has developed an intensive 3-day guided practicum to bring you up to speed quickly on key concepts and skills in this exciting realm. Join us in this webinar for an in-depth overview of Enthought’s Machine Learning Mastery Workshop — a training course designed to accelerate the development of intuition, skill, and confidence in applying machine learning methods to solve real-world problems.

In the webinar we’ll describe how Enthought’s training course combines conceptual knowledge of machine learning models with intensive experience applying them to real-world data to develop skill in applying Python’s machine learning tools, such as the scikit-learn package, to make predictions about complicated phenomena by leveraging the information contained in numerical data, natural language, 2D images, and discrete categories.

The hands-on, interactive course was created ground up by our training experts to enable you to develop transferable skills in Machine Learning that you can apply back at work the next day.

In this webinar, we’ll give you the key information and insight you need to quickly evaluate whether Enthought’s Machine Learning Mastery Workshop course is the right solution for you to build skills in using Python for advanced analytics, including:

  • Who will benefit most from the course, and what pre-requisite knowledge is required
  • What topics the course covers – a guided tour
  • What new knowledge, skills, and capabilities you’ll take away, and how the course design supports those outcomes
  • What the (highly interactive) learning experience is like
  • Why this course is different from other training alternatives (with a preview of actual course materials!)
  • What previous workshop attendees say about our courses


Presenter: Dr. Dillon Niederhut,

Enthought Training Instructor

Ph.D., University of California at Berkeley



Additional Resources

Upcoming Open Machine Learning Mastery Workshop Sessions:

Austin, TX, Feb. 21-23, 2017
Houston, TX, Apr. 18-20, 2018
Cambridge, UK, May 9-11, 2018

Upcoming Open Python for Data Science Sessions:

New York City, NY, Dec. 4-8, 2018
London, UK, Feb. 19-23, 2018
Washington, DC, Apr. 23-27, 2018
San Jose, CA, May 14-18, 2018

Have a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.

Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat Sheets

Enthought's Machine Learning with Python Cheat Sheets

Additional Webinars in the Training Series:

Python for MATLAB Users: What You Need to Know

Python for Scientists and Engineers: A Tour of Enthought’s Professional Technical Training Course

Python for Data Science: A Tour of Enthought’s Professional Technical Training Course

Python for Professionals: The Complete Guide to Enthought’s Technical Training Courses

An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

Webinar: Python for Data Science: A Tour of Enthought’s Professional Training Course

View Python for Data Science Webinar
What: A guided walkthrough and Q&A about Enthought’s technical training course “Python for Data Science and Machine Learning” with VP of Training Solutions, Dr. Michael Connell

Who Should Watch: individuals, team leaders, and learning & development coordinators who are looking to better understand the options to increase professional capabilities in Python for data science and machine learning applications


Enthought’s Python for Data Science training course is designed to accelerate the development of skill and confidence in using Python’s core data science tools — including the standard Python language, the fast array programming package NumPy, and the Pandas data analysis package, as well as tools for database access (DBAPI2, SQLAlchemy), machine learning (scikit-learn), and visual exploration (Matplotlib, Seaborn).

In this webinar, we give you the key information and insight you need to evaluate whether Enthought’s Python for Data Science course is the right solution to advance your professional data science skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous course attendees say about the course


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT

Continue reading

Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

The Data Import Tool can highlight missing value cells, helping you easily identify columns or rows containing NaN valuesThis is the second blog in a series. See the first blog here: Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

No dataset is perfect and most datasets that we have to deal with on a day-to-day basis have values missing, often represented by “NA” or “NaN”. One of the reasons why the Pandas library is as popular as it is in the data science community is because of its capabilities in handling data that contains NaN values.

But spending time looking up the relevant Pandas commands might be cumbersome when you are exploring raw data or prototyping your data analysis pipeline. This is one of the places where the Canopy Data Import Tool helps make data munging faster and easier, by simplifying the task of identifying missing values in your raw data and removing/replacing them.

Why are missing values a problem you ask? We can answer that question in the context of machine learning. scikit-learn and TensorFlow are popular and widely used libraries for machine learning in Python. Both of them caution the user about missing values in their datasets. Various machine learning algorithms expect all the input values to be numerical and to hold meaning. Both of the libraries suggest removing rows and/or columns that contain missing values.

If removing the missing values is not an option, given the size of your dataset, then they suggest replacing the missing values. The scikit-learn library provides an Imputer class, which can be used to replace missing values. See the sci-kit learn documentation for an example of how the Imputer class is used. Similarly, the decode_csv function in the TensorFlow library can be passed a record_defaults argument, which will replace missing values in the dataset. See the TensorFlow documentation for specifics.

The Data Import Tool provides capabilities to handle missing values in your dataset because we strongly believe that discovering and handling missing values in your dataset is a part of the data import and cleaning phase and not the analysis phase of the data science process.

Digging into the specifics, here we’ll compare how you can go about handling missing values with three typical scenarios, first using the Pandas library, then contrasting with the Data Import Tool:

  1. Identifying missing values in data
  2. Replacing missing values in data, and
  3. Removing missing values from data.

Note : Pandas’ internal representation of your data is called a DataFrame. A DataFrame is simply a tabular data structure, similar to a spreadsheet or a SQL table.

Continue reading

Geophysical Tutorial: Facies Classification using Machine Learning and Python

Published in the October 2016 edition of The Leading Edge magazine by the Society of Exploration Geophysicists. Read the full article here.

By Brendon Hall, Enthought Geosciences Applications Engineer 
Coordinated by Matt Hall, Agile Geoscience


There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist’s toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

One of the best examples is scikit-learn, a collection of tools for machine learning in Python. What is machine learning? You can think of it as a set of data-analysis methods that includes classification, clustering, and regression. These algorithms can be used to discover features and trends within the data without being explicitly programmed, in essence learning from the data itself.

Well logs and facies classification results from a single well.

Well logs and facies classification results from a single well.

In this tutorial, we will demonstrate how to use a classification algorithm known as a support vector machine to identify lithofacies based on well-log measurements. A support vector machine (or SVM) is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described.

See the tutorial in The Leading Edge here.