Tag Archives: Machine Learning

SciPy 2017 Conference to Showcase Leading Edge Developments in Scientific Computing with Python

Renowned scientists, engineers and researchers from around the world to gather July 10-16, 2017 in Austin, TX to share and collaborate to advance scientific computing tool


AUSTIN, TX – June 6, 2017 –
Enthought, as Institutional Sponsor, today announced the SciPy 2017 Conference will be held July 10-16, 2017 in Austin, Texas. At this 16th annual installment of the conference, scientists, engineers, data scientists and researchers will participate in tutorials, talks and developer sprints designed to foster the continued rapid growth of the scientific Python ecosystem. This year’s attendees hail from over 25 countries and represent academia, government, national research laboratories, and industries such as aerospace, biotechnology, finance, oil and gas and more.

“Since 2001, the SciPy Conference has been a highly anticipated annual event for the scientific and analytic computing community,” states Dr. Eric Jones, CEO at Enthought and SciPy Conference co-founder. “Over the last 16 years we’ve witnessed Python emerge as the de facto open source programming language for science, engineering and analytics with widespread adoption in research and industry. The powerful tools and libraries the SciPy community has developed are used by millions of people to advance scientific inquest and innovation every day.”

Special topical themes for this year’s conference are “Artificial Intelligence and Machine Learning Applications” and the “Scientific Python (SciPy) Tool Stack.” Keynote speakers include:

  • Kathryn Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign  
  • Sean Gulick, Research Professor at the Institute for Geophysics at the University of Texas at Austin
  • Gaël Varoquaux, faculty researcher in the Neurospin brain research institute at INRIA (French Institute for Research in Computer Science and Automation)

In addition to the special conference themes, there will also be over 100 talk and poster paper speakers/presenters covering eight mini-symposia tracks including: Astronomy; Biology, Biophysics, and Biostatistics; Computational Science and Numerical Techniques; Data Science; Earth, Ocean, and Geo Sciences; Materials Science and Engineering; Neuroscience; and Open Data and Reproducibility.

New for 2017 is a sold-out “Teen Track,” a two-day curriculum designed to inspire the scientists of tomorrow.  From July 10-11, high school students will learn more about the Python language and how developers solve real world scientific problems using Python and its scientific libraries.

Conference and tutorial registration is open at https://scipy2017.scipy.org.

About the SciPy Conference

SciPy 2017, the sixteenth annual Scientific Computing with Python conference, will be held July 10-16, 2017 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers and collaborate on code development. For more information or to register, visit https://scipy2017.scipy.org.

About Enthought

Enthought is a global leader in scientific and analytic software, consulting and training solutions serving a customer base comprised of some of the most respected names in the oil and gas, manufacturing, financial services, aerospace, military, government, biotechnology, consumer products and technology industries. The company was founded in 2001 and is headquartered in Austin, Texas, with additional offices in Cambridge, United Kingdom and Pune, India. For more information visit www.enthought.com and connect with Enthought on Twitter, LinkedIn, Google+, Facebook and YouTube.

 

 

Webinar: Python for Data Science: A Tour of Enthought’s Professional Training Course

DataView Python for Data Science Webinar
What: A guided walkthrough and Q&A about Enthought’s technical training course “Python for Data Science and Machine Learning” with VP of Training Solutions, Dr. Michael Connell

Who Should Watch: individuals, team leaders, and learning & development coordinators who are looking to better understand the options to increase professional capabilities in Python for data science and machine learning applications

VIEW


Enthought’s Python for Data Science training course is designed to accelerate the development of skill and confidence in using Python’s core data science tools — including the standard Python language, the fast array programming package NumPy, and the Pandas data analysis package, as well as tools for database access (DBAPI2, SQLAlchemy), machine learning (scikit-learn), and visual exploration (Matplotlib, Seaborn).

In this webinar, we give you the key information and insight you need to evaluate whether Enthought’s Python for Data Science course is the right solution to advance your professional data science skills in Python, including:

  • Who will benefit most from the course
  • A guided tour through the course topics
  • What skills you’ll take away from the course, how the instructional design supports that
  • What the experience is like, and why it is different from other training alternatives (with a sneak peek at actual course materials)
  • What previous course attendees say about the course

VIEW


michael_connell-enthought-vp-trainingPresenter: Dr. Michael Connell, VP, Enthought Training Solutions

Ed.D, Education, Harvard University
M.S., Electrical Engineering and Computer Science, MIT


Considering Moving to Python for Data Science?

Then Enthought’s Python for Data Science training course is definitely for you! This class has been particularly appealing to people who have been using other tools like R or SAS (or even Excel) for their data science work and want to start applying their analytic skills using the Python toolset.  And it’s no wonder — Python has been identified as the most popular coding language for five years in a row for good reason.

One reason for Python’s broad popularity across a range of disciplines is its efficiency and ease-of-use. Many people consider Python more fun to work in than other languages (and we agree!). Another reason for its popularity among data analysts and data scientists in particular is Python’s extensive (and growing) open source library of powerful tools for preparing, visualizing, analyzing, and modeling data.

Python is also an extraordinarily comprehensive toolset – it supports everything from interactive analysis to automation to software engineering to web app development within a single language and plays very well with other languages like C/C++ or FORTRAN so you can continue leveraging your existing code libraries written in those other languages.

Many organizations are moving to Python so they can consolidate all of their technical work streams under a single comprehensive toolset. In the first part of this class we’ll give you the fundamentals you need to switch from another language to Python and then we cover the core tools that will enable you to do in Python what you were doing with other tools, only faster!

Additional Resources

Upcoming Open Python for Data Science Sessions:
Austin, TX, June 12-16, 2017
San Jose, CA, July 17-21, 2017Learn MoreHave a group interested in training? We specialize in group and corporate training. Contact us or call 512.536.1057.
Download Enthought’s Machine Learning with Python’s Scikit-Learn Cheat Sheets
Enthought's Machine Learning with Python Cheat Sheets
Download Enthought’s Pandas Cheat SheetsEnthought's Pandas Cheat Sheets

Handling Missing Values in Pandas DataFrames: the Hard Way, and the Easy Way

This is the second blog in a series. See the first blog here: Loading Data Into a Pandas DataFrame: The Hard Way, and The Easy Way

No dataset is perfect and most datasets that we have to deal with on a day-to-day basis have values missing, often represented by “NA” or “NaN”. One of the reasons why the Pandas library is as popular as it is in the data science community is because of its capabilities in handling data that contains NaN values.

But spending time looking up the relevant Pandas commands might be cumbersome when you are exploring raw data or prototyping your data analysis pipeline. This is one of the places where the Canopy Data Import Tool helps make data munging faster and easier, by simplifying the task of identifying missing values in your raw data and removing/replacing them.

Why are missing values a problem you ask? We can answer that question in the context of machine learning. scikit-learn and TensorFlow are popular and widely used libraries for machine learning in Python. Both of them caution the user about missing values in their datasets. Various machine learning algorithms expect all the input values to be numerical and to hold meaning. Both of the libraries suggest removing rows and/or columns that contain missing values.

If removing the missing values is not an option, given the size of your dataset, then they suggest replacing the missing values. The scikit-learn library provides an Imputer class, which can be used to replace missing values. See the sci-kit learn documentation for an example of how the Imputer class is used. Similarly, the decode_csv function in the TensorFlow library can be passed a record_defaults argument, which will replace missing values in the dataset. See the TensorFlow documentation for specifics.

The Data Import Tool provides capabilities to handle missing values in your dataset because we strongly believe that discovering and handling missing values in your dataset is a part of the data import and cleaning phase and not the analysis phase of the data science process.

Digging into the specifics, here we’ll compare how you can go about handling missing values with three typical scenarios, first using the Pandas library, then contrasting with the Data Import Tool:

  1. Identifying missing values in data
  2. Replacing missing values in data, and
  3. Removing missing values from data.

Note : Pandas’ internal representation of your data is called a DataFrame. A DataFrame is simply a tabular data structure, similar to a spreadsheet or a SQL table.

Continue reading

Webinar- Get More From Your Core: Applying Artificial Intelligence to CT, Photo, and Well Log Analysis with Virtual Core

What: Presentation, demo, and Q&A with Brendon Hall, Geoscience Product Manager, Enthought

Who should watch this webinar:

  • Oil and gas industry professionals who are looking for ways to extract more value from expensive science wells
  • Those interested in learning how artificial intelligence and machine learning techniques can be applied to core analysis

VIEW 


Geoscientists and petroleum engineers rely on accurate core measurements to characterize reservoirs, develop drilling plans and de-risk play assessments. Whole-core CT scans are now routinely performed on extracted well cores, however the data produced from these scans are difficult to visualize and integrate with other measurements.

Virtual Core automates aspects of core description for geologists, drastically reducing the time and effort required for core description, and its unified visualization interface displays cleansed whole-core CT data alongside core photographs and well logs. It provides tools for geoscientists to analyze core data and extract features from sub-millimeter scale to the entire core.

In this webinar and demo, we’ll start by introducing the Clear Core processing pipeline, which automatically removes unwanted artifacts (such as tubing) from the CT image. We’ll then show how the machine learning capabilities in Virtual Core can be used to describe the core, extracting features such as bedding planes and dip angle. Finally, we’ll show how the data can be viewed and analyzed alongside other core data, such as photographs, wellbore images, well logs, plug measurements, and more.

What You’ll Learn:

  • How core CT data, photographs, well logs, borehole images, and more can be integrated into a digital core workshop
  • How digital core data can shorten core description timelines and deliver business results faster
  • How new features can be extracted from digital core data using artificial intelligence
  • Novel workflows that leverage these features, such as identifying parasequences and strategies for determining net pay

VIEW 

Presenter:

Brendon Hall, Geoscience Applications Engineer, Enthought Brendon Hall, Enthought
Geoscience Product Manager and Application Engineer

Continue reading

Geophysical Tutorial: Facies Classification using Machine Learning and Python

Published in the October 2016 edition of The Leading Edge magazine by the Society of Exploration Geophysicists. Read the full article here.

By Brendon Hall, Enthought Geosciences Applications Engineer 
Coordinated by Matt Hall, Agile Geoscience

ABSTRACT

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist’s toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

One of the best examples is scikit-learn, a collection of tools for machine learning in Python. What is machine learning? You can think of it as a set of data-analysis methods that includes classification, clustering, and regression. These algorithms can be used to discover features and trends within the data without being explicitly programmed, in essence learning from the data itself.

Well logs and facies classification results from a single well.

Well logs and facies classification results from a single well.

In this tutorial, we will demonstrate how to use a classification algorithm known as a support vector machine to identify lithofacies based on well-log measurements. A support vector machine (or SVM) is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described.

See the tutorial in The Leading Edge here.

ADDITIONAL RESOURCES: