Well, DataGotham is over. The conference featured a wide cross section of the data community in NYC. Talks spanned topics from “urban science” to “finding racism on FourSquare” to “creating an API for spaces.” Don’t worry, the videos will be online soon so you can investigate yourself. The organizers did a great job putting a conference of this size together on relatively short notice. Bravo NYC data crunchers!

One thing I somehow missed was a network graph created by the organizers to illustrate the tools used by attendees. I am happy to see python leading the way! The thickness of the edge indicates the number of people using both tools. It seems there are a lot of people trying to make Python and R “two great tastes that go great together.” I’m curious as to why more Python users aren’t using numpy and scipy. Food for thought…

Got tools?

One thought on “DataGotham…Complete!

  1. avatarJeff Knisley

    As someone who uses both Python and R on a regular basis, I think I can provide some insight into why more Python users are not using Scipy and Numpy. Note: My bias is toward Python with Scipy/Numpy, which I use exclusively whenever possible.

    But here are my 4 big reasons for using R instead of Python, and #2 implies #1, #3 implies #2, and so on.

    1. Big Data Initiatives — e.g., sage bionetworks breast cancer challenge — require the use of R. Or equivalently, clients require it, because it is easy to work with.

    2. Rstudio. I repeat — Rstudio. Spyder is the closest thing in Python. Ipython is about to get there, but transparency and documentation are huge, huge issues that the Python community mostly ignores.

    3. R is transparent in how to upload and work with data. Python often requires searching through the documentation to find the one small key or similar necessary to get the data into the proper numpy format. An excellent example — Bioconductor ExpressionSet objects for array expression data.

    4. R is a well-documented (=sweave) collection of tools built on a uniform platform of core tools and constructs. There are anywhere from 3 to 7 versions of Python. Even simple things, like 3/2 = 1 instead of 3/2 = 1.5 are huge issues.

    When everything moves to Python 3.2 where there is some uniformity in the language, and Ipython notebook or Spyder or something similar emerges to make scientific work more accessible (we’re scientists who program, not hackers who do science), then we may see Python users like me using R more exclusively all the time.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 103,041 bad guys.