Jul 16 2018

PyMC3 Install on Windows 10

Hey what’s up? Let’s not talk about the length of time since my last post.

I wanted to share before I forget the steps I endured to get PyMC3 installed and working on my Windows 10 laptop (its a Lenovo Thinkpad). PyMC3 is a tool for doing probabilistic programming in Python and looks super cool. However, it has been challenging for me to totally install both at home and work. I think I got it now so let me review what I have learned. Here are the steps I took (I have python3.6 installed):

1. pip install pymc3 <-gets pymc3, theano and necessary packages
2. During the pip install I get a few warnings: a) two additional packages were needed (I pip installed them too) b) a tool has been installed to my python scripts directory, but that is not included in my windows PATH. This latter is interesting because in fact the directory is included in my "User Variables" PATH but not my "System Variables" PATH. For those new to all this, you can change these variables by searching for "Env" and choosing Edit the System Environment Variables:
sysenv
3. In the command prompt started a ipython/jupyter notebook: jupyter notebook
4. Then I began by importing pymc3 in the first cell in my notebook: import pymc3. Sadly, this yields an error saying “no g++ compiler detected”. Without a “found” compiler, pymc3 (theano) will be slow. In fact, I do have Visual Studio 2015 installed, but am not sure whether I selected to install Visual C++, so I’ll install one to be safe.
5. To install g++: At home I chose the Anaconda route…ultimately uninstalling python first to try to get the paths correct. Once you do that you can conda install mingw64-toolchain. At work today, I simply went with installing: mingw64. This is easy BUT CRITICALLY YOU MUST CHOOSE x86-64 AND NOT THE DEFAULT i686 architecture from the installation dropdow. This mistake cost me hours earlier.
6. Although it asked me a lot about whether to install stuff to the path, mingw64 did not seem to (may I need a reboot) so I manually added the bin directory to my path. for me that was:
7. I re-ran the first cell in my notebook and it didn’t complain. Good.
8. As a first test, I began entering the code in the example http://docs.pymc.io/notebooks/sampler-stats.html
9. All went well until it was time for the NUTS sampler (I found this at home too). When run with “cores=2″ or greater it stops with an error. Specifically an error with pickling.The single comment on that StackOverflow page seems to have fixed it for me. Unfortunately, first I had to…
10. Install Git. This went smoothly, except that I wanted Notepad++ as my editor so I installed that so I could choose it during install.
11. Now the command pip3 install -U git+https://github.com/pymc-devs/pymc3.git will uninstall the earlier version of pymc3 and grab the latest one from git. NOTE: I added https:// to the URL from the comment in StackOverflow.
12. Voila!
jupyter
13. Now I can learn about this cool tool. Thanks to the internet and to people that take the time to explain how they fixed problems!


Nov 7 2012

Welcome the Data Era: How 2012 shows the future of all elections, and sports, and business…

Nate Silver's Election Map

Nate Silver's Election Map

President Obama’s reelection last night happened amidst a confluence of trends in our nation’s demographics, economics, even some-would-say our meteorology. The most important of these is not a trend, but a meta-trend: the true onset of the Data Era. Should Florida go to the President (as it is leaning), Nate Silver, long-time author of the fivethirtyeight blog, will have correctly predicted all 50 states’ presidential vote outcomes, and all but two of the senate races. This accuracy marks a triumph of Statistics and Data Science, and signals the future for our elections, and many other things.

Data Data everywhere
It is important to note that we have long honed our ability to analyze data, with required statistics classes for many different majors on a College campus (hello #RITEMBAStats!). In fact, Silver’s method includes analysis of error using sum of squares just the same as introductory stats students learn in their regression unit. Over the years, “Database Marketing”, “Data Science”, “Big Data”, “Business Intelligence”, “Data Analytics” and others terms have become catch phrases to refer to the trend of analyzing data to discover useful patterns.

While our attention to data has existed for decades, our ability to make such visible results on a grand scale is relatively recent. The biggest reason for this is the new availability of data to large audiences. The most obvious source of this new information is social media. Twitter, for example, hit a new peak of 66,019 tweets per minute around 9:30pm last night. This massive data set is continuously broadcast for anyone who is interested to read, or analyze it. Many commercial ventures study it closely. Dell Computer’s 1st Chief Listening Officer Susan Beebe uses a wide variety of analysis tools to track the vast array of social media data relevant to her company.

Another major source of this new information are the companies that actively collect, and then share, their data. In 2009, the $1 Million NetFlix prize was awarded to team BellKor’s Pragmatic Chaos for their algorithm predicting how well a viewer will enjoy a movie based on their movie preferences. Since then, crowd-sourced data analysis has become mainstream with websites like Kaggleoffering new competitions as well as educational materials to enhance our ability to predict the future based on the past. A quick read of Kaggle’s forums provides a list of new analysis tools that are available (and inexpensive) like the programming language R, Wolfram Alpha or Orange.

Moneyball Poster

Statistics with a relatable face

The signs
But it is none of this that signals the new era of data. While some have argued how boring and uninteresting the details are, it is hard to miss how prevalent all of this has become. The opposite is in fact true: the advent of the Data Era is this moment when the data, the tools and people have created a user interface that everyone can see, understand and embrace. At its most mundane, we see this in an exploding number of “infographics” explaining all manner of information in a visual format that is appropriate for audiences with a short amount of time. We know we could present tables full of numbers but people won’t consume them. An election map is known to be the way to go.

But it goes much further than infographics. A 2009 NY Times article quotes Hal Varian, chief economist at Google as saying “I keep saying that the sexy job in the next 10 years will be statisticians…and I am not kidding.” Statisticians, or at least their output, have become increasingly visual and widely embraced. The recent hit movie, Moneyball, shows the impact of statistical thinking on baseball while providing Brad Pitt on the posters.

Nate Silver has effectively become a rock-star among the data-connoisseurs and others. He appeared this year on The Daily Show, the Colbert Report, and other television programs all the while authoring a bestselling book. His name was trending on Twitter during the election results coverage, and a search on the #natesilverfacts hashtag will show you how heroic he has become.

Naters gonna Nate

Nate Silver meme

In the wake of the 2012 election, we enter a new era. Nate Silver has become a modern hero by predicting politics and baseball with remarkable accuracy, but it is the accessibility of his thinking that we should pay attention to. Having passed through this moment, there is no way back. The availability of data, tools and new people to work with them is only increasing going forward. All elections will have simulation-powered visual, interpretable results. Statisticians will be much more commonplace and seen as they explain sophisticated computer models.

It is our ability to make the esoteric details of an analysis clear through our charts, words and actions that has crossed through this threshhold.

Sources:
For Today’s Graduate, Just One Word: Statistics By STEVE LOHR

Pundit Forecasts All Wrong, Silver Perfectly Right. Is Punditry Dead?

Meme from: @riondotnu