Apologies to my readers for not having a post last weekend. I was on the road for my job and had to put blogging on the back burner.
My post this week is a review of the book The Signal and the Noise: Why so Many Predictions Fail—but Some Don't by Nate Silver. This is a book about the science and art of forecasting in a wide variety of fields; Nate Silver runs the website FiveThirtyEight that does exactly that. The title of this post was inspired by the fact that March Madness is coming soon.
Overview of the book
The Signal and the Noise (abbreviated S&N below) is really well written and well organized. Nate Silver does a great job of explaining mathematical details of statistics and modelling in an understandable way without sacrificing accuracy. Each chapter deals with predictions in a different field, so they can be read alone, but also fit together really effectively. I found it to be good for reading on planes and in airports (where I've been spending a lot of time this month)—it held my attention but was also divided into convenient-sized segments so that it was easy to put down and return to without losing the flow. I got into it enough that when I lost my Kindle, I picked up the paperback copy so I could finish reading it.
I was hooked by the introduction in S&N. It hearkens back to the invention of the printing press for insight into the present day, since both eras are defined by an explosion of new information. The printing press brought innumerable benefits but it also spurred some bloody religious conflicts. The rise of "Big Data" also carries both promise and peril, per Nate Silver.
As mentioned, each chapter focuses on predictions/forecasts in a different field. The fields that are covered include: sports, politics, economics, weather, earthquakes, poker, climate change, and terrorism. I found the chapter on weather to be one of the most interesting because it's something that everyone can relate to. Nate Silver describes the techniques used to develop weather forecasts—including how probabilistic forecasts are developped from deterministic models—and then discusses how they've increased in accuracy in recent decades. He also reviews the calibration of forecasts from different sources: what percentage of the times that are stated to have a 15% probability of rain does it actually rain? He also reveals that 10 days or more in advance the historic averages for a location are more accurate than the long-term temperature forecast.
I also found the chapter on the stock market to have some good insights. And the chapter on climate change was among the best coverage I've ever read on the debate.
About the author
Some of the chapters in S&N are quite biographical. Nate Silver is probably most famous for his U.S. election predictions (his website, FiveThirtyEight takes its name from the number of electors in the U.S. Electoral College) and the parts of the book dealing with political forecasts draw on this experience and describe some of the techniques he uses. However, he really got his start looking at baseball statistics, and also spent a few years making a living playing online poker, so those chapters also contain a lot of personal anecdotes.
What I learned
From this book, I got four big ideas about forecasting, as well as several other tips to keep in mind.
Four big ideas
Overfitting: Overfitting is probably going to be a rising problem in the age of Big Data. It results when a statistical model is aimed to have a good "fit" to "noisy" data but goes too far: in attempting to explain specific random fluctuations, it does a worse job when it is generalized or extrapolated.
Bayes' Theorem: The second half of S&N makes heavy use of Bayes' Theorem. Nate Silver promotes this theorem (and the paradigm of statistics that goes with it) for the way that it works with/accounts for our uncertainty and our subjective points of view, and how it can incorporate assumptions that are hard to quantify otherwise. The Bayesian approach to forecasting incorporates context and plausibility, unlike the frequentist approach, so it is less likely to identify spurious correlations.
Bayes' Theorem is as follows:
Posterior probability, xN+1 = (xN ⋅ y) / (xN ⋅ y + z ⋅ (1 – xN)
Where: xN = prior probability of A being true; y = probability of B occurring given that A is true; z = probability of B occurring given that A is not true; the "posterior probability" is the probability of A being true given that B has occurred.
I've added the subscript notation to the equation Nate provides in his book to make the iterative application of Bayes' Theorem explicit. The iterative application means that a range of reasonable "priors" should converge toward the true results as more information is added.
Based on a couple of previous posts I've written, I was interested to learn that Thomas Bayes was part of the Scottish Enlightenment since, as a Nonconformist, he was excluded from OxBridge and thus had to go to Edinburgh to study.
Pooled/Aggregate Predictions: Another theme from The Signal and the Noise concerns the benefits of pooling predictions together for improved accuracy. This principle is used in FiveThirtyEight's political forecasts which use sort of a poll of polls. It's also what underlies the utility of "prediction markets", like the late, lamented InTrade.
Be clear about uncertainty: To be a good forecaster, one should practice thinking probabilistically. Rather than simply stating if you think a certain event will happen or not, consider the weight or likelihood you assign to your prediction. In the long run, you should get 70% of predictions that you attach 70% confidence to correct, for example.
Other tips for making good predictions
- Avoid "out of sample" situations, predictions where the situation differs from the circumstances in which the source data was collected
- Be a "fox" (I think this is the idea behind the FiveThirtyEight logo): able to separate "is" from "ought" when making predictions; more information helps accuracy, rather than helping justify pre-existing biases
- Continually revise/update forecasts as new data comes in
- Communicate uncertainty
- Don't throw out data
- Overstating confidence is counterproductive
Finally, there were a few other tidbits from S&N or the FiveThirtyEight website that I wanted to share (or at least bookmark for myself):
- Nate Silver makes use of Donald Rumsfeld's rubric of "unknown unknowns"
- The provocatively-titled "Why Most Published Research Findings Are False" paper is referenced
- One of the chapter titles is taken from a poem by the Danish mathematician Piet Hein
- Some forecasting principles by Scott Armstrong are referenced
- A fun feature on the FiveThirtyEight website last year was the search for America's best burrito; on a recent trip to Birmingham, Alabama, I visited one of the featured restaurants (pictured below)