Like "Freakonomics," this book over-relies on a catchy phrase as a substitute for a thorough exploration of the concepts and issues. The list of concerns includes:
1. Vague definition of the term "supercrunching." Is it "super" because the author wants us to think all statistics are super, or (what I had hoped) is there something about the type of statistics to which he refers that are in fact different from statistics in decision making for the last 40 years? All the talk of large datasets implies that supercrunching is a matter of size, but then why does the very first example of regression involve a model that has only 2 predictors? No need for large data sets for this kind of a model, right? Unless the effect size is tiny, but then, what good is the model? Tell us how things really are new and different now.
2. The book reads like a list of (mostly internet) companies and how fabulous and smart they are for using statistics. Actuarial science has been around for many, many years and again we see little discussion of how the actuarial tradition has become more available outside of the insurance industry. The whole book seems more like a stream of consciousness than an organized conceptual framework about the emergence of statistics as a guide to commercial, medical, and policy making over time.
3. While perhaps an excellent lawyer and professor, the author makes so many misleading or inaccurate remarks about statistics that it could be difficult for someone with a statistics background to enjoy the book. For example, regression is discussed as a technique that is different from the "randomized test," when in fact the randomized test is a design, and the regression (more commonly the "general linear model," including regression, analysis of variance, linear and structural modeling) is the inferential statistical technique used to evaluate the results of the test design. Early in the book, the author talks about how amazing regression is, and then gives and example of how a bank evaluates probability of future actions on the phone based on past behaviors on the phone. This very first example after introducing regression does not involve regression as a prediction technique, but rather actuarial base rates! In fact, I found it very disappointing that actuarial science, probability, and Bayes' theorem (all at least as relevant to data-driven decision-making as the randomized trial) were given so little attention in the book.
4. Finally, the great irony--and part of the "this book is so bad I have to finish it" quality--is that the author writes in an incredibly intuitive manner. The book is full of cognitively biased representation of the topic, owing mainly to "availability" heuristics, for example, the authors' excessive attention to the work of his friends, his roommates, his enemies, his daughter, or the companies he shops from. Better scholarship (or at least more rational) would have involved an initial sampling of all the relevant examples and final selection of the ones that would best illustrate the concepts (which I never really understood--see points 1 and 2). As other reviewers have pointed out, there is also "confirmatory bias" all over the place (presenting only the facts that fit with one's idea)--why aren't the counter arguments and counter-evidence better presented? The author must know that people buying a book on statistics will actually be smart enough to weigh the different sides of an issue. Like I said, I read to the end just to see if there was a "punch line" where the author confesses about his unapologetically intuitive approach to writing--that the book was a joke on the reader.
I would recommend this only to people who know very little about statistics and are unaware how companies like amazon.com use statistics to improve business. Such readers will be impressed. For everyone else...there are so many better books out there. Paul Meehl would be super-disappointed in this work.