This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.
However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.
In particular:
- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.
- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.
- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).
- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.
- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.
- I concur with a previous review in that "Workshop" sections just present an ultra-short overview of some open source tools. A quick search in your favourite engine will display much more informative introductions (even quick start guides).
- Today, effective data analysis heavily depends on using the best possible implementation. While I might find educational to learn some of this implementations, in a real situation it is much better to rely on precise implementations of algorithms already available (e.g. libraries in GNU R).
All in all, I still recommend "R in a Nutshell" for a gentle introduction to data analysis with an open source tool (GNU R). It also has some inaccuracies and typos, but at least it's much more informative and clear. Besides, it does include an R package with all datasets and examples, ready to be installed and explored.