- Taschenbuch: 375 Seiten
- Verlag: Manning; Auflage: Pap/Psc (12. Oktober 2011)
- Sprache: Englisch
- ISBN-10: 1935182684
- ISBN-13: 978-1935182689
- Größe und/oder Gewicht: 18,7 x 2,3 x 23,5 cm
- Durchschnittliche Kundenbewertung: 3 Kundenrezensionen
- Amazon Bestseller-Rang: Nr. 152.716 in Fremdsprachige Bücher (Siehe Top 100 in Fremdsprachige Bücher)
Mahout in Action (Englisch) Taschenbuch – 12. Oktober 2011
Wird oft zusammen gekauft
Kunden, die diesen Artikel gekauft haben, kauften auch
Es wird kein Kindle Gerät benötigt. Laden Sie eine der kostenlosen Kindle Apps herunter und beginnen Sie, Kindle-Bücher auf Ihrem Smartphone, Tablet und Computer zu lesen.
Geben Sie Ihre Mobiltelefonnummer ein, um die kostenfreie App zu beziehen.
Über den Autor und weitere Mitwirkende
Sean Owen has been a practicing software engineer for 9 years, most recently at Google, where he helped build and launch Mobile Web search. He joined Apache's Mahout machine learning project in 2008 as a primary committer and works as a Mahout consultant. Robin Anil is a committer at Mahout and works as a full-time Software Engineer at Google.
Welche anderen Artikel kaufen Kunden, nachdem sie diesen Artikel angesehen haben?
Due to the split into three parts and due to the missing structure, some topics keep repeating in each of these three parts (eg. distance metrics explainations) and waste too much space (and reading time), some important aspects are explained in (well hidden) 4th level sections while others (imho) less important aspects got much more attention in own main sections.
What I missed throughout the entire book was an explanation how to write (useful) unittests for the newly developed code - although Mahout and the Mahout Examples ship some good stuff within their sourcecode, I really can't think of a reason why it made sense to leave that out of the book.
Nevertheless if you're going to look at Mahout this book is still worth it's money, mainly because of the time savings you'll have and because it enables you to play around with all the algorithms within the library to earn some experience.
Book contains many examples (written against latest released version, 0.5), that could be very useful during read, and they could be reused in your own projects. So, if you're interested in machine learning on big data, then this book is highly recommended
Mahout abstracts math complexity away from dev work, and this book makes Mahout look easy.
Die hilfreichsten Kundenrezensionen auf Amazon.com (beta)
Mahout in Action is written and explained so well with simple real life explanations and definitely executable code that you can gather all the techniques you've heard/read about come right near your grasp. Just extend your arms and reach for that recommender or clusterer.
A big thanks to every Mahout contributor and double thanks to the authors.
Oh by the way! Order the book. At whatever price, this will save you hundreds of hours of reading and coding.
This book doesn't provide deep coverage of theoretical foundations of machine learning (I would recommend to look to other books, like Introduction to Machine Learning (Adaptive Computation and Machine Learning series), Machine Learning in Action or Programming Collective Intelligence: Building Smart Web 2.0 Applications, etc., if you want to get more background), but concentrates on explanation on how to use Apache Mahout ([...]) to solve some of machine learning problems: making recommendations, data clustering & classification.
For each of class of these problems, description starts with base things, and continues with more complex examples, including complete solutions, that could be easily adapted for your machine learning problems. All examples that come with book were checked with actual release of Apache Mahout (version 0.5).
Book is written in succinct, but understandable language and provides many code snippets that make understanding of topics much easier. Interesting solution in e-book version of Mahout in Action, is inclusion of audio & video snippets, that explains and/or show "hard places". There is also interesting description of one of Mahout's deployments in real world, where it's used in e-commerce.
So I recommend this book if you're interested in solving machine learning problems that works with very large data sets.
First the book use 0.5 of mahout while version 0.7 is the current release. Much of the code no longer works from the examples. Keeping the example code updated on the site would be a huge plus.
A useful start would have been discussing theory in chapter 7. Instead the theory is discussed in chapter 9.
Chatper 7 is a mishmash of distance measures, similarity and examples.
A thorough explanation of the output produced by clusterdumper would have been useful. With some knowledge of the algorithm you can figure out what c and r are and the numbers assigned to the vectors are. But taking a simple example and showing the actually hand calculation would be very useful to someone totaly new to clustering .
I don't like to be overly critical, the book has some good information, but its much more difficult to extract it than it should be.
I was delightfully surprised, this book covers a lot of the learning algorithms in thorough detail. It is great for people with no prior knowledge of how machine learning works, like I was. If you already understand some things about machine learning, you will probably get bored fast.
I did have a few gripes though:
I felt like the clustering chapters did a great job explaining the k-means algorithm, but just did a little hand-waving for the more advanced algorithms. For example, the explanation of the canopy algorithm did not make sense to me after reading it twice, and I feel like the Latent Dirichlet Analysis algorithm made no sense at all. I learned what these algorithms were good for, but still don't completely understand how they work under the hood. Perhaps they are just too complicated to explain in the book, maybe they belong in an appendix, I don't know.
I'm reading the Classification chapters now, and I must admit that it's a bit verbose. The authors are repeating themselves way too much in chapter 13. I think multiple authors contributed to chapter 13 without looking at each-other's work. On the plus side I feel like I understand it.
I have not tried doing anything yet, some other reviewer's said the examples are out of date. I have come to expect that now with hadoop books, the libraries are evolving fast and unfortunately hadoop is awful at keeping things backwards compatible. I expect to have to "interpret" the examples when I go and try them.
My only other wish is that the author's gave you a little more advice regarding maven dependencies, and imports. I would like to see the fully qualified class name of some of the imports in the examples. The maven section should stress the importance of using the same jar that is installed on your machine, you may need to add the dependency on the CDH version of mahout, or the plain version of mahout. It would be nice if they showed you an example pom.
I'm giving this four stars because the explanations of the machine learning algorithms are pretty good. I will try revising this when I have done more.
After having tried a little more, here are my gripes:
1. The book is a little too text-oriented, meaning it focuses so much on how to cluster and classify text documents, that it doesn't give you enough advice on how to work with data that is not text documents.
2. It offers no useful advice on how to normalize input data that is not a text document.
3. Sometimes it tells you how to do something using the command line, but doesn't tell you how to do it in code.
4. It doesn't explain how to make your own cluster input files. It explains the canopy method, a method to do that on the command line (assuming text documents), but never shows you how to just create that file in code. Like if I want the clusters to start at (0,0), (0,1), (1,0), and (1,1), the book doesn't explain how to make that sequence file, I had to figure it out on my own.