In weniger als einer Minute können Sie mit dem Lesen von Agile Data Science auf Ihrem Kindle beginnen. Sie haben noch keinen Kindle? Hier kaufen Oder fangen Sie mit einer unserer gratis Kindle Lese-Apps sofort an zu lesen.

An Ihren Kindle oder ein anderes Gerät senden


Kostenlos testen

Jetzt kostenlos reinlesen

An Ihren Kindle oder ein anderes Gerät senden

Jeder kann Kindle Bücher lesen  selbst ohne ein Kindle-Gerät  mit der KOSTENFREIEN Kindle App für Smartphones, Tablets und Computer.
Agile Data Science: Building Data Analytics Applications with Hadoop

Agile Data Science: Building Data Analytics Applications with Hadoop [Kindle Edition]

Russell Jurney

Kindle-Preis: EUR 17,30 Inkl. MwSt. und kostenloser drahtloser Lieferung über Amazon Whispernet

Weitere Ausgaben

Amazon-Preis Neu ab Gebraucht ab
Kindle Edition EUR 17,30  
Taschenbuch EUR 24,95  



Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.

Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.

  • Create analytics applications by using the agile big data development methodology
  • Build value from your data in a series of agile sprints, using the data-value stack
  • Gain insight by using several data structures to extract multiple features from a single dataset
  • Visualize data with charts, and expose different aspects through interactive reports
  • Use historical data to predict the future, and translate predictions into action
  • Get feedback from users after each sprint to keep your project on track

Über den Autor und weitere Mitwirkende

Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.


Mehr über den Autor

Entdecken Sie Bücher, lesen Sie über Autoren und mehr

Welche anderen Artikel kaufen Kunden, nachdem sie diesen Artikel angesehen haben?


Es gibt noch keine Kundenrezensionen auf
5 Sterne
4 Sterne
3 Sterne
2 Sterne
1 Sterne
Die hilfreichsten Kundenrezensionen auf (beta) 3.8 von 5 Sternen  6 Rezensionen
8 von 8 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Absolute required reading for all new data scientists 3. Januar 2014
Von chad - Veröffentlicht auf
Format:Taschenbuch|Verifizierter Kauf
I was once told by a chief data scientist that they would rather teach a mathematician programming than a programmer math (to be a data scientist). After being a data scientist for some time now I would have to respectfully disagree. 85% of data science is plumbing and I wouldn't hire a physicist to be a plumber. Indeed modern data scientists really do need to be full-stack developers trapped in an academic's body.
Jurney nails it! He offers tools and methodologies adapted to common data science workflows and their associated pitfalls wherein we spend 85% of our time plumbing and 15% of our time integrating some off-the-shelf algorithm to find deep insight.
So, for new data scientists or 3rd-4th year grad students who have balanced their Twitter API hack with NSF grant deadlines, this is ABSOLUTELY REQUIRED READING.
7 von 7 Kunden fanden die folgende Rezension hilfreich
4.0 von 5 Sternen Chapter 3 alone is worth the price of the entire book. 5. Februar 2014
Von Carsten Jørgensen - Veröffentlicht auf
Book review - Agile Data Science by Russell Jurney, O'Reilly Media

The subtitle "Building Data Analytics Applications with Hadoop" of this book says more about the book than the actual title "Agile Data Science". However the subtitle will probably fool most people. Before reading this book I believed that Hadoop with the the distributed file-system HDFS. If you are looking for a book about building applications on the of HDFS then this book IS NOT for you. It turns out that Hadoop is much more than just HDFS.

Do not buy this book for learning about agile software development methodologies. There are some rather strange comments about personal and private space requirement for creative workers as well as mentioning of "Easy access to large-format printing is a requirement for the agile environment." The discussion about agile methods for working with data science is interesting. The basic question is if it is possible to bridge agile methods and data science since science in it's nature does not consists of a predefined set of tasks. It seems to me that the tools and software used in chapter 3 are called agile an hence is the process agile. In part II of the book the application build is chapter 3 is refined in a number of steps that the author calls iterative. But again, that does not make the process agile. I am not saying that the author is wrong but the point about the agile method and how process and tools interact to make the development agile is not entirely clear to me.

This is NOT a book about the inner workings of Hadoop. Please refer to "Hadoop: The Definitive Guide" by Tom White for O'Reilly Media for a thorough introduction to Hadoop. Instead the book takes a very practical approach and show us how to build agile applications using various Hadoop components like Pig, MapReduce, and the Avro serialization framework. In addition you will see how to move data into the popular noSQL database MongoDB and how to use ElasticSearch to search the data. Finally, all the collected data is accesses through a lightweight web application build with Python and Flask with visual enhancement made in Bootstrap and D3.

Agile Data Science covers a lot of material and uses lots of different software and tools. If you want to run the examples in the book you have two options 1) a user-contributed Linux Vagrant image is available with most of the required software or 2) you can follow along the instructions given in the book and the accompanied Github project and install the software yourself. In either case you have to pay close attention to software versions. All of the examples work but it does require some effort the get them running and if you feel uncomfortable using a terminal and command line you might have a hard time playing with the examples.

Being able to work in an agile way with data science is quite important but I do not feel that the attempt made by the author convinced me that the suggested framework will work in a practical setting.
The main value of this book is definitely chapter 3 where Jurney show us how to go from zero to a working data science application. The application is literally build from ground up starting with data collection over storing data to build a web front-end. This chapter is alone worth the price of the entire book.

Part II of the books contains interesting material about data visualizations and prediction models. For many readers some prior knowledge about Naive Bayes and the Natural Language Toolkit would most likely be useful to fully understand the implications of the predictions made around what makes an email likely to receive a response.

I review for the O`Reilly Reader Review Program and I want to be transparent about my reviews so you should know that I received a free copy of this ebook in exchange of my review.
3 von 3 Kunden fanden die folgende Rezension hilfreich
1.0 von 5 Sternen He sent this with NON-WORKING CODE 9. Juni 2014
Von Sean Franks - Veröffentlicht auf
Format:Kindle Edition|Verifizierter Kauf
The story is nice, but the code that forms the basis of the entire project behind the book DOESN'T COMPILE. The author has - as of today (June 9, 2014) - removed all of the github references to the project.

I"m half way through the book, have been practicing Agile development techniques for several years, and I am not quite sure what in particular makes this book about Data Science 'Agile' based.

One thing that he does nicely is explain the Pig code he uses, but I can't use those programs because the Python programs that gather the data that feed Pig will not compile, even after I de-bugged his code for several hours. (Example: the author made reference to an RFC inline in the Python code that would have NEVER compiled. NEVER. Line 11 from call to email utilitiies)
1 von 1 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen This book plunges right into the meat of data science 1. März 2014
Von gatorgirl - Veröffentlicht auf
Format:Taschenbuch|Verifizierter Kauf
I really like the introduction - it gives a solid and good overview of how a data shop functions, and the different types of organizational roles.

After that, the book moves pretty fast. The sections are brief, but thorough. The author is forward thinking enough to put his examples on GitHub, so that was really helpful.

Overall, I really recommend this book. There was one section I had a lot of trouble with, and it was mostly versioning errors with Hadoop and Mongo, other than that, everything's been pretty straight forward.
3.0 von 5 Sternen Could be useful for data scientists who need a familiarity with deployment environments. 17. Juni 2014
Von K. Luangkesorn - Veröffentlicht auf
One of the problems with data science is that any description of what is encountered takes on the appearance of a mythical unicorn, noone person could possibly have all of the skills required. And it gets worse when you add to the standard set of statistics, domain knowledge, and programming the ability to deploy the application into a high speed environment. This book is not going to make a data scientist an expert in running a data center, but it is useful to give someone who has the rest of the skills an understanding of the environment their work will be deployed into.

One of the conflicts between the data scientist/analyst and information technology groups is that while the data scientist gives the data owned by the organization its value, IT is charged with storing the data and providing the access. And in a high velocity, high volume environment of big data, not understanding how the architecture works can lead to the data scientist creating valid solutions that cannot be applied in the actual day to day working environment. That is where this book comes in. The book has associated virtual machines in software repository so that the data scientist who does not know anything about infrastructure and the software stack that the data and the analysis rides on can see how everything fits together.

The book title is misleading. This is not a book about data analytics. This is a book for data analysts so they know how their analytical application is deployed and applied to day-to-day use in enterprise environments. For that reason it is useful.

Disclaimer: I received a free electronic copy of this book as part of the Oreilly Press Blogger program.
Waren diese Rezensionen hilfreich?   Wir wollen von Ihnen hören.

Kunden diskutieren

Das Forum zu diesem Produkt
Diskussion Antworten Jüngster Beitrag
Noch keine Diskussionen

Fragen stellen, Meinungen austauschen, Einblicke gewinnen
Neue Diskussion starten
Erster Beitrag:
Eingabe des Log-ins

Kundendiskussionen durchsuchen
Alle Amazon-Diskussionen durchsuchen

Ähnliche Artikel finden