Doing Data Science: Straight Talk from the Frontline und über 1,5 Millionen weitere Bücher verfügbar für Amazon Kindle. Erfahren Sie mehr
EUR 27,95
  • Alle Preisangaben inkl. MwSt.
Auf Lager.
Verkauf und Versand durch Amazon.
Geschenkverpackung verfügbar.
Menge:1
Ihren Artikel jetzt
eintauschen und
EUR 8,85 Gutschein erhalten.
Möchten Sie verkaufen?
Zur Rückseite klappen Zur Vorderseite klappen
Anhören Wird wiedergegeben... Angehalten   Sie hören eine Probe der Audible-Audioausgabe.
Weitere Informationen
Alle 2 Bilder anzeigen

Doing Data Science: Straight Talk from the Frontline (Englisch) Taschenbuch – 18. Oktober 2013


Alle 2 Formate und Ausgaben anzeigen Andere Formate und Ausgaben ausblenden
Amazon-Preis Neu ab Gebraucht ab
Kindle Edition
"Bitte wiederholen"
Taschenbuch
"Bitte wiederholen"
EUR 27,95
EUR 24,94 EUR 19,90
70 neu ab EUR 24,94 5 gebraucht ab EUR 19,90

Wird oft zusammen gekauft

Doing Data Science: Straight Talk from the Frontline + Data Science for Business: What you need to know about data mining and data-analytic thinking + Big Data: A Revolution That Will Transform How We Live, Work and Think
Preis für alle drei: EUR 61,60

Die ausgewählten Artikel zusammen kaufen
Jeder kann Kindle Bücher lesen — selbst ohne ein Kindle-Gerät — mit der KOSTENFREIEN Kindle App für Smartphones, Tablets und Computer.


Produktinformation

  • Taschenbuch: 405 Seiten
  • Verlag: O'Reilly & Associates; Auflage: 1 (18. Oktober 2013)
  • Sprache: Englisch
  • ISBN-10: 1449358659
  • ISBN-13: 978-1449358655
  • Größe und/oder Gewicht: 15,2 x 2 x 22,9 cm
  • Durchschnittliche Kundenbewertung: 3.0 von 5 Sternen  Alle Rezensionen anzeigen (4 Kundenrezensionen)
  • Amazon Bestseller-Rang: Nr. 32.938 in Fremdsprachige Bücher (Siehe Top 100 in Fremdsprachige Bücher)

Mehr über die Autoren

Entdecken Sie Bücher, lesen Sie über Autoren und mehr

Produktbeschreibungen

Pressestimmen

"I enjoyed Rachel and Cathy's book, it's readable, informative, and like no other book I've read on the topic of statistics or data science." --Andrew Gelman Professor of statistics and political science, and director of the Applied Statistics Center at Columbia University "I got a lot out of Doing Data Science, finding the chapter organization on business problem specification, analytics formulation, data access/wrangling, and computer code to be very helpful in understanding DS solutions."--Steve Miller Co-founder, OpenBI, LLC, a Chicago-based business intelligence services firm

Über den Autor und weitere Mitwirkende

Cathy O'Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street. Rachel Schutt is a Senior Statistician at Google Research in the New York office and adjunct assistant professor at Columbia University. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her education-related research interests include curriculum design.

Welche anderen Artikel kaufen Kunden, nachdem sie diesen Artikel angesehen haben?

Kundenrezensionen

3.0 von 5 Sternen

Die hilfreichsten Kundenrezensionen

2 von 2 Kunden fanden die folgende Rezension hilfreich Von Rodrigo Rivera am 30. März 2014
Format: Taschenbuch
Data Science ist immer noch ein sehr schwammiger Begriff, einige behaupten es ist ein schöner Name für Statistik, andere sagen es handelt sich um das neue Business Intelligence aber für Big Data und wiederum andere glauben, dass Data Science ein komplett neues Thema ist. Denn es gibt weder Konsensus noch eine offizielle Definition. Allerdings handelt es sich um ein sehr sexy Thema zur Zeit und fast jeder Verlag hat mittlerweile ein Buch diesbezüglich im Angebot; O'reilly macht es nicht anders.

Doing Data Science: Straight Talk from the Frontline versucht, eine Einführung zum Thema zu sein, ohne eine große Mathematik-Theorie dahinter. Vielmehr will von dem Alltag von Data Scientists erzählen und ein Basisverständnis für Data Science schaffen.

Die Autorinnen sind bekannt in der Szene: Cathy O'Neill, ist eine bekannte Bloggerin (mathbabe) mit einem sehr starken mathematischen Hintergrund, und Rachel Schutt lehrt an der Columbia University in New York. All dies sind die richtigen Bedingungen für ein gutes Buch über das Thema: Erfahrene Expertinnen, die sich sehr gut schriftlich ausdrücken können und im Data Science tätig sind.

Allerdings liegt hier genau das Problem, dass Buch ist weder ein Fachbuch noch ein Roman. Es fühlt sich genau wie eine Sammlung von Blog-Einträgen oder einen längeren Magazinartikel. Denn einige Kapitel sind Gastbeiträge von anderen Experten oder Studenten des erwähnten Kurses. Noch dazu ist das Buch an sich eine Ansammlung von Präsentationen und Vorträgen der Data Science Vorlesung an der Columbia University. Somit ist das Stil in jedem Kapitel etwas anders.
Lesen Sie weiter... ›
1 Kommentar War diese Rezension für Sie hilfreich? Ja Nein Feedback senden...
Vielen Dank für Ihr Feedback. Wenn diese Rezension unangemessen ist, informieren Sie uns bitte darüber.
Wir konnten Ihre Stimmabgabe leider nicht speichern. Bitte erneut versuchen
Format: Taschenbuch
First off - this book is bloated with "historical context" of Data Science and the personal experiences of the authors. Not only contain these sections no relevant information, they also bog down the reading process. There ist no "straight" talk from the frontline in this book. Many keywords and phrases specific to Data Science are also not explained, further hindering the process of understanding this topic.
The first chapters would be expected to explain what Data Science is - and fail. This is exemplary for the entire book. It talks about methods, but not how these methods are used to to Data Science. Statistical Inference, Exploratory Data Analysis and so on are touched upon (not thoroughly explained), but not how a Data Scientist would use them, how she would interpret the results, what part of the results would be particularly interesting and what decisions she would make based upon these results. A clear example how a Data Scientist would tackle a particular problem, describing in Details what steps she would take and most importantly: WHY; would have been necessary to make heads and tails of the information presented in this book.

A particular shining example of all these problems are the "exercises" in this book: For example, the second chapter contains an exercise about a real estate house buying company and asks the reader to formulate a "data strategy" for this company, based on its website data and to analyze this data for anything unusual...
If you have no prior knowledge about real-estate house buying, you wil have trouble even understanding what this company is actually doing, what seperates it from its competitors and why it earns money this way. It is really that badly explained. And analyzing the company's website? How? What?
Lesen Sie weiter... ›
Kommentar War diese Rezension für Sie hilfreich? Ja Nein Feedback senden...
Vielen Dank für Ihr Feedback. Wenn diese Rezension unangemessen ist, informieren Sie uns bitte darüber.
Wir konnten Ihre Stimmabgabe leider nicht speichern. Bitte erneut versuchen
Format: Taschenbuch
“Data Science” has become one of the most trendy research fields in recent years, as well as a catchall rubric for various job descriptions and work functions. The cynics and skeptics, and there are many of those, contend that “Data Science” is nothing more than repackaged Statistics, with a bit of coding and hacking thrown in. Its proponents, however, point out that most practicing data scientists use a variety of skills and techniques in their daily work, and come from a vast spectrum of career paths and backgrounds. I tend to side with the latter group, but I too am an outsider to this field and am still trying to get a better understanding of what it really entails.

“Doing Data Science: Straight Talk from the Frontline” is a compendium of chapters that deal with data science as it is practiced in the real world. Each chapter is written by a different author, all of who have significant practical experience and are acknowledged authorities on data science. Most of the contributors work in industry, but data science is still so fresh and new that there is a lot of crossing over between academia and the corporate world.

A few of the chapters include exercises, but these tend to be too advanced and assume too much background material for an introductory book. The exercises still give you a good idea of what kinds of problems data scientists tend to grapple with. However, this book is definitely not a textbook and cannot be effectively used as such. The book doesn’t provide any background on R, statistics, data scrubbing, machine learning, and various other techniques used by data scientist. It is highly unlikely that any single textbook would be able to do justice to all of that material anyways, but a book of that sort could still have a lot of potential use.
Lesen Sie weiter... ›
Kommentar War diese Rezension für Sie hilfreich? Ja Nein Feedback senden...
Vielen Dank für Ihr Feedback. Wenn diese Rezension unangemessen ist, informieren Sie uns bitte darüber.
Wir konnten Ihre Stimmabgabe leider nicht speichern. Bitte erneut versuchen

Die hilfreichsten Kundenrezensionen auf Amazon.com (beta)

Amazon.com: 38 Rezensionen
79 von 79 Kunden fanden die folgende Rezension hilfreich
More breadth than depth 28. Dezember 2013
Von Carsten Jørgensen - Veröffentlicht auf Amazon.com
Format: Taschenbuch
Book review - Doing Data Science by O'Neil and Schutt, O'Reilly Media.

More breadth than depth

What is data science? The book Doing Data Science not only explains what data science is but also provides a broad overview of methods and techniques that one must master in order to call one self a data scientist. The book is based on a course about data science given at Columbia University. However it is not to be considered as a text book about data science but more as a broad introduction to a number of topics in data science.

In the spring of 2013 I followed two Coursera courses. One about the statistical programming language R and one on Data Analysis. I had for some time been looking for a book that could be used as a follow-up reading on topics in data science. This was the reason I picked up "Doing Data Science".

The book begins with a chapter about what data science is all about is followed by four chapters on topics like statistical inference, explanatory data analysis, various machine learning algorithms, linear and logistic regression, and Naive Bayes. I have a background in both mathematics and statistics and I was able to understand these chapters but the material is covered in such broad terms that I find it hard to believe that a newcomer to this topics will understand or gain much knowledge from reading these chapters. Basic math is presented about the models but without some kind of detailed explanation one cannot develop any deeper intuition for the approach explained.

The best parts of the book is definitely chapter 6 to 8 and 10. In here we find interesting discussion about coverage of data science applied to financial modeling, extracting information from data, and social networks. I really enjoyed the examination of time stamped data, the Kaggle Model, feature selection, and case-attribute data versus social network data. The math behind these topics was however once again explained quite superficial. Centrality measures is central to social network analysis but it is very hard to develop intuition for there measures without a more detailed explanation about the underlying math. These chapters contains lots of useful resources for finding additional information about the discussed topics.

Data visualization is an integral part of data science for communication results. Beginners in the field of data science needs concrete and easy to follow instruction on how to get started with visualization. Unfortunately the book focuses more on the use of data visualization in modern art projects. The content is simply to abstract for beginners to learn about the usage of visualization in data science.

When I was browsing the book before actual buying it I was kind thrilled to see that it covered topics like causality and epidemiology. Topics that I did not found covered in any other book about data science. However the chapter about epidemiology is not about using data science in epidemiology but 'just' about using data science to evaluate the methods used in epidemiology. Likewise there seems to be no link between data science and causality. I later discovered that the authors used an entire blog post ([...] to explain why causality was part of the university course underlying the book. This material or parts of it should have made it into the book. I am still not convinced that causality is a topic in data science.

There are several examples in which the book assumes the reader to have knowledge of US government structure and organizations. Examples include page 292 when discussing US health care databases and page 298 where FDA is mentioned without further introduction or explanation about what FDA is.

A book than contains programming examples should always make the code accessible to download. Typing in the code yourself is simply waste of time. It is possible to download some of the datasets used in the book through GitHub. But the code does not seem to be available. I also own the electronic version of the book and I tried to copy-paste some of the examples from the e-book but there are several examples of code that hasn't been proof written or tested prior to publication. The sample code misses references to required R libraries or refers to computer folder structures on some local Columbia University computer. The companion datasets that can be downloaded on GitHub consists of a number of Excel files. The R sample code uses the gdata package to load these Excel files into R for further analysis. It took be quite some time to figure out why this process didn't work on a Windows computer. The gdata package requires Perl to be installed on the computer and this is not default software on Windows. In my opinion one should always publish data in a simple format, e.g. csv files and definitely not proprietary formats like xls for Excel files.

Data Science is both science and a lot of practical experience. I guess the title of the book Doing Data Science tries to capture that. You need to do data science in order to learn it. The covered topics are interesting but the material is more breadth than depth. Luckily there are lots of useful links and resources to additional materials. Personally I would prefer more details about the actual data science topics like e.g. extracting meaning from data and social network analysis and less focus on math. The book already requires some knowledge of math, statistics and programming, so why not presume that the reader has the background knowledge and dive straight into the data science discussions.

I really like the idea about having a lot of different people present various topics in data science and the book is well written and contains lots of useful resources for further studies of data science. I will recommend to book to people new to the subject but be aware of the fact that source code is not available and that is a major drawback.

Disclosure: I review for the O'Reilly Reader Review Program and I want to be transparent about my reviews so you should know that I received a free copy of this ebooks in exchange of my review.
38 von 40 Kunden fanden die folgende Rezension hilfreich
Doing Data Science Worth a Look 19. November 2013
Von Dan D. Gutierrez - Veröffentlicht auf Amazon.com
Format: Taschenbuch
I found this book to be a very odd bird indeed. It is one book you can read from back cover to front cover and not be at a disadvantage. This is because the book is really just a collection of presentations made by various people to a class taught by the primary author Rachel Schutt at Columbia University in the Fall of 2012 – Introduction to Data Science. It wasn’t entirely clear what content Schutt was directly responsible for since only some of the chapters indicate who the contributors were (one of the chapters was contributed by a group of her students!). The co-author, Cathy O’Neil, I’ve encountered before as an outspoken blogger going by the name “mathbabe” but it wasn’t specifically stated how she became part of the book project, other than to say she was one of the students in Schutt’s class. Chapter 6 was partly written by O’Neil.

Both Schutt and O’Neil are Ph.D.s data science appropriate fields, but the book was not “written” by the two, rather they seemed to have performed some kind of editing function with the materials submitted by each contributor and added commentaries of their own. As a result, the book is a hodgepodge of anecdotes, factoids, R code snippets, plots, and mathematics, all from the in-class presentations. I enjoy seeing math in data science books, but the equations in this book were sort of just floating there requiring the reader to explore further at another time.

Although I have issues with the book as it is not any sort of text for the field, I did enjoy reading it with a number of “Ah, I didn’t know that!” moments. Schutt’s credentials in data science are considerable, having worked at Google for a few years around the same time that “data science” was growing up in Silicon Valley. As a result the book has many memorable anecdotes about the early days of the data science industry, and observations about what makes big data tick. I enjoyed the story about the Google software engineer who accidentally deleted 10 petabytes of data, and I think my favorite quote from the book is from the student’s chapter 15:

Kaggle competitions could be described as the dick-measuring contests of data science.

With contributor’s chapters on statistical inference, machine learning algorithms, logistic regression, financial modeling, recommendation engines, data visualization, Hadoop, MapReduce, and more, I’d say the book is worth a read, but not necessarily as a source of learning data science but more as a high-level guide and short historical account of this young industry. You get to learn about the people, companies, technologies that have collectively built the data science arena and you’ll be better for it especially if you are working to become a data scientist yourself.
64 von 77 Kunden fanden die folgende Rezension hilfreich
A spoonful of sugar... 29. Oktober 2013
Von Dimitri Shvorob - Veröffentlicht auf Amazon.com
Format: Kindle Edition
... helps the medicine go down, as Mary Poppins used to say. An IT-focused publisher, O'Reilly has twice before used the "book as collection of chapters by different contributors" formula in its foray into the attractive "data" niche, with such titles as "Beautiful data" and "Bad data". "Doing data science" - by the way, I prefer Hastie and Tibshirani's "statistical learning" to the fuzzy and grandiose "data science" - follows the same approach, but, with its subject matter being closer to the academe, the company enlisted two young PhDs to steer the collaborative effort. Rachel Schutt took the lead as author and editor, and, assisted by Cathy O'Neil, produced an engaging, informal - you don't often see "science" in the title and "huge-ass" in the text - yet sufficiently technical to be hands-on, sequence-of-vignettes-styled book. Imagine a mash-up of a magazine article and a textbook. Neither part may be best-in-class, but their combination makes for a "unique selling proposition".

Well, maybe not a textbook. Most textbooks are carefully written and carefully checked. In contrast, when I see "Doing data science" introduce the ROC curve in three places, one of which translates the "O" as "operator", I can guess that this is a copy-paste of papers by three contributors. When Dr. O'Neil casually redefines an English word ("causal") to avoid rewriting a couple of sentences, or pronounces, on page 159, that "priors reduce degrees of freedom" - this is painfully meaningless, and neither term is defined, only name-checked - I suspect that she knows better, but just did not feel like spending more time on her half-chapter. Neither author speaks of their own projects - if this is the "frontline", then it's other soldiers' "trenches" that we are visiting. The occasional code listings are borrowed as well, thrown in without editing or comments. In this last regard, "Doing data science" lags far behind the book that seems to have informed its choice of topics, Peter Harrington's "Machine learning in action". (That's one suggestion - and if you want a good, accessible textbook, "Introduction to statistical learning" by James et al. is another).

None of it is going to matter to the book's target audience. "Doing data science" is aimed at beginners - and is bound to be interesting and useful to thousands of keen undergrads and adult learners.
10 von 12 Kunden fanden die folgende Rezension hilfreich
Not so much about Doing Data Science as about what is covers 25. März 2014
Von Marc Zucker - Veröffentlicht auf Amazon.com
Format: Kindle Edition
"Doing Data Science: Straight Talk from the Frontline" by Cathy O’Neil and Rachel Schutt; O'Reilly Media

With so many books being published on Data, from Big Data to Machine Learning to Data Analysis, we must ask what yet another book is going to offer us. And the answer is not that much. O’Neil and Schutt have seemingly written this to give us a view of how the ins and outs of the actual world of Data Science are practiced. But we get little more than an overview of what Data Analysis is in general.

Many topics a reader might find interesting, whether for background knowledge or otherwise, are given little emphasis, and even then, without much depth. As an example, a discussion on the Exponential Distribution merely states that “because we are familiar with the fact that ‘waiting time’ is a common enough real-world phenomenon that a distribution called the exponential distribution has been invented to describe it.” Any mathematician realizes that inventing probability distributions are a little more involved than is implied.

The main part of the book starts with a look at algorithms. The authors use R as their primary language. There is little in terms of explanation of the underlying processes and examples are pretty direct. Naïve Bayes, for example, is explained in a page and a half, so we can clearly say that this is an overview of many of the ideas going into Data Science. In fact it is a very wide overview. Financial Modelling, Spam Filtering, Epidemiology, the list goes on. This is definitely a plus; we see the wide applications that Data Science has.

The book ends with a discussion on Competitions, but we come back to the question of what we have gained more than we might get by a simple perusal of the web? Maybe I was hoping for more out of the book than I got; expectations can be a killer. But in the end we must make a choice. If we would like to know about Data Science – how it’s done – then we would probably like to look elsewhere. If we would like to have some book that we can show someone what areas exist within this field and what topics are touched upon in it (perhaps for an advisor), then this book should do fine.

(FTC disclosure (16 CFR Part 255): The reviewer has accepted a reviewer's copy of this book which is his to keep. He intends to provide an honest, independent, and fair evaluation of the book in all circumstances.)

[...]
2 von 2 Kunden fanden die folgende Rezension hilfreich
Getting a Flavor of What Data Science is All About 16. April 2014
Von Dr. Bojan Tunguz - Veröffentlicht auf Amazon.com
Format: Taschenbuch Verifizierter Kauf
“Data Science” has become one of the most trendy research fields in recent years, as well as a catchall rubric for various job descriptions and work functions. The cynics and skeptics, and there are many of those, contend that “Data Science” is nothing more than repackaged Statistics, with a bit of coding and hacking thrown in. Its proponents, however, point out that most practicing data scientists use a variety of skills and techniques in their daily work, and come from a vast spectrum of career paths and backgrounds. I tend to side with the latter group, but I too am an outsider to this field and am still trying to get a better understanding of what it really entails.

“Doing Data Science: Straight Talk from the Frontline” is a compendium of chapters that deal with data science as it is practiced in the real world. Each chapter is written by a different author, all of who have significant practical experience and are acknowledged authorities on data science. Most of the contributors work in industry, but data science is still so fresh and new that there is a lot of crossing over between academia and the corporate world.

A few of the chapters include exercises, but these tend to be too advanced and assume too much background material for an introductory book. The exercises still give you a good idea of what kinds of problems data scientists tend to grapple with. However, this book is definitely not a textbook and cannot be effectively used as such. The book doesn’t provide any background on R, statistics, data scrubbing, machine learning, and various other techniques used by data scientist. It is highly unlikely that any single textbook would be able to do justice to all of that material anyways, but a book of that sort could still have a lot of potential use.

There are two groups of people who would benefit from this book. The first are people who have absolutely no background in data science or any of its related fields, but would like to get a flavor of what data science is all about and are interested in exploring it for career purposes. The second group are people with significant technical background in one of the fields related to data science (programming, statistics, machine learning, etc.) who are interested in broadening their skills and would like to see how would their particular strengths fit within the broader data science field.
Waren diese Rezensionen hilfreich? Wir wollen von Ihnen hören.