Doing Data Science: Straight Talk from the Frontline und über 1,5 Millionen weitere Bücher verfügbar für Amazon Kindle. Erfahren Sie mehr


oder
Loggen Sie sich ein, um 1-Click® einzuschalten.
oder
Mit kostenloser Probeteilnahme bei Amazon Prime. Melden Sie sich während des Bestellvorgangs an.
Jetzt eintauschen
und EUR 9,50 Gutschein erhalten
Eintausch
Alle Angebote
Möchten Sie verkaufen? Hier verkaufen
Der Artikel ist in folgender Variante leider nicht verfügbar
Keine Abbildung vorhanden für
Farbe:
Keine Abbildung vorhanden

 
Beginnen Sie mit dem Lesen von Doing Data Science: Straight Talk from the Frontline auf Ihrem Kindle in weniger als einer Minute.

Sie haben keinen Kindle? Hier kaufen oder eine gratis Kindle Lese-App herunterladen.

Doing Data Science: Straight Talk from the Frontline [Englisch] [Taschenbuch]

Cathy O'Neil , Rachel Schutt
3.7 von 5 Sternen  Alle Rezensionen anzeigen (3 Kundenrezensionen)
Preis: EUR 27,95 kostenlose Lieferung. Siehe Details.
  Alle Preisangaben inkl. MwSt.
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Nur noch 3 auf Lager (mehr ist unterwegs).
Verkauf und Versand durch Amazon. Geschenkverpackung verfügbar.
Lieferung bis Dienstag, 15. Juli: Wählen Sie an der Kasse Morning-Express. Siehe Details.

Weitere Ausgaben

Amazon-Preis Neu ab Gebraucht ab
Kindle Edition EUR 17,30  
Taschenbuch EUR 27,95  

Kurzbeschreibung

18. Oktober 2013
Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that's so clouded in hype? This insightful book, based on Columbia University's Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you're familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: * Statistical inference, exploratory data analysis, and the data science process * Algorithms * Spam filters, Naive Bayes, and data wrangling * Logistic regression * Financial modeling * Recommendation engines and causality * Data visualization * Social networks and data journalism * Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O'Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Wird oft zusammen gekauft

Doing Data Science: Straight Talk from the Frontline + Data Science for Business: What you need to know about data mining and data-analytic thinking
Preis für beide: EUR 52,90

Die ausgewählten Artikel zusammen kaufen

Kunden, die diesen Artikel gekauft haben, kauften auch


Produktinformation


Mehr über die Autoren

Entdecken Sie Bücher, lesen Sie über Autoren und mehr

Produktbeschreibungen

Pressestimmen

"I enjoyed Rachel and Cathy's book, it's readable, informative, and like no other book I've read on the topic of statistics or data science." --Andrew Gelman Professor of statistics and political science, and director of the Applied Statistics Center at Columbia University "I got a lot out of Doing Data Science, finding the chapter organization on business problem specification, analytics formulation, data access/wrangling, and computer code to be very helpful in understanding DS solutions."--Steve Miller Co-founder, OpenBI, LLC, a Chicago-based business intelligence services firm

Über den Autor und weitere Mitwirkende

Cathy O'Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street. Rachel Schutt is a Senior Statistician at Google Research in the New York office and adjunct assistant professor at Columbia University. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her education-related research interests include curriculum design.

Welche anderen Artikel kaufen Kunden, nachdem sie diesen Artikel angesehen haben?


Kundenrezensionen

3.7 von 5 Sternen
3.7 von 5 Sternen
Die hilfreichsten Kundenrezensionen
2 von 2 Kunden fanden die folgende Rezension hilfreich
Format:Taschenbuch
Data Science ist immer noch ein sehr schwammiger Begriff, einige behaupten es ist ein schöner Name für Statistik, andere sagen es handelt sich um das neue Business Intelligence aber für Big Data und wiederum andere glauben, dass Data Science ein komplett neues Thema ist. Denn es gibt weder Konsensus noch eine offizielle Definition. Allerdings handelt es sich um ein sehr sexy Thema zur Zeit und fast jeder Verlag hat mittlerweile ein Buch diesbezüglich im Angebot; O'reilly macht es nicht anders.

Doing Data Science: Straight Talk from the Frontline versucht, eine Einführung zum Thema zu sein, ohne eine große Mathematik-Theorie dahinter. Vielmehr will von dem Alltag von Data Scientists erzählen und ein Basisverständnis für Data Science schaffen.

Die Autorinnen sind bekannt in der Szene: Cathy O'Neill, ist eine bekannte Bloggerin (mathbabe) mit einem sehr starken mathematischen Hintergrund, und Rachel Schutt lehrt an der Columbia University in New York. All dies sind die richtigen Bedingungen für ein gutes Buch über das Thema: Erfahrene Expertinnen, die sich sehr gut schriftlich ausdrücken können und im Data Science tätig sind.

Allerdings liegt hier genau das Problem, dass Buch ist weder ein Fachbuch noch ein Roman. Es fühlt sich genau wie eine Sammlung von Blog-Einträgen oder einen längeren Magazinartikel. Denn einige Kapitel sind Gastbeiträge von anderen Experten oder Studenten des erwähnten Kurses. Noch dazu ist das Buch an sich eine Ansammlung von Präsentationen und Vorträgen der Data Science Vorlesung an der Columbia University. Somit ist das Stil in jedem Kapitel etwas anders.
Lesen Sie weiter... ›
War diese Rezension für Sie hilfreich?
4.0 von 5 Sternen Getting a Flavor of What Data Science is All About 16. April 2014
Format:Taschenbuch
“Data Science” has become one of the most trendy research fields in recent years, as well as a catchall rubric for various job descriptions and work functions. The cynics and skeptics, and there are many of those, contend that “Data Science” is nothing more than repackaged Statistics, with a bit of coding and hacking thrown in. Its proponents, however, point out that most practicing data scientists use a variety of skills and techniques in their daily work, and come from a vast spectrum of career paths and backgrounds. I tend to side with the latter group, but I too am an outsider to this field and am still trying to get a better understanding of what it really entails.

“Doing Data Science: Straight Talk from the Frontline” is a compendium of chapters that deal with data science as it is practiced in the real world. Each chapter is written by a different author, all of who have significant practical experience and are acknowledged authorities on data science. Most of the contributors work in industry, but data science is still so fresh and new that there is a lot of crossing over between academia and the corporate world.

A few of the chapters include exercises, but these tend to be too advanced and assume too much background material for an introductory book. The exercises still give you a good idea of what kinds of problems data scientists tend to grapple with. However, this book is definitely not a textbook and cannot be effectively used as such. The book doesn’t provide any background on R, statistics, data scrubbing, machine learning, and various other techniques used by data scientist. It is highly unlikely that any single textbook would be able to do justice to all of that material anyways, but a book of that sort could still have a lot of potential use.
Lesen Sie weiter... ›
War diese Rezension für Sie hilfreich?
0 von 1 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Do read it. It is even fun... 19. Februar 2014
Format:Kindle Edition|Verifizierter Kauf
If you enjoy being a wizard who can fortell the future :)
The book is a superb start to data science. Has it all and gives you clues as how to continue building up your knowlege. Gives you some basic R code samples that are priceless for a quick and practical learning. Beware, however, that this book just gets you started and then one has to follow the threads.
If you know nothing about statistics, perhaps you should take an introductory book or course on statistics and probability.
I did study math 20 years ago and never used stats or probability for work until recently. This book got me going, so you need nothing but rudimentary information.
War diese Rezension für Sie hilfreich?
Die hilfreichsten Kundenrezensionen auf Amazon.com (beta)
Amazon.com: 4.3 von 5 Sternen  32 Rezensionen
49 von 49 Kunden fanden die folgende Rezension hilfreich
3.0 von 5 Sternen More breadth than depth 28. Dezember 2013
Von Carsten Jørgensen - Veröffentlicht auf Amazon.com
Format:Taschenbuch
Book review - Doing Data Science by O'Neil and Schutt, O'Reilly Media.

More breadth than depth

What is data science? The book Doing Data Science not only explains what data science is but also provides a broad overview of methods and techniques that one must master in order to call one self a data scientist. The book is based on a course about data science given at Columbia University. However it is not to be considered as a text book about data science but more as a broad introduction to a number of topics in data science.

In the spring of 2013 I followed two Coursera courses. One about the statistical programming language R and one on Data Analysis. I had for some time been looking for a book that could be used as a follow-up reading on topics in data science. This was the reason I picked up "Doing Data Science".

The book begins with a chapter about what data science is all about is followed by four chapters on topics like statistical inference, explanatory data analysis, various machine learning algorithms, linear and logistic regression, and Naive Bayes. I have a background in both mathematics and statistics and I was able to understand these chapters but the material is covered in such broad terms that I find it hard to believe that a newcomer to this topics will understand or gain much knowledge from reading these chapters. Basic math is presented about the models but without some kind of detailed explanation one cannot develop any deeper intuition for the approach explained.

The best parts of the book is definitely chapter 6 to 8 and 10. In here we find interesting discussion about coverage of data science applied to financial modeling, extracting information from data, and social networks. I really enjoyed the examination of time stamped data, the Kaggle Model, feature selection, and case-attribute data versus social network data. The math behind these topics was however once again explained quite superficial. Centrality measures is central to social network analysis but it is very hard to develop intuition for there measures without a more detailed explanation about the underlying math. These chapters contains lots of useful resources for finding additional information about the discussed topics.

Data visualization is an integral part of data science for communication results. Beginners in the field of data science needs concrete and easy to follow instruction on how to get started with visualization. Unfortunately the book focuses more on the use of data visualization in modern art projects. The content is simply to abstract for beginners to learn about the usage of visualization in data science.

When I was browsing the book before actual buying it I was kind thrilled to see that it covered topics like causality and epidemiology. Topics that I did not found covered in any other book about data science. However the chapter about epidemiology is not about using data science in epidemiology but 'just' about using data science to evaluate the methods used in epidemiology. Likewise there seems to be no link between data science and causality. I later discovered that the authors used an entire blog post ([...] to explain why causality was part of the university course underlying the book. This material or parts of it should have made it into the book. I am still not convinced that causality is a topic in data science.

There are several examples in which the book assumes the reader to have knowledge of US government structure and organizations. Examples include page 292 when discussing US health care databases and page 298 where FDA is mentioned without further introduction or explanation about what FDA is.

A book than contains programming examples should always make the code accessible to download. Typing in the code yourself is simply waste of time. It is possible to download some of the datasets used in the book through GitHub. But the code does not seem to be available. I also own the electronic version of the book and I tried to copy-paste some of the examples from the e-book but there are several examples of code that hasn't been proof written or tested prior to publication. The sample code misses references to required R libraries or refers to computer folder structures on some local Columbia University computer. The companion datasets that can be downloaded on GitHub consists of a number of Excel files. The R sample code uses the gdata package to load these Excel files into R for further analysis. It took be quite some time to figure out why this process didn't work on a Windows computer. The gdata package requires Perl to be installed on the computer and this is not default software on Windows. In my opinion one should always publish data in a simple format, e.g. csv files and definitely not proprietary formats like xls for Excel files.

Data Science is both science and a lot of practical experience. I guess the title of the book Doing Data Science tries to capture that. You need to do data science in order to learn it. The covered topics are interesting but the material is more breadth than depth. Luckily there are lots of useful links and resources to additional materials. Personally I would prefer more details about the actual data science topics like e.g. extracting meaning from data and social network analysis and less focus on math. The book already requires some knowledge of math, statistics and programming, so why not presume that the reader has the background knowledge and dive straight into the data science discussions.

I really like the idea about having a lot of different people present various topics in data science and the book is well written and contains lots of useful resources for further studies of data science. I will recommend to book to people new to the subject but be aware of the fact that source code is not available and that is a major drawback.

Disclosure: I review for the O'Reilly Reader Review Program and I want to be transparent about my reviews so you should know that I received a free copy of this ebooks in exchange of my review.
25 von 26 Kunden fanden die folgende Rezension hilfreich
4.0 von 5 Sternen Doing Data Science Worth a Look 19. November 2013
Von Dan D. Gutierrez - Veröffentlicht auf Amazon.com
Format:Taschenbuch
I found this book to be a very odd bird indeed. It is one book you can read from back cover to front cover and not be at a disadvantage. This is because the book is really just a collection of presentations made by various people to a class taught by the primary author Rachel Schutt at Columbia University in the Fall of 2012 – Introduction to Data Science. It wasn’t entirely clear what content Schutt was directly responsible for since only some of the chapters indicate who the contributors were (one of the chapters was contributed by a group of her students!). The co-author, Cathy O’Neil, I’ve encountered before as an outspoken blogger going by the name “mathbabe” but it wasn’t specifically stated how she became part of the book project, other than to say she was one of the students in Schutt’s class. Chapter 6 was partly written by O’Neil.

Both Schutt and O’Neil are Ph.D.s data science appropriate fields, but the book was not “written” by the two, rather they seemed to have performed some kind of editing function with the materials submitted by each contributor and added commentaries of their own. As a result, the book is a hodgepodge of anecdotes, factoids, R code snippets, plots, and mathematics, all from the in-class presentations. I enjoy seeing math in data science books, but the equations in this book were sort of just floating there requiring the reader to explore further at another time.

Although I have issues with the book as it is not any sort of text for the field, I did enjoy reading it with a number of “Ah, I didn’t know that!” moments. Schutt’s credentials in data science are considerable, having worked at Google for a few years around the same time that “data science” was growing up in Silicon Valley. As a result the book has many memorable anecdotes about the early days of the data science industry, and observations about what makes big data tick. I enjoyed the story about the Google software engineer who accidentally deleted 10 petabytes of data, and I think my favorite quote from the book is from the student’s chapter 15:

Kaggle competitions could be described as the dick-measuring contests of data science.

With contributor’s chapters on statistical inference, machine learning algorithms, logistic regression, financial modeling, recommendation engines, data visualization, Hadoop, MapReduce, and more, I’d say the book is worth a read, but not necessarily as a source of learning data science but more as a high-level guide and short historical account of this young industry. You get to learn about the people, companies, technologies that have collectively built the data science arena and you’ll be better for it especially if you are working to become a data scientist yourself.
57 von 69 Kunden fanden die folgende Rezension hilfreich
4.0 von 5 Sternen A spoonful of sugar... 29. Oktober 2013
Von Dimitri Shvorob - Veröffentlicht auf Amazon.com
Format:Kindle Edition
... helps the medicine go down, as Mary Poppins used to say. An IT-focused publisher, O'Reilly has twice before used the "book as collection of chapters by different contributors" formula in its foray into the attractive "data" niche, with such titles as "Beautiful data" and "Bad data". "Doing data science" - by the way, I prefer Hastie and Tibshirani's "statistical learning" to the fuzzy and grandiose "data science" - follows the same approach, but, with its subject matter being closer to the academe, the company enlisted two young PhDs to steer the collaborative effort. Rachel Schutt took the lead as author and editor, and, assisted by Cathy O'Neil, produced an engaging, informal - you don't often see "science" in the title and "huge-ass" in the text - yet sufficiently technical to be hands-on, sequence-of-vignettes-styled book. Imagine a mash-up of a magazine article and a textbook. Neither part may be best-in-class, but their combination makes for a "unique selling proposition".

Well, maybe not a textbook. Most textbooks are carefully written and carefully checked. In contrast, when I see "Doing data science" introduce the ROC curve in three places, one of which translates the "O" as "operator", I can guess that this is a copy-paste of papers by three contributors. When Dr. O'Neil casually redefines an English word ("causal") to avoid rewriting a couple of sentences, or pronounces, on page 159, that "priors reduce degrees of freedom" - this is painfully meaningless, and neither term is defined, only name-checked - I suspect that she knows better, but just did not feel like spending more time on her half-chapter. Neither author speaks of their own projects - if this is the "frontline", then it's other soldiers' "trenches" that we are visiting. The occasional code listings are borrowed as well, thrown in without editing or comments. In this last regard, "Doing data science" lags far behind the book that seems to have informed its choice of topics, Peter Harrington's "Machine learning in action". (That's one suggestion - and if you want a good, accessible textbook, "Introduction to statistical learning" by James et al. is another).

None of it is going to matter to the book's target audience. "Doing data science" is aimed at beginners - and is bound to be interesting and useful to thousands of keen undergrads and adult learners.
8 von 9 Kunden fanden die folgende Rezension hilfreich
2.0 von 5 Sternen Very poor rendition of maths symbols in the Kindle edition 18. Februar 2014
Von Peter Alspach - Veröffentlicht auf Amazon.com
Format:Kindle Edition|Verifizierter Kauf
The Kindle version has very poor rendition of the maths formulae and symbols - some being huge, some so small they couldn't be read, and some just plain wrong. If this is typical of Kindle versions of such text, then don't buy Kindle versions. I would not have bought this had the sample contained any formulae

The authors use a chatty style which some people like, but I found somewhat condescending.
7 von 8 Kunden fanden die folgende Rezension hilfreich
3.0 von 5 Sternen Not so much about Doing Data Science as about what is covers 25. März 2014
Von Marc Zucker - Veröffentlicht auf Amazon.com
Format:Kindle Edition
"Doing Data Science: Straight Talk from the Frontline" by Cathy O’Neil and Rachel Schutt; O'Reilly Media

With so many books being published on Data, from Big Data to Machine Learning to Data Analysis, we must ask what yet another book is going to offer us. And the answer is not that much. O’Neil and Schutt have seemingly written this to give us a view of how the ins and outs of the actual world of Data Science are practiced. But we get little more than an overview of what Data Analysis is in general.

Many topics a reader might find interesting, whether for background knowledge or otherwise, are given little emphasis, and even then, without much depth. As an example, a discussion on the Exponential Distribution merely states that “because we are familiar with the fact that ‘waiting time’ is a common enough real-world phenomenon that a distribution called the exponential distribution has been invented to describe it.” Any mathematician realizes that inventing probability distributions are a little more involved than is implied.

The main part of the book starts with a look at algorithms. The authors use R as their primary language. There is little in terms of explanation of the underlying processes and examples are pretty direct. Naïve Bayes, for example, is explained in a page and a half, so we can clearly say that this is an overview of many of the ideas going into Data Science. In fact it is a very wide overview. Financial Modelling, Spam Filtering, Epidemiology, the list goes on. This is definitely a plus; we see the wide applications that Data Science has.

The book ends with a discussion on Competitions, but we come back to the question of what we have gained more than we might get by a simple perusal of the web? Maybe I was hoping for more out of the book than I got; expectations can be a killer. But in the end we must make a choice. If we would like to know about Data Science – how it’s done – then we would probably like to look elsewhere. If we would like to have some book that we can show someone what areas exist within this field and what topics are touched upon in it (perhaps for an advisor), then this book should do fine.

(FTC disclosure (16 CFR Part 255): The reviewer has accepted a reviewer's copy of this book which is his to keep. He intends to provide an honest, independent, and fair evaluation of the book in all circumstances.)

[...]
Waren diese Rezensionen hilfreich?   Wir wollen von Ihnen hören.
Kundenrezensionen suchen
Nur in den Rezensionen zu diesem Produkt suchen

Kunden diskutieren

Das Forum zu diesem Produkt
Diskussion Antworten Jüngster Beitrag
Noch keine Diskussionen

Fragen stellen, Meinungen austauschen, Einblicke gewinnen
Neue Diskussion starten
Thema:
Erster Beitrag:
Eingabe des Log-ins
 

Kundendiskussionen durchsuchen
Alle Amazon-Diskussionen durchsuchen
   


Ähnliche Artikel finden


Ihr Kommentar