- Taschenbuch: 463 Seiten
- Verlag: O'Reilly and Associates; Auflage: 1 (26. Oktober 2012)
- Sprache: Englisch
- ISBN-10: 1449319793
- ISBN-13: 978-1449319793
- Größe und/oder Gewicht: 17,8 x 2,3 x 23,3 cm
- Durchschnittliche Kundenbewertung: 4 Kundenrezensionen
- Amazon Bestseller-Rang: Nr. 995 in Fremdsprachige Bücher (Siehe Top 100 in Fremdsprachige Bücher)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (Englisch) Taschenbuch – 26. Oktober 2012
|Neu ab||Gebraucht ab|
Wird oft zusammen gekauft
Kunden, die diesen Artikel gekauft haben, kauften auch
Es wird kein Kindle Gerät benötigt. Laden Sie eine der kostenlosen Kindle Apps herunter und beginnen Sie, Kindle-Bücher auf Ihrem Smartphone, Tablet und Computer zu lesen.
Geben Sie Ihre E-Mail-Adresse oder Mobiltelefonnummer ein, um die kostenfreie App zu beziehen.
Mehr über den Autor
Über den Autor und weitere Mitwirkende
Wes McKinney is the main author of pandas, the popular open source Python library for data analysis. Wes is an active speaker and participant in the Python and open source communities. He worked as a quantitative analyst at AQR Capital Management before founding an enterprise data analysis company, Lambda Foundry, in 2012. He graduated from MIT with an S.B. in Mathematics.
Welche anderen Artikel kaufen Kunden, nachdem sie diesen Artikel angesehen haben?
Kann das Buch wirklich nur empfehlen - auch als Nachschlagewerk.
Die hilfreichsten Kundenrezensionen auf Amazon.com (beta)
To steal from the book, Wes states, "This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is NOT (author's emphasis) an exposition on analytical methods using Python as the implementation language."
This is a book for any level of professional, researcher, or academic working with data. You could be a beginner who wants to get started, a professional coming from discipline rooted in another language like Matlab, or even someone seasoned in data-manipulation with Python who wants to get more work done in less time with greater ease.
While Pandas is the main focus of the book, sections dedicated to IPython (a shell for interactive execution) and NumPy (Matlab-like vectorized arrays) means there is something for everyone. For example, you might already use IPython, but not to its fullest potential. Wes shows how to be more efficient using the interactive debugger.
Amazon limits their ratings to 5-stars, but if I gave a star for every time I learned something new that made my analysis easier this book would be off the charts!
First, two warnings:
1. **This book is not an introduction to Python.** While McKinney does not assume that you know *any* Python, he isn't exactly going to hold your hand on the language here. There is an appendix ("Python Language Essentials") that beginners will want to read before getting too far, but otherwise you're on your own. ("Lucky for you Python is executable pseudocode"?)
2. **This book is not about theories of data analysis.** What I mean by that is: if you're looking for a book that is going to tell you the *types* of analyses to do, this is not that book. McKinney assumes that you already know, through your "actual" training, what kinds of analyses you need to perform on your data, and how to go about the computations necessary for those analyses.
That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), offering overviews of the utilities in these packages, and concrete examples on how to employ them to great effect. In examining these libraries, McKinney also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables). McKinney also delves into some (semi) esoteric information about how Python works at very low levels and ways to optimize data structures so that you can get maximum performance from your programs. McKinney is clearly knowledgeable about these libraries, about Python, and about using those tools effectively in analytical software.
So where do I land on "Python for Data Analysis"? If you're looking for a book that discusses data analysis in a broad sense, or one that pays special attention to the theory, this isn't that book. If you're looking for a generalist's book on Python--also not this book. However, if you've already selected Python as your analytical tool (and it sounds like it's more/less the de facto analytical tool in many circles) then this just might be the perfect book for you.
DISCLOSURE: I received an electronic copy of this book from the publisher in exchange for writing a review.
If you are trying to decide whether to learn to use the pandas library, this book is for you. It starts with an example of how python and the pandas library can make it easy to do some basic analyses of data, and then develops more specialized chapters: summary statistics, data storage, data transformation (merging and joining), plotting, aggregation, time-series, special considerations for financial or economic data, advanced special topics.
Once I decided to use the pandas library, the book suddenly became less useful. The author has a verbose pedagogical style, and the book never departs from its tutorial perspective. Functions are introduced with examples but no definitions, and it's hard to find the rare summaries of functions, function arguments, or discussion suggesting when to use one method instead of another.
If you want to do something very close to what's done in an example, it's easy to follow along. Once you want to do something not emphasized or covered by an example, there is no guidance, no reference or dictionary section to give any hint about where I might search next --- google will probably direct you to stackoverflow.com, or the official pandas documentation site.
For example, suppose you have loaded your data into a DataFrame, and you want to use another column as the index. The book has several pages on the useful reindex() method, but that method is for resampling the data. Instead, you want set_index() --- but the book only mentions set_index() in passing, without saying what it does, far from the section where the DataFrame index is covered.
There have been some attempts to remedy this, with "quick reference cards" for pandas --- but they are in general also not comprehensive.
Finally, there is little guidance on the kinds of problems where you would be better served using numpy or some other tool instead of pandas. (There are a few paragraphs on areas where you might not want to use python.)
[Update: by mid 2013, the API reference at the official pandas documentation has the comprehensive listings that I was looking for --- see http pandas.pydata.org pandas-docs stable api.html . By version 0.12.0, all of the various function arguments seem to have been described with examples of acceptable settings. Also, the data analytical work (as opposed to cleaning and organization) has moved to the related statsmodels project, which requires pandas. So, to use that, it's important to be familiar with pandas.]
To the editor:
On many pages, there is some comment, phrasing, or trivial fact that I would have eliminated. Example:
"In some cases, a table might not have a fixed delimiter, using whitespace or some other pattern to separate fields. In these cases, ..."
"In part for legacy reasons (much earlier versions of pandas), DataFrame's join method ..."
"In my experience, having to align data by hand (and worse, having to verify that data is aligned) is a far too rigid and tedious way to work. It is also rife with potential for bugs due to combining misaligned data."
This is a technical publication, not a narrative!
Many of the code examples break across physical and PDF pages, which create small interruptions when reading. This may be hard to avoid when about half the text space is occupied by worked examples.
last line on page 129: a b c d a b c d e
first line on page 130: 0 0 1 2 3 0 0 1 2 3 4
So I am not a Python, Numpy or Pandas expert.
I took the $5 upgrade at O'Reilly so I have downloaded a pdf for backup viewing and also get future enhancements to the book.
The material appears good and the coverage thorough. I've been working through the Language Essentials as well and its clarified a couple of things I misunderstood after earlier Python books so at this point I'll give it 5 stars. I'll re-review later if I come to a different conclusion.