- Taschenbuch: 328 Seiten
- Verlag: No Starch Press; Auflage: 1 (1. März 2007)
- Sprache: Englisch
- ISBN-10: 1593271204
- ISBN-13: 978-1593271206
- Größe und/oder Gewicht: 17,8 x 2,6 x 23,5 cm
- Durchschnittliche Kundenbewertung: 2 Kundenrezensionen
- Amazon Bestseller-Rang: Nr. 428.232 in Fremdsprachige Bücher (Siehe Top 100 in Fremdsprachige Bücher)
- Komplettes Inhaltsverzeichnis ansehen
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (Englisch) Taschenbuch – 1. März 2007
Dieses Buch gibt es in einer neuen Auflage:
Kunden, die diesen Artikel gekauft haben, kauften auch
Es wird kein Kindle Gerät benötigt. Laden Sie eine der kostenlosen Kindle Apps herunter und beginnen Sie, Kindle-Bücher auf Ihrem Smartphone, Tablet und Computer zu lesen.
Geben Sie Ihre Mobiltelefonnummer ein, um die kostenfreie App zu beziehen.
Wenn Sie dieses Produkt verkaufen, möchten Sie über Seller Support Updates vorschlagen?
The Internet is bigger and better than what a mere browser allows. Webbots, Spiders, and Screen Scrapers is for programmers and businesspeople who want to take full advantage of the vast resources available on the Web. There's no reason to let browsers limit your online experience-especially when you can easily automate online tasks to suit your individual needs. Learn how to write webbots and spiders that do all this and more: Programmatically download entire websites Effectively parse data from web pages Manage cookies Decode encrypted files Automate form submissions Send and receive email Send SMS alerts to your cell phone Unlock password-protected websites Automatically bid in online auctions Exchange data with FTP and NNTP servers Sample projects using standard code libraries reinforce these new skills. You'll learn how to create your own webbots and spiders that track online prices, aggregate different data sources into a single web page, and archive the online data you just can't live without.You'll learn inside information from an experienced webbot developer on how and when to write stealthy webbots that mimic human behavior, tips for developing fault-tolerant designs, and various methods for launching and scheduling webbots.You'll also get advice on how to write webbots and spiders that respect website owner property rights, plus techniques for shielding websites from unwanted robots. As a bonus, visit the author's website to test your webbots on sample target pages, and to download the scripts and code libraries used in the book. Some tasks are just too tedious-or too important!- to leave to humans. Once you've automated your online life, you'll never let a browser limit the way you use the Internet again.
Über den Autor und weitere Mitwirkende
Michael Schrenk has developed webbots for over 15 years, working just about everywhere from Silicon Valley to Moscow, for clients like the BBC, foreign governments, and many Fortune 500 companies. He's a frequent Defcon speaker and lives in Las Vegas, Nevada.
Derzeit tritt ein Problem beim Filtern der Rezensionen auf. Bitte versuchen Sie es später noch einmal.
Das Buch ist in - wie ich finde - leicht verständlicher Sprache und recht anschaulich geschrieben.
Die hilfreichsten Kundenrezensionen auf Amazon.com
It would be great to see an update to the book that specifically speaks to forms on .asp pages and how the parameters are passed from the client to the server. I have not been able to get a scraper to work with those types of pages.
I was very pleased with how this book covered concepts. The book uses PHP and the cURL library as a teaching tool instead of trying to give a lesson in how to use PHP as a crawler language. The way the code is explained makes it very easy to translate into whatever language you are most comfortable coding in. The book uses fundamental functional programming concepts which make it easy to pick up the general idea without actually knowing PHP.
My boss bought this book to help my group us with a project we were working on, and even my co-workers who had no background with PHP were able to use this book to write a web bot in C# (using the cURL library) very easily. The concepts from this book easily transfered over to object-oriented concepts.
I did download some of the material to check it out and tried a few things. If you do not know PHP and want to get started webscraping as your primary goal, this book would be for you.
However if you're like me, I've been programming a rather complex database driven personal site for over 8 months and learning PHP. Previously, I had very succussfully used PERL for webscraping and I became interested in the webscraping possibilites of PHP decided to check it out. This book did not add much to what I'd already learned in the past 8+ months; although, it does have a decent jumpstart guide to CURL.
Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.
I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."
This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:
"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."
PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.
The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.
If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.