Taming Text How to Find, Organize and Manipulate it [Englisch] [Taschenbuch]

Grant S. Ingersoll , Thomas S. Morton , Andrew L. Farris

31. Januar 2013
DESCRIPTION It is no secret that the world is drowning in text and data. This causes real problems for everyday users who need to make sense of all the information available, and for software engineers who want to make their text-based applications more useful and user-friendly. Whether building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be daunting. Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. It explores how to automatically organize text, using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. This book gives examples illustrating each of these topics, as well as the foundations upon which they are built. KEY POINTS "h One-stop shop for learning how to process text "h Clear, concise, and practical advice "h Builds on high quality open source libraries

Über den Autor und weitere Mitwirkende

Grant Ingersoll is an independent consultant developing search and natural language processing tools. He has worked on a number of text processing applications involving information retrieval, question answering, clustering, summarization, and categorization. Grant is a committer, as well as a speaker and trainer, on the Apache Lucene Java project and a co-founder of the Apache Mahout machine-learning project. Thomas Morton writes software and performs research in the area of text processing and machine learning. He has been the primary developer and maintainer of the OpenNLP text processing project and Maximum Entropy machine learning project for the last 5 years. Currently, he works as a software architect for Comcast Interactive Media in Philadelphia. Drew Farris is a professional software developer and technology consultant whose interests focus on large scale analytics, distributed computing and machine learning. He has contributed to a number of open source projects including Apache Mahout, Lucene and Solr, and holds a master's degree in Information Resource Management from Syracuse University's iSchool and a B.F.A in Computer Graphics.

11 von 11 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Excellent Introduction to Text Processing... 5. März 2013
Von JKC
Text processing is a complex topic and this book makes it more palatable. The unique thing about this book, that I found valuable, is that it brings together a slew of innovative technologies like SOLR, Mahout, Tika, etc. The book starts small, building up various related facets, and eventually culminates into an application that is both complex and applicable in real world situations. I feel much more at ease dealing with the complexities of text processing after reading this book and subsequent related research. I would like to thank the authors for providing such a great resource.
I used the companion code to follow along using a Linux Mint VM in Windows 8.
8 von 9 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Great Text Mining Book for (Lucene/Solr) Search Engineers 8. Februar 2013
Von Sujit Pal
Solr and Lucene has made text search a commodity today. Web applications can plug in fairly feature-rich search functionality using Solr in a matter of days, and most applications today consider search functionality a routine must-have. As long as your needs are met with Solr's extensive feature set, you need look no further. However, search is constantly evolving, and the focus today is on making search results even more useful to the user. This book addresses this trend, exploring various strategies and software tools that can be used in conjunction with Solr and Lucene to mine the text in your indexes to do this. Tools covered are OpenNLP, Carrot and Apache Mahout. The book builds up towards building a simple question-answering system that relies on search. OpenNLP is used for named entity extraction (identifying people, places, things, etc) from the corpus and for categorization. Carrot's Lingo algorithm is used for online clustering of search results, and Apache Mahout is used to demonstrate clustering and classification solutions.

You will find the book most useful if you work with Solr, Lucene and Java and are looking to get beyond basic search, since all the tools described fall in the Java search ecosystem. The authors cover Lucene analysis and tokenization, along with the basic NLP and machine learning theory that are needed, so knowledge of these is not a pre-requisite (although it can help).
4 von 4 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Pragmatic and Concise Entry into NLP 3. Mai 2013
Von M. Faulhaber
Format:Taschenbuch|Verifizierter Kauf
This book is down-to-earth with perfect balance of depth of the subject and practical applications. I have intermediate level NLP experience, but this book still delivers at this level with easy to read style despite the very technical basis of the topics.
3 von 3 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Great Intro for Getting Serious about NLP 8. Oktober 2013
Von Peter Harrington
I love this book. I also love Python. My first into into Natural Language Processing (NLP) was with NLTK for Python. After getting my hands dirty and starting to get some real work done I realized that I had to move to Java. This book is a great intro into that world. The chapters are organized well into tasks that you probably will need to do at some point when working with text. Chapter 4 is worth the price of this book alone.

The examples are easy to follow and the theory is clearly explained.

I was already familiar with OpenNLP when I read this book so I cannot comment on how it feels as a beginner.
5.0 von 5 Sternen Excellent job 15. April 2014
Von Emily Nedell
Format:Taschenbuch|Verifizierter Kauf
The author does a fantastic job writing about what might otherwise be a dry subject, providng just enough technical information to be useful and allow you to sink your teeth into it.
