I am a big fan of the authors 1999 book on Statistical Natural Language Processing, and I and was thrilled when I found this new book online -- just search for "Information Retrieval" on Google.
In these two books, they describe the theory behind a vast toolbox which can be used to construct new tools/products for the Internet. Now I can go back to them when the need arises.
For starters, I appreciate the detailed theoretical explanations of topics that I could not find in other texts, and the references to related work are especially helpful. One of the other books I read was Information Retrieval by Grossman, which is an older book but has a more condensed style compared to this. Grossman's discussion of clustering was more high level and referenced a few more papers that I found useful. That helped increase my interest to read through these chapters in which offer greater detail.
Before I felt like I could place each topic in its appropriate context, I had to spend six months of reading both the books, playing with code and finding s/w packages, searching the research literature, reading papers and other books, and then cycling back to the books. Here's are some suggestions for things I'd like to see:
1. A set of recomended programming tools: in some books on Perl -- such as the chapter "Natural Language Tools" in pages 149-171 in "Advanced Perl Programming" by Simon Cozens (O'Reilly) -- you get a very "quick & dirty" introduction to maybe 20-30% of the concepts in these two books along with ways to implement and play around with them. Although Perl has many natural language processing tools, the Cozens book cuts to the chase, explains which are the best tools, and shows you how to use them. I think knowing such shortcuts aids in learning how to apply and improve on them. The more complex and sophisticated topics, the more likely to make it out into the real world if they are easy to play with.
2. More data/examples on what does/doesn't work with end-users: Numbers, graphs, and charts are all good stuff. I always appreciate it when the authors referenced quantitative comparisons, real-world products, and history of Internet. One of the reasons I had to consult the research literature was to broaden my understanding of quantitative comparisons between different techniques involving end-users, which were typically done in the context of complete systems studies that users could try out.