I am writing this review in response to some confusion and unfairness I see in other reviews. Cover and Thomas have written a unique and ambitious introduction to a fascinating and complex subject; their book must be judged fairly and not compared to other books that have entirely different goals.
Claude Shannon provided a working definition of "information" in his seminal 1948 paper, A Mathematical Theory of Communication. Shannon's interest in that and subsequent papers was the attainment of reliable communication in noisy channels. The definition of information that Shannon gave was perfectly fitted to this task; indeed, it is easily shown that in the context studied by Shannon, the only meaningful measure of information content that will apply to random variables with known distribution must be (up to a multiplicative constant) of the now-familiar form h(p) = log(1/p).
However, Shannon freely admitted that his definition of information was limited in scope and was never envisioned as being universal. Shannon deliberately avoided the "murkier" aspects of human communication in framing his definitions; problematic themes such as knowledge, semantics, motivations and intentions of the sender and/or receiver, etc., were avoided altogether.
For several decades, Information Theory continued to exist as a subset of the theory of reliable communication. Some classical and highly regarded texts on the subject are Gallager, Ash, Viterbi and Omura, and McEliece. For those whose interest in Information Theory is motivated largely by questions from the field of digital communications, these texts remain unrivalled standards; Gallager, in particular, is so highly regarded by those who learned from it that it is still described as superior to many of its more recent, up-to-date successors.
In recent decades, Information Theory has been applied to problems from across a wide array of academic disciplines. Physicists have been forced to clarify the extent to which information is conserved in order to completely understand black hole dynamics; biologists have found extensive use of Information Theoretic concepts in understanding the human genome; computer scientists have applied Information Theory to complex issues in computational vs. descriptive complexity (the Mandelbrot set, which has been called the most complex set in all of mathematics, is actually extremely simple from the point of view of Kolmogorov complexity); and John von Neumann's brilliant creation, game theory, which has been called "a universal language for the unification of the behavioral sciences," is intimately coupled to Information Theory, perhaps in ways that have not yet been fully appreciated or explored.
Cover and Thomas' book "Elements of Information Theory" is written for the reader who is interested in these eclectic and exciting applications of Information Theory. This book does NOT treat Information Theory as a subset of reliable communication theory; therefore, the book is NOT written as a competitor for Gallager's classic text. Critics who ask
for a more thorough treatment of rate distortion theory or convolutional codes are criticizing the authors for failing to include topics that are not even central to their goals for the text!
A very selective list of some of the more interesting topics that Cover and Thomas study includes: (1) the Asymptotic Equipartition Property and its consequences for data compression; (2) Information Theory and gambling; (3) Kolmogorov complexity and Chaitin's Omega; (4) Information Theory and statistics; and (5) Information Theory and the stock market. Item (4) on this list is only briefly introduced in Cover and Thomas's book, and appropriately so; however, readers who wish to pursue the fascinating subject of Fischer Information further should consider B. Roy Frieden's book Physics from Fisher Information: A Unification. Frieden identifies a principle of "extreme physical information" as a unifying theme across all of physics, deriving such classic equations as the Klein-Gordon equation, Maxwell's equations, and Einstein's field equations for general relativity from this information-theoretic principle.
This last point is quite typical of Cover and Thomas's book. I participated in a faculty seminar on Information Thoery at my university a few years ago, in which we studied Cover and Thomas as our primary source. We were a diverse group, drawn from five different academic disciplines, and we all found that Cover and Thomas repeatedly introduced us to exciting and unexpected applications of Information Theory, always sending us to the journals for further, more in-depth study.
Cover and Thomas' book has become an established favorite in university courses on information theory. In truth, the book has few competitors. Interested readers looking for additional references might also consider David MacKay's book Information Theory, Inference, and Learning Algorithms, which has as a primary goal the use of information theory in the study of Neural Networks and learning algorithms. George Klir's book Uncertainty and Information considers many alternative measures of information/uncertainty, moving far beyond the classical log(1/p) measure of Shannon and the context in which it arose. Jan Kahre's iconoclastic book The Mathematical Theory of Information is an intriguing alternative in which the so-called Law of Diminishing Information is elevated to primary axiomatic status in deriving measures of information content. I alluded to some of the "murkier" issues of human communication earlier; readers who wish to study some of those issues will find Yehoshua Bar-Hillel's book Language and Information a useful source.
In conclusion, I highly recommend Cover and Thomas' book on Information Theory. It is currently unrivalled as a rigorous introduction to applications of Information Theory across the curriculum. As a person who used to work in the general area of signals analysis, I resist all comparisons of Cover and Thomas' book with the classic text of Gallager; the books have vastly different goals and very little overlap.