Inhaltsverzeichnis * Chapter 1 Meet Hadoop * Data! * Data Storage and Analysis * Comparison with Other Systems * A Brief History of Hadoop * The Apache Hadoop Project * Chapter 2 MapReduce * A Weather Dataset * Analyzing the Data with Unix Tools * Analyzing the Data with Hadoop * Scaling Out * Hadoop Streaming * Hadoop Pipes * Chapter 3 The Hadoop Distributed Filesystem * The Design of HDFS * HDFS Concepts * The Command-Line Interface * Hadoop Filesystems * The Java Interface * Data Flow * Parallel Copying with distcp * Hadoop Archives * Chapter 4 Hadoop I/O * Data Integrity * Compression * Serialization * File-Based Data Structures * Chapter 5 Developing a MapReduce Application * The Configuration API * Configuring the Development Environment * Writing a Unit Test * Running Locally on Test Data * Running on a Cluster * Tuning a Job * MapReduce Workflows * Chapter 6 How MapReduce Works * Anatomy of a MapReduce Job Run * Failures * Job Scheduling * Shuffle and Sort * Task Execution * Chapter 7 MapReduce Types and Formats * MapReduce Types * Input Formats * Output Formats * Chapter 8 MapReduce Features * Counters * Sorting * Joins * Side Data Distribution * MapReduce Library Classes * Chapter 9 Setting Up a Hadoop Cluster * Cluster Specification * Cluster Setup and Installation * SSH Configuration * Hadoop Configuration * Post Install * Benchmarking a Hadoop Cluster * Hadoop in the Cloud * Chapter 10 Administering Hadoop * HDFS * Monitoring * Maintenance * Chapter 11 Pig * Installing and Running Pig * An Example * Comparison with Databases * Pig Latin * User-Defined Functions * Data Processing Operators * Pig in Practice * Chapter 12 HBase * HBasics * Concepts * Installation * Clients * Example * HBase Versus RDBMS * Praxis * Chapter 13 ZooKeeper * Installing and Running ZooKeeper * An Example * The ZooKeeper Service * Building Applications with ZooKeeper * ZooKeeper in Production * Chapter 14 Case Studies * Hadoop Usage at Last.fm * Hadoop and Hive at Facebook * Nutch Search Engine * Log Processing at Rackspace * Cascading * TeraByte Sort on Apache Hadoop * Appendix Installing Apache Hadoop * Prerequisites * Installation * Configuration * Appendix Cloudera’s Distribution for Hadoop * Prerequisites * Standalone Mode * Pseudo-Distributed Mode * Fully Distributed Mode * Hadoop-Related Packages * Appendix Preparing the NCDC Weather Data * Colophon