- Taschenbuch: 216 Seiten
- Verlag: Manning Pubn; Auflage: Pap/Psc (22. Juni 2017)
- Sprache: Englisch
- ISBN-10: 1617292281
- ISBN-13: 978-1617292286
- Größe und/oder Gewicht: 18,5 x 1 x 23,4 cm
- Durchschnittliche Kundenbewertung: Schreiben Sie die erste Bewertung
- Amazon Bestseller-Rang: Nr. 156.721 in Fremdsprachige Bücher (Siehe Top 100 in Fremdsprachige Bücher)
Andere Verkäufer auf Amazon
+ kostenlose Lieferung
+ EUR 3,00 Versandkosten
+ EUR 3,00 Versandkosten
Streaming Data: Understanding the Real-Time Pipeline (Englisch) Taschenbuch – 22. Juni 2017
Wird oft zusammen gekauft
Kunden, die diesen Artikel gekauft haben, kauften auch
Es wird kein Kindle Gerät benötigt. Laden Sie eine der kostenlosen Kindle Apps herunter und beginnen Sie, Kindle-Bücher auf Ihrem Smartphone, Tablet und Computer zu lesen.
Geben Sie Ihre Mobiltelefonnummer ein, um die kostenfreie App zu beziehen.
Wenn Sie dieses Produkt verkaufen, möchten Sie über Seller Support Updates vorschlagen?
Über den Autor und weitere Mitwirkende
Andrew Psaltis is a software engineer and architect focused full time on building massively scalable real-time analytics systems using Spark, Kafka, Storm, Hadoop, and WebSockets.
Kunden, die diesen Artikel angesehen haben, haben auch angesehen
|5 Sterne (0%)|
|4 Sterne (0%)|
|3 Sterne (0%)|
|2 Sterne (0%)|
|1 Stern (0%)|
Die hilfreichsten Kundenrezensionen auf Amazon.com
Another great feature of this book, published by Manning, is that is comes with the e-book digital version at no extra cost. I wish all physical books followed this model. There is a small page inside the front cover that you cut open neatly with scissors. Inside is a matrix of codes. You must create an account on the Manning site, and then enter a few of these codes, and the download is made available. Several formats are available for download, including PDF and MOBI (for Kindle).
I downloaded the MOBI file and then used the "send to Kindle" document feature to deliver to my Kindle via email. It worked great, the front cover is used as the thumbnail and all hyper-linked, such as the table of contents, are active. The one thing I couldn't figure out was how to add this e-book to a collection on the Kindle or how to have it show up as a book instead of a document. I googled for answers, tried moving the file into a /book directory, and ultimately could not figure it out. But the e-book version is much appreciated nonetheless.
If you have an interest, and particularly if you work in this industry, you will benefit from absorbing this information. From introduction, data ingestion, decoupling the pipeline, analysis, algorithms, storage, availability, and device limitations - this book has it all in a very concise but complete format. There are many visual diagrams and charts that help explain the concepts throughout. Highly recommended.
Book is very good starter book for the topic. It is only 216 pages. It shows different perspectives and what we encounter in real life designing data stream digesting analyzing application. Based on example it presents architecture of streaming pipeline. Informs what can be encountered during whole process in positive situation or when we are in trouble because one of our components went down and how prepare our architecture in any case of failure. Author shortly compares different solutions like for e.g. Spark, Storm, Kafka, Flink, shows briefly their pros and cons and what is missing to use certain tool. The same about different databases and in-memory caches. Helps to distinguish between technologies showing their pros and cons. Also explains algorithms which can be used when data are need to be analysed Bloom filter, HyperLogLog and Count-Min Sketch.
All in all the book should be valuable for people who are interested in architecture, o they want to improve their understanding or maybe existing approach.
The big plus is a lot of references to external sources either books or articles with links. I found the book to be helpful.
The disadvantage is that I would gladly find much more about reactive systems and sometimes content more clearly written.
However I can recommend a book and find it very positive.
The following tiers are identified and described:
Data is entered into the system via a Collection tier. Various models are presented that can be used and considerations are given on scaling and fault-tolerance issues.
The importance of a Message Queuing tier to decouple data collection from data analysis is explained together with delivery semantics offered by message brokers and the trade-offs involved in implementing stronger delivery guarantees.
The Analysis tier is the core of the system and is where data processing takes place. The concepts of in-flight data and continuous query model is explained.
A general architecture of a distributed stream processors is presented and message delivery semantics is discussed once again (this time in the context of a stream processor) and techniques like replicating idempotent computations and checkpointing are suggested to obtain fault tolerance.
Also discussed is how the constraints on analysis algorithms imposed by the streaming nature of data can be overcome with windowing techniques to group together a series of stream data elements for processing/extracting information and summarization techniques to approximate information from the flow of stream data (counting, membership, frequency, sampling).
A Data Storage tier is needed to store results computed by the Analysis tier and make them available to the Data Access tier.
Available solutions are presented, focusing on alternatives for in-memory storage.
Finally, the Data Access tier discussion explores communication patterns and protocols that can be used by clients to connect to the stream system and obtain produced data.
The final chapter aims to put theory into practice by presenting a simple full-fledged streaming system that uses open source technologies like Netty, RocksDB, Apache Kafka and Apache Storm to implement the various tiers of the model.
The source code can be downloaded from github.
An experienced Java developer should be able to follow the code but in order to fully understand it knowledge in the above technologies is required.
The content is generic and taxonomic, independent of any specific existing streaming technology but also lacks concrete details.
Special attention is given for each tier to reliability and recovery and how they can be achieved.
The book doesn't teach specific technologies to build a streaming system but rather tries to describe the components that make up such a system in general and abstract terms, pinpointing important aspects to consider when choosing one of the available alternatives.
I suggest reading this book together with or maybe after having read more specific books on Apache Storm or Kafka, in order to fully appreciate the generality and abstractions provided by the book.