An Ihren Kindle oder ein anderes Gerät senden


Kostenlos testen

Jetzt kostenlos reinlesen

An Ihren Kindle oder ein anderes Gerät senden

Jeder kann Kindle Bücher lesen  selbst ohne ein Kindle-Gerät  mit der KOSTENFREIEN Kindle App für Smartphones, Tablets und Computer.
Pentaho Data Integration Beginner's Guide, Second Edition

Pentaho Data Integration Beginner's Guide, Second Edition [Kindle Edition]

María Carina Roldán

Kindle-Preis: EUR 17,30 Inkl. MwSt. und kostenloser drahtloser Lieferung über Amazon Whispernet

Weitere Ausgaben

Amazon-Preis Neu ab Gebraucht ab
Kindle Edition EUR 17,30  
Taschenbuch EUR 41,30  

Kunden, die diesen Artikel gekauft haben, kauften auch



In Detail

Capturing, manipulating, cleansing, transferring, and loading data effectively are the prime requirements in every IT organization. Achieving these tasks require people devoted to developing extensive software programs, or investing in ETL or data integration tools that can simplify this work.

Pentaho Data Integration is a full-featured open source ETL solution that allows you to meet these requirements. Pentaho Data Integration has an intuitive, graphical, drag-and-drop design environment and its ETL capabilities are powerful. However, getting started with Pentaho Data Integration can be difficult or confusing.

"Pentaho Data Integration Beginner's Guide, Second Edition" provides the guidance needed to overcome that difficulty, covering all the possible key features of Pentaho Data Integration.

"Pentaho Data Integration Beginner's Guide, Second Edition" starts with the installation of Pentaho Data Integration software and then moves on to cover all the key Pentaho Data Integration concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to do all kinds of data manipulation and work with plain files. Then, the book gives you a primer on databases and teaches you how to work with databases inside Pentaho Data Integration. Moreover, you will be introduced to data warehouse concepts and you will learn how to load data in a data warehouse. After that, you will learn to implement simple and complex processes. Finally, you will have the opportunity of applying and reinforcing all the learned concepts through the implementation of a simple datamart.

With "Pentaho Data Integration Beginner's Guide, Second Edition", you will learn everything you need to know in order to meet your data manipulation requirements.


This book focuses on teaching you by example. The book walks you through every aspect of Pentaho Data Integration, giving systematic instructions in a friendly style, allowing you to learn in front of your computer, playing with the tool. The extensive use of drawings and screenshots make the process of learning Pentaho Data Integration easy. Throughout the book, numerous tips and helpful hints are provided that you will not find anywhere else.

Who this book is for

This book is a must-have for software developers, database administrators, IT students, and everyone involved or interested in developing ETL solutions, or, more generally, doing any kind of data manipulation. Those who have never used Pentaho Data Integration will benefit most from the book, but those who have, they will also find it useful.

This book is also a good starting point for database administrators, data warehouse designers, architects, or anyone who is responsible for data warehouse projects and needs to load data into them.

Über den Autor und weitere Mitwirkende

María Carina Roldán

María Carina Roldán was born in Esquel, Argentina, and earned her Bachelor's degree in Computer Science at at the Universidad Nacional de La Plata (UNLP) and then moved to Buenos Aires where she has lived since 1994.

She has worked as a BI consultant for almost fifteen years. She started working with Pentaho technology back in 2006. Over the last three and a half years, she has been devoted to working full time for Webdetails—a company acquired by Pentaho in 2013—as an ETL specialist.

Carina is the author of Pentaho 3.2 Data Integration Beginner's Book, Packt Publishing, April 2009, and the co-author of Pentaho Data Integration 4 Cookbook, Packt Publishing, June 2011.



Es gibt noch keine Kundenrezensionen auf
5 Sterne
4 Sterne
3 Sterne
2 Sterne
1 Sterne
Die hilfreichsten Kundenrezensionen auf (beta) 5.0 von 5 Sternen  3 Rezensionen
2 von 3 Kunden fanden die folgende Rezension hilfreich
5.0 von 5 Sternen Pentaho Data Integration Beginner's Guide - Second Edition Review: 12. Januar 2014
Von David Fombella Pombal - Veröffentlicht auf
Pentaho Data Integration Beginner's Guide - Second Edition Review:

First of all, I would like to congratulate Maria Carina a great contributor to the community pentaho I met in person in last Pentaho Community Meeting #PCM13 in Sintra.

Book review by: David Fombella Pombal (twitter: @pentaho_fan)

Book Title: Pentaho Data Integration Beginner's Guide - Second Edition

Authors: María Carina Roldán

Paperback: 502 pages

I would like to recommend this book because if you are a noob in Pentaho Data Integration you will gain a lot of knowledge of this cool tool, besides if you are advanced with PDI you can use it as reference guide book.

Target Audience
This book is an excellent starting point for database administrators, data warehouse developers, or anyone who is responsible for ETL and data warehouse projects and needs to load data into them.

Rating: 9 out of 10

Although this book is oriented to PDI 4.4.0 CE version, some new features of PDI 5.0.1 CE are listed in an Appendix of the book

Chapter List

Chapter 1 - Getting Started with Pentaho Data Integration
In this chapter you learn what Pentaho Data Integration is and installing the software required to start using PDI graphical designer. As an additional task MySQL DBMS server is installed.

Chapter 2 - Getting started with Transformations
This chapters introduces us in the basic terminology of PDI and an introduction in handling runtime errors is performed. We will also learn the simplest ways of transforming data.

Chapter 3 - Manipulating Real-World Data
Here we will learn how to get data from different sorts of files (csv, txt, xml ...) using PDI. Besides we will send data from Kettle to plain files

Chapter 4 - Filtering, Searching, and Performing Other Useful Operations with Data
Explains how to sort and filter data, grouping data by different criteria and looking up for data outside the main stream of data. Some data cleasing tasks are also performed in this chapter.

Chapter 5 - Controlling the Flow of Data
In this very important for ETL developers chapter we will learn how to control the flow of data. In particular we will cover the following topics: Copying and distributing rows, Splitting streams based on conditions and merging streams of data.

Chapter 6 - Transforming Your Data by Coding
This chapter explains how to insert code in your transformations. Specially you will learn: Inserting and testing Javascript and Java code in your transformations and Distinguishing situations where coding is the best option, from those where there are better alternatives. PDI uses the Rhino javascript engine from Mozilla [...] . For allowing Java programming inside PDI, the tool uses the Janino project libraries. Janino es a supper-small and fast embedded compiler that compiles Java code at runtime [...]. In summary,always remember that code in the Javascript step is interpreted, whereas the code in User Java Class is compiled. This means that a transformation that uses the UDJC step will have much better performance.

Chapter 7 - Transforming the Rowset
This chapter will be dedicated to learn how to convert rows to columns (denormalizing) and converting columns to rows (normalizing) . Furthermore, you will be introduced to a very important topic in data warehousing called time dimensions.

Chapter 8 - Working with databases
This is the firs of two chapters fully dedicated to working with databases. We will learn how to connect to a database, preview and get data from a database and insert/update/delete data from a database.

Chapter 9 - Performing Advanced Operations with Databases
This chapter explains different advanced operations with databases: Doing simple and complex lookups in a database. Besides an introduction in dimensional modeling and loading dimensions is included.

Chapter 10 - Creating Basic Task Flows
So far, we have been working with data (running transformations). A PDI transformation does not run in isolation and usually is embedded in a bigger process. These processes like generating a daily report and transfer the report to a shared repository or updating a data ware house and sending a notification by email can be implemented by PDI jobs. In this chapter we will be introduced to jobs, executing tasks upon conditions and working with arguments and named paramenters.

Chapter 11 - Creating Advanced Transformations and Jobs
This chapter is about learning techniques for creating complex transformations and jobs (create subtransformations, implement process flows, nest jobs, iterate the execution of jobs and transformations ...)

Chapter 12 - Developing and Implementing a Simple Datamart
This chapter will cover the following: Introduction to a sales datamart based on a provided database, loading the dimensions and fact table of the sales datamart and automating what has been done.

Appendix A- Working With Repositories
PDI allows us storing our transformations and jobs under 2 different configurations: file-based and database repository. Along this book we have used file-based option, however the database repository is convenient in some situations.

Appendix B- Pan and Kitchen - LaunchingTransformations and Jobs from the Command Line
Despite having used Spoon as the tool for running jobs and transformation you may also run them from a terminal window. Pan is a cmd-line program which lets you launche the transformations designed in Spoon, both the .ktr files and from a repository. The counterpart to Pan is Kitchen, which allows you to run jobs from .kjb files and from a repository.

Appendix C- Quick Reference - Steps and Job Entries
This appendix summarizes the purpose of the steps and jobs entries used in the labs throughout the book.

Appendix D- Spoon Shortcuts
This very useful appendix includes tables summarizing the main Spoon shortcuts.

Appendix E- Introducing PDI 5 features
New PDI 5 features (PDI 5 is currently available now)
5.0 von 5 Sternen Thorough and clear 19. Februar 2014
Von Amazon Customer - Veröffentlicht auf
Format:Kindle Edition|Verifizierter Kauf
Pentaho is an incredibly powerful package, but it is sometimes difficult to make it do what you want. This book guides you through a set of basic but relevant techniques, and explains the inner workings of the software along the way in a clear language. This makes it a useful book for intermediate users, too. Highly recommended.
5.0 von 5 Sternen Second amazing edition 10. Februar 2014
Von Benaglia Nicola - Veröffentlicht auf
This second edition of Pentaho Data Integration covers with very satisfying examples
all aspects of this extraordinary ETL tool: transformations, jobs, data manipulation, filtering, sorting, searching, basic and advanced flows of data and an interesting final appendinx about best practices.
In my opinion for every IT consultant and manager, the knowledge of a ETL tool is mandatory and with such a book, it's very easy and stimulating learning it by experimenting real situation examples.
The book consists of 502 pages divided in 12 chapters and 6 appendixes.
Waren diese Rezensionen hilfreich?   Wir wollen von Ihnen hören.

Kunden diskutieren

Das Forum zu diesem Produkt
Diskussion Antworten Jüngster Beitrag
Noch keine Diskussionen

Fragen stellen, Meinungen austauschen, Einblicke gewinnen
Neue Diskussion starten
Erster Beitrag:
Eingabe des Log-ins

Kundendiskussionen durchsuchen
Alle Amazon-Diskussionen durchsuchen

Ähnliche Artikel finden