Saturday, July 21, 2012

Extract, Transform, and Load Video



As a continuation of our discussion of ETL, consider this YouTube video, which is also available at http://www.youtube.com/brianciampa.  Designing the star or snowflake schema is a great start.  However,  the ETL populates that star, thus transforming your data into something that is truly valuable.  In a future post we will look at an example of a basic ETL job in Oracle.

Friday, July 13, 2012

Extract, Transform, and Load (ETL)

In prior posts, we’ve looked a bit at the basic structure of a star schema but we have not looked at populating a star schema.  Populating a data warehouse is accomplished using an extract, transform, and load (ETL) job.  This is a job that pulls data from the source system, transforms it into a structure appropriate for the data warehouse, and then loads that data into the data warehouse.  The concept of ETL is not unlike any other craft or trade.  Consider the following…

1.)    A math student reads and understands the math problem from the textbook (extract), uses a piece of scrap paper, if calculators are not allowed, to find the solution (transform), and places the answer onto the final answer sheet (load).
2.)    A carpenter purchases wood from a supplier (extract), uses his workshop to design, measure, sand, cut, etc. (transform), and then delivers the final product to the customer (load).
Usually, the transformation piece is accomplished inside of a separate schema in the database, called a staging area.  The data from the source system is placed into this area and then restructured so that it is appropriate for the star schema, much like the math student uses the scrap paper and the carpenter uses the workshop.  It is generally accepted that this area is for the ETL developer’s eyes only.  End users do not have access to data in the staging area, much like a customer is not involved in the carpenter’s workshop and the math student does not place his chicken scratch on the sheet with the final answer.  Once that data has been transformed, it is placed into the star schema and is ready for analysis.
It may be worth noting that some variations of ETL, such as ELT (extract, load, and transform), ETLT (extract, transform, load, and transform), and others, are used to describe methods by which jobs will populate a data warehouse using a different order than the standard extract then transform then load order.  In order to encompass all of these terms, the term data integration (or something similar) is often used to describe data warehouse population in a general sense.
For more information on data warehousing concepts, remember to visit http://www.brianciampa.com/