We've been looking recently at the concept of extract, transform, and load (ETL) jobs. This post will begin to examine some of the mechanics. When loading a dimension table the ETL job should accomplish the following things...
1.) Extract all of the dimension attributes from the source.
2.) Transform the data according to the requirements (i.e., concatenate first name and last name to create a full name column, etc.)
3.) Identify those records that are new to the table as opposed to those records that already exist in the table and may need to be updated due to updates in the source system
4.) Generate new surrogate key values and add them to the new records
5.) Load the records into the dimension table
Each ETL job that populates a dimension table will need to accomplish these five things, although no two developers will develop a job exactly the same way. To get a feel for this type of job, consider this very simple example. A source system contains these two tables among others (click the picture to enlarge)...
...and this dimension needs to be populated...
If you have an Oracle environment available to you, run this sql file to create the two source system tables, populate them with data, and create the DIM_PRODUCT table. Each product, as you can see, is grouped under a single product group. The PRODUCT_EFFECTIVE_FLAG indicates whether or not a product is currently effective (current at the time of the ETL job). If you wish, try to write an ETL job that will populate the DIM_PRODUCT table in PL/SQL or another language of your choosing. We will examine a possible solution in a future post.
No comments:
Post a Comment