Friday, September 7, 2012

ETL - Practice Loading A Fact - Solution

In a prior post we looked at the process by which a fact table is loaded.  I need to apologize, since the ERD of the FACT_TRANSACTION table that I posted in that prior post was incorrect.  It has since been updated.  The four things that an ETL job needs to accomplish are...

1.) Select all necessary data from the source
 a. Measures
 b. Natural Keys of the dimension objects
 c. Degenerate Dimension objects
2.) Transform the data as needed
 a. Aggregate measures and group by the appropriate objects
 b. Add foreign keys to dimension tables
3.) Delete any existing rows from the fact table that will be replaced by the new data
4.) Load the fact table

Consider this PL/SQL procedure as a procedure that will accomplish these things.


The STAGE_TRANSACTION_DATA table pulls all of the necessary elements from the source system into the staging area.  I am treating one schema as both the staging area and the presentation area.  This is probably unrealistic in a production environment but it keeps things simple for these purposes.


1.) The STAGE_TRANS_DATA_PROD_KEY table is used to grab the KEY_PRODUCT from the DIM_PRODUCT dimension.  Notice that this is done by joining the product's natural key, which is the PRODUCT_NO (product number), to the DIM_PRODUCT table.  An outerjoin is used so that if a record comes into the staging area for a product that does not exist in the DIM_PRODUCT table, that record will not be excluded from the fact table.  It will exist with a value of -1 and will need to be addressed after the ETL has run.

2.) The STAGE_TRANS_DATA_DATE_KEY table is used to transform the TRANSACTION_DATE into a key that will join to a date dimension.  I created the date dimension on my own, so I'll let you try and tackle that one.  Those dimensions typically have no source and are created as needed by the ETL developer.  Again, an outer join is used so that if a date does not exist in that dimension a value of -1 is placed into the fact table.


1.) This fact table, because it is so small, is truncated and then repopulated each time that the ETL runs.  Date logic can be written so that only a certain time period will be refreshed with each run, but for simplicity's sake I did it this way.

2.) Finally, the data is loaded into the FACT_TRANSACTION table.

Prior posts have talked about turning data into valuable data.  This post shows you how to accomplish that in a hands-on way.

Remember to take a look at (specifically, click on Career In Data Warehousing and then Grow) for larger datasets that can be used for this kind of practice. 


No comments:

Post a Comment