Friday, June 15, 2012

Junk Dimensions

When the time comes to move out of a house or apartment, one of the more tedious tasks involves packing your belongings into boxes.  While these boxes are typically organized based on their destination in the new house or their contents, there always seems to be a few items that just don't fit into one of these categories.  So, that last box contains these miscellaneous items with no good description.

This challenge is sometimes encountered in the area of dimensional modeling as well.  Remember from this post that there are instances in which a dimension value may not fit into any of the other dimension tables.  In this case, it is appropriate to add the value to the fact table as a degenerate dimension. 

Now, consider the scenario in which there may be a number of low cardinality dimension values that do not fit into an existing dimension table.  While adding each of these values to the fact table is not incorrect, another option exists.  Suppose that the source system that feeds the transaction star that we've typically used contains the following contextual elements (dimensions) for each transaction...

1.) Was this item paid using cash, debit, or credit?
2.) Was this sale originated via a phone call, an online order, or a walk-in?
3.) Was this a promotional item (Y/N)?

Because these are all low cardinality values, consider this approach.  Each of the possible combinations are placed into a dimension table, assigned a surrogate key, and then joined to the fact table for analysis.  This is referred to as a junk dimension.  The contents of this junk dimension will be something like...

KEY_TRANSACTION_
JUNK
PAYMENT_
METHOD
ORIGINATION PROMOTIONAL_
ITEM
1 Cash Phone Call Yes
2 Cash Online Yes
3 Cash Walk-In Yes
4 Cash Phone Call No
5 Cash Online No
6 Cash Walk-In No
7 Credit Phone Call Yes
8 Credit Online Yes
9 Credit Walk-In Yes
10 Credit Phone Call No
11 Credit Online No
12 Credit Walk-In No
13 Debit Phone Call Yes
14 Debit Online Yes
15 Debit Walk-In Yes
16 Debit Phone Call No
17 Debit Online No
18 Debit Walk-In No

Creating a junk dimension with items that are not low cardinality is probably not a good design technique.  If you were to place the transaction line number, something that is unique to each fact row, into this dimension, the need to create all possible combinations of the junk dimension values will cause this table to contain more rows than the fact table.  In this case, the high cardinality items would best be placed directly into the fact table as degenerate dimensions, leaving the junk dimension to contain the low cardinality items.

If you wish, you may use one of the following methods to take a look at this data in Oracle...

1.) If you have already created the transaction star from the post on conformed dimensions, run this script to add a junk dimension
2.) If you have not created the above mentioned transaction star, then run this script to create it first, and then run this one to add a junk dimension.

This ERD displays this star schema with the new junk dimension in the lower left-hand corner (click the image to enlarge).


As I mentioned earlier, placing these low cardinality items into the fact table as degenerate dimensions is not a bad approach.  However, in that case, each fact's value will have to be stored on the fact row which will require more space than the junk dimension.  Comparing that cost with the cost of having the ETL lookup the surrogate key in the junk dimension for placement into fact table will have to be considered.  You have some options in this case.

Remember that your job is to enable excellent decision making for the leaders of your organization.  Consider the options and select the one that will best lead you down that path.

Update: In this post I mentioned that I made a slight error in my explanation of junk dimensions.  The examples of junk dimensions that I have provided included the word "junk" in the name.  Margy Ross suggests not naming it as such, which makes a lot of sense.  Encountering a table with junk in the name may cause some confusion (perhaps even concern) for an analyst who is not well-versed in dimensional modeling.

Image courtesy of FreeDigitalPhotos.net

No comments:

Post a Comment