This challenge is sometimes encountered in the area of dimensional modeling as well. Remember from this post that there are instances in which a dimension value may not fit into any of the other dimension tables. In this case, it is appropriate to add the value to the fact table as a degenerate dimension.
Now, consider the scenario in which there may be a number of low cardinality dimension values that do not fit into an existing dimension table. While adding each of these values to the fact table is not incorrect, another option exists. Suppose that the source system that feeds the transaction star that we've typically used contains the following contextual elements (dimensions) for each transaction...
1.) Was this item paid using cash, debit, or credit?
2.) Was this sale originated via a phone call, an online order, or a walk-in?
3.) Was this a promotional item (Y/N)?
Because these are all low cardinality values, consider this approach. Each of the possible combinations are placed into a dimension table, assigned a surrogate key, and then joined to the fact table for analysis. This is referred to as a junk dimension. The contents of this junk dimension will be something like...
KEY_TRANSACTION_ JUNK |
PAYMENT_ METHOD |
ORIGINATION | PROMOTIONAL_ ITEM |
1 | Cash | Phone Call | Yes |
2 | Cash | Online | Yes |
3 | Cash | Walk-In | Yes |
4 | Cash | Phone Call | No |
5 | Cash | Online | No |
6 | Cash | Walk-In | No |
7 | Credit | Phone Call | Yes |
8 | Credit | Online | Yes |
9 | Credit | Walk-In | Yes |
10 | Credit | Phone Call | No |
11 | Credit | Online | No |
12 | Credit | Walk-In | No |
13 | Debit | Phone Call | Yes |
14 | Debit | Online | Yes |
15 | Debit | Walk-In | Yes |
16 | Debit | Phone Call | No |
17 | Debit | Online | No |
18 | Debit | Walk-In | No |
Creating a junk dimension with items that are not low cardinality is probably not a good design technique. If you were to place the transaction line number, something that is unique to each fact row, into this dimension, the need to create all possible combinations of the junk dimension values will cause this table to contain more rows than the fact table. In this case, the high cardinality items would best be placed directly into the fact table as degenerate dimensions, leaving the junk dimension to contain the low cardinality items.
If you wish, you may use one of the following methods to take a look at this data in Oracle...
1.) If you have already created the transaction star from the post on conformed dimensions, run this script to add a junk dimension.
2.) If you have not created the above mentioned transaction star, then run this script to create it first, and then run this one to add a junk dimension.
This ERD displays this star schema with the new junk dimension in the lower left-hand corner (click the image to enlarge).
As I mentioned earlier, placing these low cardinality items into the fact table as degenerate dimensions is not a bad approach. However, in that case, each fact's value will have to be stored on the fact row which will require more space than the junk dimension. Comparing that cost with the cost of having the ETL lookup the surrogate key in the junk dimension for placement into fact table will have to be considered. You have some options in this case.
Remember that your job is to enable excellent decision making for the leaders of your organization. Consider the options and select the one that will best lead you down that path.
Update: In this post I mentioned that I made a slight error in my explanation of junk dimensions. The examples of junk dimensions that I have provided included the word "junk" in the name. Margy Ross suggests not naming it as such, which makes a lot of sense. Encountering a table with junk in the name may cause some confusion (perhaps even concern) for an analyst who is not well-versed in dimensional modeling.
Image courtesy of FreeDigitalPhotos.net
No comments:
Post a Comment