Friday, June 8, 2012

Slowly Changing Dimensions Explained...Using Twitter

I recently changed my twitter profile picture and after seeing the results I realized that this would be a great way to explain slowly changing dimensions.  When I uploaded the new picture and then viewed my timeline of tweets (my tweets) there was no trace of the old picture.  I could not tell that it had ever been used.  Even the tweets that had been associated with the old picture at one point now displayed the new one.  This is the behavior of a type 1 slowly changing dimension.  History is not kept.  Only the dimension values that are current are displayed for ALL facts.

Now, imagine that the old picture remained for those tweets that were sent when it was active and that the tweets sent AFTER the new picture was uploaded contained the new picture.  Viewing my timeline of my tweets will show me when I uploaded new pictures.  Twitter does not work this way, but if it did this would mimic the behavior of a type 2 slowly changing dimension.  As history changes, new rows are added to the dimension table.  Any fact rows that occur AFTER that change will point to the new row.  The fact rows that occurred before that change will continue to point to the older row.

Now, imagine that the timeline on twitter showed two pictures for each tweet.  One picture represented the picture that was current at the time of the tweet.  The other picture represented the picture that is the most current.  The latest tweets would probably show the same picture (most current one) twice.  If this were the case, we could easily see that when I tweeted about something two months ago, my profile used one picture but now it uses another.  This would mimic the behavior of a type 3 slowly changing dimension, which is a hybrid between type 1 and type 2.  A type 3 slowly changing dimension will add a new row as dimension objects change (as in a type 2).  Each dimension row, however, in that history will contain a column (or columns) that represent the current value.  So, if a new row is added, the current value columns for the historical rows will be updated with the new current information.  An analyst can easily see that at the time of a certain fact, the dimension values were one thing but now, they are something else.

Of course, writing ETL so that it populates a type 2 or type 3 slowly changing dimension is more complex than a type 1.  Creating a dimension to include history when the source system itself does not include history may cause difficulty if a dimension needs to be reloaded (although there are ways around this as well).  Consider all of the available options and make a decision that will turn your user's data into something that is truly valuable.

For some hands-on examples of slowly changing dimensions take a look at this post.  Also, remember that you can read this blog and access other data warehousing information by going to http://www.brianciampa.com/.

No comments:

Post a Comment