Sunday, April 8, 2012

Analysis Using Conformed Dimensions

In the last post we practiced examining data from different fact tables via conformed dimensions.  While the syntax may be different from solution to solution, something like the select statement below will allow you to compare salary dollars from the fact_salary table to sales dollars from the fact_transaction table per month for each person.





SELECT                    decode(paid.calendar_year, null, sales.calendar_year, paid.calendar_year) Year,
                                decode (paid.calendar_month_number, null, sales.calendar_month_number, paid.calendar_month_number) Month,
                                decode(paid.last_name, null, sales.last_name, paid.last_name) LastName,
                                decode(paid.first_name, null, sales.first_name, paid.first_name) FirstName,
                                paid.amtpaid,
                                sales.salesamt
FROM
(SELECT                   d.calendar_year,
                                d.calendar_month_number,
                                c.last_name,
                                c.first_name,
                                sum(b.actual_wages_paid) amtpaid
FROM                      fact_salary b,
                                dim_employee c,
                                dim_day d
WHERE                    b.key_date_paid = d.key_day
                                AND b.key_employee = c.key_employee
GROUP BY d.calendar_year, d.calendar_month_number, c.last_name, c.first_name) paid
FULL OUTER JOIN
(SELECT                   d.calendar_year,
                                d.calendar_month_number,
                                c.last_name,
                                c.first_name,
                                sum(a.amount) salesamt
FROM                      fact_transaction a,
                                dim_employee c,
                                dim_day d
WHERE                    a.key_day = d.key_day
                                AND a.key_sales_person = c.key_employee
GROUP BY d.calendar_year, d.calendar_month_number, c.last_name, c.first_name) sales
                ON                           paid.calendar_year = sales.calendar_year
                                                AND paid.calendar_month_number = sales.calendar_month_number
                                                AND paid.last_name = sales.last_name
                                                AND paid.first_name = sales.first_name

In order to take advantage of this approach, an analyst must view the data from each fact table at the same grain.  In the select statement above, notice that each of the inner selects group by the same dimension attributes (calendar_year, calendar_month_number, last_name, and first_name) and then the outer select statement joins on those attributes.  Since there may be some instances in which an individual has sales dollars but no salary dollars and vice versa, an Oracle full outer join is used.  If you are confident that one of the select statements will return all of the possible individuals for all possible time periods for that person, then a left or right outer join may be used.  In reality a reporting tool will probably take care of this on the back-end, so that a data analyst will not need to be familiar with this exact syntax (although it may be beneficial to conceptually understand how the data is being pulled).

From a business perspective, this kind of analysis will allow a decision maker to view productivity vs. cost for each instance of a business entity (i.e., individual, department, region, line of business, etc.).  This is yet another way in which data warehousing can help data become truly valuable.

Image: vichie81 / FreeDigitalPhotos.net

No comments:

Post a Comment