In the last post we looked at a hands-on example of analyzing data using a star schema. The questions were…
1.) Who was the highest performer in terms of dollars in the first quarter?
2.) Who was the highest performer in terms of volume in the first quarter?
3.) The business experts speculate that a product may have been applied to a line item with the incorrect dollar amount. Using this star, see if you can identify where that may have happened.
Keeping in mind that there are several ways to analyze the data so that these questions can be answered, here are a few ways that will work.
1.) The highest performer in terms of dollars was James Couch with $3,529.98. This can be answered using this SQL statement…
select c.first_name, c.last_name, sum(a.amount), count(a.transaction_line)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
and b.calendar_quarter = 1
group by c.first_name, c.last_name
order by 3 desc
2.) The highest performer in terms of volume turned out to be several people. Gabriel Palmer, Norma Manlow, and Arnold Tab all fit that bill with 7. This can also be answered using the SQL statement above by changing the last line to “order by 4 desc”.
3.) This request from the business offered an interesting challenge. It would be possible to export all of the transaction data to a spreadsheet and then ask an analyst to examine it line-by-line to find the item that was misapplied. That option would be very tedious and time consuming. Consider this SQL statement…
select d.product_name, b.calendar_quarter, sum(a.amount)/count(a.transaction_line)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
group by d.product_name, b.calendar_quarter
order by 1,2
This is examining the average price per product per quarter. Scrolling down the list will show an anomaly with the Men’s Athletic Shoes product in calendar quarter 2. Writing the select statement so that it shows the distinct quarter, product, and amount will essentially lead you to the same conclusion. As I mentioned above, there may be multiple ways to answer each question. Anyway, at this point, independent of which method you choose, more detail can be added to the query so that more information can be gathered. Now that we have identified the portion of the “forest” that appears to be problematic (Men’s Athletic Shoes in quarter 2), we can limit our analysis to the “trees” that make up that portion (the individual transactions). Consider the following SQL statement…
select d.product_name, a.transaction_number, a.transaction_line, sum(a.amount)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
and b.calendar_quarter = 2
and d.product_name = 'Men''s Athletic Shoes'
group by d.product_name, a.transaction_number, a.transaction_line
Scrolling down shows us that line one of transaction 1345641 has an odd amount associated with it. The source system owners can be alerted that this transaction may need to be investigated (assuming that there is nothing wrong with the ETL job).
While this is a simplistic example (the highest performer only sold 7 items) I hope the point is obvious. Warehousing data allows users to easily find answers to business questions like these. If this is the first time that you have done something like this, welcome to Business Intelligence. If you enjoy this kind of thing, continue to play with the data to see what other conclusions you can draw.