Showing posts with label Analysis. Show all posts
Showing posts with label Analysis. Show all posts

Saturday, March 10, 2012

Gaining Business Intelligence

 In the last post we looked at a hands-on example of analyzing data using a star schema.  The questions were…
     1.)    Who was the highest performer in terms of dollars in the first quarter? 
     2.)    Who was the highest performer in terms of volume in the first quarter? 
     3.)    The business experts speculate that a product may have been applied to a line item with the incorrect dollar amount.  Using this star, see if you can identify where that may have happened.
Keeping in mind that there are several ways to analyze the data so that these questions can be answered, here are a few ways that will work.
1.)    The highest performer in terms of dollars was James Couch with $3,529.98.  This can be answered using this SQL statement…

select c.first_name, c.last_name, sum(a.amount), count(a.transaction_line)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
and b.calendar_quarter = 1
group by c.first_name, c.last_name
order by 3 desc

2.)    The highest performer in terms of volume turned out to be several people.  Gabriel Palmer, Norma Manlow, and Arnold Tab all fit that bill with 7.  This can also be answered using the SQL statement above by changing the last line to “order by 4 desc”.
3.)    This request from the business offered an interesting challenge.  It would be possible to export all of the transaction data to a spreadsheet and then ask an analyst to examine it line-by-line to find the item that was misapplied.  That option would be very tedious and time consuming.  Consider this SQL statement…

select d.product_name, b.calendar_quarter, sum(a.amount)/count(a.transaction_line)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
group by d.product_name, b.calendar_quarter
order by 1,2

This is examining the average price per product per quarter.  Scrolling down the list will show an anomaly with the Men’s Athletic Shoes product in calendar quarter 2.  Writing the select statement so that it shows the distinct quarter, product, and amount will essentially lead you to the same conclusion.  As I mentioned above, there may be multiple ways to answer each question.  Anyway, at this point, independent of which method you choose, more detail can be added to the query so that more information can be gathered.  Now that we have identified the portion of the “forest” that appears to be problematic (Men’s Athletic Shoes in quarter 2), we can limit our analysis to the “trees” that make up that portion (the individual transactions).  Consider the following SQL statement…

select d.product_name, a.transaction_number, a.transaction_line, sum(a.amount)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
and b.calendar_quarter = 2
and d.product_name = 'Men''s Athletic Shoes'
group by d.product_name, a.transaction_number, a.transaction_line

Scrolling down shows us that line one of transaction 1345641 has an odd amount associated with it.  The source system owners can be alerted that this transaction may need to be investigated (assuming that there is nothing wrong with the ETL job).

While this is a simplistic example (the highest performer only sold 7 items) I hope the point is obvious.  Warehousing data allows users to easily find answers to business questions like these.  If this is the first time that you have done something like this, welcome to Business Intelligence.  If you enjoy this kind of thing, continue to play with the data to see what other conclusions you can draw. 

Sunday, March 4, 2012

Seeking Business Intelligence

In a prior post, we looked at a hands-on example of examining data in a star schema.  Suppose an analyst on the business side does some manual analysis each quarter to determine who deserves the top performer award.  You are interested in showing her how the data warehouse can be used to easily answer this question without the need for a lot of manual analysis.  Using this star, see if you can answer these questions…
    1.)    Who was the highest performer in terms of  dollars in the first quarter? 
    2.)    Who was the highest performer in terms of volume in the first quarter? 
The value of the data warehouse is rising in the eyes of the business community to the point that it is being used more and more for information.  The business experts, after looking at some high-level data, speculate that a product may have been applied to a line item with the incorrect dollar amount.  Using this star, see if you can identify where that may have happened.  By asking these questions (or others), you are using the data warehouse to seek intelligence about your business or organization. 
Spend some time answering these questions and in a future post we will examine the solutions.  If you find yourself enjoying this exercise that’s probably a tell-tale sign that you enjoy business intelligence.

Image: digitalart / FreeDigitalPhotos.net

Saturday, February 25, 2012

Practice Makes Perfect

As with any skill, only so much can be learned from a book (or a blog post).  In this post, I’m hoping to offer some hands-on training that will allow you to play with some data a bit.  If you have an Oracle environment available to you (check the previous post if you do not), you can run the script found here.  If you are unfamiliar with running a script like this, some suggestions are in the previous post as well.  The following tables with data will be created…

This is a very basic star that holds data related to sales transactions.  Here is a quick breakdown of the tables…
Dim_Product (22 rows):  This is a dimension table that contains a product’s overall group (larger grouping) and then the individual product numbers and names within that group (smaller grouping).
Dim_Employee (19 rows):  This is a dimension table that contains information regarding the employees that sold the products. 
Dim_Day (365 rows):  This is a date dimension table that contains one row for each day as well as higher-level date groupings.  In this example, this table only holds data in calendar year 2011 (1/1/2011 – 12/31/2011).
Fact_Transaction (300 rows):  This is a fact table that contains one row for each transaction line item.  So, if an individual purchased three items on the same transaction (i.e., brought three items to the cash register) that transaction will produce three rows in this fact table. 
Data Warehousing is ideal for top-down analysis.  Once the script has been run in Oracle, and the tables have been created, consider the following example…
select b.calendar_year, sum(a.amount)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
group by b.calendar_year

This select statement shows the dollars sold per fiscal year, although there is only one year in this case.  If you want to throw in the product_group to see the dollars sold per year, per product_group, that will give you an additional level of detail…
select b.calendar_year, d.product_group, sum(a.amount)
from fact_transaction a,
dim_day b,
dim_employee c,
dim_product d
where a.key_day = b.key_day
and a.key_sales_person = c.key_employee
and a.key_product = d.key_product
group by b.calendar_year, d.product_group

Do you see what is happening?  You can continue drilling down until you get to a specific transaction_line if you wish. 
An analyst may examine the data at a high-level and notice that one quarter had an enormous number of sales dollars.  She can keep adding descriptors to the select clause (and group by them) to eventually find where those dollars lie.  Did one sales rep have a stellar quarter?  Did one product group do exceptionally well?  Did sales go through the roof on one particular day out of the quarter?  Did one product alone do exceptionally well?  Once you have the information you can turn that into action.  Does somebody deserve a raise for their extraordinary work?   Did one marketing strategy work remarkably well, meaning that we need to consider implementing it across the entire organization? 
This type of structure can turn data into valuable data.  Valuable data can enable great decisions.  Have fun playing with the data.  See how many ways you can slice and dice it.  We will do some more with this star in future posts.

Sunday, February 19, 2012

Hands-On Training


As with anything else, the way to become familiar with data management is to constantly work with data.  I would like to offer some of that through this blog, in addition to continuing to examine some of the concepts of data warehousing.  I plan to provide scripts that will create data in your own Oracle instance so that you can practice working with it.  If you do not have access to (and permission to use) an Oracle instance through your employer, your school, or some other entity, one option is to use a couple of free products explained below.
Oracle offers a free product called Oracle Database Express Edition that, among other things, allows users to become familiar with data management.  This is a smaller version of the Oracle database that can be loaded onto a local machine running Windows or Linux.  Oracle Database Express Edition can be downloaded here. 


Oracle also offers a free integrated development environment (IDE) called Oracle SQL Developer that allows you to easily query data and build database objects.  Oracle SQL Developer can be downloaded here.
As a test, download the very simple sql script hereThis will create one basic table and insert three records.  Here are two ways of running it…
From an IDE - Open the file from an IDE (such as Oracle SQL Developer although others may be used as well) and run the script.  In Oracle SQL Developer, this involves clicking the Run Script button on the tool bar or pressing F5.
Using SQLPlus (in Windows)
1.)    On the machine on which Oracle is installed go to the windows command line (from the Run box type CMD).
2.)    Navigate to the folder that contains the script using the cd syntax.
3.)    Type the following: sqlplus {schema name}/{password} @{filename}  So, if you are loading this into a schema called “myschema” with a password being “mypassword”, the command would look like this:  sqlplus myschema/mypassword @test_tbl  Including the “.sql” extension with the filename is not necessary.
4.)    Press Enter.
You should see that one table was created, three rows were created, and the transactions were committed.   You can simply drop the test_tbl table at this point.  We will still be looking at some of the theory behind data warehousing, but along with that I’ll provide some paths for hands-on training.  I don’t want to get lost in the mechanics of getting the data into the database, so if you have a better way of loading a script like this, by all means use it (and share with us if you would like).
Find an Oracle environment that you works well for you, or create one using the products above (or something else if you like).  In the next post or two, I’ll provide a star schema that we can look at.
As an aside, I’m not trying to endorse Oracle as being a better product for building a data warehouse than something else.  My experience is with this product so it is the easiest and quickest way for me to share data in this way.  If you would like to take my scripts and update them so that they are appropriate for another product, feel absolutely free to do that. 

Image: Ambro / FreeDigitalPhotos.net