Friday, September 14, 2012

Bridge Tables

Each of the examples that we've used thus far in this blog have involved situations in which each fact row was associated with only on dimension row, per dimension table.  In other words there has always been a many-to-one relationship between the fact table and its associated dimensions.  In the real world, this relationship will not always be the case. 

Suppose that a fact table contains employee salary data and each employee can be associated with multiple departments at once.  Conceptually, the design would be something like this...



One possible way of resolving this is to use what some people call a bridge table.  This is essentially a crosswalk between the fact and dimension designed to resolve the many-to-many problem.  The design for such a table is below.



The EMPLOYEE_DEPT_BRIDGE table will not contain any data that will be seen in query results.  It will only be used as a link between the FACT_SALARY and the DIM_DEPARTMENT tables.  However, there is a rule that must be well understood by any and all analysts that select from this table: You must always group by the department in your query results if your select statement joins to the DIM_DEPARTMENT table (via the bridge, of course) in any way.  For example, suppose that John Smith (KEY_EMPLOYEE value is 123) has $1,000 in wages and is associated with both accounting and human resources (part time in each, perhaps).  Consider this SQL statement...

select a.key_employee_name,
sum(a.wage_amount)
from fact_salary a,
employee_dept_bridge b,
dim_department c
where a.key_employee = b.key_employee
and b.key_department = c.key_department
and a.key_employee = 123
group by a.key_employee_name

Because the EMPLOYEE_DEPT_BRIDGE table will contain two rows for John Smith (one for accounting and one for human resources), these results will show a salary of $2,000 which will appear incorrect to the user. 

KEY_EMPLOYEESUM(A.WAGE_AMT)
1232,000


Now, consider this SQL statement...

select a.key_employee_name,
department_name,
sum(a.wage_amount)
from fact_salary a,
employee_dept_bridge b,
dim_department c
where a.key_employee = b.key_employee
and b.key_department = c.key_department
and a.key_employee = 123
group by a.key_employee_name,
department_name

The results will show his $1,000 salary associated with each department.  This will appear to be more correct to the user, assuming they understand that he is associated with both departments. 

KEY_EMPLOYEEDEPARTMENT_NAMESUM(A.WAGE_AMT)
123ACCOUNTING1,000
123HUMAN RESOURCES1,000


In this case, educating your analysts and report writers to use this table correctly is a key component to the success of this design.

No comments:

Post a Comment