Each morning I wake up and do some quick analysis. I’m not referring to analysis of data in a relational database. I’m referring to analysis of myself. I stand in front of the mirror to see how the night treated me. My eyes are groggy, my hair is a mess, and I have stubble on my cheeks. As a result, I need to fix a few things.
This process is not unlike the analysis done by executives of any organization. An executive seeks information and then acts on it. The mirror in this illustration represents the role of the business intelligence experts. It provides the data to be analyzed.
This illustration is interesting when placed into the context of data warehousing/business intelligence. While standing in front of the mirror, I represent the source system. The mirror is strictly a reflection of the source system (me in this case). When I notice something that needs to be fixed (i.e., my hair needs to be combed) it would be odd and incorrect to “fix” the mirror in some way. What if I spent the next few minutes going around the house and somehow manipulating all of the mirrors so that my messy hair is no longer shown in the reflection? In that case, a few problems exist…
1.) The ultimate problem has not been solved; I’m just not looking at it anymore.
2.) The mirror is no longer a true reflection of my appearance; it is a reflection of how I want to appear.
In the same way, a data warehouse should be a reflection of its source systems. Data is not created specifically for the data warehouse; it is created for a source system and then restructured by the data warehouse to make analysis easier. If a source system contains some incorrect data, and that data flows to the data warehouse, the tendency is to want to “fix it on the report” (leaving the incorrect data in the source system). This would be equivalent to “fixing” the mirror in the example above.
Taking the “let’s fix it in the warehouse and worry about it later” approach can be a bit dangerous. If a quick SQL statement is run to change the data now, what happens when the ETL job runs again? It may replace the updated data with the original (incorrect) data from the source system, resulting in some reports with the updated data floating around and other reports (run after the ETL) with the source system’s data. Also, anybody who goes back to the source system to verify something will see the discrepancy. Questions may be asked, and the perception of the warehouse’s data integrity may begin to be compromised. In other words, this is just not a good plan of action.
As a side note, I do want to address the possibility that an ETL job was written incorrectly. If the source system shows A and the data warehouse shows B as a result of the ETL, then there is an issue with the warehouse that must be addressed by that team immediately.
When the data warehouse is used as a true reflection of the source system, it truly becomes a stimulus for action. When something looks odd in the data, it may refer to a data problem in the source system or a reflection of a business problem (i.e., sales really dropped 10% last month?). Each of these should move the organization to some sort of action (correcting data in the source system or re-evaluating a business strategy). In this case, the data warehouse has not only helped data to become truly valuable but it has also helped to increase the intelligence of the organization.