And how we handle it in the DWH Deck
One of the key characteristics of using the Data Vault modeling and methodology is the shift of a large part of the ETL work upstream in the data flow. In pictures:
Fig 1. Simple ETL schema in a traditional data warehouse
Fig 2. Simple ETL schema in a data warehouse with the data vault architecture
This shift comes with several advantages like:
- auditable/traceable: we can recreate the source on a given day/time
- flexibility: supporting (possible) multiple versions of the truth (e.g. when business rule change)
Dan Lindstedt in his article: Data Vault series 5 - Loading practices even argues that you shouldn't touch or change the data on the way into the Data Vault, but manipulate it according to business rules on the way out - to the Data Marts. However there is still some ETL from the sources to the data vault:
- The typical conversion into the data vault structure with hubs, links and satellites.
- Some acceptable minor changes and default values (article 2.1.3 and 2.1.4 of the Data Vault loading specification)
- Integration based on business keys.
This integration based on business keys is a very powerful aspect and sometimes overlooked. Consider the example of a large wholesaler with warehouses around the globe. By using a uniform business key for 'product' a lot of the integration already happens in a smooth way in the data vault. Creating the data marts and answering questions like what's the total stock for product xyz gets a lot easier this way.
For more information on this subject, see the blogpost of Ronald Damhof and the discussions in the comment area.
So how do we handle this integration in the DWH Deck?
- You can identify the business keys as described in Part 6 of the series describing DWH Deck's functionality.
- And make sure you use the same DWHName in de table section of the main application screen for all tables that describe the same object (e.g. Product)
When you do this the SQL statement to create the data vault table will only be generated once. The SQL statements to create the view and ETL stored procedure to import the data will be generated for each table.