Pages

Tuesday, November 5, 2013

P4.1: Data Modeling in PowerPivot: Relationships

When we wanted to merge two table together, an option would be to use the VLOOKUP function. To analyze the many tables in a data model, VLOOKUP is inefficient and is replaced with relationships, an 'automatic VLOOKUP', which allows us to link entire tables at a time instead of referencing single columns.

As usual, we will be using the Adventure Works companion material (Chapter 4) from the Ferrari book.


Creating the Data Model


Instead of using VLOOKUP multiple times to add columns into the DimProduct table, we will use relationships to merge all three tables together. Opening the PowerPivot window in Diagram View from the workbook we see that the relationships between the tables have already been formed, because PowerPivot detected existing relationships (will be elaborated in further posts.)


Fig. 1: PowerPivot Diagram View
The arrow lines are representations of relationships in Figure 1. The arrow starts at DimProduct, the source table and points to DimProductSubcategory, the target table, and the connection is finished as DimProductSubcategory is linked to the DimProductCategory table. In the measure field towards the bottom half in grid view of the DimProduct table, we can add a calculated field for the number of products, NumOfProducts


Fig. 2: Creating NumOfProducts Calculated Field
Now we can create a PivotTable tabulating the number of products for each ProductCategory from the DimProduct table. The ProductCategory is actually the EnglishProductCategoryName field in the DimProductCategory table, two relationships away, connected by the ProductSubcategoryKey and the ProductCategoryKey.


Fig. 3: ProductCategory PivotTable
Data models are much more efficient in performance compared to VLOOKUP, especially when dealing with large data sets for computational power, and in multiple tables for a chain of relationships. Also, the NumOfProduct field calculation is available for any PivotTable using the data model, so we do not have to create a new definition every time.

So a data model is a set a tables connected by relationships, sometimes with calculated fields or columns. Simple yet powerful.


Notes on Relationships


Though these relationships will not help much in your love life, the following details on PowerPivot relationships will make your life easier when creating and understanding data models.

Looking at the Diagram View of the tables and relationships, we focus on the relationship arrow from DimProduct to DimProductSubcategory.

  • DimProduct is the source table (the beginning of the relationship)
  • DimProductSubcategory is the target table (where the values relate to source table)
  • ProductSubcategoryKey column is the Foreign Key in source table (contains values which is searched in target table for related row)
  • ProductSubcategoryKey column is the Primary Key in the target table (needs to have unique values in each row)

These are shown highlighted in blue below:


Fig. 4: DimProduct and DimProductSubcategory Relationship
A specific product can only have one subcategory, whereas a subcategory may have many products. That is why the target table (DimProductSubcategory) must have an unique Primary Key (ProductSubcategory) for the products in the DimProduct table. This type of relationship is called "one-to-many", with the target table being "one side", and the source table being the "many side" of the relationship.

We can test the relationship by recreating it. Click the arrow and press Delete. Then in the Design tab, click Create Relationship.


Fig. 5: Creating a Relationship
In the drop-down boxes, we would want to specify the DimProduct and DimProductSubcategory as tables, with the ProductSubcategoryKey as columns from both. However, if we accidentally flip the tables, and confuse the "one-to-many" to "many-to-one" tables, PowerPivot automatically detects the columns in the tables and lets us know it will create the relationship in the opposite direction.

So now you know the inner workings of relationships in data models. Some DAX functions work only when invoked on the correct side of a relationship, so sides do matter. Stay tuned for more on data modeling!


Thanks for reading,

Wayne

No comments:

Post a Comment