Pages

Thursday, November 14, 2013

P4.3: Data Modeling in PowerPivot: SQL Query Denormalization

Hello Readers,

This post is a continuation from the last post of understanding normalization in data modeling using PowerPivot. As usual, we will be using the AdventureWorks companion material from the Ferrari book. Let us get started.



SQL Querying in PowerPivot


In the previous post I demonstrated how to denormalize a table in PowerPivot by creating calculated columns using the RELATED command. The RELATED command is an easy tool to understand denormalization. However, with SQL queries we can normalize with ease in a large database. Here we will query in SQL the necessary columns and tables to denormalize a data model. The denormalization process will be shifted outside of PowerPivot versus the previous method of denormalizing within PowerPivot using the RELATED command.


A SQL query is a statement written in SQL language used in many modern relational databases. PowerPivot has a built-in SQL query designer, of which we will discuss further.


Begin by adding AdventureWorks data from SQL Server. In the Home tab of PowerPivot, click from Database and From SQL Server option.



Fig. 1: Add External Data from SQL Server in PowerPivot
The Table Import Wizard will appear. Type in the Server name and select the Database name from the drop-down menu. In this case, the AdventureWorks tables are located in the POWER Database.


Fig. 2: Connecting to SQL Server Database
After a connection is established, select Write a query because we will demonstrate how to use SQL queries to denormalize a data model.


Fig. 3: Use Query Method
Next we are given the option to enter the SQL query. Knowing SQL, we can type in:

SELECT
  DimProduct.ProductKey
  ,DimProductCategory.EnglishProductCategoryName
  ,DimProductSubcategory.EnglishProductSubcategoryName
FROM
  DimProductSubcategory
  INNER JOIN DimProductCategory
  ON DimProductSubcategory.ProductCategoryKey = DimProductCategory.ProductCategoryKey
  INNER JOIN DimProduct
  ON DimProductSubcategory.ProductSubcategoryKey = DimProduct.ProductSubcategoryKey

But for now, click the Design button as show below in Figure 4 to use the SQL query designer.


Fig. 4: Select PowerPivot Query Designer
With the SQL query Designer open, select the ProductKey from DimProductEnglishProductCategoryName from DimProductCategory, and EnglishProductSubcategoryName from DimProductSubcategory.


Fig. 5:  SQL Query Designer
In the Relationships section, the DimProductSubcategory has the Primary Keys to the categories and subcategories to the DimProduct and DimProductCategory tables. This illustrates the one-to-many relationship, indicated by the left and right tables. The Join Fields are shown in Figure 5 as well. Now click OK, and we get SQL query statement designed by the point and click SQL query designer (Figure 6.)


Fig. 6: SQL Query Statement from SQL Query Designer
Hitting Finish, the Table Import Wizard implements the query statement and imports the specified table into the PowerPivot database.


Fig. 7: Query Table
Now we have denormalized a data model by using SQL queries to retrieve columns from different tables into one table. Note that 397 rows were imported. This is less than the 606 rows in the DimProduct. Where did the other 209 rows go? The 397 rows from the query is a result of the type of relationship between the tables, in this case is an INNER JOIN, which only returns rows with matching values in both tables. The next post will discuss the different types of JOINS.


Thanks for reading,



Wayne

No comments:

Post a Comment