Saturday, August 17, 2013

P2.3: Computing Complex Aggregations in PowerPivot: Distinct Count & Refreshing the Data Model

This post will describe how to calculate a complex aggregation such as a distinct count of products in PowerPivot. We will be using the data model created in the previous post, from the companion material of the Ferrari PowerPivot book.

Computing Distinct Count

Calculated fields have an useful feature that enables us to compute complex aggregations. In the last post we calculated the gross margin and also the gross percentage of sales using calculated column and field. Today we would like to determine the number of products in each Color.

Let us begin by creating a simple PivotTable showing the SalesAmount by Color of products.

Fig. 1: Sum of SalesAmount by Color of Product
We want to count the number of products, not the number sold under each Color; whether a particular product sold 8,000 units or only 1 unit, the count would only be 1 for that same product (hence distinct count.)

If we add the ProductKey under the DimProduct table to the PivotTable, we can see that the result is not what we were looking for.

Fig. 2: PivotTable with Sum of ProductKey
From the numbers and the column header, the column is displaying the sum of the product keys. It is logical that Excel took the sum because the values are integers. However we want the sum of the number of unique product keys. We can change the method Excel aggregates the data, by changing the settings of ProductKey in the Values area. Click the drop-down arrow and select the Value Field Settings button.

Fig. 3: Value Field Settings Button
In the Value Field Settings window, we can select an attractive option, which also happens to be called Distinct Count. Note that the Custom Name of the column changes to "Distinct Count of ProductKey".

Fig. 4: Value Field Settings- Distinct Count Option
Now let us observe the new PivotTable showing the proper distinct count. The two red arrows point to rows with counted products but no sales. If we want to count products responsible for sales, we should only count those products that were actually sold. 

Fig. 5: PivotTable with Corrected Distinct Count
The correct field to display in the PivotTable is the ProductKey under the FactInternetSales table, because the table has only product keys of products that were sold. This highlights how easily we could choose the wrong field to aggregate, especially for the computation of distinct count. The PivotTable with correct fields is shown below.

Fig. 6: PivotTable with Proper Distinct Count

Note on Refreshing the Data Model:

We have seen in P2.1 that we can add data to the data model in PowerPivot without going through Excel. Data models are dynamic, and thus, they change constantly as columns or fields are calculated. Refresh the data model by clicking the Refresh button in the Home tab of PowerPivot. It will give the option of refreshing the current table or all of the tables in PowerPivot, as highlighted in Figure 7.

Fig. 7: Refresh Button
When PowerPivot refreshes tables, it will gather the data again from the databases and update the tables and the calculated columns accordingly. Calculated fields are not stored in the data model so they are recalculated as required. Because we work with data sets not numbering in the millions of rows, the refresh process is quick. However, it can take some amount of time when many millions of rows are read again from the database. This is code we will explore later on, which will update the tables upon opening, or clicking a certain button.

Thanks for reading,

No comments:

Post a Comment