TOWARDSDATASCIENCE.COM
How to Write Queries for Tabular Models with DAX
Introduction EVALUATE is the statement to query tabular models. Unfortunately, knowing SQL or any other query language doesn’t help as EVALUATE follows a different concept. EVALUATE has only two “Parameters”: A table to show A sort order (ORDER BY) You can pass a third parameter (START AT), but this one is rarely used. However, a DAX query can have additional components. Those are defined in the DEFINE section of the query.In the DEFINE section, you can define Variables and local Measures.You can use the COLUMN and TABLE keywords in EVALUATE, which I have never used until now. Let’s start with some simple Queries and add some additional logic step by step. However, first, let’s discuss the Tools. Querying tools There are two possibilities for querying a tabular model: Using the DAX query view in Power BI Desktop. Using DAX Studio. Of course, the syntax is the same. I prefer DAX Studio over DAX query view. It offers advanced features not available in Power BI Desktop, such as performance statistics with Server Timing and displaying the model’s metrics. On the other hand, the DAX query view in Power BI Desktop provides the option to apply changes in a Measure back to the model directly after I have modified them in the query. I will discuss this later when I explain more about the possibility of defining local measures. You can read the MS documentation on modifying Measures directly from the DAX query view. You can find a link to the documentation in the References section below. In this article, I will use DAX Studio only. Simple queries The simplest query is to get all columns and all rows from a table: EVALUATE     Customer This query returns the entire Customer table: Figure 1 – Simple query on the Customer table. The number of returned rows can be found in the bottom right corner of DAX Studio, as well as the position of the cursor in the Query (Figure by the Author) If I want to query the result of a single value, for example, a Measure, I must define a table, as EVALUATE requires a table as input. Curly brackets do this. Therefore, the query for a Measure looks like this: EVALUATE<br>     { [Online Customer Count]} The result is one single value: Figure 2 – Querying a Measure with Curly brackets to define a table (Figure by the Author) Get only the first 10 rows It’s not unusual to have tables with thousands or even millions of rows. So, what if I want to see the first 10 rows to glimpse the data inside the table? For this, TOPN() does the trick. TOPN() accepts a sorting order. However, it doesn’t sort the data; it only looks at the values and gets the first or last rows according to the sorting criteria. For example, let’s get the ten customers with the latest birthdate (Descending order): EVALUATE<br>    TOPN(10<br>        ,Customer<br>        ,Customer[BirthDate]<br>        ,DESC) This is the result: Figure 3 – Here, the result of TOPN() is used to get the top 10 rows by birthdate. See, that 11 rows are returned, as there are customers with the same birthdate (Figure by the Author) The DAX.guide article on TOPN() states the following about ties in the resulting data: If there is a tie in OrderBy_Expression values at the N-th row of the table, then all tied rows are returned. Then, when there are ties at the N-th row, the function might return more than n rows. This explains why we get 11 rows from the query. When sorting the output, we will see the tie for the last value, November 26, 1980. To have the result sorted by the Birthdate, you must add an ORDER BY: EVALUATE<br>    TOPN(10<br>        ,Customer<br>        ,Customer[BirthDate]<br>        ,DESC)<br>    ORDER BY Customer[BirthDate] DESC And here, the result: Figure 4 – Result of the same TOPN() query as before, but with an ORDER BY to sort the output of the query by the Birthday descending (Figure by the Author) Now, the ties at the last two rows are clearly visible. Adding columns Usually, I want to select only a subset of all columns in a table. If I query multiple columns, I will only get the distinct values of the existing combination of values in both columns. This differs from other query languages, like SQL, where I must explicitly define that I want to remove duplicates, for example with DISTINCT. DAX has multiple functions to get a subset of columns from a table: ADDCOLUMNS() SELECTCOLUMNS() SUMMARIZE() SUMMARIZECOLUMNS() Of these four, SUMMARIZECOLUMNS() is the most useful for general purposes. When trying these four functions, be cautious when using ADDCOLUMNS(), as this function can result in unexpected results. Read this SQLBI article for more details. OK, how can we use SUMMARIZECOLUMNS() in a query: EVALUATE<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]) This is the result: Figure 5 – Getting the Distinct values of CustomerType with SUMMARIZECOLUMNS() (Figure by the Author) As described above, we get only the distinct values of the CustomerType column. When querying multiple columns, the result is the distinct combinations of the existing data: Figure 6 – Getting multiple columns (Figure by the Author) Now, I can add a Measure to the Query, to get the number of Customers per combination: EVALUATE<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]<br>                        ,Customer[Gender]<br>                        ,"Number of Customers", [Online Customer Count]) As you can see, a label must be added for the Measure. This applies to all calculated columns added to a query. This is the result of the query above: Figure 7 – Result of the query with multiple columns and a Measure (Figure by the Author) You can add as many columns and measures as you need. Adding filters The function CALCULATE() is well-known for adding filters to a Measure. For queries, we can use the CALCULATETABLE() function, which works like CALCULATE(); only the first argument must be a table. Here, the same query as before, only that the Customer-Type is filtered to include only “Persons”: EVALUATE<br>CALCULATETABLE(<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]<br>                        ,Customer[Gender]<br>                        ,"Number of Customers", [Online Customer Count])<br>                ,'Customer'[CustomerType] = "Person"<br>                ) Here, the result: Figure 8 – Query and result to filter the Customer-Type to “Person” (Figure by the Author) It is possible to add filters directly to SUMMARIZECOLUMNS(). The queries generated by Power BI use this approach. But it is much more complicated than using CALCULATETABLE(). You can find examples for this approach on the DAX.guide page for SUMMARIZECOLUMNS(). Power BI uses this approach when building queries from the visualisations. You can get the queries from the Performance Analyzer in Power BI Desktop. You can read my piece about collecting performance data to learn how to use Performance Analyzer to get a query from a Visual. You can also read the Microsoft documentation linked below, which explains this. Defining Local Measures From my point of view, this is one of the most powerful features of DAX queries: Adding Measures local to the query. The DEFINE statement exists for this purpose. For example, we have the Online Customer Count Measure. Now, I want to add a filter to count only customers of the type “Person”. I can modify the code in the data model or test the logic in a DAX query. The first step is to get the current code from the data model in the existing query. For this, I must place the cursor on the first line of the query. Ideally, I will add an empty line to the query. Now, I can use DAX Studio to extract the code of the Measure and add it to the Query by right-clicking on the Measure and clicking on “Define Measure”: Figure 9 – Use the “Define Measure” feature of DAX Studio to extract the DAX code for a Measure (Figure by the Author) The same feature is also available in Power BI Desktop. Next, I can change the DAX code of the Measure by adding the Filter: DEFINE <br>---- MODEL MEASURES BEGIN ----<br>MEASURE 'All Measures'[Online Customer Count] =<br>    CALCULATE(DISTINCTCOUNT('Online Sales'[CustomerKey])<br>                ,'Customer'[CustomerType] = "Person"<br>                )<br>---- MODEL MEASURES END ---- When executing the query, the local definition of the Measure is used, instead of the DAX code stored in the data model: Figure 10 – Query and results with the modified DAX code for the Measure (Figure by the Author) Once the DAX code works as expected, you can take it and modify the Measure in Power BI Desktop. The DAX query view in Power BI Desktop is advantageous because you can directly right-click the modified code and add it back to the data model. Refer to the link in the References section below for instructions on how to do this. DAX Studio doesn’t support this feature. Putting the pieces together OK, now let’s put the pieces together and write the following query: I want to get the top 5 products ordered by customers. I take the query from above, change the query to list the Product names, and add a TOPN(): DEFINE  ---- MODEL MEASURES BEGIN ---- MEASURE 'All Measures'[Online Customer Count] =     CALCULATE(DISTINCTCOUNT('Online Sales'[CustomerKey])                 ,'Customer'[CustomerType] = "Person"                 ) ---- MODEL MEASURES END ---- EVALUATE     TOPN(5         ,SUMMARIZECOLUMNS('Product'[ProductName]                         ,"Number of Customers", [Online Customer Count]                         )         ,[Number of Customers]         ,DESC)     ORDER BY [Number of Customers] Notice that I pass the measure’s label, “Number of Customers”, instead of its name. I must do it this way, as DAX replaces the measure’s name with the label. Therefore, DAX has no information about the Measure and only knows the label. This is the result of the query: Figure 11 – The query result using TOPN() combined with a Measure. Notice that the label is used instead of the Measures name (Figure by the Author) Conclusion I often use queries in DAX Studio, as it is much easier for Data Validation. DAX Studio allows me to directly copy the result into the Clipboard or write it in an Excel file without explicitly exporting the data. This is extremely useful when creating a result set and sending it to my client for validation. Moreover, I can modify a Measure without changing it in Power Bi Desktop and quickly validate the result in a table. I can use a Measure from the data model, temporarily create a modified version, and validate the results side-by-side. DAX queries have endless use cases and should be part of every Power BI developer’s toolkit. I hope that I was able to show you something new and explain why knowing how to write DAX queries is important for a Data model developer’s daily life. References Microsoft’s documentation about applying changes from the DAX Query view on the model: Update model with changes – DAX query view – Power BI | Microsoft Learn Like in my previous articles, I use the Contoso sample dataset. You can download the ContosoRetailDW Dataset for free from Microsoft here. The Contoso Data can be freely used under the MIT License, as described in this document. I changed the dataset to shift the data to contemporary dates. The post How to Write Queries for Tabular Models with DAX appeared first on Towards Data Science.
0 Kommentare 0 Anteile 40 Ansichten