14 May 2024

🗒️Graphical Representation: Sparklines [Notes]

Example Sparklines within Groups

Sparkline

  • {definition} "small, intense, simple, word-sized graphic with typographic resolution" [1]
  • {timeline}
    • [1993] initially considered as 'intense continuous time-series' by Tufte [2]
    • [2006] the term is introduced by Tufte [1]
    • [2009] introduced in Microsoft Excel 2010
    • ⇐ see also [3] for a broader timeline
  • {characteristic} small
    • considered as tiny charts 
  • {characteristic} word-sized graphic (aka word-like)
    • comparable to words and letters
    • their distributions on a page are like sentences and paragraphs [1]
  • {characteristic} inline
    • can be everywhere a word or number can be
    • e.g. embedded in a sentence, table, headline, map, spreadsheet, graphic [1]
  • {characteristic} minimalistic
    • no grid lines 
    • other visual elements are used occasionally, though kept to a minimum 
  • {characteristic} approximate
    • it isn't meant to give precise values, though is precise enough for its scope
  • {characteristic} compact
    • vastly increase the amount of data within readers' eye-span
    • aggregates pattern along with plenty of local detail [1]
    • allows for speed and convenience [1]
  • {characteristic} provides context
    • enables us to put numbers in context 
  • {characteristic} typographic resolution
    • "work at intense resolutions, at the level of good typography and cartography" [1]
  • forms
    • line graph
    • bar chart 
    • win/loss
  • scope
    • show trends in a series of values
    • highlight maximum and minimum values
    • show recent change in relation to past changes
    • make comparisons across the lines and/or within groups
  • supports
    • time series
      • e.g.  seasonal increases or decreases, economic cycles
    • binary data
      • e.g. presence/absence, occurrence/non-occurrence, win/loss [1]
    • multivariate data
      • can simultaneously accommodate several variables
  • {recommendation} position a sparkline near its data for greatest impact

References:
[1] Edward R Tufte (2006) "Beautiful Evidence"
[2] Edward R Tufte (1983) "The Visual Display of Quantitative Information"
[3] Wikipedia (2023) Sparklines (link)

07 May 2024

🏭🗒️Microsoft Fabric: The Metrics Layer [Notes] [new feature]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 07-May-2024

The Metrics Layer in Microsoft Fabric (adapted diagram)
The Metrics Layer in Microsoft Fabric (adapted diagram)

[new feature] Metrics Layer (Metrics Store)

  • {definition}an abstraction layer available between the data store(s) and end users which allows organizations to create standardized business metrics, that are rooted in measures and are discoverable and intended for reuse
    • ⇐ {important} feature still in private preview 
  • {goal} extend existing infrastructure 
    • {benefit} leverages and extends existing features
  • {goal} provide consistent definitions and descriptions [1]
    • consistent definitions that include besides business logic additional dimensions and filters [1]
    • ⇒ {benefit} allows to standardize the metrics across the organization
    • ⇒ {benefit} enforce to enforce a SSoT
  • {goal} easy management 
    • via management views 
    • [feature] lineage 
    • [feature] source control
    • [feature] duplicate identification
    • [feature] push updates to downstream uses of the metrics 
  • {goal}searchable and discoverable metrics 
    • {feature} integration
      • based on Sempy fabric package
        • ⇐ a dataframe for storage and propagation of Power BI metadata which is part of the python-based semantic Link in Fabric
  • {goal}trust
    • [feature] trust indicators
    • {benefit} facilitates report's adoption
  • {feature} metric set 
    • {definition} a Fabric item that groups together a set of metrics into a mini-model
    • {benefit} allows to reduce the overall complexity of semantic models, while being easy to evolve and consume
    • associated with a single domain
      • ⇒ supports the data mesh architecture
    • shareable 
      • can be shared with other users
    • {action} create metric set
      • creates the actual artifact, to which metrics can be added 
  • {feature} metric
    •  {definition} a way to elevate the measures from the various semantic models existing in the organization
    • tied to the original semantic model
      • ⇒ {benefit} allows to see how a metric is used across the solutions 
    • reusable
      • can be reused in other fabric artifacts
        • new reports on the Power BI service
        • notebooks 
          • by copying the code
      • can be reused in Power BI
        • via OneLake data hub menu element
      • can be chained 
        • changes are propagated downstream 
    • materializable 
      • its output can be persisted to OneLake by saving it a delta table into a lakehouse
      • {misuse} data is persisted unnecessarily
    • {action} elevate metric
      • copies measure's definition and description
      • ⇒  implies restructuring, refactoring, moving, and testing a lot of code in the process
      • {misuse} data professionals build everything as metrics
    • {action} update metric
    • {action} add filters to metric 
    • {action} add dimensions to metric
    • {action} materialize metric 

Acronyms:
SSoT - single source of truth ()

References:
[1] Power BI Tips (2024) Explicit Measures Ep. 236: Metrics Hub, Hot New Feature with Carly Newsome (link)
[2] Power BI Tips (2024) Introducing Fabric Metrics Layer / Power Metrics Hub [with Carly Newsome] (link)

06 May 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part III: The Metrics Layer [new feature])

Introduction

One of the announcements of this year's Microsoft Fabric Community first conference was the introduction of a metrics layer in Fabric which "allows organizations to create standardized business metrics, that are rooted in measures and are discoverable and intended for reuse" [1]. As it seems, the information content provided at the conference was kept to a minimum given that the feature is still in private preview, though several webcasts start to catch up on the topic (see [2], [4]). Moreover, as part of their show, the Explicit Measures (@PowerBITips) hosts had Carly Newsome as invitee, the manager of the project, who unveiled more details about the project and the feature, details which became the main source for the information below. 

The idea of a metric layer or metric store is not new, data professionals occasionally refer to their structure(s) of metrics as such. The terms gained weight in their modern conception relatively recently in 2021-2022 (see [5], [6], [7], [8], [10]). Within the modern data stack, a metrics layer or metric store is an abstraction layer available between the data store(s) and end users. It allows to centrally define, store, and manage business metrics. Thus, it allows us to standardize and enforce a single source of truth (SSoT), respectively solve several issues existing in the data stacks. As Benn Stancil earlier remarked, the metrics layer is one of the missing pieces from the modern data stack (see [10]).

Microsoft's Solution

Microsoft's business case for metrics layer's implementation is based on three main ideas (1) duplicate measures contribute to poor data quality, (2) complex data models hinder self-service, (3) reduce data silos in Power BI. In Microsoft's conception the metric layer provides several benefits: consistent definitions and descriptions, easy management via management views, searchable and discoverable metrics, respectively assure trust through indicators. 

For this feature's implementation Microsoft introduces a new Fabric Item called a metric set that allows to group several (business) metrics together as part of a mini-model that can be tailored to the needs of a subset of end-users and accessed by them via the standard tools already available. The metric set becomes thus a mini-model. Such mini-models allow to break down and reduce the overall complexity of semantic models, while being easy to evolve and consume. The challenge will become then on how to break down existing and future semantic models into nonoverlapping mini-models, creating in extremis a partition (see the Lego metaphor for data products). The idea of mini-models is not new, [12] advocating the idea of using a Master Model, a technique for creating derivative tabular models based on a single tabular solution.

A (business) metric is a way to elevate the measures from the various semantic models existing in the organization within the mini-model defined by the metric set. A metric can be reused in other fabric artifacts - currently in new reports on the Power BI service, respectively in notebooks by copying the code. Reusing metrics in other measures can mean that one can chain metrics and the changes made will be further propagated downstream. 

The Metrics Layer in Microsoft Fabric (adapted diagram)
The Metrics Layer in Microsoft Fabric (adapted diagram)

Every metric is tied to the original semantic model which allows thus to track how a metric is used across the solutions and, looking forward to Purview, to identify data's lineage. A measure is related to a "table", the source from which the measure came from.

Users' Perspective

The Metrics Layer feature is available in Microsoft Fabric service for Power BI within the Metrics menu element next to Scorecards. One starts by creating a metric set in an existing workspace, an operation which creates the actual artifact, to which the individual metrics are added. To create a metric, a user with build permissions can navigate through the semantic models across different workspaces he/she has access to, pick a measure from one of them and elevate it to a metric, copying in the process its measure's definition and description. In this way the metric will always point back to the measure from the semantic model, while the metrics thus created are considered as a related collection and can be shared around accordingly. 

Once a metric is added to the metric set, one can add in edit mode dimensions to it (e.g. Date, Category, Product Id, etc.). One can then further explore a metric's output and add filters (e.g. concentrate on only one product or category) point from which one can slice-and-dice the data as needed.

There is a panel where one can see where the metric has been used (e.g. in reports, scorecards, and other integrations), when was last time refreshed, respectively how many times was used. Thus, one has the most important information in one place, which is great for developers as well as for the users. Probably, other metadata will be added, such as whether an increase in the metric would be favorable or unfavorable (like in Tableau Pulse, see [13]) or maybe levels of criticality, an unit of measure, or maybe its type - simple metric, performance indicator (PI), result indicator (RI), KPI, KRI etc.

Metrics can be persisted to the OneLake by saving their output to a delta table into the lakehouse. As demonstrated in the presentation(s), with just a copy-paste and a small piece of code one can materialize the data into a lakehouse delta table, from where the data can be reused as needed. Hopefully, the process will be further automated. 

One can consume metrics and metrics sets also in Power BI Desktop, where a new menu element called Metric sets was added under the OneLake data hub, which can be used to connect to a metric set from a Semantic model and select the metrics needed for the project. 

Tapping into the available Power BI solutions is done via an integration feature based on Sempy fabric package, a dataframe for storage and propagation of Power BI metadata which is part of the python-based semantic Link in Fabric [11].

Further Thoughts

When dealing with a new feature, a natural idea comes to mind: what challenges does the feature involve, respectively how can it be misused? Given that the metrics layer can be built within a workspace and that it can tap into the existing measures, this means that one can built on the existing infrastructure. However, this can imply restructuring, refactoring, moving, and testing a lot of code in the process, hopefully with minimal implications for the solutions already available. Whether the process is as simple as imagined is another story. As misusage, in extremis, data professionals might start building everything as metrics, though the danger might come when the data is persisted unnecessarily. 

From a data mesh's perspective, a metric set is associated with a domain, though there will be metrics and data common to multiple domains. Moreover, a mini-model has the potential of becoming a data product. Distributing the logic across multiple workspaces and domains can add further challenges, especially in what concerns the synchronization and implemented of requirements in a way that doesn't lead to bottlenecks. But this is a general challenge for the development team(s). 

The feature will probably suffer further changes until is released in public review (probably by September or the end of the year). I subscribe to other data professionals' opinion that the feature was for long needed and that can have an important impact on the solutions built. 

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Fabric Blog (2024) Announcements from the Microsoft Fabric Community Conference (link)
[2] Power BI Tips (2024) Explicit Measures Ep. 236: Metrics Hub, Hot New Feature with Carly Newsome (link)
[3] Power BI Tips (2024) Introducing Fabric Metrics Layer / Power Metrics Hub [with Carly Newsome] (link)
[4] KratosBI (2024) Fabric Fridays: Metrics Layer Conspiracy Theories #40 (link)
[5] Chris Webb's BI Blog (2022) Is Power BI A Semantic Layer? (link)
[6] The Data Stack Show (2022) TDSS 95: How the Metrics Layer Bridges the Gap Between Data & Business with Nick Handel of Transform (link)
[7] Sundeep Teki (2022) The Metric Layer & how it fits into the Modern Data Stack (link)
[8] Nick Handel (2021) A brief history of the metrics store (link)
[9] Aurimas (2022) The Jungle of Metrics Layers and its Invisible Elephant (link)
[10] Benn Stancil (2021) The missing piece of the modern data stack (link)
[11] Microsoft Learn (2024) Sempy fabric Package (link)
[12] Michael Kovalsky (2019) Master Model: Creating Derivative Tabular Models (link)
[13] Christina Obry (2023) The Power of a Metrics Layer - and How Your Organization Can Benefit From It (link
[14] KratosBI (2024) Introducing the Metrics Layer in #MicrosoftFabric with Carly Newsome [link]

29 April 2024

⚡️Power BI: Working with Visual Calculations (Part III: Matrix Tables with Square Numbers as Example)

Introduction

In the previous post I exemplified various operations that can be performed with visual calculations on simple tables based on square numbers. Changing the simple table to a matrix table doesn't bring any benefit. The real benefit comes when one restructures the table to store only a cell per row in a table. 

Data Modelling

For this the Magic5 table can be transformed via the following code, which creates a second table (e.g. M5):

M5 = UNION (
    SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C1]
     , "Col", "C1"
    )
    , SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C2]
     , "Col", "C2"
    )
    , SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C3]
     , "Col", "C3"
    )
    ,  SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C4]
     , "Col", "C4"
    )
    , SUMMARIZECOLUMNS(
      Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C5]
     , "Col", "C5"
    )
)

Once this done, one can add the column [Col] as values for the matrix in a new visual. From now on, all the calculations can be done on copies of this visual. 

Simple Operations

The behavior of the RUNNINGSUM and other functions is different when applied on a matrix table because the formula is applied to every cell of the N*N table, a column with the result being added for each existing column of the matrix.

Moreover, there are four different ways of applying the formula based on the Axis used. ROW calculates the formula by the row within a column:

Run SumByRow(C) = RUNNINGSUM([C], ROWS)
Output:
R C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C)
R1 18 18 25 25 2 2 9 9 11 11
R2 4 22 6 31 13 15 20 29 22 33
R3 15 37 17 48 24 39 1 30 8 41
R4 21 58 3 51 10 49 12 42 19 60
R5 7 65 14 65 16 65 23 65 5 65

By providing COLUMNS as parameter for the Axis makes the calculation run by the column within a row: 

Run SumByCol(C) = RUNNINGSUM([C], COLUMNS)
Output:
R C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C)
R1 18 18 25 43 2 45 9 54 11 65
R2 4 4 6 10 13 23 20 43 22 65
R3 15 15 17 32 24 56 1 57 8 65
R4 21 21 3 24 10 34 12 46 19 65
R5 7 7 14 21 16 37 23 60 5 65

By providing ROW COLUMNS as parameter for the Axis makes the calculation run by the column and then continuing the next column (without resetting the value at the end of the column):
Run SumByRow-Col(C) = RUNNINGSUM([C],ROWS COLUMNS)
Output:
R C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C)
R1 18 18 25 90 2 132 9 204 11 271
R2 4 22 6 96 13 145 20 224 22 293
R3 15 37 17 113 24 169 1 225 8 301
R4 21 58 3 116 10 179 12 237 19 320
R5 7 65 14 130 16 195 23 260 5 325

By providing COLUMNS ROWS as parameter for the Axis makes the calculation run by the row and then continuing the next row (without resetting the value at the end of the column):
Run SumByCol-Row = RUNNINGSUM([C],COLUMNS ROWS)
Output:
R C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C) C Run Sum(C)
R1 18 18 25 43 2 45 9 54 11 65
R2 4 69 6 75 13 88 20 108 22 130
R3 15 145 17 162 24 186 1 187 8 195
R4 21 216 3 219 10 229 12 241 19 260
R5 7 267 14 281 16 297 23 320 5 325

Ranking

RANK can be applied independent of the values, or considering the value with ASC or DESC sorting:
RankByRow = RANK(DENSE,ROWS) -- ranking by row independent of values
RankByRow ASC = RANK(DENSE,ROWS, ORDERBY([C],ASC)) -- ranking by row ascending
RankByRow DESC = RANK(DENSE,ROWS, ORDERBY([C], DESC)) -- ranking by row descending
RankByRow-Col ASC = RANK(DENSE,ROWS COLUMNS, ORDERBY([C],ASC)) -- ranking by row columns ascending
RankByRow-Col DESC = RANK(DENSE,ROWS COLUMNS, ORDERBY([C], DESC)) -- ranking by row columns ascending

[RankByRow-Col ASC] matches the actual numbers from the matrix and is thus useful when sorting any numbers accordingly. 

Differences

Differences can be calculated between any of the cells of the matrix:
DiffToPrevByRow = [C] - PREVIOUS([C])  -- difference to previous record
DiffToPrevByRow* = IF(NOT(IsBlank(PREVIOUS([C]))), [C] - PREVIOUS([C])) -- extended difference to previous record
DiffToPrevByRow-Col = [C] - PREVIOUS([C],, ROWS COLUMNS) -- difference to previous record by ROWS COLUMNS
DiffToFirstByRow = [C] - FIRST([C]) -- difference to first record
DiffToPrevByCol = [C] - FIRST([C], COLUMNS) -- difference to previous record COLUMNS

Ranking = RANK(DENSE, ROWS COLUMNS, ORDERBY([C], ASC)) -- ranking of values by ROWS COLUMNS
OffsetDiffToPrevByRow = [C] - calculate([C], OFFSET(1, ROWS, ORDERBY([Ranking],DESC))) -- difference to the previous record by ROW
OffsetDiffToPrevByRow-Col = [C] - calculate([C], OFFSET(1, ROWS COLUMNS, ORDERBY([Ranking],DESC))) -- difference to the previous record by ROW

Ranking has been introduced to facilitate the calculations based on OFFSET.

The other functions [1] can be applied similarly.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)

⚡️Power BI: Working with Visual Calculations (Part II: Simple Tables with Square Numbers as Example)

Introduction

The records behind a visual can be mentally represented as a matrix, the visual calculations allowing to tap into this structure intuitively and simplify many of the visualizations used. After a general test drive of the functionality, it makes sense to dive deeper into the topic to understand more about the limitations, functions behavior and what it takes to fill the gaps. This post focuses on simple tables, following in a next post to focus on matrices and a few other topics. 

For exemplification, it makes sense to use a simple set of small numbers that are easy to work with, and magic squares seem to match this profile. A magic square is a matrix of positive sequential numbers in which each row, each column, and both main diagonals are the same [1]. Thus, a square of order N has N*N numbers from 1 to N*N, the non-trivial case being order 3. However, from the case of non-trivial squares, the one of order 5 provides a low order and allows hopefully the minimum needed for exemplification:

18252911
46132022
15172418
213101219
71416235
17131925

Data Modeling

One magic square should be enough to exemplify the various operations, though for testing purposes it makes sense to have a few more squares readily available. Each square has an [Id], [C1] to [C5] corresponds to matrix's columns, while [R] stores a row identifier which allows to sort the values the way they are stored in the matrix:

let
    Source = #table({"Id","C1","C2","C3","C4","C5","R"}
, {
{1,18,25,2,9,11,"R1"},
{1,4,6,13,20,22,"R2"},
{1,15,17,24,1,8,"R3"},
{1,21,3,10,12,19,"R4"},
{1,7,14,16,23,5,"R5"},
{2,1,7,13,19,25,"R1"},
{2,14,20,21,2,5,"R2"},
{2,22,3,9,15,16,"R3"},
{2,10,11,17,23,4,"R4"},
{2,18,24,5,6,12,"R5"},
{3,1,2,22,25,15,"R1"},
{3,9,10,16,11,19,"R2"},
{3,17,23,13,5,7,"R3"},
{3,24,12,6,20,3,"R4"},
{3,14,18,8,4,21,"R5"},
{4,22,6,3,18,16,"R1"},
{4,4,14,11,15,21,"R2"},
{4,5,8,12,23,17,"R3"},
{4,25,13,19,7,1,"R4"},
{4,9,24,20,2,10,"R5"},
{5,5,9,20,25,6,"R1"},
{5,13,15,2,11,24,"R2"},
{5,21,1,23,3,17,"R3"},
{5,19,18,4,14,10,"R4"},
{5,7,22,16,12,8,"R5"}
}
),
    #"Changed Type to Number" = Table.TransformColumnTypes(Source,{{"C1", Int64.Type}, {"C2", Int64.Type}, {"C3", Int64.Type}, {"C4", Int64.Type}, {"C5", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type to Number",{{"Id", Order.Ascending}, {"R", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type)
in
    #"Added Index"

The column names and the row identifier could have been numeric values from 1 to 5, though it could have been confounded with the actual numeric values.

In addition, the columns [C1] to [C5] were formatted as integers and an index was added after sorting the values after [Id] and [R]. Copy the above code as a Blank Query in Power BI and change the name to Magic5. 

Prerequisites

For the further steps you'll need to enable visual calculations in Power BI Developer via:
File >> Options and settings >> Options >> Preview features >> Visual calculations >> (check)

Into a Table visual drag and drop [R], [C1] to [C5] as column and make sure that the records are sorted ascending by [R]. To select only a square, add a filter based on the [Id] and select the first square. Use further copies of this visual for further tests. 

Some basic notions of Algebra are recommended but not a must. If you worked with formulas in Excel, then you are set to go. 

In Mathematics a matrix starts from the top left side and one moves on the rows (e.g. 18, 25, 2, ...) and then on the columns. With a few exceptions in which the reference is based on the latest value from a series (see Exchange rates), this is the direction that will be followed. 

Basic Operations

Same as in Excel [C1] + [C2] creates a third column in the matrix that stores the sum of the two. The sum can be further applies to all the columns:

Sum(C) = [C1] + [C2] + [C3] + [C4] + [C5] -- sum of all columns (should amount to 65)

The column can be called "Sum", "Sum(C)" or any other allowed unique name, though the names should be meaningful, useful, and succinct, when possible.

Similarly, one can work with constants, linear or nonlinear transformations (each formula is a distinct calculation):

constant = 1 -- constant value
linear = 2*[C1] + 1 -- linear translation: 2*x+1
linear2 = 2*[C1] + [constant] -- linear translation: 2*x+1
quadratic = Power([C1],2) + 2*[C1] + 1 -- quadratic translation: x^2+2*x+1 quadratic2 = Power([C1],2) + [linear] -- quadratic translation: x^2+2*x+1
Output:
R C1 constant linear linear2 quadratic quadratic2
R1 18 1 37 37 361 361
R2 4 1 9 9 25 25
R3 15 1 31 31 256 256
R4 21 1 43 43 484 484
R5 7 1 15 15 64 64
Please note that the output was duplicated in Excel (instead of making screenshots).

Similarly, can be build any type of formulas based on one or more columns.

With a simple trick, one can use DAX functions like SUMX, PRODUCTX, MINX or MAXX as well:

Sum2(C) = SUMX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- sum of all columns
Prod(C) = PRODUCTX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- product of all columns
Avg(C) = AVERAGEX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- average of all columns
Min(C) = MINX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- minimum value of all columns
Max(C) = MAXX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- maximum value of all columns
Count(C) = COUNTX({[C1], [C2], [C3], [C4], [C5]},[Value]) -- counts the number of columns
Output:
C1 C2 C3 C4 C5 Sum(C) Avg(C) Prod(C) Min(C) Max(C) Count(C)
18 25 2 9 11 65 13 89100 2 25 5
4 6 13 20 22 65 13 137280 4 22 5
15 17 24 1 8 65 13 48960 1 24 5
21 3 10 12 19 65 13 143640 3 21 5
7 14 16 23 5 65 13 180320 5 23 5

Unfortunately, currently there seems to be no way available for applying such calculations without referencing the individual columns. 

Working across Rows

ROWNUMBER and RANK allow to rank a cell within a column independently, respectively dependently of its value:

Ranking = ROWNUMBER() -- returns the rank in the column (independently of the value)
RankA(C) = RANK(DENSE, ORDERBY([C1], ASC)) -- ranking of the value (ascending) 
RankD(C) = RANK(DENSE, ORDERBY([C1], DESC)) -- ranking of the value (descending) 
Output:
R C1 Ranking RankA(C) RankD(C)
R1 18 1 4 2
R2 4 2 1 5
R3 15 3 3 3
R4 21 4 5 1
R5 7 5 2 4

PREVIOUS, NEXT, LAST and FIRST allow to refer to the values of other cells within the same column:

Prev(C) = PREVIOUS([C1]) -- previous cell
Next(C) = NEXT([C1])  -- next cell
First(C) = FIRST([C1]) -- first cell
Last(C) = LAST([C1]) -- last cell
Output:
R C1 Prev(C) NextC) First(C) Last(C)
R1 18 4 18 7
R2 4 18 15 18 7
R3 15 4 21 18 7
R4 21 15 7 18 7
R5 7 21 18 7

OFFSET is a generalization of these functions

offset(2) = calculate([C1], offset(2)) -- 
offset(-2) = calculate([C1], offset(-2))
Ind = ROWNUMBER() -- index
inverse = calculate([C1], offset(6-2*[Ind])) -- inversing the values based on index
Output:
R C1 offset(2) offset(-2) ind inverse
R1 18 15 1 7
R2 4 21 2 21
R3 15 7 18 3 15
R4 21 4 4 4
R5 7 15 5 18

The same functions allow to calculate the differences for consecutive values:

DiffToPrev(C) = [C1] - PREVIOUS([C1]) -- difference to previous 
DiffToNext(C) = [C1] - PREVIOUS([C1]) -- difference to next 
DiffTtoFirst(C) = [C1] - FIRST([C1]) -- difference to first
DiffToLast(C) = [C1] - LAST([C1]) -- difference to last
Output:
R C1 DiffToPrev(C) DiffToNextC) DiffToFirst(C) DiffToLast(C)
R1 18 18 14 0 11
R2 4 -14 -11 -14 -3
R3 15 11 -6 -3 8
R4 21 6 14 3 14
R5 7 -14 7 -11 0

DAX makes available several functions for working across the rows of the same column. Two of the useful functions are RUNNINGSUM and MOVINGAVERAGE:

Run Sum(C) = RUNNINGSUM([C1]) -- running sum
Moving Avg3(C) = MOVINGAVERAGE([C1], 3) -- moving average for the past 3 values
Moving Avg2(C) = MOVINGAVERAGE([C1], 2) -- moving average for the past 2 values

Unfortunately, one can use only the default sorting of the table with the functions that don't support the ORDERBY parameter. Therefore, when the table needs to be sorted descending and the RUNNINGSUM calculated ascending, for the moment there's no solution to achieve this behavior. However, it appears that Microsoft is planning to implement a solution for this issue.

RUNNINGSUM together with ROWNUMBER can be used to calculate a running average:

Run Avg(C) = DIVIDE(RUNNINGSUM([C1]), ROWNUMBER()) -- running average
Output:
R C1 Run Sum(C) Moving Avg3(C) Moving Avg2(C) Run Avg(C)
R1 18 18 18 18 18
R2 4 22 11 11 11
R3 15 37 12.33 9.5 12.33
R4 21 58 13.33 18 14.5
R5 7 65 14.33 14 13

With a mathematical trick that allows to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see the solution in SQL), one can run the PRODUCT across rows, though the values must be small enough to allow their multiplication without running into issues:

Ln(C) = IFERROR(LN([C1]), Blank()) -- applying the natural logarithm
Sum(Ln(C)) = RUNNINGSUM([Ln(C)]) -- running sum
Run Prod(C) = IF(NOT(ISBLANK([Sum(Ln(C))])), Exp([Sum(Ln(C))])) -- product across rows
Output:
R C1 Ln(C) Sum(Ln(C)) Run Prod(C)
R1 18 2.89 2.89 18
R2 4 1.39 4.28 72
R3 15 2.71 6.98 1080
R4 21 3.04 10.03 22680
R5 7 1.95 11.98 158760

These three calculations could be brought into a single formula, though the result could be more difficult to troubleshoot. The test via IsBlank is necessary because otherwise the exponential for the total raises an error. 

Considering that when traversing a column it's enough to remember the previous value, one can build MIN and MAX functionality across a column: 

Run Min = IF(OR(Previous([C1]) > [C1], IsBlank(Previous([C1]))), [C1], Previous([C1])) -- minimum value across rows
Run Max = IF(OR(Previous([C1]) < [C1], IsBlank(Previous([C1]))), [C1], Previous([C1])) -- maximum across rows

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Wikipedia (2024) Magic Squares (online)
[2] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)

25 April 2024

📊Graphical Representation: Graphics We Live By (Part III: Exchange Rates in Power BI)

Graphical Representation Series
Graphical Representation Series

An exchange rate (XR) is the rate at which one currency will be exchanged for another currency, and thus XRs are used in everything related to trades, several processes in Finance relying on them. There are various sources for the XR like the European Central Bank (ECB) that provide the row data and various analyses including graphical representations varying in complexity. Conversely, XRs' processing offers some opportunities for learning techniques for data visualization. 

On ECB there are monthlyyearly, daily and biannually XRs from EUR to the various currencies which by triangulation allow to create XRs for any of the currencies involved. If N currencies are involved for one time unit in the process (e.g. N-1 XRs) , the triangulation generates NxN values for only one time division, the result being tedious to navigate. A matrix like the one below facilitates identifying the value between any of the currencies:


The table needs to be multiplied by 12, the number of months, respectively by the number of years, and filter allowing to navigate the data as needed. For many operations is just needed to look use the EX for a given time division. There are however operations in which is needed to have a deeper understanding of one or more XR's evolution over time (e.g. GBP to NOK). 

Moreover, for some operations is enough to work with two decimals, while for others one needs to use up to 6 or even more decimals for each XR. Occasionally, one can compromise and use 3 decimals, which should be enough for most of the scenarios. Making sense of such numbers is not easy for most of us, especially when is needed to compare at first sight values across multiple columns. Summary tables can help:

Statistics like Min. (minimum), Max. (maximum), Max. - Min. (range), Avg. (average) or even StdDev. (standard deviation) can provide some basis for further analysis, while sparklines are ideal for showing trends over a time interval (e.g. months).

Usually, a heatmap helps to some degree to navigate the data, especially when there's a plot associated with it:

In this case filtering by column in the heatmap allows to see how an XR changed for the same month over the years, while the trendline allows to identify the overall tendency (which is sensitive to the number of years considered). Showing tendencies or patterns for the same month over several years complements the yearly perspective shown via sparklines.

Fortunately, there are techniques to reduce the representational complexity of such numbers. For example, one can use as basis the XRs for January (see Base Jan), and represent the other XRs only as differences from the respective XR. Thus, in the below table for February is shown the XR difference between February and January (13.32-13.22=0.10). The column for January is zero and could be omitted, though it can still be useful in further calculations (e.g. in the calculation of averages) based on the respective data..

This technique works when the variations are relatively small (e.g. the values vary around 0). The above plots show the respective differences for the whole year, respectively only for four months. Given a bigger sequence (e.g. 24, 28 months) one can attempt to use the same technique, though there's a point beyond which it becomes difficult to make sense of the results. One can also use the year end XR or even the yearly average for the same, though it adds unnecessary complexity to the calculations when the values for the whole year aren't available. 

Usually, it's recommended to show only 3-5 series in a plot, as one can better distinguish the trends. However, plotting all series allows to grasp the overall pattern, if any. Thus, in the first plot is not important to identify the individual series but to see their tendencies. The two perspectives can be aggregated into one plot obtained by applying different filtering. 

Of course, a similar perspective can be obtained by looking at the whole XRs:

The Max.-Min. and StdDev (standard deviation for population) between the last and previous tables must match. 

Certain operations require comparing the trends of two currencies. The first plot shows the evolution NOK and SEK in respect to EUR, while the second shows only the differences between the two XRs:


The first plot will show different values when performed against other currency (e.g. USD), however the second plot will look similarly, even if the points deviate slightly:

Another important difference is the one between monthly and yearly XRs, difference depicted by the below plot:

The value differences between the two XR types can have considerable impact on reporting. Therefore, one must reflect in analyses the rate type used in the actual process. 

Attempting to project data into the future can require complex techniques, however, sometimes is enough to highlight a probable area, which depends also on the confidence interval (e.g. 85%) and the forecast length (e.g. 10 months):

Every perspective into the data tends to provide something new that helps in sense-making. For some users the first table with flexible filtering (e.g. time unit, currency type, currency from/to) is enough, while for others multiple perspectives are needed. When possible, one should  allow users to explore the various perspectives and use the feedback to remove or even add more perspectives. Including a feedback loop in graphical representation is important not only for tailoring the visuals to users' needs but also for managing their expectations,  respectively of learning what works and what doesn't.

Comments:
1) I used GBP to NOK XRs to provide an example based on  triangulation.
2) Some experts advise against using borders or grid lines. Borders, as the name indicates allow to delimitate between various areas, while grid lines allow to make comparisons within a section without needing to sway between broader areas, adding thus precision to our senses-making. Choosing grey as color for the elements from the background minimizes the overhead for coping with more information while allowing to better use the available space.
3) Trend lines are recommended where the number of points is relatively small and only one series is involved, though, as always, there are exceptions too. 
4) In heatmaps one can use a gradient between two colors to show the tendencies of moving toward an extreme or another. One should avoid colors like red or green.
5) Ideally, a color should be used for only one encoding (e.g. one color for the same month across all graphics), though the more elements need to be encoded, the more difficult it becomes to respect this rule. The above graphics might slightly deviate from this as the purpose is to show a representation technique. 
6) In some graphics the XRs are displayed only with two decimals because currently the technique used (visual calculations) doesn't support formatting.
7) All the above graphical elements are based on a Power BI solution. Unfortunately, the tool has its representational limitations, especially when one wants to add additional information into the plots. 
8) Unfortunately, the daily XR values are not easily available from the same source. There are special scenarios for which a daily, hourly or even minute-based analysis is needed.
9) It's a good idea to validate the results against the similar results available on the web (see the ECB website).

Previous Post <<||>> Next Post

20 April 2024

⚡️🗒️Power BI: Visual Calculations [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources.
Last updated: 20-Apr-2024

[feature] Visual Calculations (aka Visual Calcs)

  • {definition} a type of DAX calculation that's defined and executed directly on a visual [1]
  • {benefit} make it easier to create calculations that were previously hard to create
    • leads to simpler DAX, easier maintenance, and better performance [1]
    • reuse the results from its components [2]
  • still in public preview
    • ⇒ must be enabled manually
  • are stored on the visual
    • ⇒ can refer to any data in the visual 
      • incl. columns, measures, or other visual calculations [1]
    • ⇒ anything in the model must be added to the visual before the visual calculation can refer to it [1]
    •  ⇒ they can refer to the visual structure
      • ⇒ leads to more flexibility
  • combine the simplicity of context from calculated columns with the on-demand calculation flexibility from measures [1]
    • ⇐ the context is "visible"
  • operate on aggregated data instead of the detail level [1]
    • ⇒ leads to performance benefits
  • introduce a set of functions specific to visual calculations [1]
    • {category} medium-level functions
      • {function} COLLAPSE
        • the calculation is evaluated at a higher level of the axis [1]
      • {function} COLLAPSEALL
        • the calculation is evaluated at the total level of the axis [1]
      • {function} EXPAND
        • the calculation is evaluated at a lower level of the axis [1]
      • {function}EXPANDALL
        • the calculation is evaluated at the leaf level of the axis [1]
      • {function} FIRST
        • refers to the first row of an axis [1]
      • {function} ISATLEVEL
        • reports whether a specified column is present at the current level.
      • {function} LAST
        • refers to the last row of an axis [1]
      • {function} NEXT
        • refers to a next row of an axis [1]
      • {function} PREVIOUS
        • refers to a previous row of an axis [1]
      • {function} RANGE
        • refers to a slice of rows of an axis [1]
    • {category} high-level functions
      • {function} MOVINGAVERAGE
        • adds a moving average on an axis [1]
      • {function} RUNNINGSUM
        • adds a running sum on an axis  [1]
    • {category} low-level functions
      • ⇐ {exception} are available in standard DAX
      • {function} INDEX
      • {function} OFFSET
      • {function} RANK
      • {function} ROWNUMBER
      • {function}WINDOW
      • {function} ORDERBY
      • {function} PARTITIONBY
      • {function} MATCHBY
  • {default} most of them are evaluated row-by-row [1]
    • ⇐ like a calculated column
    • there's no need to add an aggregation function [1]
      •  it's better not to add such aggregates when they're not necessary [1]
  • {operation} create calculation
    • adds the visual calculation to the visual
  • {operation} hide calculation
    • calculations that aren't needed in the visual can be hidden [2]
  • {operation} copy calculation
    • copies the calculation between visuals and if intermediary steps are not there, they will be copied as well [planned] [2]
  • {feature} templates 
    • ready available calculation constructs 
    • {benefit} make it easier to write common calculations [1]
  • {feature|planned} support for Scanner API [2]
  • {parameter} axis 
    • influences how the visual calculation traverses the visual matrix [1]
    • {default} set to the first axis in the visual
    • {value} ROWS
      • the visual calculation is evaluated row-by-row in the visual matrix, from top to bottom. [1]
    • {value} COLUMNS
      • the visual calculation is evaluated row-by-row in the visual matrix, from left to right [1]
    • {value} ROWS COLUMNS
      • calculates vertically across rows from top to bottom, continuing column by column from left to right [1]
    • {value} COLUMNS ROWS
      • calculates horizontally across columns from left to right, continuing row by row from top to bottom [1]
    • {warning} not all visuals provide all axes, and some visuals provide no axes [1]
  • {parameter| reset 
    • influences if and when the function resets its value to 0 or switches to a different scope while traversing the visual matrix [1]
    • expects there to be multiple levels on the axis [1]
      • ⇐ use PARTITIONBY if there's only one level on the axis [1]
    • {value|default} NONE
      • means the visual calculation is never restarted [1]
    • {value} HIGHESTPARENT 
      • resets the calculation when the value of the highest parent on the axis changes [1]
    • {value} LOWESTPARENT 
      • resets the calculations when the value of the lowest parent on the axis changes [1]
    • {value} numerical value
      • refers to the fields on the axis, with the highest field being one [1]
  • {limitation} functions that rely on model relationships  aren't available
    • e.g. USERELATIONSHIP, RELATED or RELATEDTABLE
  • {limitation} not all visual types are supported [1]
    • ⇐ for the full list of limitations see [1]
  • {limitation} one can't filter on visual calculations [1]
  • {limitation} underlying data can't be exported [1]
  • {limitation} don't support conditional formatting

References
[1] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)
[2] SSBI Central (2024) Visual Calculations - Making DAX easier, with Jeroen ter Heerdt(link)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.