SQL Troubles: matrices

Showing posts with label matrices. Show all posts

29 April 2024

⚡️Power BI: Working with Visual Calculations (Part III: Matrix Tables with Square Numbers as Example)

Introduction

In the previous post I exemplified various operations that can be performed with visual calculations on simple tables based on square numbers. Changing the simple table to a matrix table doesn't bring any benefit. The real benefit comes when one restructures the table to store only a cell per row in a table.

Data Modelling

For this the Magic5 table can be transformed via the following code, which creates a second table (e.g. M5):

M5 = UNION (
    SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C1]
     , "Col", "C1"
    )
    , SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C2]
     , "Col", "C2"
    )
    , SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C3]
     , "Col", "C3"
    )
    ,  SUMMARIZECOLUMNS(
     Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C4]
     , "Col", "C4"
    )
    , SUMMARIZECOLUMNS(
      Magic5[Id]
     , Magic5[R]
     , Magic5[Index]
     , Magic5[C5]
     , "Col", "C5"
    )
)

Once this done, one can add the column [Col] as values for the matrix in a new visual. From now on, all the calculations can be done on copies of this visual.

Simple Operations

The behavior of the RUNNINGSUM and other functions is different when applied on a matrix table because the formula is applied to every cell of the N*N table, a column with the result being added for each existing column of the matrix.

Moreover, there are four different ways of applying the formula based on the Axis used. ROW calculates the formula by the row within a column:

Run SumByRow(C) = RUNNINGSUM([C], ROWS)

Output:

R	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)
R1	18	18	25	25	2	2	9	9	11	11
R2	4	22	6	31	13	15	20	29	22	33
R3	15	37	17	48	24	39	1	30	8	41
R4	21	58	3	51	10	49	12	42	19	60
R5	7	65	14	65	16	65	23	65	5	65

By providing COLUMNS as parameter for the Axis makes the calculation run by the column within a row:

Run SumByCol(C) = RUNNINGSUM([C], COLUMNS)

Output:

R	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)
R1	18	18	25	43	2	45	9	54	11	65
R2	4	4	6	10	13	23	20	43	22	65
R3	15	15	17	32	24	56	1	57	8	65
R4	21	21	3	24	10	34	12	46	19	65
R5	7	7	14	21	16	37	23	60	5	65

By providing ROW COLUMNS as parameter for the Axis makes the calculation run by the column and then continuing the next column (without resetting the value at the end of the column):

Run SumByRow-Col(C) = RUNNINGSUM([C],ROWS COLUMNS)

Output:

R	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)
R1	18	18	25	90	2	132	9	204	11	271
R2	4	22	6	96	13	145	20	224	22	293
R3	15	37	17	113	24	169	1	225	8	301
R4	21	58	3	116	10	179	12	237	19	320
R5	7	65	14	130	16	195	23	260	5	325

By providing COLUMNS ROWS as parameter for the Axis makes the calculation run by the row and then continuing the next row (without resetting the value at the end of the column):

Run SumByCol-Row = RUNNINGSUM([C],COLUMNS ROWS)

Output:

R	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)	C	Run Sum(C)
R1	18	18	25	43	2	45	9	54	11	65
R2	4	69	6	75	13	88	20	108	22	130
R3	15	145	17	162	24	186	1	187	8	195
R4	21	216	3	219	10	229	12	241	19	260
R5	7	267	14	281	16	297	23	320	5	325

Ranking

RANK can be applied independent of the values, or considering the value with ASC or DESC sorting:

RankByRow = RANK(DENSE,ROWS) -- ranking by row independent of values
RankByRow ASC = RANK(DENSE,ROWS, ORDERBY([C],ASC)) -- ranking by row ascending
RankByRow DESC = RANK(DENSE,ROWS, ORDERBY([C], DESC)) -- ranking by row descending
RankByRow-Col ASC = RANK(DENSE,ROWS COLUMNS, ORDERBY([C],ASC)) -- ranking by row columns ascending
RankByRow-Col DESC = RANK(DENSE,ROWS COLUMNS, ORDERBY([C], DESC)) -- ranking by row columns ascending

[RankByRow-Col ASC] matches the actual numbers from the matrix and is thus useful when sorting any numbers accordingly.

Differences

Differences can be calculated between any of the cells of the matrix:

DiffToPrevByRow = [C] - PREVIOUS([C])  -- difference to previous record
DiffToPrevByRow* = IF(NOT(IsBlank(PREVIOUS([C]))), [C] - PREVIOUS([C])) -- extended difference to previous record
DiffToPrevByRow-Col = [C] - PREVIOUS([C],, ROWS COLUMNS) -- difference to previous record by ROWS COLUMNS
DiffToFirstByRow = [C] - FIRST([C]) -- difference to first record
DiffToPrevByCol = [C] - FIRST([C], COLUMNS) -- difference to previous record COLUMNS

Ranking = RANK(DENSE, ROWS COLUMNS, ORDERBY([C], ASC)) -- ranking of values by ROWS COLUMNS
OffsetDiffToPrevByRow = [C] - calculate([C], OFFSET(1, ROWS, ORDERBY([Ranking],DESC))) -- difference to the previous record by ROW
OffsetDiffToPrevByRow-Col = [C] - calculate([C], OFFSET(1, ROWS COLUMNS, ORDERBY([Ranking],DESC))) -- difference to the previous record by ROW

Ranking has been introduced to facilitate the calculations based on OFFSET.

The other functions [1] can be applied similarly.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)

⚡️Power BI: Working with Visual Calculations (Part II: Simple Tables with Square Numbers as Example) 🆕

Introduction

The records behind a visual can be mentally represented as a matrix, the visual calculations allowing to tap into this structure intuitively and simplify many of the visualizations used. After a general test drive of the functionality, it makes sense to dive deeper into the topic to understand more about the limitations, functions behavior and what it takes to fill the gaps. This post focuses on simple tables, following in a next post to focus on matrices and a few other topics.

For exemplification, it makes sense to use a simple set of small numbers that are easy to work with, and magic squares seem to match this profile. A magic square is a matrix of positive sequential numbers in which each row, each column, and both main diagonals are the same [1]. Thus, a square of order N has N*N numbers from 1 to N*N, the non-trivial case being order 3. However, from the case of non-trivial squares, the one of order 5 provides a low order and allows hopefully the minimum needed for exemplification:

18	25	2	9	11
4	6	13	20	22
15	17	24	1	8
21	3	10	12	19
7	14	16	23	5
1	7	13	19	25

Data Modeling

One magic square should be enough to exemplify the various operations, though for testing purposes it makes sense to have a few more squares readily available. Each square has an [Id], [C1] to [C5] corresponds to matrix's columns, while [R] stores a row identifier which allows to sort the values the way they are stored in the matrix:

let
    Source = #table({"Id","C1","C2","C3","C4","C5","R"}
, {
{1,18,25,2,9,11,"R1"},
{1,4,6,13,20,22,"R2"},
{1,15,17,24,1,8,"R3"},
{1,21,3,10,12,19,"R4"},
{1,7,14,16,23,5,"R5"},
{2,1,7,13,19,25,"R1"},
{2,14,20,21,2,5,"R2"},
{2,22,3,9,15,16,"R3"},
{2,10,11,17,23,4,"R4"},
{2,18,24,5,6,12,"R5"},
{3,1,2,22,25,15,"R1"},
{3,9,10,16,11,19,"R2"},
{3,17,23,13,5,7,"R3"},
{3,24,12,6,20,3,"R4"},
{3,14,18,8,4,21,"R5"},
{4,22,6,3,18,16,"R1"},
{4,4,14,11,15,21,"R2"},
{4,5,8,12,23,17,"R3"},
{4,25,13,19,7,1,"R4"},
{4,9,24,20,2,10,"R5"},
{5,5,9,20,25,6,"R1"},
{5,13,15,2,11,24,"R2"},
{5,21,1,23,3,17,"R3"},
{5,19,18,4,14,10,"R4"},
{5,7,22,16,12,8,"R5"}
}
),
    #"Changed Type to Number" = Table.TransformColumnTypes(Source,{{"C1", Int64.Type}, {"C2", Int64.Type}, {"C3", Int64.Type}, {"C4", Int64.Type}, {"C5", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type to Number",{{"Id", Order.Ascending}, {"R", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type)
in
    #"Added Index"

The column names and the row identifier could have been numeric values from 1 to 5, though it could have been confounded with the actual numeric values.

In addition, the columns [C1] to [C5] were formatted as integers and an index was added after sorting the values after [Id] and [R]. Copy the above code as a Blank Query in Power BI and change the name to Magic5.

Prerequisites

For the further steps you'll need to enable visual calculations in Power BI Developer via:
File >> Options and settings >> Options >> Preview features >> Visual calculations >> (check)

Into a Table visual drag and drop [R], [C1] to [C5] as column and make sure that the records are sorted ascending by [R]. To select only a square, add a filter based on the [Id] and select the first square. Use further copies of this visual for further tests.

Some basic notions of Algebra are recommended but not a must. If you worked with formulas in Excel, then you are set to go.

In Mathematics a matrix starts from the top left side and one moves on the rows (e.g. 18, 25, 2, ...) and then on the columns. With a few exceptions in which the reference is based on the latest value from a series (see Exchange rates), this is the direction that will be followed.

Basic Operations

Same as in Excel [C1] + [C2] creates a third column in the matrix that stores the sum of the two. The sum can be further applies to all the columns:

Sum(C) = [C1] + [C2] + [C3] + [C4] + [C5] -- sum of all columns (should amount to 65)

The column can be called "Sum", "Sum(C)" or any other allowed unique name, though the names should be meaningful, useful, and succinct, when possible.

Similarly, one can work with constants, linear or nonlinear transformations (each formula is a distinct calculation):

constant = 1 -- constant value
linear = 2*[C1] + 1 -- linear translation: 2*x+1

linear2 = 2*[C1] + [constant] -- linear translation: 2*x+1
quadratic = Power([C1],2) + 2*[C1] + 1 -- quadratic translation: x^2+2*x+1
quadratic2 = Power([C1],2) + [linear] -- quadratic translation: x^2+2*x+1

Output:

R	C1	constant	linear	linear2	quadratic	quadratic2
R1	18	1	37	37	361	361
R2	4	1	9	9	25	25
R3	15	1	31	31	256	256
R4	21	1	43	43	484	484
R5	7	1	15	15	64	64

Please note that the output was duplicated in Excel (instead of making screenshots).

Similarly, can be build any type of formulas based on one or more columns.

With a simple trick, one can use DAX functions like SUMX, PRODUCTX, MINX or MAXX as well:

Sum2(C) = SUMX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- sum of all columns

Prod(C) = PRODUCTX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- product of all columns

Avg(C) = AVERAGEX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- average of all columns

Min(C) = MINX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- minimum value of all columns

Max(C) = MAXX({[C1], [C2], [C3], [C4], [C5]}, [Value]) -- maximum value of all columns
Count(C) = COUNTX({[C1], [C2], [C3], [C4], [C5]},[Value]) -- counts the number of columns

Output:

C1	C2	C3	C4	C5	Sum(C)	Avg(C)	Prod(C)	Min(C)	Max(C)	Count(C)
18	25	2	9	11	65	13	89100	2	25	5
4	6	13	20	22	65	13	137280	4	22	5
15	17	24	1	8	65	13	48960	1	24	5
21	3	10	12	19	65	13	143640	3	21	5
7	14	16	23	5	65	13	180320	5	23	5

Unfortunately, currently there seems to be no way available for applying such calculations without referencing the individual columns.

Working across Rows

ROWNUMBER and RANK allow to rank a cell within a column independently, respectively dependently of its value:

Ranking = ROWNUMBER() -- returns the rank in the column (independently of the value)
RankA(C) = RANK(DENSE, ORDERBY([C1], ASC)) -- ranking of the value (ascending)

RankD(C) = RANK(DENSE, ORDERBY([C1], DESC)) -- ranking of the value (descending)

Output:

R	C1	Ranking	RankA(C)	RankD(C)
R1	18	1	4	2
R2	4	2	1	5
R3	15	3	3	3
R4	21	4	5	1
R5	7	5	2	4

PREVIOUS, NEXT, LAST and FIRST allow to refer to the values of other cells within the same column:

Prev(C) = PREVIOUS([C1]) -- previous cell
Next(C) = NEXT([C1])  -- next cell

First(C) = FIRST([C1]) -- first cell
Last(C) = LAST([C1]) -- last cell

Output:

R	C1	Prev(C)	NextC)	First(C)	Last(C)
R1	18		4	18	7
R2	4	18	15	18	7
R3	15	4	21	18	7
R4	21	15	7	18	7
R5	7	21		18	7

OFFSET is a generalization of these functions

offset(2) = calculate([C1], offset(2)) -- 
offset(-2) = calculate([C1], offset(-2))
Ind = ROWNUMBER() -- index
inverse = calculate([C1], offset(6-2*[Ind])) -- inversing the values based on index

Output:

R	C1	offset(2)	offset(-2)	ind	inverse
R1	18	15		1	7
R2	4	21		2	21
R3	15	7	18	3	15
R4	21		4	4	4
R5	7		15	5	18

The same functions allow to calculate the differences for consecutive values:

DiffToPrev(C) = [C1] - PREVIOUS([C1]) -- difference to previous 
DiffToNext(C) = [C1] - PREVIOUS([C1]) -- difference to next 
DiffTtoFirst(C) = [C1] - FIRST([C1]) -- difference to first
DiffToLast(C) = [C1] - LAST([C1]) -- difference to last

Output:

R	C1	DiffToPrev(C)	DiffToNextC)	DiffToFirst(C)	DiffToLast(C)
R1	18	18	14	0	11
R2	4	-14	-11	-14	-3
R3	15	11	-6	-3	8
R4	21	6	14	3	14
R5	7	-14	7	-11	0

DAX makes available several functions for working across the rows of the same column. Two of the useful functions are RUNNINGSUM and MOVINGAVERAGE:

Run Sum(C) = RUNNINGSUM([C1]) -- running sum

Moving Avg3(C) = MOVINGAVERAGE([C1], 3) -- moving average for the past 3 values

Moving Avg2(C) = MOVINGAVERAGE([C1], 2) -- moving average for the past 2 values

Unfortunately, one can use only the default sorting of the table with the functions that don't support the ORDERBY parameter. Therefore, when the table needs to be sorted descending and the RUNNINGSUM calculated ascending, for the moment there's no solution to achieve this behavior. However, it appears that Microsoft is planning to implement a solution for this issue.

RUNNINGSUM together with ROWNUMBER can be used to calculate a running average:

Run Avg(C) = DIVIDE(RUNNINGSUM([C1]), ROWNUMBER()) -- running average

Output:

R	C1	Run Sum(C)	Moving Avg3(C)	Moving Avg2(C)	Run Avg(C)
R1	18	18	18	18	18
R2	4	22	11	11	11
R3	15	37	12.33	9.5	12.33
R4	21	58	13.33	18	14.5
R5	7	65	14.33	14	13

With a mathematical trick that allows to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see the solution in SQL), one can run the PRODUCT across rows, though the values must be small enough to allow their multiplication without running into issues:

Ln(C) = IFERROR(LN([C1]), Blank()) -- applying the natural logarithm
Sum(Ln(C)) = RUNNINGSUM([Ln(C)]) -- running sum
Run Prod(C) = IF(NOT(ISBLANK([Sum(Ln(C))])), Exp([Sum(Ln(C))])) -- product across rows

Output:

R	C1	Ln(C)	Sum(Ln(C))	Run Prod(C)
R1	18	2.89	2.89	18
R2	4	1.39	4.28	72
R3	15	2.71	6.98	1080
R4	21	3.04	10.03	22680
R5	7	1.95	11.98	158760

These three calculations could be brought into a single formula, though the result could be more difficult to troubleshoot. The test via IsBlank is necessary because otherwise the exponential for the total raises an error.

Considering that when traversing a column it's enough to remember the previous value, one can build MIN and MAX functionality across a column:

Run Min = IF(OR(Previous([C1]) > [C1], IsBlank(Previous([C1]))), [C1], Previous([C1])) -- minimum value across rows
Run Max = IF(OR(Previous([C1]) < [C1], IsBlank(Previous([C1]))), [C1], Previous([C1])) -- maximum across rows

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Wikipedia (2024) Magic Squares (online)
[2] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)

25 April 2024

📊Graphical Representation: Graphics We Live By (Part III: Exchange Rates in Power BI)

Graphical Representation Series

An exchange rate (XR) is the rate at which one currency will be exchanged for another currency, and thus XRs are used in everything related to trades, several processes in Finance relying on them. There are various sources for the XR like the European Central Bank (ECB) that provide the row data and various analyses including graphical representations varying in complexity. Conversely, XRs' processing offers some opportunities for learning techniques for data visualization.

On ECB there are monthly, yearly, daily and biannually XRs from EUR to the various currencies which by triangulation allow to create XRs for any of the currencies involved. If N currencies are involved for one time unit in the process (e.g. N-1 XRs) , the triangulation generates NxN values for only one time division, the result being tedious to navigate. A matrix like the one below facilitates identifying the value between any of the currencies:

The table needs to be multiplied by 12, the number of months, respectively by the number of years, and filter allowing to navigate the data as needed. For many operations is just needed to look use the EX for a given time division. There are however operations in which is needed to have a deeper understanding of one or more XR's evolution over time (e.g. GBP to NOK).

Moreover, for some operations is enough to work with two decimals, while for others one needs to use up to 6 or even more decimals for each XR. Occasionally, one can compromise and use 3 decimals, which should be enough for most of the scenarios. Making sense of such numbers is not easy for most of us, especially when is needed to compare at first sight values across multiple columns. Summary tables can help:

Statistics like Min. (minimum), Max. (maximum), Max. - Min. (range), Avg. (average) or even StdDev. (standard deviation) can provide some basis for further analysis, while sparklines are ideal for showing trends over a time interval (e.g. months).

Usually, a heatmap helps to some degree to navigate the data, especially when there's a plot associated with it:

In this case filtering by column in the heatmap allows to see how an XR changed for the same month over the years, while the trendline allows to identify the overall tendency (which is sensitive to the number of years considered). Showing tendencies or patterns for the same month over several years complements the yearly perspective shown via sparklines.

Fortunately, there are techniques to reduce the representational complexity of such numbers. For example, one can use as basis the XRs for January (see Base Jan), and represent the other XRs only as differences from the respective XR. Thus, in the below table for February is shown the XR difference between February and January (13.32-13.22=0.10). The column for January is zero and could be omitted, though it can still be useful in further calculations (e.g. in the calculation of averages) based on the respective data..

This technique works when the variations are relatively small (e.g. the values vary around 0). The above plots show the respective differences for the whole year, respectively only for four months. Given a bigger sequence (e.g. 24, 28 months) one can attempt to use the same technique, though there's a point beyond which it becomes difficult to make sense of the results. One can also use the year end XR or even the yearly average for the same, though it adds unnecessary complexity to the calculations when the values for the whole year aren't available.

Usually, it's recommended to show only 3-5 series in a plot, as one can better distinguish the trends. However, plotting all series allows to grasp the overall pattern, if any. Thus, in the first plot is not important to identify the individual series but to see their tendencies. The two perspectives can be aggregated into one plot obtained by applying different filtering.

Of course, a similar perspective can be obtained by looking at the whole XRs:

The Max.-Min. and StdDev (standard deviation for population) between the last and previous tables must match.

Certain operations require comparing the trends of two currencies. The first plot shows the evolution NOK and SEK in respect to EUR, while the second shows only the differences between the two XRs:

The first plot will show different values when performed against other currency (e.g. USD), however the second plot will look similarly, even if the points deviate slightly:

Another important difference is the one between monthly and yearly XRs, difference depicted by the below plot:

The value differences between the two XR types can have considerable impact on reporting. Therefore, one must reflect in analyses the rate type used in the actual process.

Attempting to project data into the future can require complex techniques, however, sometimes is enough to highlight a probable area, which depends also on the confidence interval (e.g. 85%) and the forecast length (e.g. 10 months):

Every perspective into the data tends to provide something new that helps in sense-making. For some users the first table with flexible filtering (e.g. time unit, currency type, currency from/to) is enough, while for others multiple perspectives are needed. When possible, one should allow users to explore the various perspectives and use the feedback to remove or even add more perspectives. Including a feedback loop in graphical representation is important not only for tailoring the visuals to users' needs but also for managing their expectations, respectively of learning what works and what doesn't.

Comments:
1) I used GBP to NOK XRs to provide an example based on triangulation.
2) Some experts advise against using borders or grid lines. Borders, as the name indicates allow to delimitate between various areas, while grid lines allow to make comparisons within a section without needing to sway between broader areas, adding thus precision to our senses-making. Choosing grey as color for the elements from the background minimizes the overhead for coping with more information while allowing to better use the available space.
3) Trend lines are recommended where the number of points is relatively small and only one series is involved, though, as always, there are exceptions too.
4) In heatmaps one can use a gradient between two colors to show the tendencies of moving toward an extreme or another. One should avoid colors like red or green.
5) Ideally, a color should be used for only one encoding (e.g. one color for the same month across all graphics), though the more elements need to be encoded, the more difficult it becomes to respect this rule. The above graphics might slightly deviate from this as the purpose is to show a representation technique.
6) In some graphics the XRs are displayed only with two decimals because currently the technique used (visual calculations) doesn't support formatting.
7) All the above graphical elements are based on a Power BI solution. Unfortunately, the tool has its representational limitations, especially when one wants to add additional information into the plots.
8) Unfortunately, the daily XR values are not easily available from the same source. There are special scenarios for which a daily, hourly or even minute-based analysis is needed.
9) It's a good idea to validate the results against the similar results available on the web (see the ECB website).

Previous Post <<||>> Next Post

19 April 2024

⚡️🗒️Power BI: Working with Visual Calculations (Part I: Test Drive) 🆕

Introduction

I recently watched a webcast with Jeroen (Jay) ter Heerdt (see [2]) in which he introduces visual calculations, a type of DAX calculation that's defined and executed directly on a visual [1]. Visual calculations provide an approach of treating a set of data much like an Excel table, allowing to refer to any field available on a visual and write formulas, which simplifies considerably the solutions used currently for ranking records, running averages and other windowing functions.

The records behind a visual can be mentally represented as a matrix, while the visual calculations can refer to any column from the matrix, allowing to add new columns and include the respective columns in further calculations. Moreover, if a column is used in a formula, it's not recalculated as is the case of measures, which should improve the performance of DAX formulas considerably.

Currently, one can copy a formula between visuals and if the formula contains fields not available in the targeted visual, they are added as well. Conversely, it's possible to build such a visual, copy it and then replace the dimension on which the analysis is made (e.g. Customer with Product), without being needed to make further changes. Unfortunately, there are also downsides: (1) the calculations are visible only within the visual in which were defined; (2) currently, the visual's data can't be exported if a visual calculation is added; (3) no formatting is supported, etc.

Ranking and Differences

I started to build a solution based on publicly available sales data, which offers a good basis for testing the use of visual calculations. Based on a Power BI visual table made of [Customer Name], [Sales Amount], [Revenue] and [Total Discount], I've added several calculations:

-- percentages
Sales % = 100*DIVIDE([Sales Amount], COLLAPSE([Sales Amount], ROWS))
Revenue % = 100*DIVIDE([Revenue],[Sales Amount])
Discount % = 100*DIVIDE([Total  Discount], [Total  Discount]+[Sales Amount])

-- rankings 
Rank Sales = Rank(DENSE, ORDERBY([Sales Amount], DESC))
Rank Revenue = Rank(DENSE, ORDERBY([Revenue], DESC))

-- differences between consecutive values
Diff. to Prev. Sales = IF([Rank Sales]>1, INDEX([Rank Sales]-1, , ORDERBY([Sales Amount], DESC)) - [Sales Amount] , BLANK())
Diff. to Prev. Rev. = IF([Rank Revenue]>1, INDEX([Rank Revenue]-1, , ORDERBY([Revenue], DESC)) - [Revenue] , BLANK())

Here's the output considered only for the first 10 records sorted by [Sales Amount]:

Customer Name	Sales Amount	Sales %	Revenue	Revenue %	Total Discount	Discount %	Rank Sales	Diff. to Prev. Sales.	Rank Rev.	Diff. to Prev. Rev.
Medline	1058923.78	3.76	307761.99	3.75	126601.02	10.68	1		1
Ei	707663.21	2.51	229866.98	2.8	95124.09	11.85	2	351260.57	2	77895.01
Elorac, Corp	702911.91	2.49	209078.76	2.55	83192.39	10.58	3	4751.3	6	20788.22
Sundial	694918.98	2.47	213362.1	2.6	78401.72	10.14	4	7992.93	4	-4283.34
OUR Ltd	691687.4	2.45	196396.26	2.4	78732.2	10.22	5	3231.58	10	16965.84
Eminence Corp	681612.78	2.42	213002.78	2.6	86904.03	11.31	6	10074.62	5	-16606.52
Apotheca, Ltd	667283.99	2.37	157435.56	1.92	101453.91	13.2	7	14328.79	31	55567.22
Rochester Ltd	662943.9	2.35	224918.2	2.74	81158.11	10.91	8	4340.09	3	-67482.64
ETUDE Ltd	658370.48	2.34	205432.79	2.51	89322.72	11.95	9	4573.42	9	19485.41
Llorens Ltd	646779.31	2.29	206567.4	2.52	82897.59	11.36	10	11591.17	8	-1134.61

Comments:
1) One could use [Total Amount] = [Total Discount]+[Sales Amount] as a separate column.
2) The [Rank Sales] is different from the [Rank Rev.] because of the discount applied.
3) In the last two formulas a blank was considered for the first item from the ranking.
4) It's not possible to control when the totals should be displayed, however one can change the color for the not needed total to match the background.

Visualizing Differences

Once the formulas are added, one can hide the basis columns and visualize the data as needed. To obtain the below chart I copied the visual and changed the column as follows:

Diff. to Prev. Rev. = IF([Rank Revenue]>1, [Revenue]- INDEX([Rank Revenue]-1, , ORDERBY([Revenue], DESC)) , [Revenue]) -- modified column

Comments:

1) Instead of showing the full revenue, the chart shows only the differences from the highest revenue, where the column in green is the highest revenue, while the columns in red are the differences of the current customer's revenue to the previous customer, as the data are sorted by the highest revenue. At least in this case it results in a lower data-ink ratio (see Tufte).

2) The values are sorted by the [Revenue] descending.

3) Unfortunately, it's not possible to change the names from the legend.

Simple Moving Averages (SMAs)

Based on the [Sales Amount], [Revenue] and [Month] one can add the following DAX formulas to the table for calculating the SMA:

Sales Amount (SMA) = MOVINGAVERAGE([Sales Amount],6)
Revenue (SMA) = MOVINGAVERAGE([Revenue],6)

The chart becomes:

Comments:

1) Unfortunately, the formula can't project the values into the feature, at least not without the proper dates.
2) "Show items with not data" feature seems to be disabled when visual calculations are used.

3) The SMA was created via a template formula. Similarly, calculating a running sum is reduced to applying a formula:

Running Sales Amount = RUNNINGSUM([Sales Amount])

Wrap Up

It's easier to start with a table for the visual, construct the needed formulas and then use the proper visual while eliminating the not needed fields.

The feature is still in public preview and changes can still occur. Unfortunately, there's still no information available on the general availability date. From the first tests, it provides considerable power with a minimum of effort, which is great! I don't want to think how long I would have needed to obtain the same results without it!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Power BI: Using visual calculations [preview] (link)
[2] SSBI Central (2024) Visual Calculations - Making DAX easier, with Jeroen ter Heerdt (link)

04 March 2016

♜Strategic Management: Risk Matrix (Definitions)

"A graph that compares the likelihood and severity of risks from highest to lowest." (Annetta Cortez & Bob Yehling, "The Complete Idiot's Guide® To Risk Management", 2010)

"A common way to determine whether a risk is considered low, moderate, or high by combining the two dimensions of a risk: its probability of occurrence and its impact on objectives if it occurs." (Cynthia Stackpole, "PMP Certification All-in-One For Dummies", 2011)

"A grid for mapping the probability of each risk occurrence and its impact on project objectives if that risk occurs. " (Project Management Institute, "The Standard for Portfolio Management" 3rd Ed., 2012)

"A graphical representation of impact versus likelihood used to assist in the prioritisation of risks" (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

[impact matrix:] "A method for assigning values to expected pressures from the macro-environment in order for an organisation to assess the future nature of its context for which it must design an effective strategy." (Duncan Angwin & Stephen Cummings, "The Strategy Pathfinder" 3rd Ed., 2017)

04 April 2012

🚧Project Management: RACI Matrix

"A two-dimensional table that lists tasks or deliverables as the row headings, and roles (or people’s names) as the column headings. The cell data contains the responsibility by task and by role (or person): R = Responsible; A = Accountable; C = Consulted; I = Informed." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A matrix used to describe classes of involved parties and their impact on a situation by role." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A table used to document tasks and the personnel responsible for the assignments. RACI stands for responsible, accountable, consulted, and informed." (Weiss, "Auditing IT Infrastructures for Compliance" 2nd Ed., 2015)

"A matrix describing the participation by various roles in completing tasks or deliverables for a project or process. It is especially useful in clarifying roles and responsibilities. RACI is an acronym derived from the four key responsibilities most typically used: Responsible, Accountable, Consulted, and Informed." (ISTQB)

11 November 2011

📉Graphical Representation: Matrices (Just the Quotes)

"The problem that still remains to be solved is that of the orderable matrix, that needs the use of imagination […] When the two components of a data table are orderable, the normal construction is the orderable matrix. Its permutations show the analogy and the complementary nature that exist between the algorithmic treatments and the graphical treatments." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"The square has always had a no-nonsense sort of image. Stable, solid, and - well - square. Perhaps that's why it is the shape used in business visuals in those rare cases where a visual is even bothered with. Flip through most business books and you'll find precious few places for your eye to stop and your visual brain to engage. But when you do, the shape of the graphic, chart, matrix, table, or diagram is certainly square. It's a comfortable shape, which makes it a valuable implement in your kit of visual communication tools." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"Characterizing a two-dimensional scatterplot is relatively easy, particularly with the full range of recently developed graphical enhancements at hand. However, standard patterns to watch for in three-dimensional plots are not as well understood as they are in many two-dimensional plots. We can certainly look for very general characteristics like curvature in three-dimensional plots, but it may not be clear how or if the curvature itself should be characterized. It is also possible to obtain useful insights into higher-dimensional scatterplots, but for the most part their interpretation must rely on lower-dimensional constructions. Similar statements apply to scatterplot matrices and various linked plots." (R Dennis Cook, "Regression Graphics: Ideas for Studying Regressions through Graphics", 1998)

"The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot, shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary." (Michael Friendly, "Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data", 199)

"Two types of graphic organizers are commonly used for comparison: the Venn diagram and the comparison matrix [...] the Venn diagram provides students with a visual display of the similarities and differences between two items. The similarities between elements are listed in the intersection between the two circles. The differences are listed in the parts of each circle that do not intersect. Ideally, a new Venn diagram should be completed for each characteristic so that students can easily see how similar and different the elements are for each characteristic used in the comparison." (Robert J. Marzano et al, "Classroom Instruction that Works: Research-based strategies for increasing student achievement, 2001)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"One big advantage of parallel coordinate plots over scatterplot matrices. (i.e., the matrix of scatterplots of all variable pairs) is that parallel coordinate plots need less space to plot the same amount of data. On the other hand, parallel coordinate plots with p variables show only p - 1 adjacencies. However, adjacent variables reveal most of the information in a parallel coordinate plot. Reordering variables in a parallel coordinate plot is therefore essential." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Whereas charts generally focus on a trend or comparison, tables organize data for the reader to scan. Tables present data in an easy-read-format, or matrix. Tables arrange data in columns or rows so readers can make side-by-side comparisons. Tables work for many situations because they convey large amounts of data and have several variables for each item. Tables allow the reader to focus quickly on a specific item by scanning the matrix or to compare multiple items by scanning the rows or columns." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"With further similarities to small multiples, heatmaps enable us to perform rapid pattern matching to detect the order and hierarchy of different quantitative values across a matrix of categorical combinations. The use of a color scheme with decreasing saturation or increasing lightness helps create the sense of data magnitude ranking." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"One problem for visualizing multiple views is that of laying out the plots. Indeed, there are some plots, such as scatterplot matrixes and trellis displays, that are formed just by arranging simpler plots according to certain rules. Scatterplot matrices, for example, arrange scatterplots side by side so that each variable in a dataset is graphed against the other variables, with the graphs being displayed as a row or a column of the matrix. This lets the user rapidly inspect all of the bivariate relationships among the variables, permitting the detection of outliers, nonlinearities, and other features of the data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A useful way to think about tables and graphics is to visualize layers. Just as photographic files may be manipulated in photo editing software using layers, data presentations are constructed by imagining that layers of an image are placed one on top of another. There are three general layers that apply to visual data presentations: (a) a frame that is typically a rectangle or matrix, (b) axes and coordinate systems (for graphics), and (c) data presented as numbers or geometric objects." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"A heatmap is a visualization where values contained in a matrix are represented as colors or color saturation. Heatmaps are great for visualizing multivariate data" (data in which analysis is based on more than two variables per observation), where categorical variables are placed in the rows and columns and a numerical or categorical variable is represented as colors or color saturation." (Mario Döbler & Tim Großmann, "The Data Visualization Workshop", 2nd Ed., 2020)

SQL Troubles

Pages

29 April 2024

⚡️Power BI: Working with Visual Calculations (Part III: Matrix Tables with Square Numbers as Example)

⚡️Power BI: Working with Visual Calculations (Part II: Simple Tables with Square Numbers as Example) 🆕

25 April 2024

📊Graphical Representation: Graphics We Live By (Part III: Exchange Rates in Power BI)

19 April 2024

⚡️🗒️Power BI: Working with Visual Calculations (Part I: Test Drive) 🆕

04 March 2016

♜Strategic Management: Risk Matrix (Definitions)

04 April 2012

🚧Project Management: RACI Matrix

11 November 2011

📉Graphical Representation: Matrices (Just the Quotes)

About Me