SQL Troubles: stacked chart

Showing posts with label stacked chart. Show all posts

07 February 2025

🌌🏭KQL Reloaded: First Steps (Part VII: Basic Data Visualizations)

One of the greatest aspects of KQL and its environment is that creating a chart is just one instruction away from the dataset generated in the process. Of course, the data still need to be in an appropriate form to be used as source for a visual, though the effort is minimal. Let's consider the example used in the previous post based ln the ContosoSales data, where the visualization part is everything that comes after "| render":

// visualizations by Country: various charts
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize count_customers = count_distinct(CustomerKey) by RegionCountryName
| order by count_customers desc
//| render table
//| render linechart
//| render areachart

//| render stackedchart
//| render columnchart
| render piechart
    with (xtitle="Country", ytitle="# Customers",
    title="# Customers by Country (pie chart)", legend=hidden)

Output:

# Customers by Country (various charts)

It's enough to use "render" with the chart type without specifying the additional information provided under "with", though the legend can facilitate data's understanding. Unfortunately, the available properties are relatively limited, at least for now.

Adding one more dimension is quite simple, even if the display may be sometimes confusing as there's no clear delimitation between the entities represented while the legend grows linearly with the number of points. It might be a good idea to use additional charts for the further dimensions in scope.

// visualizations by Region & Country: various charts
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize count_customers = count_distinct(CustomerKey) by ContinentName, RegionCountryName
| order by count_customers desc   
//| render stackedareachart 
//| render linechart 
//| render table 
//| render areachart 
//| render piechart
| render columnchart 
    with (xtitle="Region/Country", ytitle="# Customers",
    title="#Customers by Continent & Country", legend=hidden)

Output:

# Customers by Continent & Country (column chart)

Sometimes, it makes sense to reduce the number of values, recommendation that applies mainly to pie charts:

// visualizations by Zone: pie chart
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize count_customers = count_distinct(CustomerKey) by iif(RegionCountryName in ('United States', 'Canada'), RegionCountryName, 'Others')
| render piechart
    with (xtitle="Country", ytitle="Sales volume",
    title="Sales volume by Zone")

Output:

# Customers by Zone (pie chart)

Adding a second set of values (e.g. Total cost) allows to easily create a scatter chart:

// visualization by Occupation: scatter chart
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize count_customers = count_distinct(CustomerKey) 
    , TotalCost = sum(TotalCost) by Occupation
| order by count_customers desc
| render scatterchart 
    with (xtitle="# Customers", ytitle="Sales volume",
    title="# Customers vs Sales volume by Occupation", legend=visible )

Output:

# Customers vs Sales volume by Occupation (scatter chart)

The visualizations are pretty simple to build, though one shouldn't expect that one can build a visualization on top of any dataset, at least not without further formatting and eventually code changes. For example, considering the query from the previous post, with a small change one can use the data with a column chart, though this approach might have some limitation (e.g. it doesn't work pie charts):

// calculating percentages from totals: column chart
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
//| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize count_customers = count_distinct(CustomerKey)
    , count_customers_US = count_distinctif(CustomerKey, RegionCountryName == 'United States')
    , count_customers_CA = count_distinctif(CustomerKey, RegionCountryName == 'Canada')
    , count_customers_other = count_distinctif(CustomerKey, not(RegionCountryName in ('United States', 'Canada')))
| project Charting = "Country"
    , US = count_customers_US
    , CA = count_customers_CA
    , other = count_customers_other
| render columnchart

    with (xtitle="Region", ytitle="# Customers",
    title="# Customers by Region")

Output:

# Customers by Region (column chart)

There are a few more visuals that will be considered in a next post. Despite the relatively limited set of visuals and properties, the visualizations are useful to get a sense of data's shape, and this with a minimum of changes. Ad-hoc visualizations can help also in data modeling, validating the logic and/or identifying issues in the data when creating the queries, which makes it a great feature.

Happy coding!

Previous Post <<||>> Next Post

15 June 2024

🗒️Graphical Representation: Bar & Column Charts [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!
Last updated: 15-Jun-2024

Bar & Column Charts with Variations

Bar & Column Charts (Graphs)

{definition} graphical representation of categorical data with rectangular figures (aka boxes) whose heights (column chart) or lengths (bar chart) are proportional to the values that they represent
{benefit} allow to visually encode/decode quantitative information-size as magnitude and area based on the relative position of the end of the box along the common scale

if the width of the box is the same, it's enough to compare the length

⇒ the basis of comparison is one-dimensional [1]
⇐ orient the reader to the relative magnitudes of the boxes

area is typically encoded when the width varies

⇐ encoding by area is a poor encoding method as it can mislead

can represent negative and positive values
one of the most useful, simple, and adaptable techniques in graphic presentation [1]

easily understood by readers
sometimes avoided because they are so common
almost everything could be a bar chart

the length of each bar is proportional to the quantity or amount of each category represented [1]

⇒the zero line must be shown [1]
⇒the scale must not be broken [1]

{exception} an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it [1]

{benefit} allow to visually represent categorical data

⇒ occasionally represented without scales, grid lines or tick marks
the more data elements are presented, the more difficult it becomes to navigate and/or display the data

{benefit} allow us to easily compare magnitudes

sometimes without looking at the actual values

{type} bar chart

the box is shown horizontally
represents magnitude by length
allows comparing different items as of a specific time

{type} column chart

the box is shown vertically
represents magnitude by height
allows comparing different items over time

⇐ it still displays discrete points

recommended for comparing similar items for different time periods [2]
effective way to show most types of comparisons [2]

{subtype} stacked chart

variation of bar/column charts in which the boxes of a dimension's components are staked over each other

{exception} spaces can be used between boxes if the values aren't cumulative [3]

{benefit} allows encoding a further dimension where the values are staked within the same box
{drawback} do not show data structure well

⇒ make it challenging to compare values across boxes

{subtype} 100-percent chart

variation of stacked chart in which the magnitude totals to 100%
{benefit} allows to display part to whole relationships

⇐ preferable to circle chart's angle and area comparison [1]

{subtype} clustered chart (aka grouped chart)

variation of bar/column charts that allows encoding further quantitative information in distinct boxes tacked together which occasionally overlap

⇐ if there's space, it is usually kept to a minimum
e.g. can be used to display multiple data series

can be used with a secondary axis
{benefit} allows comparisons within the cluster/group as well between clusters/groups
{drawback} more challenging to make comparisons across points

{subtype} area chart (variable-width/variwide chart/graph)

variation of bar/column charts in which the height/width have significance being proportional to some measure or characteristics of the data elements represented [3]
{benefit} allow encoding a further dimension as part of the area

{subtype} deviation chart

variation of bar/column charts that display positive and negative values

{subtype} joined chart

variation of bar/column charts in which the boxes are tacked together
{benefit} allow to better use the space available

{subtype} paired chart

variation of bar/column charts in which the boxes are paired in mirror based on an axis

e.g. the values of one data series are displayed to the left, while the values for a second data series are displayed to the right

{benefit} allows to study the correlation and/or other relationships between the values of two data series
the hidden axes can have different scales

{subtype} circular chart (aka radial chart)

variation of bar/column charts in which the boxes are wrapped into a circle, the various categories being uniformly spaced along the radial or category axis [3]
the value scale can have any upper or lower value and can progress in either direction [3]
{benefit} useful to represent data that have a circular dimension in an aesthetic form

e.g. months, hours

{subtype} waterfall chart (aka progressing chart)

variation of bar/column charts in which the boxes are displayed progressively, the start of a box corresponding the end of the previous box
time and activity charts can be considered as variations of this subtype [3]
{advantage} allows to determine cumulative values, respectively the increase/decrease between consecutive boxes

{subtype}composite chart (aka mixed chart, combination chart, overlay chart)

variation of bar/column charts in besides boxes are used other graphic types of encoding (line, area)

⇐ the different data graphics are overlaid on one another [3]

{benefit} allows to improve clarity or highlight the relationships between several data series [3]
{drawback} overlaying can result in clutter

used to

display totals, averages or frequencies
display time series
display the relationship between two or more items
make a comparison among several items
make a comparison between parts and the whole

can be confounded with

[histograms]

show distribution through the frequency of quantitative values against defined intervals of quantitative values
used for continuous numerical data or data that can be effectively modelled as continuous
it doesn't have spaces between bars

⇐ older use of bar/column charts don't use spaces
if this aspect is ignored, histograms can be considered as a special type of area chart

[vertical line chart] (aka price chart, bar chart)

vertical line charts are sometimes referred as bar charts (see [3])

things to consider

distance between bars

the more distant the bars, the more difficult it becomes to make comparisons and the accuracy of judgment decreases

sorting

sorting the bars/columns by their size facilitates comparisons, though it can impede items' search, especially when there are many categories involved

{exception} not recommended for time series

clutter

displaying too many items in a cluster and/or too many labels can lead to clutter
{recommendation} display at maximum 3-4 clustered boxes

color

one should follow the general recommendations

trend lines

can be used especially with time series especially to represent the linear regression line

dual axis

{benefit} allows to compare the magnitudes of two data series by employing a secondary axis

overlapping

overlapping boxes can make charts easier to read

symbols

can be used to designate reference points of comparison for each of the bars [3]

{alternative} pie chart

can be used to dramatize comparisons in relation to the whole [2]
one should consider the drawbacks

{alternative} choropleth maps

more adequate for geographical dimensions
provide minimal encoding

{alternative} line charts

can be much more informative
provides an optimal dat-ink ratio
reduces the chart junk feeling

{alternative} dot plots

are closer to the original data

References:
[1] Anna C Rogers (1961) "Graphic Charts Handbook"
[2] Robert Lefferts (1981) "Elements of Graphics: How to prepare charts and graphs for effective reports"
[3] Robert L Harris (1996) "Information Graphics: A Comprehensive Illustrated Reference"

SQL Troubles

Pages

07 February 2025

🌌🏭KQL Reloaded: First Steps (Part VII: Basic Data Visualizations)

15 June 2024

🗒️Graphical Representation: Bar & Column Charts [Notes]

About Me