SQL Troubles: filters

Showing posts with label filters. Show all posts

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

Graphical Representation Series

Introduction

Creating simple charts or more complex data visualizations may appear trivial for many, though their authors shouldn't forget that readers have different backgrounds, degrees of literacy, many of them not being maybe able to make sense of graphical displays, at least not without some help.

Beginners start with a limited experience and build upon it, then, on the road to mastery, they get acquainted with the many possibilities, a deeper sense is achieved and the choices become a few. Independently of one's experience, there are seldom 'yes' and 'no' answers for the various choices, but everything is a matter of degree that varies with one's experience, available time, audience's expectations, and many more aspects might be considered in time.

The following questions are intended to expand, respectively narrow down our choices when dealing with data visualizations from a data professional's perspective. The questions are based mainly on [1] though they were extended to include a broader perspective.

General Questions

Where does the data come from? Is the source reliable, representative (for the whole population in scope)? Is the data source certified? Are yhe data actual?

Are there better (usable) sources? What's the effort to consider them? Does the data overlap? To what degree? Are there any benefits in merging the data? How much this changes the overall picture? Are the changes (in trends) explainable?

Was the data collected? How, from where, and using what method? [1] What methodology/approach was used?

What's the dataset about? Can one recognize the data, the (data) entities, respectively the structures behind? How big is the fact table (in terms of rows and columns)? How many dimensions are in scope?

What transformations, calculations or modifications have been applied? What was left out and what's the overall impact?

Any significant assumptions were made? [1] Were the assumptions clearly stated? Are they entitled? Is it more to them?

Were any transformation applied? Do the transformations change any data characteristics? Were they adequately documented/explained? Do they make sense? Was it something important left out? What's the overall impact?

What criteria were used to include/exclude data from the display? [1] Are the criteria adequately explained/documented? Do they make sense?

Are similar data publicly available? Is it (freely) accessible/usable? To what degree? How much do the datasets overlap? Is there any benefit to analyze/use the respective data? Are the characteristics comparable? To what degree?

Dataviz Questions

What's the title/subtitle of the chart? Is it meaningful for the readers? Does the title reflect the data, respectively the findings adequately? Can it be better formulated? Is it an eye-catcher? Does it meet the expectations?

What data is shown? Of what type? At what level is the data aggregated?

What chart (type) is being used? [1] Are the readers familiar with the chart type? Does it needs further introduction/clarifications? Are there better means to represent the data? Does the chart offer the appropriate perspective? Does it make sense to offer different (complementary) perspective(s)? To what degree other perspectives help?

What items of data do the marks represent? What value associations do the attributes represent? [1] Are the marks visible? Are the marks adequately presented (e.g. due to missing data)?

What range of values are displayed? [1] What approximation the values support? To what degree can the values be rounded without losing meaning?

Is the data categorical, ordinal or continuous?

Are the axes property chosen/displayed/labeled? Is the scale properly chosen (linear, semilogarithmic, logarithmic), respectively displayed? Do they emphasize, diminish, distort, simplify, or clutter the information?

What features (shapes, patterns, differences or connections) are observable, interesting or vital for understanding the chart? [1]

Where are the largest, mid-sized and smallest values? (aka ‘stepped magnitude’ judgements). [1]

Where lie the most/least values? Where is the average or normal? (aka ‘global comparison’ judgements)” [1] How are the values distributed? Are there any outliers present? Are they explainable?

What features are expected or unexpected? [1] To what degree are they unexpected?

What features are important given the subject? [1]

What shapes and patterns strike readers as being semantically aligned with the subject? [1]

What is the overall feeling when looking at the final result? Is the chart overcrowded? Can anything be left out/included?

What colors were used? [1] Are the colors adequately chosen, respectively meaningful? Do they follow the general recommendations?

What colors, patterns, forms do readers see first? What impressions come next, respectively last longer?

Are the various elements adequately/intuitively positioned/distinguishable? What's the degree of overlapping/proximity? Do the elements respect an intuitive hierarchy? Do they match readers' expectations, respectively the best practices in scope? Are the deviations entitled?

Is the space properly used? To what degree? Are there major gaps?

Know Your Audience

What audience targets the visualization? Which are its characteristics (level of experience with data visualizations; authors, experts or casual attendees)? Are there any accidental attendees? How likely is the audience to pay attention?

What is audience’s relationship with the subject matter? What knowledge do they have or, conversely, lack about the subject? What assistance might they need to interpret the meaning of the subject? Do they have the capacity to comprehend what it means to them? [1]

Why do the audience wants/needs to understand the topic? Are they familiar, respectively actively interested or more passive? Is it able to grasp the intended meaning? [1] To what degree? What kind of challenges might be involved, of what nature?

What is their motivation? Do they have a direct, expressed need or are they more passive and indifferent? Is it needed a way to persuade them or even seduce them to engage? [1] Can this be done without distorting the data and its meaning(s)?

What are their visualization literacy skill set? Do they require assistance perceiving the chart(s)? Are they sufficiently comfortable with operating features of interactivity? Do they have any visual accessibility issues (e.g. red–green color blindness)? Do they need to be (re)factored into the design? [1]

Reflections

What has been learnt? Has it reinforced or challenged existing knowledge? [1] Was new knowledge gained? How valuable is this knowledge? Can it be reused? In which contexts?

Do the findings meet one's expectations? To what degree? Were the expectations entitled? On what basis? What's missing? What's gaps' relevance?

What feelings have been stirred? Has the experience had an impact emotionally? [1] To what degree? Is the impact positive/negative? Is the reaction entitled/explainable? Are there any factors that distorted the reactions? Are they explainable? Do they make sense?

What does one do with this understanding? Is it just knowledge acquired or something to inspire action (e.g. making a decision or motivating a change in behavior)? [1] How relevant/valuable is the information for us? Can it be used/misused? To what degree?

Are the data and its representation trustworthy? [1] To what degree?

Previous Post <<||>> Next Post

References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019

08 February 2025

🌌🏭KQL Reloaded: First Steps (Part IX: Translating SQL to KQL - More Joins)

The last post exemplified the use of "explain" to translate queries from SQL to KQL. The current post attempts to test the feature based on the various join constructs available in SQL by using the NewSales and Products tables. The post presumes that the reader as a basic understanding of the join types from SQL-based environments.

Inner Join

// full join in SQL
--
explain
SELECT NewSales.CustomerKey
, NewSales.ProductKey
, Products.ProductName
FROM NewSales 
     JOIN Products 
      ON NewSales.ProductKey = Products.ProductKey 

// full join in KQL
NewSales
| join kind=inner (
    Products
    | project ProductKey, ProductName 
    )
    on ($left.ProductKey == $right.ProductKey)
| project CustomerKey, ProductKey, ProductName
| limit 10

A full join is probably the most used type of join given that fact tables presume the existence of dimensions, even if poor data warehousing design can lead also to exception. The join retrieves all the data matching from both tables, including the eventual duplicates from both sides of the join.

Left Join

// left join in SQL
--
explain
SELECT NewSales.CustomerKey
, NewSales.ProductKey
, Products.ProductName
FROM NewSales 
     LEFT JOIN Products 
      ON NewSales.ProductKey = Products.ProductKey 

// left join in KQL
NewSales
| join kind=leftouter (
    Products
    | project ProductKey

        , Product = ProductName 
    )
    on ($left.ProductKey == $right.ProductKey)
| where isnull(Product) 
| project CustomerKey
    , ProductKey
    , ProductName
| limit 10

A left join retrieves all the records from the left table, typically the fact table, independently whether records were found in the dimension table. One can check whether mismatches exist by retrieving the records where no match was found.

Right Join

// right join in SQL
--
explain
SELECT NewSales.CustomerKey
, Products.ProductKey
, Products.ProductName
FROM NewSales 
     RIGHT JOIN Products 
      ON NewSales.ProductKey = Products.ProductKey 

// right join in KQL
NewSales
| join kind=rightouter (
    Products
    | project DimProductKey = ProductKey
    , DimProductName = ProductName 
    )
    on ($left.ProductKey == $right.DimProductKey)
| where isnull(ProductKey) 
| project CustomerKey
    , DimProductKey
    , DimProductName
| limit 10

A right join retrieves the records from the dimension together with the matches from the fact table, independently whether a match was found in the fact table.

Full Outer Join

// full outer join in SQL
--
explain
SELECT NewSales.CustomerKey
, Coalesce(NewSales.ProductKey, Products.ProductKey) ProductKey
, Coalesce(NewSales.ProductName, Products.ProductName) ProductName
FROM NewSales 
     FULL OUTER JOIN Products 
      ON NewSales.ProductKey = Products.ProductKey 


// full outer join in KQL
NewSales
| join kind=fullouter (
    Products
    | project DimProductKey = ProductKey
    , DimProductName = ProductName 
    )
    on ($left.ProductKey == $right.DimProductKey)
//| where isnull(ProductKey) 
| project CustomerKey
    , ProductKey = coalesce(ProductKey, DimProductKey)
    , ProductName = coalesce(ProductName, DimProductName)
| limit 10

A full outer join retrieves all the data from both sides of the join independently on whether a match is found. In RDBMS this type of join performs poorly especially when further joins are considered, respectively when many records are involved on both sides of the join. Therefore it should be avoided when possible, though in many cases it might be the only feasible solution. There are also alternatives that involve a UNION between a LEFT JOIN and a RIGHT JOIN, the letter retrieving only the records which is not found in the fact table (see last query from a previous post). This can be a feasible solution when data sharding is involved.

Notes:
1) If one ignores the unnecessary logic introduced by the translation via explain, the tool is excellent for learning KQL. It would be interesting to understand why the tool used a certain complex translation over another, especially when there's a performance benefit in the use of a certain piece of code.
2) Also in SQL-based queries it's recommended to start with the fact table, respectively with the table having the highest cardinality and/or the lowest level of detail, though the database engine might find an optimal plan independently of which table was written first.

Happy coding!

Previous Post <<||>> Next Post

🌌🏭KQL Reloaded: First Steps (Part VIII: Translating SQL to KQL - Full Joins)

One of the great features of KQL is the possibility of translating SQL code to KQL via the "explain" keyword, allowing thus to port SQL code to KQL, respectively help translate knowledge from one programming language to another.

Let's start with a basic example:

// transform SQL to KQL code (to be run only the first part from --)
--
explain
SELECT top(10) CustomerKey, FirstName, LastName, CityName, CompanyName 
FROM Customers 
ORDER BY CityName DESC

// output: translated KQL code 
Customers
| project CustomerKey, FirstName, LastName, CityName, CompanyName
| sort by CityName desc nulls first
| take int(10)

The most interesting part of the translation is how "explain" translate joins from SQL to KQL. Let's start with a FULL JOIN from the set of patterns considered in a previous post on SQL joins:

--
explain
SELECT CST.CustomerKey
, CST.FirstName + ' ' + CST.LastName CustomerName
, Cast(SAL.DateKey as Date) DateKey
, SAL.TotalCost
FROM NewSales SAL
    JOIN Customers CST
      ON SAL.CustomerKey = CST.CustomerKey 
WHERE SAL.DateKey > '20240101' AND SAL.DateKey < '20240201'
ORDER BY CustomerName, DateKey, TotalCost DESC

And, here's the translation:

// translated code
NewSales
| project-rename ['SAL.DateKey']=DateKey
| join kind=inner (Customers
| project-rename ['CST.CustomerKey']=CustomerKey
    , ['CST.CityName']=CityName
    , ['CST.CompanyName']=CompanyName
    , ['CST.ContinentName']=ContinentName
    , ['CST.Education']=Education
    , ['CST.FirstName']=FirstName
    , ['CST.Gender']=Gender
    , ['CST.LastName']=LastName
    , ['CST.MaritalStatus']=MaritalStatus
    , ['CST.Occupation']=Occupation
    , ['CST.RegionCountryName']=RegionCountryName
    , ['CST.StateProvinceName']=StateProvinceName) 
    on ($left.CustomerKey == $right.['CST.CustomerKey'])
| where ((['SAL.DateKey'] > todatetime("20240101")) 
    and (['SAL.DateKey'] < todatetime("20240201")))
| project ['CST.CustomerKey']

    , CustomerName=__sql_add(__sql_add(['CST.FirstName']
    , " "), ['CST.LastName'])
    , DateKey=['SAL.DateKey']
    , TotalCost
| sort by CustomerName asc nulls first
    , DateKey asc nulls first
    , TotalCost desc nulls first
| project-rename CustomerKey=['CST.CustomerKey']

The code was slightly formatted to facilitated its reading. Unfortunately, the tool doesn't work well with table aliases, introduces also all the fields available from the dimension table, which can become a nightmare for the big dimension tables, the concatenation seems strange, and if one looks deeper, further issues can be identified. So, the challenge is how to write a query in SQL so it can minimize the further changed in QKL.

Probably, one approach is to write the backbone of the query in SQL and add the further logic after translation.

--
explain
SELECT NewSales.CustomerKey
, NewSales.DateKey 
, NewSales.TotalCost
FROM NewSales 
    INNER JOIN Customers 
      ON NewSales.CustomerKey = Customers.CustomerKey 
WHERE DateKey > '20240101' AND DateKey < '20240201'
ORDER BY NewSales.CustomerKey
, NewSales.DateKey

And the translation looks simpler:

// transformed query
NewSales
| join kind=inner 
(Customers
| project-rename ['Customers.CustomerKey']=CustomerKey
    , ['Customers.CityName']=CityName
    , ['Customers.CompanyName']=CompanyName
    , ['Customers.ContinentName']=ContinentName
    , ['Customers.Education']=Education
    , ['Customers.FirstName']=FirstName
    , ['Customers.Gender']=Gender
    , ['Customers.LastName']=LastName
    , ['Customers.MaritalStatus']=MaritalStatus
    , ['Customers.Occupation']=Occupation
    , ['Customers.RegionCountryName']=RegionCountryName
    , ['Customers.StateProvinceName']=StateProvinceName) 
    on ($left.CustomerKey == $right.['Customers.CustomerKey'])
| where ((DateKey > todatetime("20240101")) 
    and (DateKey < todatetime("20240201")))
| project CustomerKey, DateKey, TotalCost
| sort by CustomerKey asc nulls first
, DateKey asc nulls first

I would have written the query as follows:

// transformed final query
NewSales
| where (DateKey > todatetime("20240101")) 
    and (DateKey < todatetime("20240201"))
| join kind=inner (
    Customers
    | project CustomerKey
        , FirstName
        , LastName 
    ) on $left.CustomerKey == $right.CustomerKey
| project CustomerKey
    , CustomerName = strcat(FirstName, ' ', LastName)
    , DateKey
    , TotalCost
| sort by CustomerName asc nulls first
    , DateKey asc nulls first

So, it makes sense to create the backbone of a query, translate it to KQL via explain, remove the unnecessary columns and formatting, respectively add what's missing. Once the patterns were mastered, there's probably no need to use the translation tool, but could prove to be also some exceptions. Anyway, the translation tool helps considerably in learning. Big kudos for the development team!

Notes:
1) The above queries ignore the fact that Customer information is available also in the NewSales table, making thus the joins obsolete. The joins were considered only for exemplification purposes. Similar joins might be still needed for checking the "quality" of the data (e.g. for dimensions that change over time). Even if such "surprises" shouldn't appear by design, real life designs continue to surprise...
2) Queries should be less verbose by design (aka simplicity by design)! The more unnecessary code is added, the higher the chances for errors to be overseen, respectively the more time is needed to understand and validated the queries!

Happy coding!

Previous Post <<||>> Next Post

References
[1] Microsoft Lear n (2024) Azure: Query data using T-SQL [link]

19 April 2024

⚡️Power BI: Preparatory Steps for Creating a Power BI Report

When creating a Power BI report consider the following steps when starting the actual work. The first five steps can be saved to a "template" that can be reused as starting point for each report.

Step 0: Check Power BI Desktop's version

Check whether you have the latest version, otherwise you can download it from the Microsoft website.
Given that most of the documentation, books and other resources are in English, it might be a good idea to install the English version.

Step 1: Enable the recommended options

File >> Options and settings >> Options >> Global >> Data Load:
.>> Time intelligence >> Auto date/time for new files >> (uncheck)
.>> Regional settings >> Application language >> set to English (United States)
.>> Regional settings >> Model language >> set to English (United States)

You can consider upon case also the following options (e.g. when the relationships are more complex than the feature can handle):
File >> Options and settings >> Options >> Current >> Data load:
.>> Relationship >> Import relationships from data sources on first load >> (uncheck)
.>> Relationship >> Autodetect new relationships after data is loaded >> (uncheck)

Step 2: Enable the options needed by the report

For example, you can enable visual calculations:
File >> Options and settings >> Options >> Preview features >> Visual calculations >> (check)

Comment:
Given that not all preview features are stable enough, instead of activating several features at once, it might be a good idea to do it individually and test first whether they work as expected.

Step 3: Add a table for managing the measures

Add a new table (e.g. "dummy" with one column "OK"):

Results = ROW("dummy", "OK")

Add a dummy measure that could be deleted later when there's at least one other measure:
Test = ""

Hide the "OK" column and with this the table is moved to the top. The measures can be further organized within folders for easier maintenance.

Step 4: Add the Calendar if time analysis is needed

Add a new table (e.g. "Calendar" with a "Date" column):

Calendar = Calendar(Date(Year(Today()-3*365),1,1),Date(Year(Today()+1*365),12,31))

Add the columns:

Year = Year('Calendar'[Date])
YearQuarter = 'Calendar'[Year] & "-Q" & 'Calendar'[Quarter]
Quarter = Quarter('Calendar'[Date])
QuarterName = "Q" & Quarter('Calendar'[Date])
Month = Month('Calendar'[Date])
MonthName = FORMAT('Calendar'[Date], "mmm")

Even if errors appear (as the columns aren't listed in the order of their dependencies), create first all the columns. Format the Date in a standard format (e.g. dd-mmm-yy) including for Date/Time for which the Time is not needed.

To get the values in the visual sorted by the MonthName:
Table view >> (select MonthName) >> Column tools >> Sort by column >> (select Month)

To get the values in the visual sorted by the QuarterName:
Table view >> (select QuarterName) >> Column tools >> Sort by column >> (select Quarter)

With these changes the filter could look like this:

Step 5: Add the corporate/personal theme

Consider using a corporate/personal theme at this stage. Without this the volume of work that needs to be done later can increase considerably.

There are also themes generators, e.g. see powerbitips.com, a tool that simplifies the process of creating complex theme files. The tool is free however, users can save their theme files via a subscription service.

Set canvas settings (e.g. 1080 x 1920 pixels).

Step 6: Get the data

Consider the appropriate connectors for getting the data into the report.

Step 7: Set/Validate the relationships

Check whether the relationships between tables set by default are correct, respectively set the relationships accordingly.

Step 8: Optimize the data model

Look for ways to optimize the data model.

Step 9: Apply the formatting

Format numeric values to represent their precision accordingly.
Format the dates in a standard format (e.g. "dd-mmm-yy") including for Date/Time for which the Time is not needed.

The formatting needs to be considered for the fields, measures and metrics added later as well.

Step 10: Define the filters

Identify the filters that will be used more likely in pages and use the Sync slicers to synchronize the filters between pages, when appropriate:
View >> Sync slicers >> (select Page name) >> (check Synch) >> (check Visible)

Step 11: Add the visuals

At least for report's validation, consider using a visual that holds the detail data as represented in the other visuals on the page. Besides the fact that it allows users to validate the report, it also provides transparence, which facilitates report's adoption.

25 March 2011

🧭Business Intelligence: Troubleshooting (Part II: Approaching a Query)

Business Intelligence Series

Introduction

You received a (long) query for troubleshooting, reviewing, conversion or any similar tasks. In addition, you don’t know much about the underlying table structure or business logic. So, what do you do then? For sure two things are intuitively clear: you don’t need to panic and, understanding the query may help you in your task. Understanding the query, it seems such a simple statement, though there is more to it. Here are some points on how to approach a query.

State your problem

“A problem well stated is a problem half solved” (Charles F. Kettering). Before performing any work, check what’s requested from you, whether you are having the information required for the task(s) ahead, for example documentation, valid examples, all code, etc. If something is missing, don’t hesitate to request all the information you need. While waiting for information, you can continue with next steps. As we don’t live in a perfect world, there will be also cases in which you’ll have to fill the gaps by yourself by performing additional research/work. When troubleshooting is important to understand what’s wrong and, when possible, have data against which to compare query’s output.

Save the work

Even if you are having a copy of the query somewhere on the server, save the previous version of the query and, when possible, use versioning. It might seem a redundant task, however the fact is that you never know when you need to refer to it and, as you’ll see next, it can/should be used as a baseline for validating the changes. In case you haven’t saved the query, check whether your RDBMS is tracking metadata about the queries run and, if the metadata were not reset in the meantime, you might be lucky enough to find a copy of your query.

I found that is important to save the daily work, the various analysis performed in order to understand a query, the various versions and even the data used for testing. All this work could help you letter to review what you made, the steps you missed, you can reuse one of the queries for further work, etc.

Break down

When the query is too complex, it could be useful to break the query into chunks that could be run and understood in isolation. Typically such chunks derive from query’s structure (e.g. inline queries, subqueries derived from unions). I found that often, focusing only a chunk of a query help isolating issues.

Restructure

Many programmers still write queries using the old non-ANSI joining syntax in which the join constraints appear in the WHERE clause, making the understanding and troubleshooting of a query more difficult. Often I found myself in the position of transforming first a query to ANSI SQL syntax, before performing further work on it. It’s actually a good occasion to gain a first understanding of query’s structure, but I’d prefer not to do it so often. In addition, during restructure phase it makes sense to differentiate between the join and filter constraints, this helping isolating the issue(s).

Check cardinalities

Wrong join constraints lead to duplicates or fewer records than expected, such differences being difficult to track when the variances in the numbers of records are quite small. Even if RDBMS come in developers’ help by providing metadata about the join relations, the columns and predicates participating in a join are not always so easy to identify. Therefore, in order to address this issue, it’s needed to check the constraints between any two tables between participating in a join. Sometimes, when the query is based on the table with the lowest level of detail, it can be enough to check the variations of the number of records.

Check filter constraints

Filter constraints are maybe more difficult to identify, especially when is needed to reengineer the logic built in applications. Many of the filter constraints are logical, though when you have no documentation about the schemas, is like rambling in the dark, having to check real examples and identify the various values and the impact they have on the behavior of your report.

Validate changes

So, you made the changes, everything looks perfect. Is it so? Often your intuition might tell you that the logic of a query is correct, though as software is not based on magic, at least not all the time, check some of the records to assure that the data are rendered as expected, check totals, compare the current with previous version, identify variations, etc. You don’t need to use all the technique you know, but to choose the best and minimal set of tools that allows you to validate the query.

Perform refactoring

Refactoring, the way to (continuous) code improvement, should become part of each developer’s philosophy about programming. A query, as any other piece of code, is rarely perfect as technical and factual knowledge is relative, features get deprecated and new techniques are introduced. On the other side, there is an old saying in IT – don’t change something that’s already working, so, there should be kept a balance between the two – the apparent and needed for change.

Document

I hope it’s not the case to stress the importance of documentation. From versioning to logic description, it’s a good practice to document the important parts of your work, especially the parts that will facilitate later work.

Previous Post <<||>> Next Post

01 February 2010

🧭Business Intelligence: Enterprise Reporting (Part VI: Filtering)

Business Intelligence Series

Generalities

Reports are often based on data sets ranging from a few thousands to millions and even billions of records, though the Users are interested most of the times in only smaller chunks of data that could be retrieved with the help of well-designed filters – controls that allow the input/selection of values applied for retrieving the needed data from the initial data set. In a reporting solution the filters are usually available in a screen of their own, called Filter, Input, Parameter or Prompt screen, the provided values being fetched into the query/queries on which the report is based on, or the queries are constructed dynamically.

Most of the values used in query constraints can be considered as parameters whose values are provided by filters; it’s even recommended to use filters instead of hard-coding values, improving thus reports’ flexibility. There are optional and mandatory parameters, in the later case a value needing to be provided in the corresponding filter, therefore such filter controls should be marked correspondingly. The actual values, especially in case of date values, should be validated before they are fetched as early as possible and should be avoided the not necessary roundtrips to the server.

In case of reports requested by Users the filters are typically specified beforehand, though in time other Users arrive to use the same report for other purposes, requesting additional filters in order to select the needed chunk of data. Adding a new filter of modifying an existing one requires a change to be done to report too, requiring in the end the availability of technical resources to develop, test and document the changes. Such gradual changes arrive to be quite expensive, therefore it should be targeted to capture and address the filtering requirements early in the design phase.

A first problem related to filters is to identify the attributes more likely to be used by Users for filtering, though that’s not so easy as it seems because this requires a good understanding of the data and business as a whole. On the other side could be provided a filter for each attribute from the report, much like Excel lets Users do, though also this approach could be expensive as performance and costs, many attributes arriving to be never used, and overloads unnecessarily the filter screen, in some cases having a direct impact on report’s performance.

Talking from experience the focus for filtering is typically on Master Data (e.g. Customers, Vendors, Items, etc.), Document Numbers (e.g. PO Number, Invoice Number), Document Dates (e.g. Order Date, Invoice Date) or any other Dates important for the business, list of values (e.g. Statuses), and occasionally amounts or quantities, though maybe 50-70% of the attributes from a report match these criteria, the average report having between 20 to 40 attributes. The list of attributes used for filtering could be different for reports based on the same data, attribute’s selection depending on report’s type and level of detail. For example in case of a PO report it would be useful to filter for the PO Number, Vendor, Buyer, Item Number, Open/Closed Pos, etc.; in exchange in summary reports it might be needed to select the top x vendors based on the amounts spent or to select the POs with the price variation in a certain interval, etc.

A second problem is how to make use of the existing controls (text boxes, dropdowns, check boxes, radio buttons, mask controls, etc.) in order to provide Users easy-to-use and flexible filtering. The fact is that an inappropriate use of controls could decrease considerably reports’ performance, making a report even unusable. There is actually no big philosophy about how to choose the controls, this being depending mainly or attributes’ data type, of weather they are numeric, alphanumeric, date or boolean, and whether one or multi selection of values is required. List boxes, dropdowns, checkboxes and even textboxes, when the values are comma delimited, allow filtering on multiple of values, another technique for the same purpose being the use of wildcards.

Alphanumeric Attributes

As a general guideline for alphanumeric (text) values (e.g. PO Number, Serial Number) that have high value variability, it’s preferable to use a free entry text box, which could allow in addition the use of wildcards. For alphanumeric value with relatively small variability it’s preferable to use a (multi-selection) dropdown control or controls with similar features (e.g. list box control, text box + list box); such controls are populated with values from database each time the screen is loaded, fact that doesn’t recommend their use for huge list of values. Typically if the list of values have more than 100-200 values then it should be definitely considered the use of text boxes, eventually with the use of search capabilities or, given the fact that the rate of change of such lists can be ignored, the list could be cached.

Date Attributes

For date attributes it’s advisable to use a From Date/To Date combination of controls, which allow Users to limit report’s scope for a certain time interval. If one of the values is missing, then there will be a limit imposed only for the specified value. Date’s format could be enforced with the help of a mask control, which allow the selection of valid dates, or at least specify the expected format.

Numeric Attributes

For numeric values the normal text boxes would do when masked controls can’t be used, the former allowing using wildcards for the attributes that typically could be also alphanumeric (e.g. document numbers). For numeric intervals should be used a pair of controls for the upper and lower bound, having similar behavior as the date intervals. It is needed to differentiate between the various types of decimal delimiters, therefore it’s recommended to specify the delimiter used or at least transform in the background the numeric value to the format expected by database/tool. The use of controls that allow multi-selection of numeric values, excepting delimiter separated values provided in text boxes, doesn’t really makes sense unless the list of possible it’s quite small.

Boolean Attributes

Boolean data types are used to store the value of attributes that have false or true state, and eventually Null values, therefore checkboxes and radio buttons could be used for this purpose. The bi-state could be easily implemented only with a checkbox when is interested only one of the aspects, while two or three checkboxes can be used for selecting more than one value. On the other side radio buttons allow the selection of only one value, thus select either the false, true or missing values. Such filters could be also used to activate hardcoded complex logic based on multiple attributes and reduced to Boolean values.

Previous Post <<||>> Next Post

SQL Troubles

Pages

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

08 February 2025

🌌🏭KQL Reloaded: First Steps (Part IX: Translating SQL to KQL - More Joins)

🌌🏭KQL Reloaded: First Steps (Part VIII: Translating SQL to KQL - Full Joins)

19 April 2024

⚡️Power BI: Preparatory Steps for Creating a Power BI Report

25 March 2011

🧭Business Intelligence: Troubleshooting (Part II: Approaching a Query)

01 February 2010

🧭Business Intelligence: Enterprise Reporting (Part VI: Filtering)

About Me