31 January 2010

Reports’ Filtering Internal Kitchen

    Reports are often based on data sets ranging from a few thousands to millions and even billions of records, though the Users are interested most of the times in only smaller chunks of data that could be retrieved with the help of well-designed filters – controls that allow the input/selection of values applied for retrieving the needed data from the initial data set. In a reporting solution the filters are usually available in a screen of their own, called Filter, Input, Parameter or Prompt screen, the provided values being fetched into the query/queries on which the report is based on, or the queries are constructed dynamically.

    Most of the values used in query constraints can be considered as parameters whose values are provided by filters; it’s even recommended to use filters instead of hard-coding values, improving thus reports’ flexibility. There are optional and mandatory parameters, in the later case a value needing to be provided in the corresponding filter, therefore such filter controls should be marked correspondingly. The actual values, especially in case of date values, should be validated before they are fetched as early as possible and should be avoided the not necessary roundtrips to the server.

    In case of reports requested by Users the filters are typically specified beforehand, though in time other Users arrive to use the same report for other purposes, requesting additional filters in order to select the needed chunk of data. Adding a new filter of modifying an existing one requires a change to be done to report too, requiring in the end the availability of technical resources to develop, test and document the changes. Such gradual changes arrive to be quite expensive, therefore it should be targeted to capture and address the filtering requirements early in the design phase. A first problem related to filters is to identify the attributes more likely to be used by Users for filtering, though that’s not so easy as it seems because this requires a good understanding of the data and business as a whole. On the other side could be provided a filter for each attribute from the report, much like Excel lets Users do, though also this approach could be expensive as performance and costs, many attributes arriving to be never used, and overloads unnecessarily the filter screen, in some cases having a direct impact on report’s performance.

    Talking from experience the focus for filtering is typically on Master Data (e.g. Customers, Vendors, Items, etc.), Document Numbers (e.g. PO Number, Invoice Number), Document Dates (e.g. Order Date, Invoice Date) or any other Dates important for the business, list of values (e.g. Statuses), and occasionally amounts or quantities, though maybe 50-70% of the attributes from a report match these criteria, the average report having between 20 to 40 attributes. The list of attributes used for filtering could be different for reports based on the same data, attribute’s selection depending on report’s type and level of detail. For example in case of a PO report it would be useful to filter for the PO Number, Vendor, Buyer, Item Number, Open/Closed Pos, etc.; in exchange in summary reports it might be needed to select the top x vendors based on the amounts spent or to select the POs with the price variation in a certain interval, etc.

    A second problem is how to make use of the existing controls (text boxes, dropdowns, check boxes, radio buttons, mask controls, etc.) in order to provide Users easy-to-use and flexible filtering. The fact is that an inappropriate use of controls could decrease considerably reports’ performance, making a report even unusable. There is actually no big philosophy about how to choose the controls, this being depending mainly or attributes’ data type, of weather they are numeric, alphanumeric, date or boolean, and whether one or multi selection of values is required. List boxes, dropdowns, checkboxes and even textboxes, when the values are comma delimited, allow filtering on multiple of values, another technique for the same purpose being the use of wildcards.

Alphanumeric Attributes

    As a general guideline for alphanumeric (text) values (e.g. PO Number, Serial Number) that have high value variability, it’s preferable to use a free entry text box, which could allow in addition the use of wildcards. For alphanumeric value with relatively small variability it’s preferable to use a (multi-selection) dropdown control or controls with similar features (e.g. list box control, text box + list box); such controls are populated with values from database each time the screen is loaded, fact that doesn’t recommend their use for huge list of values. Typically if the list of values have more than 100-200 values then it should be definitely considered the use of text boxes, eventually with the use of search capabilities or, given the fact that the rate of change of such lists can be ignored, the list could be cached.

Date Attributes

    For date attributes it’s advisable to use a From Date/To Date combination of controls, which allow Users to limit report’s scope for a certain time interval. If one of the values is missing, then there will be a limit imposed only for the specified value. Date’s format could be enforced with the help of a mask control, which allow the selection of valid dates, or at least specify the expected format.

Numeric Attributes

    For numeric values the normal text boxes would do when masked controls can’t be used, the former allowing using wildcards for the attributes that typically could be also alphanumeric (e.g. document numbers). For numeric intervals should be used a pair of controls for the upper and lower bound, having similar behavior as the date intervals. It is needed to differentiate between the various types of decimal delimiters, therefore it’s recommended to specify the delimiter used or at least transform in the background the numeric value to the format expected by database/tool. The use of controls that allow multi-selection of numeric values, excepting delimiter separated values provided in text boxes, doesn’t really makes sense unless the list of possible it’s quite small.

Boolean Attributes

    Boolean data types are used to store the value of attributes that have false or true state, and eventually Null values, therefore checkboxes and radio buttons could be used for this purpose. The bi-state could be easily implemented only with a checkbox when is interested only one of the aspects, while two or three checkboxes can be used for selecting more than one value. On the other side radio buttons allow the selection of only one value, thus select either the false, true or missing values. Such filters could be also used to activate hardcoded complex logic based on multiple attributes and reduced to Boolean values.

No comments: