A Software Engineer and data professional's blog on SQL, data, databases, data architectures, data management, programming, Software Engineering, Project Management, ERP implementation and other IT related topics.
Pages
- 🏠Home
- 🗃️Definitions
- 🔢SQL Server
- 🎞️SQL Server: VoD
- 🏭Fabric
- 🎞️Fabric: VoD
- ⚡Power BI
- 🎞️Power BI: VoD
- 📚Data
- 📚Engineering
- 📚Management
- 📚SQL Server
- 🎞️D365: VoD
- 📚Systems Thinking
- ✂...Quotes
- 🧾D365: GL
- 💸D365: AP
- 💰D365: AR
- 🏠D365: FA
- 👥D365: HR
- ⛓️D365: SCM
- 🔤Acronyms
- 🪢Experts
- 🔠Dataviz & BI
- 🔠D365
- 🔠Fabric
- 🔠Engineering
- 🔠Management
- 🔡Glossary
- 🌐Resources
- 🏺Dataviz
- 🗺️Social
- 📅Events
- ℹ️ About
03 February 2010
💎SQL Reloaded: Ways of Looking at Data III
🕋Data Warehousing: Data Mart [DM] (Definitions)
💎SQL Reloaded: Just In CASE I (Introduction)
CASE combined with other operators and functions enable to create powerful transformations, many of the transformations done during ETL processing could be solved with a CASE. The following SQL script based on Production.Product table from AdventureWorks database relies heavily on the use of CASE function in order to transform the data to a form intelligible for the Users or any other consumers.
02 February 2010
🕋Data Warehousing: Data Warehouse [DWH] (Definitions)
💎SQL Reloaded: Ways of Looking at Data II
Following PO Header’s example and logical partitioning , could be created a view based only on PO Details and the child tables, though as in this case there are no child tables – does it makes sense to create a view? Yes, why not?! The logic encapsulated by a view doesn’t resume only to joining tables, but also to provide a restrained list of attributes, computed columns or limit the scope by using constraints.
The alternative for partitioned view is to create an extended view that include PO Header and PO Details, such type of view is really easy to use by most of the users. I prefer to use both approaches, having thus more flexibility.
The * (ALL) has been used only in order to minimize query’s length, as a good practice avoid using the “*” and provide instead only the list of attributes needed!
Other types of reports are showing the data at different aggregations, for example calculating the Total Amount spent for each Vendor/Product, the Outstanding Amount, etc.
01 February 2010
🕋Data Warehousing: Cube (Definitions)
"A subset of data, usually constructed from a data warehouse, that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. A cube's data is stored in one or more partitions." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)
"Name for a dimensional structure on a multidimensional or online analytical processing (OLAP) database platform, originally referring to the simple three-dimension case of product, market, and time." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed, 2002)
"Proprietary data structure used to store data for an online analytical processing (OLAP) end user data access and analysis tool." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)
"A multidimensional data structure that represents the intersections of each unique combination of dimensions. At each intersection there is a cell that contains a data value." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)
"Used with online analytical processing (OLAP), data cubes are multidimensional structures built from one or more tables in a relational database(s)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)
"A multidimensional structure that contains dimensions and measures." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)
"A multidimensional structure that contains dimensions and measures. Cubes are a denormalized version of either the entire database or part of the database and are used within SQL Server Analysis Services (SSAS)." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)
"A set of data that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)
"A database object that organizes data for accessibility in an OLAP database." (Ken Withee, "Microsoft® Business Intelligence For Dummies®", 2010)
"A multi-dimensional data structure that contains an aggregate value at each point, i.e., the result of applying an aggregate function to an underlying relation. Data cubes are used to implement OLAP." (DAMA International, "The DAMA Dictionary of Data Management", 2011)
"Refers to the multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value in the data cube is based on the x-, y-, and z-axes of the cube. Data cubes are static (must be created before they are used), so they cannot be created by an ad hoc query." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)
"A set of data that is organized and summarized into a multidimensional structure that is defined by a set of dimensions and measures." (Microsoft, "SQL Server 2012 Glossary", 2012)
"A multidimensional representation of data needed for online analytical processing, multidimensional reporting, or multidimensional planning applications." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)
"Cubes, also known as OLAP cubes, are preprocessed and presummarized collections of data that drastically improve query time. [...] OLAP cubes are logical structures as defined by the metadata." (Piethein Strengholt, "Data Management at Scale", 2020)
🕋Data Warehousing: Data Warehousing (Definitions)
🧭Business Intelligence: Enterprise Reporting (Part VI: Filtering)
|  | 
| Business Intelligence Series | 
Most of the values used in query constraints can be considered as parameters whose values are provided by filters; it’s even recommended to use filters instead of hard-coding values, improving thus reports’ flexibility. There are optional and mandatory parameters, in the later case a value needing to be provided in the corresponding filter, therefore such filter controls should be marked correspondingly. The actual values, especially in case of date values, should be validated before they are fetched as early as possible and should be avoided the not necessary roundtrips to the server.
In case of reports requested by Users the filters are typically specified beforehand, though in time other Users arrive to use the same report for other purposes, requesting additional filters in order to select the needed chunk of data. Adding a new filter of modifying an existing one requires a change to be done to report too, requiring in the end the availability of technical resources to develop, test and document the changes. Such gradual changes arrive to be quite expensive, therefore it should be targeted to capture and address the filtering requirements early in the design phase.
Talking from experience the focus for filtering is typically on Master Data (e.g. Customers, Vendors, Items, etc.), Document Numbers (e.g. PO Number, Invoice Number), Document Dates (e.g. Order Date, Invoice Date) or any other Dates important for the business, list of values (e.g. Statuses), and occasionally amounts or quantities, though maybe 50-70% of the attributes from a report match these criteria, the average report having between 20 to 40 attributes. The list of attributes used for filtering could be different for reports based on the same data, attribute’s selection depending on report’s type and level of detail. For example in case of a PO report it would be useful to filter for the PO Number, Vendor, Buyer, Item Number, Open/Closed Pos, etc.; in exchange in summary reports it might be needed to select the top x vendors based on the amounts spent or to select the POs with the price variation in a certain interval, etc.
A second problem is how to make use of the existing controls (text boxes, dropdowns, check boxes, radio buttons, mask controls, etc.) in order to provide Users easy-to-use and flexible filtering. The fact is that an inappropriate use of controls could decrease considerably reports’ performance, making a report even unusable. There is actually no big philosophy about how to choose the controls, this being depending mainly or attributes’ data type, of weather they are numeric, alphanumeric, date or boolean, and whether one or multi selection of values is required. List boxes, dropdowns, checkboxes and even textboxes, when the values are comma delimited, allow filtering on multiple of values, another technique for the same purpose being the use of wildcards.
Alphanumeric Attributes
As a general guideline for alphanumeric (text) values (e.g. PO Number, Serial Number) that have high value variability, it’s preferable to use a free entry text box, which could allow in addition the use of wildcards. For alphanumeric value with relatively small variability it’s preferable to use a (multi-selection) dropdown control or controls with similar features (e.g. list box control, text box + list box); such controls are populated with values from database each time the screen is loaded, fact that doesn’t recommend their use for huge list of values. Typically if the list of values have more than 100-200 values then it should be definitely considered the use of text boxes, eventually with the use of search capabilities or, given the fact that the rate of change of such lists can be ignored, the list could be cached.
Date Attributes
For date attributes it’s advisable to use a From Date/To Date combination of controls, which allow Users to limit report’s scope for a certain time interval. If one of the values is missing, then there will be a limit imposed only for the specified value. Date’s format could be enforced with the help of a mask control, which allow the selection of valid dates, or at least specify the expected format.
Numeric Attributes
For numeric values the normal text boxes would do when masked controls can’t be used, the former allowing using wildcards for the attributes that typically could be also alphanumeric (e.g. document numbers). For numeric intervals should be used a pair of controls for the upper and lower bound, having similar behavior as the date intervals. It is needed to differentiate between the various types of decimal delimiters, therefore it’s recommended to specify the delimiter used or at least transform in the background the numeric value to the format expected by database/tool. The use of controls that allow multi-selection of numeric values, excepting delimiter separated values provided in text boxes, doesn’t really makes sense unless the list of possible it’s quite small.
Boolean Attributes
Boolean data types are used to store the value of attributes that have false or true state, and eventually Null values, therefore checkboxes and radio buttons could be used for this purpose. The bi-state could be easily implemented only with a checkbox when is interested only one of the aspects, while two or three checkboxes can be used for selecting more than one value. On the other side radio buttons allow the selection of only one value, thus select either the false, true or missing values. Such filters could be also used to activate hardcoded complex logic based on multiple attributes and reduced to Boolean values.
31 January 2010
🧭Business Intelligence: Enterprise Reporting (Part V: Choosing Report’s Attributes)
|  | 
| Business Intelligence Series | 
When choosing the attributes for a report, there are several important sets of attributes which needs to be considered:
Unique identifiers
Together with the various Names (e.g. Vendor Name, Customer Name) associated with entities, a report should include also the “unique identifier” (UID) for each entity, even if formed from one or more attributes. The UID allows identifying for example if duplicate records appear in report or it could be used to match/join the data from the reports with other data sets in order to pull details or for further analysis of data. For example in a PO report over PO Shipments a unique (natural) key could be identified by using the PO Number, Line Number and Shipment Number; for a Vendor could be used the Vendor Name or the GSL (Global Supplier Location) Number, though the later it’s more adequate because it’s more general and accurate, making easier Vendor’s identification. In theory, for the same scope could be used also the database (surrogate) unique identifier from PO Shipments table, the elements dictating report’s level of detail, respectively the Vendor ID, though even if surrogate UID are easier to use in joins, they could create confusion and overload the reports, given that surrogate UIDs need to be provided also for the other elements.
Documents like Invoices include an external and internal unique identifier, the Invoice Number together with the Vendor, typically unique in a system, form the external UID, while the Document or Voucher Number is used as internal UID. The external UID it’s easier to use for external-based considerations, while the internal UID it’s easier to use for internal needs, so it makes sense to include both types of unique identifiers.
Quantities & Related Attributes
In Item-related reports, most of the times it makes sense to include also the quantities (e.g. Transaction, Ordered, Delivered, Invoiced, On-Hand Quantities) together with the Unit of Measure (UOM) in which they are represented. It has to be made distinction between the Primary UOM, the UOM in which the item is stored, and the Transactional UOM, the UOM in which the Item is transacted; for example the Purchasing UOM, Sales UOM or Transaction UOM could be different than the Primary UOM in which the item is stored in Warehouse. In such cases together with the Transactional UOM should be provided also the Primary UOM and eventually the UOM Conversion Rate, when applicable.
Prices/Amounts & Related Attributes
For Item-related reports and not only, include the various Prices (e.g. Sales, Purchase, Standard Price) together with the Currency Code used even if only one Currency is used, same rule applying also for the amounts stored (e.g. Invoice, Sales Order, Purchase amounts). For financial reports it’s advisable to show both functional amounts, the amounts in the Currency used by GL (General Ledger), and transactional amounts, the Currency used in the transaction. When the level of details allows it, show also the Quantity, Price Unit used to calculate the amounts, the eventual Exchange Rate or UOM Conversion Rate used. When available, include also the Period when the Amount was booked in the system.
Dates
Typically should be included the Document Date (e.g. Invoice Date, Order Date) and Document Creation Date, together with the other Dates important for the business or data analysis (e.g. Need By Date, GL Date, Value Date). In general the Document Date or Document Creation Date, and GL Date for financial reports, should be mandatory attributes because they could be used to segment (partition) a data set in time units (e.g. days, weeks, months, periods, years, etc.).
Statuses
The various record statuses and document statuses should be again mandatory attributes in reports. Record statuses show whether a record is active, was cancelled or marked as deleted, while document statuses show documents’ processing status, often being associated with a workflow (e.g. approval or processing workflows). The record statuses could be synchronized and even merged with the document statuses.
Either expressed as flags or list of values, statuses are essential in delimiting the data set that needs to be considered for further calculations, because often not approved documents or cancelled records have low or no relevance for the business. Not approved documents are typically not considered for the various calculations until they were not approved, while cancelled records are associated with mistakes or the lack of need. Not being able to identify the active records can mess things pretty badly, because for example there are reports that show only active, while others show all the data available in a system. Therefore showing of statuses in reports can be important in the mitigation of differences between reports, especially when dealing with calculations.
It’s advisable to have the possibility to see also the cancelled records, for example in order to analyze the amount of waste expressed as overwork or for identifying the records that were cancelled by mistake.
In reports with multiple levels of details, it can be useful to show the statuses from all levels, as statuses might not be in sync or because they have different meaning. In theory, when the statuses are in synch and especially when considering cancellations, it should be enough to consider the status from the lowest level of detail from each logical entity (e.g. PO Shipment Status when considering PO, Invoice Line status when considering Invoices, both mentioned statuses when considering POs together with Invoices), though reality can prove to be a tough world for statuses, as programming errors and other business scenarios need to be considered.
Action Owners
Include Requestors, Document Preparers, Buyers, Managers or any other type of action owners, so a user can track the direct or indirect issues back to them.
Note:
Such attributes can be used as base to calculate/reflect action owner’s performance, fact that can infringe country or organization regulations so you need to check if there are any constraints in this direction and which set of attributes might be impacted. For example might be no problem to show the Buyer, though might be a problem to show information about who created/modified the record. Eventually if needed to calculate the performance at action owner level, substitute any attribute that can be used to identify a person with a random value, however if the mapping between the action owner and value used as substitute is known (in case unique identifiers are used) or easy to get (by checking records in the system), the data might be misused.
🧭Business Intelligence: Enterprise Reporting (Part IV: Design Best Practices)
|  | 
| Business Intelligence Series | 
Report’s Design pest practices:
- Reduce unnecessary network traffic.
- Group together the attributes coming from the same table/element/module.
- Use uniform coding style, formatting and naming conventions.
- Handle missing values (NULLs).
- Avoid hard-coding values.
- Include filter information, the date when the report was run and eventually the name of the Use who run the report.
- Design the report to fit screen’s resolution (e.g. 1280x1024) [1].
- Use existing reports as templates [1].
- Consider local date/time [1], currencies, decimal formatting.
- Highlight important data accordingly (e.g. styles & conditional formatting).
- Separating labels text from expressions [1].
- Use pagination in order to improve performance for large reports.
- Avoid blank pages.
- Keep objects together.
- Use flexible easy-to-use filters.
- Validate input filter data in early phases
- Provide a report for each level of detail.
- Backup aggregated reports with detailed reports - Link reports by using drill-down, drill-though, sub-reports techniques.
- Include unique identifiers for each important data element.
- Include a running number.
- Don’t overload the reports.
Report’s Management best practices:
- Provide an integrated reporting solution and integrate/align it with Data/Information/Knowledge Management strategy/vision.
- Align reports with organization’ strategic, financial and operational plans.
- Align reports with decisions making and performance management .
- Assign an Owner for each report.
- Provide metadata (e.g. Owner, Scope, Business Case, Attributes’ meaning, when the data were last time updated etc).
- Stabilize the requirements before creating the report.
- Train the Users how to use efficiently the reports/reporting tools. - Create processes for report’s creation, modification, testing, issues reporting and mitigation.
- Track report’s usage and performance
- Write unit tests & test the reports.
- Use versioning & keep older versions.
- Document reports (queries and dependent objects, filtering parameters, scope, business case, etc).
- Minimize reporting server’s workload.
- Archive/Backup important reports.
Report’s Security best practices:
- Use role-based security.
- Protect sensitive data.
- Enforce password best practices.
- Educate users about security concerns related to reporting tool’s security and security of data usage.
- Export data to formats that don’t allow data tempering.
References:
[1] Microsoft TechNet. (2010). Best Practices for Reporting. [Online] Available from: http://technet.microsoft.com/en-us/library/cc180385.aspx (Accessed: 31 January 2010).
30 January 2010
💎SQL Reloaded: Ways of Looking at Data I (Some Basics)
Using the logical partitioning of views, an alternative view on PO Headers would be created without showing the Vendor end Employee information, following to use if necessary the corresponding views when needed. AdventureWorks includes a view on Vendors called Purchasing.vVendor though the level of detail is at Contact level, and a Vendor could have more than one contacts, therefore the view can’t be used for this purpose, following to create a second view called Purchasing.vVendors (if the name creates confusion you could rename it or rename the Purchasing.vVendor to reflect the fact that the level is Vendor Contacts, for example Purchasing.vVendorContacts).
Most of the data analysis focuses only for a certain list of vendors, countries or regions, for a certain time interval, or for any other special requirements, for example pending Pos , in order to approximate the future expenses, or for the rejected Pos, in order to see the volume of waste/rework or to quantify the lost opportunities. The above queries were created for general cases and they could be modified for specific requirements if needed.
About Me
 
- Adrian
- Koeln, NRW, Germany
- IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.
