Showing posts with label design. Show all posts
Showing posts with label design. Show all posts

04 February 2025

🧭Business Intelligence: Perspectives (Part XXVI: Monitoring - A Cockpit View)

Business Intelligence Series
Business Intelligence Series

The monitoring of business imperatives is sometimes compared metaphorically with piloting an airplane, where pilots look at the cockpit instruments to verify whether everything is under control and the flight ensues according to the expectations. The use of a cockpit is supported by the fact that an airplane is an almost "closed" system in which the components were developed under strict requirements and tested thoroughly under specific technical conditions. Many instruments were engineered and evolved over decades to operate as such. The processes are standardized, inputs and outputs are under strict control, otherwise the whole edifice would crumble under its own complexity. 

In organizational setups, a similar approach is attempted for monitoring the most important aspects of a business. A few dashboards and reports are thus built to monitor and control what’s happening in the areas which were identified as critical for the organization. The various gauges and other visuals were designed to provide similar perspectives as the ones provided by an airplane’s cockpit. At first sight the cockpit metaphor makes sense, though at careful analysis, there are major differences. 

Probably, the main difference is that businesses don’t necessarily have standardized processes that were brought under control (and thus have variation). Secondly, the data used doesn’t necessarily have the needed quality and occasionally isn’t fit for use in the business processes, including supporting processes like reporting or decision making. Thirdly, are high the chances that the monitoring within the BI infrastructures doesn’t address the critical aspects of the business, at least not at the needed level of focus, detail or frequency. The interplay between these three main aspects can lead to complex issues and a muddy ground for a business to build a stable edifice upon. 

The comparison with airplanes’ cockpit was chosen because the number of instruments available for monitoring is somewhat comparable with the number of visuals existing in an organization. In contrast, autos have a smaller number of controls simple enough to help the one(s) sitting in the cockpit. A car’s monitoring capabilities can probably reflect the needs of single departments or teams, though each unit needs its own gauges with specific business focus. The parallel is however limited because the areas of focus in organizations can change and shift in other directions, some topics may have a periodic character while others can regain momentum after a long time. 

There are further important aspects. At high level, the expectation is for software products and processes, including the ones related to BI topics, to have the same stability and quality as the mass production of automobiles, airplanes or other artifacts that have similar complexity and manufacturing characteristics. Even if the design process of software and manufacturing may share many characteristics, the similar aspects diverge as soon as the production processes start, respectively progress, and these are the areas where the most differences lie. Starting from the requirements and ending with the overall goals, everything resembles the characteristics of quick shifting sands on which is challenging to build any stabile edifice.

At micro level in manufacturing each piece was carefully designed and produced according to a set of characteristics that were proved to work. Everything must fit perfectly in the grand design and there are many tests and steps to make sure that happens. To some degree the same is attempted when building software products, though the processes break along the way with the many changes attempted, with the many cost, time and quality constraints. At some point the overall complexity kicks back; it might be still manageable though the overall effort is higher than what organizations bargained for. 

24 January 2025

🧭Business Intelligence: Perspectives (Part XXIV: Building Castles in the Air)

Business Intelligence Series
Business Intelligence Series

Business users have mainly three means of visualizing data – reports, dashboards and more recently notebooks, the latter being a mix between reports and dashboards. Given that all three types of display can be a mix of tabular representations and visuals/visualizations, the difference between them is often neglectable to the degree that the terms are used interchangeably. 

For example, in Power BI a report is a "multi-perspective view into a single semantic model, with visualizations that represent different findings and insights from that semantic model" [1], while a dashboard is "a single page, often called a canvas, that uses visualizations to tell a story" [1], a dashboards’ visuals coming from one or more reports [2]. Despite this clear delimitation, the two concepts continue to be mixed and misused in conversations even by data-related professionals. This happens also because in other tools the vendors designate as dashboard what is called report in Power BI. 

Given the limited terminology, it’s easy to generalize that dashboards are useless, poorly designed, bad for business users, and so on. As Stephen Few recognized almost two decades ago, "most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations" [3]. Therefore, when people say that "dashboards are bad" refer to the result of poorly implementations, of what some of them were part of, which frankly is a different topic! Unfortunately, BI implementations reflect probably more than any other areas how easy is to fail!

Frankly, here it is not necessarily the poor implementation of a project management methodology at fault, which quite often happens, but the way requirements are defined, understood, documented and implemented. Even if these last aspects are part of the methodologies, they are merely a reflection of how people understand the business. The outcomes of BI implementations are rooted in other areas, and it starts with how the strategic goals and objectives are defined, how the elements that need oversight are considered in the broader perspectives. The dashboards become thus the end-result of a chain of failures, failing to build the business-related fundament on which the reporting infrastructure should be based upon. It’s so easy to shift the blame on what’s perceptible than on what’s missing!

Many dashboards are built because people need a sense of what’s happening in the business. It starts with some ideas based on the problems identified in organizations, one or more dashboards are built, and sometimes a lot of time is invested in the process. Then, some important progress is made, and all comes to a stale if the numbers don’t reveal something new, important, or whatever users’ perception is. Some might regard this as failure, though as long as the initial objectives were met, something was learned in the process and a difference was made, one can’t equate this with failure!

It’s more important to recognize the temporary character of dashboards, respectively of the requirements that lead to them and build around them. Of course, this requires occasionally a different approach to the whole topic. It starts with how KPIs and other business are defined and organized, respectively on how data repositories are built, and it ends with how data are visualized and reported.

As the practice often revealed, it’s possible to build castles in the air, without a solid foundation, though the expectation for such edifices to sustain the weight of businesses is unrealistic. Such edifices break with the first strong storm and unfortunately it's easier to blame a set of tools, some people or a whole department instead at looking critically at the whole organization!

[1] Microsoft Learn (2024) Power BI: Glossary [link]
[2] Microsoft Learn (2024) Power BI: Dashboards for business users of the Power BI service [link
[3] Stephen Few, "Information Dashboard Design", 2006

12 December 2024

🧭💹Business Intelligence: Perspectives (Part XIX: Data Visualization between Art, Pragmatism and Kitsch)

Business Intelligence Series

The data visualizations (aka dataviz) presented in the media, especially the ones coming from graphical artists, have the power to help us develop what is called graphical intelligence, graphical culture, graphical sense, etc., though without a tutor-like experience the process is suboptimal because it depends on our ability of identifying what is important and which are the steps needed for decoding and interpreting such work, respectively for integrating their messages in our overall understanding about the world.

When such skillset is lacking, without explicit annotations or other form of support, the reader might misinterpret or fail to observe important visual cues even for simple visualizations, with all the implications deriving from this – a false understanding, and further aspects deriving from it, this being probably the most important aspect to consider. Unfortunately, even the most elaborate work can fail if the reader doesn’t have a basic understanding of all that’s implied in the process.

The books of Willard Brinton, Ana Rogers, Jacques Bertin, William Cleveland, Leland Wilkinson, Stephen Few, Albert Cairo, Soctt Berinato and many others can help the readers build a general understanding of the dataviz process and how data visualizations or simple graphics can be used/misused effectively, though each reader must follow his/her own journey. It’s also true that the basics can be easily learned, though the deeper one dives, the more interesting and nontrivial the journey becomes. Fortunately, the average reader can stick to the basics and many visualizations are simple enough to be understood.

To grasp the full extent of the implications, one can make comparisons with the domain of poetry where the author uses basic constructs like metaphor, comparisons, rhythm and epithets to create, communicate and imprint in reader’s mind old and new meanings, images and feelings altogether. Artistic data visualizations tend to offer similar charge as poetry does, even if the impact might not appeal so much to our artistic sensibility. Though dataviz from this perspective is or at least resembles an art form.

Many people can write verses, though only a fraction can write good meaningful poetry, from which a smaller fraction get poems, respectively even fewer get books published. Conversely, not everything can be expressed in verses unless one finds good metaphors and other aspects that can be leveraged in the process. Same can be said about good dataviz.

One can argue that in dataviz the author can explore and learn especially by failing fast (seeing what works and what doesn’t). One can also innovate, though the creator has probably a limited set of tools and rules for communication. Enabling readers to see the obvious or the hidden in complex visualizations or contexts requires skill and some kind of mastery of the visual form.

Therefore, dataviz must be more pragmatic and show the facts. In art one has the freedom to distort or move things around to create new meanings, while in dataviz it’s important for the meaning to be rooted in 'truth', at least by definition. The more the creator of a dataviz innovates, the higher the chances of being misunderstood. Moreover, readers need to be educated in interpreting the new meanings and get used to their continuous use.

Kitsch is a term applied to art and design that is perceived as naïve imitation to the degree that it becomes a waste of resources even if somebody pays the tag price. There’s a trend in dataviz to add elements to visualizations that don’t bring any intrinsic value – images, colors and other elements can be misused to the degree that the result resembles kitsch, and the overall value of the visualization is diminished considerably.

01 September 2024

🗄️Data Management: Data Governance (Part I: No Guild of Heroes)

Data Management Series
Data Management Series

Data governance appeared around 1980s as topic though it gained popularity in early 2000s [1]. Twenty years later, organizations still miss the mark, respectively fail to understand and implement it in a consistent manner. As usual, the reasons for failure are multiple and they vary from misunderstanding what governance is all about to poor implementation of methodologies and inadequate management or leadership. 

Moreover, methodologies tend to idealize the various aspects and is not what organizations need, but pragmatism. For example, data governance is not about heroes and heroism [2], which can give the impression that heroic actions are involved and is not the case! Actions for the sake of action don’t necessarily lead to change by themselves. Organizations are in general good at creating meaningless action without results, especially when people preoccupy themselves, miss or ignore the mark. Big organizations are very good at generating actions without effects. 

People do talk to each other, though they try to solve their own problems and optimize their own areas without necessarily thinking about the bigger picture. The problem is not necessarily communication or the lack of depth into business issues, people do communicate, know the issues without a business impact assessment. The challenge is usually in convincing the upper management that the effort needs to be consolidated, supported, respectively the needed resources made available. 

Probably, one of the issues with data governance is the attempt of creating another structure in the organization focused on quality, which has the chances to fail, and unfortunately does fail. Many issues appear when the structure gains weight and it becomes a separate entity instead of being the backbone of organizations. 

As soon organizations separate the data governance from the key users, management and the other important decisional people in the organization, it takes a life of its own that has the chances to diverge from the initial construct. Then, organizations need "alignment" and probably other big words to coordinate the effort. Also such constructs can work but they are suboptimal because the forces will always pull in different directions.

Making each manager and the upper management responsible for governance is probably the way to go, though they’ll need the time for it. In theory, this can be achieved when many of the issues are solved at the lower level, when automation and further aspects allow them to supervise things, rather than hiding behind every issue. 

When too much mircomanagement is involved, people tend to busy themselves with topics rather than solve the issues they are confronted with. The actual actors need to be empowered to take decisions and optimize their work when needed. Kaizen, the philosophy of continuous improvement, proved itself that it works when applied correctly. They’ll need the knowledge, skills, time and support to do it though. One of the dangers is however that this becomes a full-time responsibility, which tends to create a separate entity again.

The challenge for organizations lies probably in the friction between where they are and what they must do to move forward toward the various objectives. Moving in small rapid steps is probably the way to go, though each person must be aware when something doesn’t work as expected and react. That’s probably the most important aspect. 

So, the more functions are created that diverge from the actual organization, the higher the chances for failure. Unfortunately, failure is visible in the later phases, and thus self-awareness, self-control and other similar “qualities” are needed, like small actors that keep the system in check and react whenever is needed. Ideally, the employees are the best resources to react whenever something doesn’t work as per design. 

Previous Post <<||>> Next Post 

[1] Wikipedia (2023) Data Management [link]
[2] Tiankai Feng (2023) How to Turn Your Data Team Into Governance Heroes [link]

06 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part I: First Thoughts)

Business Intelligence
Business Intelligence Series

A data project has a set of assumptions and requirements that must be met, otherwise the project has a high chance of failing. It starts with a clear idea of the goals and objectives, and they need to be achievable and feasible, with the involvement of key stakeholders and the executive without which it’s impossible to change the organization’s data culture. Ideally, there should also be a business strategy, respectively a data strategy available to understand the driving forces and the broader requirements. 

An organization’s readiness is important not only in what concerns the data but also the things revolving around the data - processes, systems, decision-making, requirements management, project management, etc. One of the challenges is that the systems and processes available can’t be used as they are for answering important business questions, and many of such questions are quite basic, though unavailability or poor quality of data makes this challenging if not impossible. 

Thus, when starting a data project an organization must be ready to change some of its processes to address a project’s needs, and thus the project can become more expensive as changes need to be made to the systems. For many organizations the best time to have done this was when they implemented the system, respectively the integration(s) between systems. Any changes made after that come in theory with higher costs derived from systems and processes’ redesign.

Many projects start big and data projects are no exception to this. Some of them build a costly infrastructure without first analyzing the feasibility of the investment, or at least whether the data can form a basis for answering the targeted questions. On one side one can torture any dataset and some knowledge will be obtained from it (aka data will confess), though few datasets can produce valuable insights, and this is where probably many data projects oversell their potential. Conversely, some initiatives are worth pursuing even only for the sake of the exposure and experience the employees get. However, trying to build something big only through the perspective of one project can easily become a disaster. 

When building a data infrastructure, the project needs to be an initiative given the transformative potential such an endeavor can have for the organization, and the different aspects must be managed accordingly. It starts with the management of stakeholders’ expectations, with building a data strategy, respectively with addressing the opportunities and risks associated with the broader context.

Organizations recognize that they aren’t capable of planning and executing such a project or initiative, and they search for a partner to lead the way. Becoming overnight such a partner is more than a challenge as a good understanding of the industry and the business is needed. Some service providers have such knowledge, at least in theory, though the leap from knowledge to results can prove to be a challenge even for experienced service providers. 

Many projects follow the pattern: the service provider comes, analyzes the requirements, builds something wonderful, the solution is used for some time and then the business realizes that the result is not what was intended. The causes are multiple and usually form a complex network of causality, though probably the most important aspect is that customers don’t have the in-house technical resources to evaluate the feasibility of requirements, solutions, respectively of the results. Even if organizations involve the best key users, are needed also good data professionals or similar resources who can become the bond between the business and the services provider. Without such an intermediary the disconnect between the business and the service provider can grow with all the implications. 

Previous Post <<||>> Next Post

22 March 2024

🧭Business Intelligence: Perspectives (Part IX: Dashboards Are Dead & Other Crap)

Business Intelligence
Business Intelligence Series

I find annoying the posts that declare that a technology is dead, as they seem to seek the sensational and, in the end, don't offer enough arguments for the positions taken; all is just surfing though a few random ideas. Almost each time I klick on such a link I find myself disappointed. Maybe it's just me - having too great expectations from ad-hoc experts who haven't understood the role of technologies and their lifecycle.

At least until now dashboards are the only visual tool that allows displaying related metrics in a consistent manner, reflecting business objectives, health, or other important perspective into an organization's performance. More recently notebooks seem to be getting closer given their capabilities of presenting data visualizations and some intermediary steps used to obtain the data, though they are still far away from offering similar capabilities. So, from where could come any justification against dashboard's utility? Even if I heard one or two expert voices saying that they don't need KPIs for managing an organization, organizations still need metrics to understand how the organization is doing as a whole and taken on parts. 

Many argue that the design of dashboards is poor, that they don't reflect data visualization best practices, or that they are too difficult to navigate. There are so many books on dashboard and/or graphic design that is almost impossible not to find such a book in any big library if one wants to learn more about design. There are many resources online as well, though it's tough to fight with a mind's stubbornness in showing no interest in what concerns the topic. Conversely, there's also lot of crap on the social networks that qualify after the mainstream as best practices. 

Frankly, design is important, though as long as the dashboards show the right data and the organization can guide itself on the respective numbers, the perfectionists can say whatever they want, even if they are right! Unfortunately, the numbers shown in dashboards raise entitled questions and the reasons are multiple. Do dashboards show the right numbers? Do they focus on the objectives or important issues? Can the number be trusted? Do they reflect reality? Can we use them in decision-making? 

There are so many things that can go wrong when building a dashboard - there are so many transformations that need to be performed, that the chances of failure are high. It's enough to have several blunders in the code or data visualizations for people to stop trusting the data shown.

Trust and quality are complex concepts and there’s no standard path to address them because they are a matter of perception, which can vary and change dynamically based on the situation. There are, however, approaches that allow to minimize this. One can start for example by providing transparency. For each dashboard provide also detailed reports that through drilldown (or also by running the reports separately if that’s not possible) allow to validate the numbers from the report. If users don’t trust the data or the report, then they should pinpoint what’s wrong. Of course, the two sources must be in synch, otherwise the validation will become more complex.

There are also issues related to the approach - the way a reporting tool was introduced, the way dashboards flooded the space, how people reacted, etc. Introducing a reporting tool for dashboards is also a matter of strategy, tactics and operations and the various aspects related to them must be addressed. Few organizations address this properly. Many organizations work after the principle "build it and they will come" even if they build the wrong thing!

Previous Post <<||>> Next Post

20 March 2021

🧭Business Intelligence: New Technologies, Old Challenges (Part II - ETL vs. ELT)


Business Intelligence

Data lakes and similar cloud-based repositories drove the requirement of loading the raw data before performing any transformations on the data. At least that’s the approach the new wave of ELT (Extract, Load, Transform) technologies use to handle analytical and data integration workloads, which is probably recommendable for the mentioned cloud-based contexts. However, ELT technologies are especially relevant when is needed to handle data with high velocity, variance, validity or different value of truth (aka big data). This because they allow processing the workloads over architectures that can be scaled with workloads’ demands.

This is probably the most important aspect, even if there can be further advantages, like using built-in connectors to a wide range of sources or implementing complex data flow controls. The ETL (Extract, Transform, Load) tools have the same capabilities, maybe reduced to certain data sources, though their newer versions seem to bridge the gap.

One of the most stressed advantages of ELT is the possibility of having all the (business) data in the repository, though these are not technological advantages. The same can be obtained via ETL tools, even if this might involve upon case a bigger effort, effort depending on the functionality existing in each tool. It’s true that ETL solutions have a narrower scope by loading a subset of the available data, or that transformations are made before loading the data, though this depends on the scope considered while building the data warehouse or data mart, respectively the design of ETL packages, and both are a matter of choice, choices that can be traced back to business requirements or technical best practices.

Some of the advantages seen are context-dependent – the context in which the technologies are put, respectively the problems are solved. It is often imputed to ETL solutions that the available data are already prepared (aggregated, converted) and new requirements will drive additional effort. On the other side, in ELT-based solutions all the data are made available and eventually further transformed, but also here the level of transformations made depends on specific requirements. Independently of the approach used, the data are still available if needed, respectively involve certain effort for further processing.

Building usable and reliable data models is dependent on good design, and in the design process reside the most important challenges. In theory, some think that in ETL scenarios the design is done beforehand though that’s not necessarily true. One can pull the raw data from the source and build the data models in the target repositories.

Data conversion and cleaning is needed under both approaches. In some scenarios is ideal to do this upfront, minimizing the effect these processes have on data’s usage, while in other scenarios it’s helpful to address them later in the process, with the risk that each project will address them differently. This can become an issue and should be ideally addressed by design (e.g. by building an intermediate layer) or at least organizationally (e.g. enforcing best practices).

Advancing that ELT is better just because the data are true (being in raw form) can be taken only as a marketing slogan. The degree of truth data has depends on the way data reflects business’ processes and the way data are maintained, while their quality is judged entirely on their intended use. Even if raw data allow more flexibility in handling the various requests, the challenges involved in processing can be neglected only under the consequences that follow from this.

Looking at the analytics and data integration cloud-based technologies, they seem to allow both approaches, thus building optimal solutions relying on professionals’ wisdom of making appropriate choices.

Previous Post <<||>>Next Post

01 February 2021

📦Data Migrations (DM): Quality Assurance (Part I: Quality Acceptance Criteria I)

Data Migration
Data Migrations Series


When designing a Data Migration (DM), respectively any software solution, it’s important to take inventory of project’s requirements, evaluate, document, communicate and monitor them accordingly. Each of them can have an important impact on the solution, as a solution’s success will be validated and judged upon them. Therefore, the identified requirements must be considered as baseline for conceptualization, design, implementation and sign-off, and should go through same procedures and rigor as other projects requirements. The existence of a standardized Requirements Management process can facilitate their management through project’s lifecycle. 

The requirements are usually driven by the source and target systems (e.g. data import/export features, data models and their constraints), the environments they are hosted on (e.g. cloud vs. on-premise), respectively the layers in between (e.g. network, firewalls), project and business aspects that need to be considered (e.g. freeze window for the Go-Live, data availability dates, data quality, external dependencies, etc.). They resume to the solution itself as well to the data and processes involved, and are reflected but not limited to the following important aspects, that can be considered upon case also as quality acceptance criteria: 


Accessibility is the degree to which the data are available for a solution so it can be processed when needed, in the form, by resources, or means intended for processing. It’s critical for a DM solution to access or have available the master, transaction, parameter and further data when needed. The team must make sure that the data become easily accessible. 

Unavailability of data can impact the DM and can easily lead to delays in the project. This also means that the various project activities (parametrization, cleansing, enrichment, development) need to be synchronized with the migration activities. 

Upon case, accessibility can involve the solution itself expressed as the degree to which it’s available to the resources supposed to use it. Certain architectural decisions can have impact on the carried activities. As the solution is usually deployed on a server, it can happen that only a limited number of people is able to access it concurrently. Moreover, a DM’s complexity makes the involvement of multiple developers challenging.  


Accountability is the degree to which accountability is enforced for the various resources involved in DM processes and related activities. As multiple resources are involved for parametrization, cleaning, processing, validation, software development, each resource needs to be aware about the extent they are accountable for. Without accountability made explicit, there’s the danger that the activities are neglected, with all the implications deriving from it – quality deviations, delays, data unavailability, etc. 


Adaptability is the degree to which a solution can be adapted to environment or requirement changes. Even if typically, the environments don’t change, it doesn’t mean that this will not happen as the IT infrastructure goes through continuous changes that can affect directly or indirectly a migration.  Same can be said about requirements, which however have higher probability to change even late in the process as new knowledge is acquired and needs to be integrated in the solution. 


Atomicity is the degree to which data entities can be processes at the required level of abstraction in an atomic manner. Even if transformations occur during the various stages, the data belonging to an entity need to be kept and processed together (e.g. Customers and their Addresses). This can involve processing attributes in advance even if the data might be required later. There can be situations in which the data belonging to the same entity need to be processed on different paths, though in the end it’s important to keep the data together, when the processing logic allows it. 

Next Post

05 January 2021

🧮ERP: Planning (Part II: It’s all about Scope - Nonfunctional Requirements & MVP))

ERP Implementation

Nonfunctional Requirements

In contrast to functional requirements (FRs), nonfunctional requirements (NFRs) have no direct impact on system’s behavior, affecting end-users’ experience with the system, resuming thus to topics like performance, usability, reliability, compatibility, security, monitoring, maintainability, testability, respectively other constraints and quality attributes. Even if these requirements are in general addressed by design, the changes made to the system have the potential of impacting users’ experience negatively.  

Moreover, the NFRs are usually difficult to quantify, and probably that’s why they are seldom made explicit in a formal document or are considered eventually only at high level. However, one can still find a basis for comparison against compliance requirements, general guidelines, standards, best practices or the legacy system(s) (e.g. the performance should not be worse than in the legacy system, the volume of effort for carrying the various activities should not increase). Even if they can’t be adequately described, it’s recommended to list the NFRs in general terms in a formal document (e.g. implementation contract). Failing to do so can open or widen the risk exposure one has, especially when the system lacks important support in the respective areas. In addition, these requirements need to be considered during testing and sign-off as well. 

Minimum Viable Product (MVP)

Besides gaps’ consideration in respect to FRs, it’s important to consider sometimes on whether the whole functionality is mandatory, especially when considering the various activities that need to be carried out (parametrization, Data Migration).

For example, one can target to implement a minimum viable product (MVP) - a version of the product which has just enough features to cover the mandatory or the most important FRs. The MVP is based on the idea that implementing about 80% of the needed functionality has in theory the potential of providing earlier a usable product with a minimum of effort (quick wins), assure that project’s goals and objectives were met, respectively assure a basis for further development. In case of cost overruns, the MVP assures that the business has a workable product and has the opportunity of deciding whether it’s worth of investing more into the project now or later. 

The MVP allows also to get early users’ feedback and integrate it into further enhancements and developments. Often the users understand the capabilities of a system, respectively implementation, only when they are able using the system. As this is a learning process, the learning period can take up to a few months until adequate feedback is available. Therefore, postponing implementation’s continuation with a few months can have in theory a positive impact, however it can come also with drawbacks (e.g. the resources are not available anymore). 

A sketch of the MVP usually results from requirements’ prioritization, however then requirements need to be regarded holistically, as there can be different levels of dependencies existing between them. In addition, different costs can incur if the requirements will be handled later, and other constrains may apply as well. Considering an MVP approach can be a sword with two edges. In the worst-case scenario, the business will get only the MVP, with its good and bad characteristics. The business will be forced then to fill the gaps by working outside the system, which can lead to further effort and, in extremis, with poor acceptance of the system. In general, users expect having their processes fully implemented in the system, expectation which is not always economically grounded.

After establishing an MVP one can consider the further requirements (including improvement suggestions) based on a cost-benefit basis and implement them accordingly as part of a continuous improvement initiative, even if more time will be maybe required for implementing the same.

Previous Post <<||>> Next Post

27 December 2020

🧊☯Data Warehousing: Data Vault 2.0 (The Good, the Bad and the Ugly)

Data Warehousing
Data Warehousing Series

One of the interesting concepts that seems to gain adepts in Data Warehousing is the Data Vault – a methodology, architecture and implementation for Data Warehouses (DWH) developed by Dan Linstedt between 1990 and 2000, and evolved into an open standard with the 2.0 version.

According to its creator, the Data Vault is a detail-oriented, historical tracking and uniquely linked set of normalized tables that support one or more business functional areas [2]. To hold data at the lowest grain of detail from the source system(s) and track the changes occurred in the data, it splits the fact and dimension tables into hubs (business keys), links (the relationships between business keys), satellites (descriptions of the business keys), and reference (dropdown values) tables [3], while adopting a hybrid approach between 3rd normal form and star schemas. In addition, it provides a two- or three-layered data integration architecture, a series of standards, methods and best practices supposed to facilitate its use.

It integrates several other methodologies that allow bridging the gap between the technical, logistic and execution parts of the DWH life-cycle – the PMI methodology is used for the various levels of planning and execution, while the Scrum methodology is used for coordinating the day-to-day project tasks. Six Sigma is used together with Total Quality Management for the design and continuous improvement of DWH and data-related processes. In addition, it follows the CMMI maturity model for providing a clear baseline for benchmarking an organization’s DWH capabilities in development, acquisition and service areas.

The Good: The decomposition of the source data models into hub, link and satellite tables provides traceability and auditability at raw data level, allowing thus to address the compliance requirements of Sarabanes-Oxley, HIPPA and Basel II by design.

The considered standards, methods, principles and best practices are leveraged from Software Engineering [1], establishing common ground and a standardized approach to DWH design, implementation and testing. It also narrows down the learning and implementation paths, while allowing an incremental approach to the various phases.

Data Vault 2.0 offers support for real-time, near-real-time and unstructured data, while new technologies like MapReduce, NoSQL can be integrated within its architecture, though the same can be said about other approaches as long there’s compatibility between the considered technologies. In fact, except business entities’ decomposition, many of the notions used are common to DWH design.

The Bad: Further decomposing the fact and dimension tables can impact the performance of the queries run against the tables as more joins are required to gather the data from the various tables. The further denormalization of tables can lead to higher data storage needs, though this can be neglectable compared with the volume of additional objects that need to be created in DWH. For an ERP system with a few hundred of meaningful tables the complexity can become overwhelming.

Unless one uses a COTS tool which automates some part of the design and creation process, building everything from scratch can be time-consuming, increasing thus the time-to-market for solutions. However, the COTS tools can introduce restrictions of their own, which can negatively impact the overall experience with the methodology.

The incorporation of non-technical methodologies can have positive impact, though unless one has experience with the respective methodologies, the disadvantages can easily overshadow the (theoretical) advantages.

The Ugly: The dangers of using Data Vault can be corroborated as usual with the poor understanding of the methodology, poor level of skillset or the attempt of implementing the methodology without allowing some flexibility when required. Unless one knows what he is doing, bringing more complexity in a field which is already complex, can easily impact negatively projects’ outcomes.

Previous Post <<||>> Next Post

[1] Dan Linstedt & Michael Olschimke (2015) Building a Scalable Data Warehouse with Data Vault 2.0
[2] Dan Linstedt (?) Data Vault Basics [source]
[3] Dan Linstedt (2018) Data Vault: Data Modeling Specification v 2.0.2 [source]

27 November 2020

🧊Data Warehousing: ETL (Part II: An Introduction)


ETL (Extract, Transform, Load) processes, technologies or tools are about extracting data from one or more data sources via a set of queries, performing changes on the data via conversions, aggregations, mappings or other types of transformations, respectively loading the data into target tables or other type of repositories. Thus, an ETL process allows moving and transforming data between predefined data structures on an ad-hoc basis or as part of stable repetitive processes, which makes ETL ideal for data warehousing, data integrations, data migrations or similar scenarios. 

ETL Data Flow

Extract: The extraction of data is done typically based on SQL queries from relational databases or any OLEDB or ODBC-based data repositories including flat or MS Office files, though modern ETL tools can support other type of queries (CAML, XQuery, DAX) or even NoSQL architectures (Handoop). This allows addressing a wide range of requirements, the complexity of the logic depending on the functionality provided by the query languages, respectively the extraction functionality available.  

Transform: The transformation logic can be implemented based on the functionality provided by the ETL tool, and can involve after case any combination of aggregates, conditional splits, merges, lookups, multicasts, pivoting/unpivoting, cleansing, data conversions, sampling, mapping or any other transformations that can be performed on an in-transit dataset. On the other side, quite often the same can be achieved with the help of SQL-based manipulations directly in the extraction logic or later in the process. SQL can prove to be occasionally faster and more flexible than the transformations provided by the ETL tool, however despite the overlaps, the two approaches can complement each other when used adequately. 

Load: The load is usually just a dump of the data into one or more final or intermediary tables with predefined structures. Unless the data don’t match the data type, format or further defined constraints, the load seldom involve further challenges as long the solution was designed adequately. 

Within the logical model, extract, transform and load can be considered as process by themselves. Within the object model provided by the ETL tool, they are considered in the mentioned sequence within a data flow, which within a set of workflow constraints defines how the data move through the pipeline – the sequence of processing steps considered. The basic unit of work is the data flow and the workflow it belongs to, unit that can be encapsulated in one container for easier management or simply convenience. Several containers can be linked within a workflow to create more complex behavior. 

The data flows and workflow constraints, together with the supporting connections and containers form an ETL package, the main unit of work for encapsulating and running ETL logic. ETL packages are scheduled and run as fit for the purpose.

With the right design, these building blocks allow enough flexibility in handling ad-hoc requests or of building complex solutions. This involves decisions on how to partition the ETL packages, respectively the data flows, in which order they should be run, where and in which sequence the data should be transformed, how to handle exceptions, how to build eventually intermediary data repositories, how to handles audit requirements, and so on. Each of these choices can prove to be important. 

The knowledge of the ETL architecture and functionality is quintessential in providing the right solution for the problem considered, however once the basics were understood the challenges typically reside in understanding the source and/or target structures, the logical and physical entities available, identify the way the data can be partitioned horizontally or vertically, respectively what type of transformations are required for moving the data, as required by the solution. 

Previous Post <<||>> Next Post

07 November 2020

⛁DBMS: Event Streaming Databases (More of a Kafka’s Story)

Database Management

Event streaming architectures are architectures in which data are generated by different sources, and then processed, stored, analyzed, and acted upon in real-time by the different applications tapped into the data streams. An event streaming database is then a database that assures that its data are continuously up-to-date, providing specific functionality like management of connectors, materialized views and running queries on data-in-motion (rather than on static data). 

Reading about this type of technologies one can easily start fantasizing about the Web as a database in which intelligent agents can process streams of data in real-time, in which knowledge is derived and propagated over the networks in an infinitely and ever-growing flow in which the limits are hardly perceptible, in which the agents act as a mind disconnected in the end from the human intent. One is stroke by the fusing elements of realism and the fantastic aspects, more like in a Kafka’s story in which the metamorphosis of the technologies and social aspects can easily lead to absurd implications.

The link to Kafka was somehow suggested by Apache Kafka, an open-source distributed event streaming platform, which seems to lead the trends within this new-developing market. Kafka provides database functionality and guarantees the ACID (atomicity, concurrency, isolation, durability) properties of transactions while tapping into data streams. 

Data streaming is an appealing concept though it has some important challenges like data overload or over-flooding, the complexity derived from building specific (business) and integrity rules for processing the data, respectively for keeping data consistency and truth within the ever-growing and ever-changing flows. 

Data overload or over-flooding occurs when applications are not able to keep the pace with the volume of data or events fired with each change. Imagine the raindrops falling on a wide surface in which each millimeter or micrometer has its own rules for handling the raindrops and this at large scale. The raindrops must infiltrate into the surface to be processed and find their way to the beneath water flows, aggregating up to streams that could nurture seas or even oceans. Same metaphor can be applied to the data events, in which the data pervade applications accumulating in streams processed by engines to derive value. However heavy rains can easily lead to floods in which water aggregates at the surface. 

Business applications rely on predefined paths in which the flow of data is tidily linked to specific actions found themselves in processual sequences that reflect the material or cash flow. Any variation in the data flow from expectations will lead to inefficiencies and ultimately to chaos. Some benefit might be derived from data integrations between the business applications, however applications must be designed for this purpose and handle extreme behaviors like over-flooding. 

Data streams are maybe ideal for social media networks in which one broadcasts data through the networks and any consumer that can tap to the network can further use the respective data. We can see however the problems of nowadays social media – data, better said information, flow through the networks being changed as fit for purposes that can easily diverge from the initial intent. Moreover, information gets denatured, misused, overused to the degree that it overloads the networks, being more and more difficult to distinguish between reliable and non-reliable information. If common sense helps in the process of handling such information, not the same can be said about machines or applications. 

It will be challenging to deal with the vastness, vagueness, uncertainty, inconsistency, and deceit of the networks of the future, however data streaming more likely will have a future as long it can address such issues by design. 

06 November 2020

🧭Business Intelligence: Perspectives (Part VI: Data Soup - Reports vs. Data Visualizations)

Business Intelligence Series
Business Intelligence Series

Considering visualizations, John Tukey remarked that ‘the greatest value of a picture is when it forces us to notice what we never expected to see’, which is not always the case for many of the graphics and visualizations available in organizations, typically in the form of simple charts and dashboards, quite often with no esthetics or meaning behind.

In general, reports are needed as source for operational activities, in which the details in form of raw or aggregate data are important. As one moves further to the tactical or strategic aspects of a business, visualizations gain in importance especially when they allow encoding data and information, respectively variations, trends or relations in smaller places with minimal loss of information.

There are also different aspects of visualizations that need to be considered. Modern tools allow rapid visualization and interactive navigation of data across different variables which is great as long one knows what is searching for, which is not always the case.

There are junk charts in which the data drowns in graphical elements that bring no value to the reader, in extremis even distorting the message/meaning.

There are graphics/visualizations that attempt bringing together and encoding multiple variables in respect to a theme, and for which a ‘project’ is typically needed as data is not ad-hoc available, don’t have the desired quality or need further transformations to be ready for consumption. Good quality graphics/visualizations require time and a good understanding of the business, which are not necessarily available into the BI/Analytics teams, and unfortunately few organizations do something in that direction, ignoring typically such needs. In this type of environments is stressed the rapid availability of data for decision-making or action-relevant insight, which depends typically on the consumer.

The story-telling capabilities of graphics/visualizations are often exaggerated. Yes, they can tell a story though stories need to be framed into a context/problem, some background and further references need to be provided, while without detailed data the graphics/visualizations are just nice representations in which each consumer understands what he can.

In an ideal world the consumer and the ‘designer’ would work together to identify the important data for the theme considered, to find the appropriate level of detail, respectively the best form of encoding. Such attempts can stop at table-based representations (aka reports), respectively basic or richer forms of graphical representations. One can consider reports as an early stage of the visualization process, with the potential to derive move value when the data allow meaningful graphical representations. Unfortunately, the time, data and knowledge available seldom make this achievable.

In addition, a well-designed report can be used as basis for multiple purposes, while a graphic/visualization can enforce more limitations. Ideal would be when multiple forms of representation (including reports) are combined to harness the value of data. Navigations from visualizations to detailed data can be useful to understand what happens; learning and understanding the various aspects being an iterative process.

It’s also difficult to demonstrate the value of insight derived from visualizations, especially when graphical literacy goes behind the numeracy and statistical literacy - many consumers lacking the skills needed to evaluate numbers and statistics adequately. If for a good artistic movie you need an assistance to enjoy the show and understand the message(s) behind it, the same can be said also about good graphics/visualizations. Moreover, this requires creativity, abstraction-based thinking, and other capabilities to harness the value of representations.

Given the considerable volume of requirements related to the need of basis data, reports will continue to be on high demand in organizations. In exchange visualizations can complement them by providing insights otherwise not available.

Initially published on Medium as answer to a post on Reporting and Visualizations. 

31 October 2020

🧊Data Warehousing: Architecture (Part III: Data Lakes & other Puddles)

Data Warehousing

One can consider a data lake as a repository of all of an organization’s data found in raw form, however this constraint might be too harsh as the data found at different levels of processing can be imported as well, for example the results of data mining or other Data Science techniques/methods can be considered as raw data for further processing.

In the initial definition provided by James Dixon, the difference between a data lake and a data mart/warehouse was expressed metaphorically as the transition from bottled water to lakes streamed (artificially) from various sources. It’s contrasted thus the objective-oriented, limited and single-purposed role of the data mart/warehouse in respect to the flow of data in nature that could be tapped and harnessed as desired. These are though metaphors intended to sensitize the buyer. Personally, I like to think of the data lake as an extension of the data infrastructure, in which the data mart or warehouse is integrant part. Imposing further constrains seem to have no benefit.  

Probably the most important characteristic of a data lake is that it makes the data of an organization discoverable and consumable, though from there to insight and other benefits is a long road and requires specific knowledge about the techniques used, as well about organization’s processes and data. Without this data lake-based solutions can lead to erroneous results, same as mixing several ingredients without having knowledge about their usage can lead to cooking experiments aloof from the art of cooking.

A characteristic of data is that they go through continuous change and have different timeliness, respectively degrees of quality in respect to the data quality dimensions implied and sources considered. Data need to reflect the reality at the appropriate level of detail and quality required by the processing application(s), this applying to data warehouses/marts as well data lake-based solutions.

Data found in raw form don’t necessarily represent the true/truth and don’t necessarily acquire a good quality no matter how much they are processed. Solutions need to be resilient in respect to the data they handle through their layers, independently of the data quality and transmission problems. Whether one talks about ETL, data migration or other types of data processing, keeping the data integrity at various levels and layers can be maybe the most important demand upon solutions.

Snapshots as moment-in-time recordings of tables, entities, sets of entities, datasets or whole databases, prove to be often the best mechanisms in keeping data integrity when this aspect is essential to their processing (e.g. data migrations, high-accuracy measurements). Unfortunately, the more systems are involved in the process and the broader span of the solutions over the sources, the more difficult it become to take such snapshots.

A SQL query’s output represents a snapshot of the data, therefore SQL-based solutions are usually appropriate for most of the business scenarios in which the characteristics of data (typically volume, velocity and/or variety) make their processing manageable. However, when the data are extracted by other means integrity is harder to obtain, especially when there’s no timestamp to allow data partitioning on a time scale, the handling of data integrity becoming thus in extremis a programmer’s task. In addition, getting snapshots of the data as they are changed can be a costly and futile task.

Further on, maintaining data integrity can prove to be a matter of design in respect not only to the processing of data, but also in respect to the source applications and the business processes they implement. The mastery of the underlying principles, techniques, patterns and methodologies, helps in the process of designing the right solutions.

Written as answer to a Medium post on data lakes and batch processing in data warehouses. 

27 September 2020

𖣯Strategic Management: Strategy Design (Part IV: Designing for Simplicity)

More than two centuries ago, in his course on the importance of Style in Literature, George Lewes wisely remarked that 'the first obligation of Simplicity is that of using the simplest means to secure the fullest effect' [1]. This is probably the most important aspect the adopters of the KISS mantra seem to ignore – solutions need to be simple while covering all or most important aspects to assure the maximum benefit. The challenge for many resides in defining what the maximum benefit is about. This state of art is typically poorly understood, especially when people don’t understand what’s possible, respectively of what’s necessary to make things work smoothly. 

To make the simplicity principle work, one must envision the desired state of a product or solution and trace back what’s needed to achieve that vision. One can aim for the maximum or for the minimum possible, respectively for anything in between. That’s at least true in theory, in praxis there are constraints that limit the range of achievement, constraints ranging from the availability of resources, their maturity or the available time, respectively to the limits for growth - the learning capacity of individuals and organization as a whole. 

On the other side following the 80/20 principle, one could achieve in theory 80% of a working solution with 20% of the effort needed in achieving the full 100%. This principle comes with a trick too because one needs to focus on the important components or aspects of the solution for this to work. Otherwise, one is forced to do exploratory work in which the learning is gradually assimilated into the solution. This implies continuous feedback, respectively changing the targets as one progresses in multiple iterations. The approach is typically common to ERP implementations, BI and Data Management initiatives, or similar transformative projects which attempt changing an organization’s data, information, or knowledge flows - the backbones organizations are built upon.     

These two principles can be used together to shape an organization. While simplicity sets a target or compass for quality, the 80/20 principle provides the means of splitting the roadmap and effort into manageable targets while allowing to identify and prioritize the critical components, and they seldom resume only to technology. While technologies provide a potential for transformation, in the end is an organization’s setup that has the transformative role. 

For transformational synergies to happen, each person involved in the process must have a minimum of necessary skillset, knowledge and awareness of what’s required and how a solution can be harnessed. This minimum can be initially addressed through training and self-learning, however without certain mechanisms in place, the magic will not happen by itself. Change needs to be managed from within as part of an organization’s culture, by the people close to the flow, and when necessary, also from the outside, by the ones who can provide guiding direction. Ideally, a strategic approach is needed the vision, mission, goals, objectives, and roadmap are sketched, where intermediary targets are adequately mapped and pursued, and the progress is adequately tracked.

Thus, besides the technological components is needed to consider the required organizational components to support and manage change. These components form a structure which needs to adhere by design to the same principle of simplicity. According to Lewes, the 'simplicity of structure means organic unity' [1], which can imply harmony, robustness, variety, balance, economy or proportion. Without these qualities the structure of the resulting edifice can break under its own weight. Moreover, paraphrasing Eric Hoffer, simplicity marks the end of a continuous process of designing, building, and refining, while complexity marks a primitive stage.

Previous Post <<||>> Next Post

Written: Sep-2020, Last Reviewed: Mar-2024

[1] George H Lewes (1865) "The Principles of Success in Literature"

Considered quotes:
"Simplicity of structure means organic unity, whether the organism be simple or complex; and hence in all times the emphasis which critics have laid upon Simplicity, though they have not unfrequently confounded it with narrowness of range." (George H Lewes, "The Principles of Success in Literature", 1865)
"The first obligation of Simplicity is that of using the simplest means to secure the fullest effect. But although the mind instinctively rejects all needless complexity, we shall greatly err if we fail to recognise the fact, that what the mind recoils from is not the complexity, but the needlessness." (George H Lewes, "The Principles of Success in Literature", 1865)
"In products of the human mind, simplicity marks the end of a process of refining, while complexity marks a primitive stage." (Eric Hoffer, 1954)

28 June 2020

𖣯Strategic Management: Strategy Design (Part II: A System's View)

Strategic Management

Each time one discusses in IT about software and hardware components interacting with each other, one talks about a composite referred to as a system. Even if the term Information System (IS) is related to it, a system is defined as a set of interrelated and interconnected components that can be considered together for specific purposes or simple convenience.

A component can be a piece of software or hardware, as well persons or groups if we extend the definition. The consideration of people becomes relevant especially in the context of ecologies, in which systems are placed in a broader context that considers people’s interaction with them, as this raises to important behavior that impacts system’s functioning.

Within a system each part has a role or function determined in respect to the whole as well as to the other parts. The role or function of the component is typically fixed, predefined, though there are also exceptions especially when the scope of a component is enlarged, respectively reduced to the degree that the component can be removed or ignored. What one considers or not considers as part of system defines a system’s boundaries; it’s what distinguishes it from other systems within the environment(s) considered.

The interaction between the components resumes in the exchange, transmission and processing of data found in different aggregations ranging from signals to complex data structures. If in non-IT-based systems the changes are determined by inflow, respectively outflow of energy, in IT the flow is considered in terms of data in its various aggregations (information, knowledge).  The data flow (also information flow) represents the ‘fluid’ that nourishes a system’s ‘organism’.

One can grasp the complexity in the moment one attempts to describe a system in terms of components, respectively the dependencies existing between them in term of data and processes. If in nature the processes are extrapolated, in IT they are predefined (even if the knowledge about them is not available). In addition, the less knowledge one has about the infrastructure, the higher the apparent complexity. Even if the system is not necessarily complex, the lack of knowledge and certainty about it makes it complex. The more one needs to dig for information and knowledge to get an acceptable level of knowledge and logical depth, the more time is needed for designing a solution.

Saint Exupéry’s definition of simplicity applies from a system’s functional point of view, though it doesn’t address the relative knowledge about the system, which often is implicit (in people’s heads). People have only fragmented knowledge about the system which makes it difficult to create the whole picture. It’s typically the role of system or process operational manuals, respectively of data descriptions, to make that knowledge explicit, also establishing a fundament for common knowledge and further communication and understanding.

Between the apparent (perceived) and real complexity of a system there’s an important gap that needs to be addressed if one wants to manage the systems adequately, respectively to simplify the systems. Often simplification happens when components or whole systems are replaced, consolidated, or migrated, a mix between these approaches existing as well. Simplifications at data level (aka data harmonization) or process level (aka process optimization and redesign) can have an important impact, being inherent to the good (optimal) functioning of systems.

Whether these changes occur in big-bang or gradual iterations it’s a question of available resources, organizational capabilities, including the ability to handle such projects, respectively the impact, opportunities and risks associated with such endeavors. Beyond this, it’s important to regard the problems from a systemic and systematic point of view, in which ecology’s role is important.

Previous Post <<||>> Next Post

Written: Jun-2020, Last Reviewed: Mar-2024

24 June 2020

𖣯Strategic Management: Strategy Design (Part I: Simple, but not that Simple)

Strategic Management
Strategic Management Series

Simplicity of design has been for centuries the wholly grail of architects, while software designers seem to situate themselves in opposition with the trend, as they aim using a mix of technologies that usually increase architecture’s complexity (sometimes the many, the newer and fancier, the better). Unfortunately, despite the implied but not necessarily reachable potential, each component added to an information system or infrastructure has the potential of increasing the overall complexity by a factor proportional to the degree of interactions it creates, respectively by the number of issues it creates or allows to propagate through these interactions.

Conversely, one talks about simplicity in IT without stating what is intended by it, and it can mean many things. Quite often the aim is packed within the ‘keep it simple stupid’ (aka KISS) mantra, a modern and pejorative alternative of Occam’s razor. KISS became a principle in software architecture design, and it can mean that a simple solution works better than a complex one, or that pursuing something in the simplest manner possible is usually better. The nuances are wide enough to cover a wide spectrum of solutions, arriving at statements that the simplest choice to make is the most appropriate one to make, thing that’s not necessarily true in IT, where complexity finds itself home.

Starting with the important number of technologies coexisting in integrations and ending with the exceptions existing in processes or the quality of data, things are almost never as simple as one may wish. An IT infrastructure’s complexity is dependent on the number of existing components, on whether they come from different generations or come from different vendor, on whether are deployed on different operating systems or are supported by different service providers, on the number of customizations made, on the degree of overlapping of the data and integrations needed to keep the data in synch, respectively of the differences existing in data models, quality and use. In general, the more variance, randomness, and challenges one has, the higher the overall complexity.

Paraphrasing Saint Exupéry, in IT simplicity is reached when there is no longer anything to add or anything to take away, or in Hans Hofmann’s words, simplicity is reflected in ‘the ability to simplify means to eliminate the unnecessary so that the necessary may speak’. This refers to the features, what a piece of software can do, respectively the functionality, how a certain outcome is reached, which arrive to be packed in various logical aggregations (function point, functional requirement, story, epic, model, product, etc.) or physical aggregations (classes, components, packages, services, models, etc.). These are the levels at which one needs to address simplicity adequately.

To make something simple one must be able either to design a solution up to the detail that there’s nothing to add or remove, or to start with something and remove or things to reach simplicity. Both approaches involve a considerable effort, time, and multiple iterations, however the first approach can easily become utopian as some architectures are so complex that sooner or later the second approach comes into play. Therefore, one needs in general to focus on what seems an optimal solution and optimize it continuously in further iterations. Aiming for perfection from the beginning or also later in the improvement process is a foolhardy wish.

Even if simplicity is hard to achieve, one can still talk about the elegance of a solution, scenarios in which the various components fit together like the pieces of a puzzle, or about robustness, reliability, correctness, maintainability, (re)usability, or learnability. These latter characteristics are known in Software Engineering as (software) quality attributes.

21 June 2020

🪄SSRS (& Paginated Reports): Design (Part I: Poor Design of Parameters)


The handling of parameters in SSRS (and Paginated Reports) can become a problem for reports' usability when certain beast practices are not considered for populating the controls behind the parameters. This post attempts to address some common cases.

Dropdowns Constrained on the Source Query 

Typically the values needed in dropdown parameters are stored in tables or are predefined like in the previous post though it's not always the case. Supposing that there's no table to store the countries, I have seen reports in which the developers identified the values by using queries like the following:

-- Countries
, SIC.CountryRegionName Name 
FROM Sales.vIndividualCustomer SIC 

Just imagine that the above query has hundred of millions of records. Even if the query was optimized, it might take minutes to run. 

It is true that the query limits the displayed values only to the ones used, however as soon the volume of data increases this can slow down the report considerably to the degree that the reports becomes unusable. Unfortunately, some developers favorize this approach also when there is a table available. Just imagine that to run a report with complex logic the logic will be run a few times to populate the selection controls for parameters before actually being able to run the report. This slowly puts a burden on the source system and when similar reports are run, this can bring the source system to its knees. In extreme situations they might not be able to run the report at all (e,g., when the report timeouts). 

Building the queries for parameter's population in this way adds also some synchronization issues between the parameters, that are not so easy trackable as the values are not available for selection.

Therefore, to fix this, if no tables are available in the source system for the dropdowns then it's a good idea to create some tables independently of the source system and populate them periodically.  One can do this with a script that runs with data refresh or even maintain the values manually. This works best for data warehouses, but also for OLTP solutions when reports are built on top of the databases behind.

If the users needed indeed only the used values, one can introduce flags in each table which will show whether the value is used or not. 

Too Many Values

If a dropdown has more than a few hundred values then consider having the free text instead of a dropdown. It's much easier to type some values that searching for them in a list. It's true that SSRS lacks a search functionality, which would make such searches easier. One can use in theory a passthrough report, however this has limited functionality. 

Wildcards Everywhere

Implementing wildcards in each text attribute is in general not a good idea. Please note that  the wildcards used in front of a search value doesn't make use of the available indexes. 

Too Many Dependent Dropdowns

There are structures which involve more than two dependencies of dropdowns on one another (e,g,, the user needs to select the Country before selecting a Zip code). Each dropdown implies a roundtrip to the data source and back, and upon case the user has to wait a considerable time until the parameters are filled. This in combination with the first mentioned issues can increase the damage. 

Building such dependencies might still be a good idea when compared with the alternative, showing in the dependent control all the values available. 

Use one Table with Dropdown Values

Oracle e-Business Suite and probably other applications store the values for dropdowns in a single table across the system or per business area. That's useful for maintaining the parameters though it can affect the performance when the number of records in such tables is high. Usually it shouldn't be the case, but it happens (e.g., financial dimensions, having values for each supported language).

Populating the dropdowns should happens with minimal wait, otherwise the users will complain. The retrieval from the database might be fast, though one needs to consider also how long it takes rendering the data, especially when searching for a specific value. In many cases, the table(s) behind should be optimized for the purpose. On the other hand, having individual tables for the dropdowns with many values could be a better approach. 

Frankly, this way of storing the dropdown values has a bigger impact when the tables are involved in joins.

Hardcoding Values for Dropdowns

Instead of hardcoding the same values used in dropdown across multiple reports one should consider using a table, misusing a view or even stored procedure for storing the respective values. The overhead of an additional roundtrip to the database bight be negligible when considering the overhead of maintaining the values across multiple reports. 

Populating the dropdowns through stored procedures could upon case offer better performance in the detriment of usability (e,g,, being able to select a subset of the data without using parameters). Besides the Ids and Names needed to populate the parameter(s), one can maintain further attributes that can be further used in filtering. 

Post reviewed on 29-Sep-2023

Previous Post <<||>> Nect Post

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.