Showing posts with label 600W. Show all posts
Showing posts with label 600W. Show all posts

27 March 2025

#️⃣Software Engineering: Programming (Part XVII: More Thoughts on AI)

Software Engineering Series
Software Engineering Series

I've been playing with AI-based prompting in Microsoft 365 and Edge Copilot for SQL programming tasks and even for simple requests I got wrong or suboptimal solutions. Some of the solutions weren’t wrong by far, though it was enough for the solution to not work at all or give curious answers. Some solutions were even more complex than needed, which made their troubleshooting more challenging, to the degree that was easier to rewrite the code by myself. Imagine when such wrong solutions and lines of reasoning propagate uncontrolled within broader chains of reasoning! 

Some of the answers we get from AI can be validated step by step, and the logic can be changed accordingly, though this provides no guarantee that the answers won't change as new data, information, knowledge is included in the models, or the model changes, directly or indirectly. In Software Development, there’s a minimum set of tests that can and should be performed to assure that the input generated matches the expectations, however in AI-based solutions there’s no guarantee that what worked before will continue to work.

Moreover, small errors can propagate in a chain-effect creating curious wrong solutions. AI acts and probably will continue to act as a Pandora's box. So, how much can we rely on AI, especially when the complexity of the problems and their ever-changing nature is confronted with a model highly sensitive to the initial or intermediary conditions? 

Some of the answers may make sense, and probably also the answers can be better to some degree than the decisions made by experts, though how far do we want to go? Who is ready to let his own life blindly driven by the answers provided by an AI machine just because it can handle certain facts better than us? Moreover, the human brain is wired to cope with uncertainty, morality and other important aspects that can enhance the quality of the decisions, even if the decisions aren't by far perfect

It’s important to understand the sensitivity of AI models and outputs to the initial and even intermediate conditions on which such models are based, respectively what is used in their reasoning and how slight changes can result in unexpected effects. Networks, independently whether they are or not AI-based, lead to behavior that can be explainable to some degree as long full transparency of the model and outcomes of the many iterations is provided. When AI models behave like black boxes there’s no guarantee of the outcomes, respectively transparence on the jumps made from one state of the network to the other, and surprises can appear more often than we expect or are prepared to accept. 

Some of the successes rooted in AI-based reasoning might happen just because in similar contexts people are not ready to trust their reasoning or take a leap of faith. AI tends to replace all these aspects that are part of human psychology, logic and whatever is part of the overall process. The eventual successes are thus not an immediate effect of the AI capabilities, but just that we took a shortcut. Unfortunately, this can act like a sharp blade with two edges. 

I want to believe that AI is the solution to humanity's problems, and probably there are many areas of applicability, though letting AI control our lives and the over-dependence on AI can on long term cause more problems than AI and out society can solve. The idea of AI acting as a Copilot that can be used to extrapolate beyond our capabilities is probably not wrong, though one should keep the risks and various outcomes in sight!

Previous Post <<||>> Next Post

🧭Business Intelligence: Perspectives (Part XXIX: Navigating into the Unknown)

Business Intelligence Series
Business Intelligence Series

One of the important challenges in Business Intelligence and the other related knowledge domains is that people try to oversell ideas, overstretching, shifting, mixing and bending the definition of concepts and their use to suit the sales pitch or other related purposes. Even if there are several methodologies built around data that attempt to provide a solid foundation on which organizations can build upon, terms like actionable, value, insight, quality or importance continue to be a matter of perception, interpretation, and quite often be misused. 

It's often challenging to define precisely such businesses concepts especially there are degrees of fuzziness that may apply to the different contexts that are associated with them. What makes a piece of signal, data, information or knowledge valuable, respectively actionable? What is the value, respectively values we associate with a piece or aggregation of information, insight or degree of quality? When do values, changes, variations and other aspects become important, respectively can be ignored? How much can one generalize or particularize certain aspects? And, many more such questions can be added to this line of inquiry. 

Just because an important value changed, no matter in what direction, it might mean nothing as long as the value moves in certain ranges, respectively other direct or indirect conditions are met or not. Sometimes, there are simple rules and models that can be used to identify the various areas that should trigger different responses, respectively actions, though even small variations can increase the overall complexity multifold. There seems to be certain comfort in numbers, even if the same numbers can mean different things to different people, at different points in time, respectively contexts.

In the pursuit to bridge the multitude of gaps and challenges, organization attempt to arrive at common definitions and understanding in what concerns the business terms, goals, objectives, metrics, rules, procedures, processes and other points of focus associated with the respective terms. Unfortunately, many such foundations barely support the edifices built as long as there’s no common mental models established!

Even if the use of shared models is not new, few organizations try to make the knowledge associated with them explicit, respectively agree on and evolve a set of mental models that reflect how the business works, what is important, respectively can be ignored, which are the dependent and independent aspects, etc. This effort can prove to be a challenge for many organizations, especially when several leaps of faith must be made in the process.

Independently on whether organizations use shared mental models, some kind of common ground must be achieved. It starts with open dialog, identifying the gaps, respectively the minimum volume of knowledge required for making progress in the right direction(s). The broader the gaps and the misalignment, the more iterations are needed to make progress! And, of course, one must know which are the destinations, what paths to follow, what to ignore, etc. 

It's important how we look at the business, and people tend to use different filters (aka glasses or hats) for this purpose. Simple relationships between the various facts are ideal, though uncommon. There’s a chain of causality that may trigger a certain change, though more likely one deals with a networked structure of cause-effect relationships. The world is more complex than we (can} imagine. We try to focus on the aspects we are aware of, respectively consider as important. However, in a complex world also small variations in certain areas can shift the overall weight to aspects outside of our focus, influence or area of responsibility. Quite often, what we don’t know is more important than what we know!

Previous Post <<||>> Next Post

10 March 2025

🧭Business Intelligence: Perspectives (Part XXVIII: Cutting through Complexity)

Business Intelligence Series
Business Intelligence Series

Independently of the complexity of the problems, one should start by framing the problem(s) correctly and this might take several steps and iterations until a foundation is achieved, upon which further steps can be based. Ideally, the framed problem should reflect reality and should provide a basis on which one can build something stable, durable and sustainable. Conversely, people want quick low-cost fixes and probably the easiest way to achieve this is by focusing on appearances, which are often confused with more.

In many data-related contexts, there’s the tendency to start with the "solution" in mind, typically one or more reports or visualizations which should serve as basis for further exploration. Often, the information exchange between the parties involved (requestor(s), respectively developer(s)) is kept to a minimum, though the formalized requirements barely qualify for the minimum required. The whole process starts with a gap that can create further changes as the development process progresses, with all the consequences deriving from this: the report doesn’t satisfy the needs, more iterations are needed - requirements’ reevaluation, redesign, redevelopment, retesting, etc.

The poor results are understandable, all parties start with a perspective based on facts which provide suboptimal views when compared with the minimum basis for making the right steps in the right direction. That’s not only valid for reports’ development but also for more complex endeavors – data models, data marts and warehouses, and other software products. Data professionals attempt to bridge the gaps by formalizing and validating the requirements, building mock-ups and prototypes, testing, though that’s more than many organizations can handle!

There are simple reports or data visualizations for which not having prior knowledge of the needed data sources, processes and the business rules has a minimal impact on the further steps of the processes involved in building the final product(s). However, "all generalizations are false" to some degree, and there’s a critical point after which minimal practices tend to create more waste than companies can afford. Consequently, applying the full extent of the processes can lead to waste when the steps aren’t imperative for the final product.

Even if one is aware of all the implications, one’s experience and the application of best practices doesn’t guarantee the quality of the results as long as some kind of novelty, unknown, fuzziness or complexity is involved. Novelty can appear in different ways – process, business rules, data or problem formulations, particularities that aren’t easily perceived or correctly understood. Each important minor piece of information can have an exponential impact under the wrong circumstances.

The unknown can encompass novelty, though can be also associated with the multitude of facts not explicitly and/or directly considered. "The devil is in details" and it’s so easy for important or minor facts to remain hidden under the veil of suppositions, expectations, respectively under the complex and fuzzy texture of logical aspects. Many processes operate under strict rules, though there are various consequences, facts or unnecessary information that tend to increase the overall complexity and fuzziness.

Predefined processes, procedures and practices can help cut and illuminate through this complex structure associated with the various requirements and aspects of problems. Plunging headfirst can be occasionally associated with the need to evaluate what is known and unknown from facts and data’s perspective, to identify the gaps and various factors that can weigh in the final solution. Unfortunately, too often it’s nothing of this!  

Besides the multitude of good/best practices and problem-solving approaches, all one has is his experience and intuition to cut through the overall complexity. False steps are inevitable for finding the approachable path(s) from the requirements to the solution.

08 March 2025

#️⃣Software Engineering: Programming (Part XVI: The Software Quality Perspective and AI)

Software Engineering Series
Software Engineering Series

Organizations tend to complain about poor software quality developed in-house, by consultancy companies or third parties, without doing much in this direction. Unfortunately, this agrees with the bigger picture reflected by the quality standards adopted by organizations - people talk and complain about them, though they aren’t that eager to include them in the various strategies, or even if they are considered, they are seldom enforced adequately!

Moreover, even if quality standards are adopted, and a lot of effort may be spent in this direction (as everybody has strong opinions and there are many exceptions), as projects progress, all the good intentions come to an end, the rules fading on the way either because are too strict, too general, aren’t adequately prioritized or communicated, or there’s no time to implement (all of) them. This applies in general to programming and to the domains that revolve around data – Business Intelligence, Data Analytics or Data Science.

The volume of good quality code and deliverables is not only a reflection of an organization’s maturity in dealing with best practices but also of its maturity in handling technical debt, Project Management, software and data quality challenges. All these aspects are strongly related to each other and therefore require a systemic approach rather than focusing on the issues locally. The systemic approach allows organizations to bridge the gaps between business areas, teams, projects and any other areas of focus.

There are many questionable studies on the effect of methodologies on software quality and data issues, proclaiming that one methodology is better than the other in addressing the multifold aspects of software quality. Besides methodologies, some studies attempt to correlate quality with organizations’ size, management or programmers’ experience, the size of software, or whatever characteristic might seem to affect quality.

Bad code is written independently of companies’ size or programmer's experience, management or organization’s maturity. Bad code doesn’t necessarily happen all at once, but it can depend on circumstances, repetitive team, requirements and code changes. There are decisions and actions that sooner or later can affect the overall outcome negatively.

Rewriting the code from scratch might look like an approachable measure though it’s seldom the cost-effective solution. Allocating resources for refactoring is usually a better approach, though this tends to increase considerably the cost of projects, and organizations might be tempted to face the risks, whatever they might be. Independently of the approaches used, sooner or later the complexity of projects, requirements or code tends to kick back.

There are many voices arguing that AI will help in addressing the problems of software development, quality assurance and probably other areas. It’s questionable how much AI will help to address the gaps, non-concordances and other mistakes in requirements, and how it will develop quality code when it has basic "understanding" issues. Even if step by step all current issues revolving around AI will be fixed, it will take time and multiple iterations until meaningful progress will be made.

At least for now, AI tools like Copilot or ChatGPT can be used for learning a programming language or framework through predefined or ad-hoc prompts. Probably, it can be used also to identify deviations from best practices or other norms in scope. This doesn’t mean that AI will replace for now code reviews, testing and other practices used in assuring the quality of software, but it can be used as an additional method to check for what was eventually missed in the other methods.

AI may also have hidden gems that when discovered, polished and sized, may have a qualitative impact on software development and software. Only time will tell what’s possible and achievable.

15 February 2025

🧭Business Intelligence: Perspectives (Part XXVII: A Tale of Two Cities II)

Business Intelligence Series
Business Intelligence Series
There’s a saying that applies to many contexts ranging from software engineering to data analysis and visualization related solutions: "fools rush in where angels fear to tread" [1]. Much earlier, an adage attributed to Confucius provides a similar perspective: "do not try to rush things; ignore matters of minor advantage". Ignoring these advices, there's the drive in rapid prototyping to jump in with both feet forward without checking first how solid the ground is, often even without having adequate experience in the field. That’s understandable to some degree – people want to see progress and value fast, without building a foundation or getting an understanding of what’s happening, respectively possible, often ignoring the full extent of the problems.

A prototype helps to bring the requirements closer to what’s intended to achieve, though, as the practice often shows, the gap between the initial steps and the final solutions require many iterations, sometimes even too many for making a solution cost-effective. There’s almost always a tradeoff between costs and quality, respectively time and scope. Sooner or later, one must compromise somewhere in between even if the solution is not optimal. The fuzzier the requirements and what’s achievable with a set of data, the harder it gets to find the sweet spot.

Even if people understand the steps, constraints and further aspects of a process relatively easily, making sense of the data generated by it, respectively using the respective data to optimize the process can take a considerable effort. There’s a chain of tradeoffs and constraints that apply to a certain situation in each context, that makes it challenging to always find optimal solutions. Moreover, optimal local solutions don’t necessarily provide the optimum effect when one looks at the broader context of the problems. Further on, even if one brought a process under control, it doesn’t necessarily mean that the process works efficiently.

This is the broader context in which data analysis and visualization topics need to be placed to build useful solutions, to make a sensible difference in one’s job. Especially when the data and processes look numb, one needs to find the perspectives that lead to useful information, respectively knowledge. It’s not realistic to expect to find new insight in any set of data. As experience often proves, insight is rarer than finding gold nuggets. Probably, the most important aspect in gold mining is to know where to look, though it also requires luck, research, the proper use of tools, effort, and probably much more.

One of the problems in working with data is that usually data is analyzed and visualized in aggregates at different levels, often without identifying and depicting the factors that determine why data take certain shapes. Even if a well-suited set of dimensions is defined for data analysis, data are usually still considered in aggregate. Having the possibility to change between aggregates and details is quintessential for data’s understanding, or at least for getting an understanding of what's happening in the various processes. 

There is one aspect of data modeling, respectively analysis and visualization that’s typically ignored in BI initiatives – process-wise there is usually data which is not available and approximating the respective values to some degree is often far from the optimal solution. Of course, there’s often a tradeoff between effort and value, though the actual value can be quantified only when gathering enough data for a thorough first analysis. It may also happen that the only benefit is getting a deeper understanding of certain aspects of the processes, respectively business. Occasionally, this price may look high, though searching for cost-effective solutions is part of the job!

Previous Post  <<||>>  Next Post

References:
[1] Alexander Pope (cca. 1711) An Essay on Criticism

04 February 2025

🧭Business Intelligence: Perspectives (Part XXVI: Monitoring - A Cockpit View)

Business Intelligence Series
Business Intelligence Series

The monitoring of business imperatives is sometimes compared metaphorically with piloting an airplane, where pilots look at the cockpit instruments to verify whether everything is under control and the flight ensues according to the expectations. The use of a cockpit is supported by the fact that an airplane is an almost "closed" system in which the components were developed under strict requirements and tested thoroughly under specific technical conditions. Many instruments were engineered and evolved over decades to operate as such. The processes are standardized, inputs and outputs are under strict control, otherwise the whole edifice would crumble under its own complexity. 

In organizational setups, a similar approach is attempted for monitoring the most important aspects of a business. A few dashboards and reports are thus built to monitor and control what’s happening in the areas which were identified as critical for the organization. The various gauges and other visuals were designed to provide similar perspectives as the ones provided by an airplane’s cockpit. At first sight the cockpit metaphor makes sense, though at careful analysis, there are major differences. 

Probably, the main difference is that businesses don’t necessarily have standardized processes that were brought under control (and thus have variation). Secondly, the data used doesn’t necessarily have the needed quality and occasionally isn’t fit for use in the business processes, including supporting processes like reporting or decision making. Thirdly, are high the chances that the monitoring within the BI infrastructures doesn’t address the critical aspects of the business, at least not at the needed level of focus, detail or frequency. The interplay between these three main aspects can lead to complex issues and a muddy ground for a business to build a stable edifice upon. 

The comparison with airplanes’ cockpit was chosen because the number of instruments available for monitoring is somewhat comparable with the number of visuals existing in an organization. In contrast, autos have a smaller number of controls simple enough to help the one(s) sitting in the cockpit. A car’s monitoring capabilities can probably reflect the needs of single departments or teams, though each unit needs its own gauges with specific business focus. The parallel is however limited because the areas of focus in organizations can change and shift in other directions, some topics may have a periodic character while others can regain momentum after a long time. 

There are further important aspects. At high level, the expectation is for software products and processes, including the ones related to BI topics, to have the same stability and quality as the mass production of automobiles, airplanes or other artifacts that have similar complexity and manufacturing characteristics. Even if the design process of software and manufacturing may share many characteristics, the similar aspects diverge as soon as the production processes start, respectively progress, and these are the areas where the most differences lie. Starting from the requirements and ending with the overall goals, everything resembles the characteristics of quick shifting sands on which is challenging to build any stabile edifice.

At micro level in manufacturing each piece was carefully designed and produced according to a set of characteristics that were proved to work. Everything must fit perfectly in the grand design and there are many tests and steps to make sure that happens. To some degree the same is attempted when building software products, though the processes break along the way with the many changes attempted, with the many cost, time and quality constraints. At some point the overall complexity kicks back; it might be still manageable though the overall effort is higher than what organizations bargained for. 

02 February 2025

🏭 💠Data Warehousing: Microsoft Fabric (Part VIII: More on SQL Databases)

Business Intelligence Series
Business Intelligence Series

Last week Microsoft had a great session [1] on the present and future of SQL databases, a “light” version of Azure SQL databases designed for Microsoft Fabric environments, respectively workloads. SQL databases are currently available for testing, and after the first tests the product looks promising. Even if there are several feature gaps, it’s expected that Microsoft will bridge the gaps over time. Conversely, there might be features that don’t make sense in Fabric, respectively new features that need to be considered for facilitating the work in OneLake and its ecosystem.

During the session several Microsoft professionals answered the audience’s questions, and they did a great job. Even if the answers and questions barely scratched the surface, they offered some insight into what Microsoft wants to do. Probably the expectation is that SQL databases won’t need any administration - indexes being maintained automatically, infrastructure scaling as needed, however everything sounds too nice to be true if one considers in general the experience with RDBMS – the devil hides usually in details.

Even if the solutions built follow the best practices in the field, which frankly seldom happens, transferring the existing knowledge to Fabric may encounter some important challenges revolving around performance, flexibility, accessibility and probably costs. Even if SQL databases are expected to fill some minor gaps, considering the lessons of the past, such solutions can easily grow. Even if a lot of processing power is thrown at the SQL queries and the various functionality, customers still need to write quality code and refactor otherwise the costs will explode sooner or later. 

As the practice has proven so many times while troubleshooting performance issues, sometimes one needs to use all the arsenal available – DBCC, DMVs and sometimes even undocumented features - to get a better understanding of what’s happening. Even if there are some voices stating that developers don’t need to know how the SQL engine works, just applying solutions blindly after a recipe can accidentally increase the value of code, though most likely it doesn’t exploit the full potential available. Unfortunately, this is a subjective topic without hard numbers to support it, and the stories told by developers and third-parties usually don’t tell the whole story. 

It’s also true that diving deep into a database’s internal working requires time, that’s quite often not available, and the value for such an effort doesn’t necessarily pay off. Above this, there’s a software engineer’s aim of understanding of how things work. Otherwise, one should drop the engineering word and just call it coding. Conversely, the data citizen just needs a high-level knowledge of how things work, though the past 20-30 years proved that that’s often not enough. The more people don’t have the required knowledge, the higher the chances that code needs refactoring. Just remember the past issues organizations had with MS Access and Excel when people started to create their own solutions, the whole infrastructure being invaded by poorly designed solutions that continue to haunt some organizations even today.

Even if lot of technical knowledge can be transported to Microsoft Fabric, the new environments may still require also adequate tools that can be used for monitoring and troubleshooting. Microsoft seems to work in this direction, though from the information available the tools don’t and can’t offer the whole perspective. It will be interesting to see how much the current, respectively the future dashboards and various reports can help; respectively what important gaps will surface. Until the gaps are addressed, probably the SQL professional must rely on SQL scripts and the DMVs available. All this can be summarized in a few words: it will not be boring!

Previous Post <<||>> Next Post

References:
[1] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

26 January 2025

🧭Business Intelligence: Perspectives (Part XXV: Grounding the Roots)

Business Intelligence Series
Business Intelligence Series

When building something that is supposed to last, one needs a solid foundation on which the artifact can be built upon. That’s valid for castles, houses, IT architectures, and probably most important, for BI infrastructures. There are so many tools out there that allow building a dashboard, report or other types of BI artifacts with a few drag-and-drops, moving things around, adding formatting and shiny things. In many cases all these steps are followed to create a prototype for a set of ideas or more formalized requirements keeping the overall process to a minimum. 

Rapid prototyping, the process of building a proof-of-concept by focusing at high level on the most important design and functional aspects, is helpful and sometimes a mandatory step in eliciting and addressing the requirements properly. It provides a fast road from an idea to the actual concept, however the prototype, still in its early stages, can rapidly become the actual solution that unfortunately continues to haunt the dreams of its creator(s). 

Especially in the BI area, there are many solutions that started as a prototype and gained mass until they start to disturb many things around them with implications for security, performance, data quality, and many other aspects. Moreover, the mass becomes in time critical, to the degree that it pulled more attention and effort than intended, with positive and negative impact altogether. It’s like building an artificial sun that suddenly becomes a danger for the nearby planet(s) and other celestial bodies. 

When building such artifacts, it’s important to define what goals the end-result must or would be nice to have, differentiating clearly between them, respectively when is the time to stop and properly address the aspects mandatory in transitioning from the prototype to an actual solution that addresses the best practices in scope. It’s also the point when one should decide upon solution’s feasibility, needed quality acceptance criteria, and broader aspects like supporting processes, human resources, data, and the various aspects that have impact. Unfortunately, many solutions gain inertia without the proper foundation and in extremis succumb under the various forces.

Developing software artifacts of any type is a balancing act between all these aspects, often under suboptimal circumstances. Therefore, one must be able to set priorities right, react and change direction (and gear) according to the changing context. Many wish all this to be a straight sequential road, when in reality it looks more like mountain climbing, with many peaks, valleys and change of scenery. The more exploration is needed, the slower the progress.

All these aspects require additional time, effort, resources and planning, which can easily increase the overall complexity of projects to the degree that it leads to (exponential) effort and more important - waste. Moreover, the complexity pushes back, leading to more effort, and with it to higher costs. On top of this one has the iteration character of BI topics, multiple iterations being needed from the initial concept to the final solution(s), sometimes many steps being discarded in the process, corners are cut, with all the further implications following from this. 

Somewhere in the middle, between minimum and the broad overextending complexity, is the sweet spot that drives the most impact with a minimum of effort. For some organizations, respectively professionals, reaching and remaining in the zone will be quite a challenge, though that’s not impossible. It’s important to be aware of all the aspects that drive and sustain the quality of artefacts, data and processes. There’s a lot to learn from successful as well from failed endeavors, and the various aspects should be reflected in the lessons learned. 

24 January 2025

🧭Business Intelligence: Perspectives (Part XXIV: Building Castles in the Air)

Business Intelligence Series
Business Intelligence Series

Business users have mainly three means of visualizing data – reports, dashboards and more recently notebooks, the latter being a mix between reports and dashboards. Given that all three types of display can be a mix of tabular representations and visuals/visualizations, the difference between them is often neglectable to the degree that the terms are used interchangeably. 

For example, in Power BI a report is a "multi-perspective view into a single semantic model, with visualizations that represent different findings and insights from that semantic model" [1], while a dashboard is "a single page, often called a canvas, that uses visualizations to tell a story" [1], a dashboards’ visuals coming from one or more reports [2]. Despite this clear delimitation, the two concepts continue to be mixed and misused in conversations even by data-related professionals. This happens also because in other tools the vendors designate as dashboard what is called report in Power BI. 

Given the limited terminology, it’s easy to generalize that dashboards are useless, poorly designed, bad for business users, and so on. As Stephen Few recognized almost two decades ago, "most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations" [3]. Therefore, when people say that "dashboards are bad" refer to the result of poorly implementations, of what some of them were part of, which frankly is a different topic! Unfortunately, BI implementations reflect probably more than any other areas how easy is to fail!

Frankly, here it is not necessarily the poor implementation of a project management methodology at fault, which quite often happens, but the way requirements are defined, understood, documented and implemented. Even if these last aspects are part of the methodologies, they are merely a reflection of how people understand the business. The outcomes of BI implementations are rooted in other areas, and it starts with how the strategic goals and objectives are defined, how the elements that need oversight are considered in the broader perspectives. The dashboards become thus the end-result of a chain of failures, failing to build the business-related fundament on which the reporting infrastructure should be based upon. It’s so easy to shift the blame on what’s perceptible than on what’s missing!

Many dashboards are built because people need a sense of what’s happening in the business. It starts with some ideas based on the problems identified in organizations, one or more dashboards are built, and sometimes a lot of time is invested in the process. Then, some important progress is made, and all comes to a stale if the numbers don’t reveal something new, important, or whatever users’ perception is. Some might regard this as failure, though as long as the initial objectives were met, something was learned in the process and a difference was made, one can’t equate this with failure!

It’s more important to recognize the temporary character of dashboards, respectively of the requirements that lead to them and build around them. Of course, this requires occasionally a different approach to the whole topic. It starts with how KPIs and other business are defined and organized, respectively on how data repositories are built, and it ends with how data are visualized and reported.

As the practice often revealed, it’s possible to build castles in the air, without a solid foundation, though the expectation for such edifices to sustain the weight of businesses is unrealistic. Such edifices break with the first strong storm and unfortunately it's easier to blame a set of tools, some people or a whole department instead at looking critically at the whole organization!


References:
[1] Microsoft Learn (2024) Power BI: Glossary [link]
[2] Microsoft Learn (2024) Power BI: Dashboards for business users of the Power BI service [link
[3] Stephen Few, "Information Dashboard Design", 2006

15 January 2025

🧭Business Intelligence: Perspectives (Part XXIII: In between the Many Destinations)

Business Intelligence Series
Business Intelligence Series

In too many cases the development of queries, respectively reports or data visualizations (aka artifacts) becomes a succession of drag & drops, formatting, (re)ordering things around, a bit of makeup, configuring a set of parameters, and the desired product is good to go! There seems nothing wrong with this approach as long as the outcomes meet users’ requirements, though it also gives the impression that’s all what the process is about. 

Given a set of data entities, usually there are at least as many perspectives into the data as entities’ number. Further perspectives can be found in exceptions and gaps in data, process variations, and the further aspects that can influence an artifact’s logic. All these aspects increase the overall complexity of the artifact, respectively of the development process. One guideline in handling all this is to keep the process in focus, and this starts with requirements’ elicitation and ends with the quality assurance and actual use.

Sometimes, the two words, the processes and their projection into the data and (data) models don’t reflect the reality adequately and one needs to compromise, at least until the gaps are better addressed. Process redesign, data harmonization and further steps need to be upon case considered in multiple iterations that should converge to optimal solutions, at least in theory. 

Therefore, in the development process there should be a continuous exploration of the various aspects until an optimum solution is reached. Often, there can be a couple of competing forces that can pull the solution in two or more directions  and then compromising is necessary. Especially as part of continuous improvement initiatives there’s the tendency of optimizing locally processes in the detriment of the overall process, with all the consequences resulting from this. 

Unfortunately, many of the problems existing in organizations are ill-posed and misunderstood to the degree that in extremis more effort is wasted than the actual benefits. Optimization is a process of putting in balance all the important aspects, respectively of answering with agility to the changing nature of the business and environment. Ignoring the changing nature of the problems and their contexts is a recipe for failure on the long term. 

This implies that people in particular and organizations in general need to become and  remain aware of the micro and macro changes occurring in organizations. Continuous learning is the key to cope with change. Organizations must learn to compromise and focus on what’s important, achievable and/or probable. Identifying, defining and following the value should be in an organization’s ADN. It also requires pragmatism (as opposed to idealism). Upon case, it may even require to say “no”, at least until the changes in the landscape offer a reevaluation of the various aspects.

One requires a lot from organizations when addressing optimization topics, especially when misalignment or important constraints or challenges may exist. Unfortunately, process related problems don’t always admit linear solutions. The nonlinear aspects are reflected especially when changing the scale, perspective or translating the issues or solutions from one are area to another.

There are probably answers available in the afferent literature or in the approaches followed by other organizations. Reinventing the wheel is part of the game, though invention may require explorations outside of the optimal paths. Conversely, an organization that knows itself has more chances to cope with the challenges and opportunities altogether. 

A lot from what organizations do in a consistent manner looks occasionally like inertia, self-occupation, suboptimal or random behavior, in opposition to being self-driven, self-aware, or in self-control. It’s also true that these are ideal qualities or aspects of what organizations should become in time. 

14 January 2025

🧭Business Intelligence: Perspectives (Part XXII: Breaking Queries' Complexity)

Business Intelligence Series
Business Intelligence Series

Independently whether standalone or encapsulated in database objects, the queries written can become complex in time, respectively difficult to comprehend and maintain. One can reduce the cognitive load by identifying the aspects that enable one’s intuition - order, similarity and origin being probably the most important aspects that help coping with the inherent complexity. 

One should start with the table having the lowest level of detail, usually a transaction table that forms the backbone of a certain feature. For example, for Purchase Orders this could be upon case the distribution or line level. If Invoices are added to the logic, and there could be multiple invoice line for a record from the former logic, then this becomes the new level of detail. Further on, if General Ledger lines are added, more likely this becomes the lowest level of detail, and so on.

This approach allows to keep a consistent way of building the queries while enabling to validate the record count, making sure that no duplicates are added to the logic. Conversely, one can start also from the table with the lowest level of details, and add tables successively based on their level of detail, though the database engine may generate upon case a suboptimal plan. In addition, checking the cardinality may involve writing a second query. 

One should try to keep the tables belonging to the same entity together, when possible (e.g. Purchase Order vs. Vendor information). This approach allows to reduce the volume of work required to manage, review, test and understand the query later. If the blocks are too big, then occasionally it makes sense to bring pieces of logic into CTEs (Common Table Expressions), or much better into views that allow to better encapsulate and partition the logic.

CTEs allow to split the logic into logical steps, allowing occasionally to troubleshoot the logic on pieces though one should keep a balance between maintainability and detail. In extremis, people may create unnecessarily an additional CTE for each join. The longer and the more fragmented a query, the more difficult it becomes to troubleshoot and even understand. Readability can be better achieved though indentation, by keeping things together that belong together, respectively partitioning the logic in logical blocks that derive from the model. 

Keeping things together should be applied also to the other elements. For example, the join constraints should be limited only to the fields participating in the join (and, if possible, all the other constraints should be brought in the WHERE clause). Bringing the join constraints in the WHERE clause, an approach used in the past, decreases query readability no matter how well the WHERE clause is structured, and occasionally can lead to constraints’ duplication or worse, to missing pieces of logic. 

The order of the attributes should follow a logic that makes sense. One approach is to start from the tables with lowest cardinality that identify entities uniquely and move to the attributes that have a higher level of detail. However, not all attributes are equally important, and thus one might have to compromise and create multiple groups of data coming from different levels. 

One should keep in mind that the more random the order of the attributes is, the more difficult it becomes to validate the data as one needs to jump multiple times either in the query or in the mask. Ideally one should find a balance between the two perspectives. Having an intuitive logic of how the attributes are listed increases queries’ readability, maintainability and troubleshooting. The more random attributes’ listing, the higher the effort for the same. 

Previous Post  <<||>>   Next Post

22 December 2024

#️⃣Software Engineering: Mea Culpa (Part VI: A Look Back)

Software Engineering Series
Software Engineering Series

Looking back at my university years, I'd say that there are three teachers, respectively courses that made a considerable impact on students' lives. In the second year I learned Category Algebra, which despite the fact that it reflected past knowledge and the topics were too complex for most of us, it provided us with a unprecedented layer of abstraction that showed us that Mathematics is not what we thought it to be!

The second course was related to the Complex plane theory, a course in which, the decan of the university at those times, challenged our way of thinking about relatively basic concepts. It was a big gap between what we thought about Mathematics, and what the subject proved to be. The course was thought in a post-university year together with a course on Relativity Theory, in which even we haven't understood much about the concepts and theories, it was the first time (except the Graph theory), we saw applied Mathematics to a broader context. Please don't misunderstand me! There were many other valuable teachers and courses, though these were the three courses that made the most important impact for me!

During those times, we attended also courses on Fortran, Pascal, C++, HTML and even dBase, and, even if each programming language brought something new in the landscape, I can't say they changed how we thought about the world (some of us had similar courses during the lyceum years) and problem solving. That's what for example SQL or more generally a database-related course brought, even if I had to wait for the first MooC courses to appear. Equally important was also Scott E Page's course on Model Theory, which introduced the model-thinking approach, a structured way of thinking about models, with applicability to the theoretical and practical aspects of life.

These are the courses that anybody interested in programming and/or IT should attend! Of course, there are also courses on algorithms, optimization, linear and non-linear programming, and they bring an arsenal of concepts and techniques to think about, though, even if they might have a wide impact, I can't compare them with the courses mentioned above. A course should (ideally) change the way we think about the world to make a sensible difference! Same goes for programming and theoretical concepts too!...

Long after I graduated, I found many books and authors that I wished I had met earlier! Quotable Math reflects some of the writings I found useful, though now it seems already too late for those books to make a considerable impact! Conversely, it's never too late to find new ways to look at life, and this is what some books achieve! This is also a way of evaluating critically what we want to read or what is worth reading!

Of course, there are many courses, books or ideas out there, though if they haven't changed the way you think about life, directly or indirectly, are they worth attending, respectively reading? Conversely, if one hasn't found a new perspective brought by a topic, probably one barely scratched the surface of the subject, independently if we talk here about students or teachers. For some topics, it's probably too much to ask, though pragmatically talking, that's the intrinsic value of what we learn! 

That's a way to think about life and select the books worth reading! I know, many love reading for the sake of reading, though the value of a book, theory, story or other similar artifacts should be judged especially by the impact they have on our way of thinking, respectively on our lives. Just a few ideas that's maybe worth reflective upon... 

Previous Post <<||>> Next Post

14 December 2024

🧭💹Business Intelligence: Perspectives (Part XXI: Data Visualization Revised)

Data Visualization Series
Data Visualization Series

Creating data visualizations nowadays became so easy that anybody can do it with a minimum of effort and knowledge, which on one side is great for the creators but can be easily become a nightmare for the readers, respectively users. Just dumping data in visuals can be barely called data visualization, even if the result is considered as such. The problems of visualization are multiple – the lack of data culture, the lack of understanding processes, data and their characteristics, the lack of being able to define and model problems, the lack of educating the users, the lack of managing the expectations, etc.

There are many books on data visualization though they seem an expensive commodity for the ones who want rapid enlightenment, and often the illusion of knowing proves maybe to be a barrier. It's also true that many sets of data are so dull, that the lack of information and meaning is compensated by adding elements that give a kitsch look-and-feel (aka chartjunk), shifting the attention from the valuable elements to decorations. So, how do we overcome the various challenges? 

Probably, the most important step when visualizing data is to define the primary purpose of the end product. Is it to inform, to summarize or to navigate the data, to provide different perspectives at macro and micro level, to help discovery, to explore, to sharpen the questions, to make people think, respectively understand, to carry a message, to be artistic or represent truthfully the reality, or maybe is just a filler or point of attraction in a textual content?

Clarifying the initial purpose is important because it makes upfront the motives and expectations explicit, allowing to determine the further requirements, characteristics, and set maybe some limits in what concern the time spent and the qualitative and/or qualitative criteria upon which the end result should be eventually evaluated. Narrowing down such aspects helps in planning and the further steps performed. 

Many of the steps are repetitive and past experience can help reduce the overall effort. Therefore, professionals in the field, driven by intuition and experience probably don't always need to go through the full extent of the process. Conversely, what is learned and done poorly, has high chances of delivering poor quality. 

A visualization can be considered as effective when it serves the intended purpose(s), when it reveals with minimal effort the patterns, issues or facts hidden in the data, when it allows people to explore the data, ask questions and find answers altogether. One can talk also about efficiency, especially when readers can see at a glance the many aspects encoded in the visualization. However, the more the discovery process is dependent on data navigation via filters or other techniques, the more difficult it becomes to talk about efficiency.

Better criteria to judge visualizations is whether they are meaningful and useful for the readers, whether the readers understood the authors' intent, the further intrinsic implication, though multiple characteristics can be associated with these criteria: clarity, specificity, correctedness, truthfulness, appropriateness, simplicity, etc. All these are important in lower or higher degree depending on the broader context of the visualization.

All these must be weighted in the bigger picture when creating visualizations, though there are probably also exceptions, especially on the artistic side, where artists can cut corners for creating an artistic effect, though also in here the authors need to be truthful to the data and make sure that their work don't distort excessively the facts. Failing to do so might not have an important impact on the short term considerably, though in time the effects can ripple with unexpected effects.


13 December 2024

🧭💹Business Intelligence: Perspectives (Part XX: From BI to AI)

Business Intelligence Series

No matter how good data visualizations, reports or other forms of BI artifacts are, they only serve a set of purposes for a limited amount of time, limited audience or any other factors that influence their lifespan. Sooner or later the artifacts become thus obsolete, being eventually disabled, archived and/or removed from the infrastructure. 

Many artifacts require a considerable number of resources for their creation and maintenance over time. Sometimes the costs can be considerably higher than the benefits brought, especially when the data or the infrastructure are used for a narrow scope, though there can be other components that need to be considered in the bigger picture. Having a report or visualization one can use when needed can have an important impact on the business in correcting issues, sizing opportunities or filling the knowledge gaps. 

Even if it’s challenging to quantify the costs associated with the loss of opportunities rooted in the lack of data, respectively information, the amounts can be considerable high, greater even than building a whole BI infrastructure. Organization’s agility in addressing the important gaps can make a considerable difference, at least in theory. Having the resources that can be pulled on demand can give organizations the needed competitive boost. Internal or external resources can be used altogether, though, pragmatically speaking, there will be always a gap between demand and supply of knowledgeable resources.

The gap in BI artefacts can be addressed nowadays by AI-driven tools, which have the theoretical potential of shortening the gap between needs and the availability of solutions, respectively a set of answers that can be used in the process. Of course, the processes of sense-making and discovery are not that simple as we’d like, though it’s a considerable step forward. 

Having the possibility of asking questions in natural language and guiding the exploration process to create visualizations and other artifacts using prompt engineering and other AI-enabled methods offers new possibilities and opportunities that at least some organizations started exploring already. This however presumes the existence of an infrastructure on which the needed foundation can be built upon, the knowledge required to bridge the gap, respectively the resources required in the process. 

It must be stressed out that the exploration processes may bring no sensible benefits, at least no immediately, and the whole process depends on organizations’ capabilities of identifying and sizing the respective opportunities. Therefore, even if there are recipes for success, each organization must identify what matters and how to use technologies and the available infrastructure to bridge the gap.

Ideally to make progress organizations need besides the financial resources the required skillset, a set of projects that support learning and value creation, respectively the design and execution of a business strategy that addresses the steps ahead. Each of these aspects implies risks and opportunities altogether. It will be a test of maturity for many organizations. It will be interesting to see how many organizations can handle the challenge, respectively how much past successes or failures will weigh in the balance. 

AI offers a set of capabilities and opportunities, however the chance of exploring and failing fast is of great importance. AI is an enabler and not a magic wand, no matter what is preached in technical workshops! Even if progress follows an exponential trajectory, it took us more than half of century from the first steps until now and probably many challenges must be still overcome. 

The future looks interesting enough to be pursued, though are organizations capable to size the opportunities, respectively to overcome the challenges ahead? Are organizations capable of supporting the effort without neglecting the other priorities? 

12 December 2024

🧭💹Business Intelligence: Perspectives (Part XIX: Data Visualization between Art, Pragmatism and Kitsch)

Business Intelligence Series

The data visualizations (aka dataviz) presented in the media, especially the ones coming from graphical artists, have the power to help us develop what is called graphical intelligence, graphical culture, graphical sense, etc., though without a tutor-like experience the process is suboptimal because it depends on our ability of identifying what is important and which are the steps needed for decoding and interpreting such work, respectively for integrating their messages in our overall understanding about the world.

When such skillset is lacking, without explicit annotations or other form of support, the reader might misinterpret or fail to observe important visual cues even for simple visualizations, with all the implications deriving from this – a false understanding, and further aspects deriving from it, this being probably the most important aspect to consider. Unfortunately, even the most elaborate work can fail if the reader doesn’t have a basic understanding of all that’s implied in the process.

The books of Willard Brinton, Ana Rogers, Jacques Bertin, William Cleveland, Leland Wilkinson, Stephen Few, Albert Cairo, Soctt Berinato and many others can help the readers build a general understanding of the dataviz process and how data visualizations or simple graphics can be used/misused effectively, though each reader must follow his/her own journey. It’s also true that the basics can be easily learned, though the deeper one dives, the more interesting and nontrivial the journey becomes. Fortunately, the average reader can stick to the basics and many visualizations are simple enough to be understood.

To grasp the full extent of the implications, one can make comparisons with the domain of poetry where the author uses basic constructs like metaphor, comparisons, rhythm and epithets to create, communicate and imprint in reader’s mind old and new meanings, images and feelings altogether. Artistic data visualizations tend to offer similar charge as poetry does, even if the impact might not appeal so much to our artistic sensibility. Though dataviz from this perspective is or at least resembles an art form.

Many people can write verses, though only a fraction can write good meaningful poetry, from which a smaller fraction get poems, respectively even fewer get books published. Conversely, not everything can be expressed in verses unless one finds good metaphors and other aspects that can be leveraged in the process. Same can be said about good dataviz.

One can argue that in dataviz the author can explore and learn especially by failing fast (seeing what works and what doesn’t). One can also innovate, though the creator has probably a limited set of tools and rules for communication. Enabling readers to see the obvious or the hidden in complex visualizations or contexts requires skill and some kind of mastery of the visual form.

Therefore, dataviz must be more pragmatic and show the facts. In art one has the freedom to distort or move things around to create new meanings, while in dataviz it’s important for the meaning to be rooted in 'truth', at least by definition. The more the creator of a dataviz innovates, the higher the chances of being misunderstood. Moreover, readers need to be educated in interpreting the new meanings and get used to their continuous use.

Kitsch is a term applied to art and design that is perceived as naïve imitation to the degree that it becomes a waste of resources even if somebody pays the tag price. There’s a trend in dataviz to add elements to visualizations that don’t bring any intrinsic value – images, colors and other elements can be misused to the degree that the result resembles kitsch, and the overall value of the visualization is diminished considerably.

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part VI: SQL Databases for OLTP scenarios) [new feature]

Data Warehousing Series
Data Warehousing Series

One interesting announcements at Ignite is the availability in public preview of SQL databases in Microsoft Fabric, "a versatile and developer-friendly transactional database built on the foundation of Azure SQL database". With this Fabric can address besides OLAP also OLTP scenarios, evolving thus from analytics to a data platform [1]. According to the announcement, besides the AI-optimized architectural aspects, the feature makes the SQL Azure simple, autonomous and secure by design [1], and these latest aspects are considered in this post. 

Simplicity revolves around the deployment and configuration of databases, the creation of a new database requiring giving a name and the database is created in seconds [1]. It’s a considerable improvement compared with the relatively complex setup needed for on-premise configurations, though sometimes more flexibility in configuration is needed upfront or over database’s lifetime. To get a database ready for testing one can import a sample database or get specific data via data flows and/or pipelines [1]. As development tools one can use Visual Studio Code or SSMS [1], and probably more tools will be available in time.

The integration with both GitHub and Azure DevOps allows to configure each database under source control, which is needed for many scenarios especially when multiple resources make changes to the database objects [1]. Frankly, that’s mainly important during the development phase, respectively in scenarios in which multiple people make in parallel changes to the logic. It will be interesting to see how much overhead or challenges the feature adds to development and how smoothly everything works together!

The most important aspect for many solutions is the replication of data in near-real time to the (open-source) delta parquet format in OneLake and thus making the data available for analytics almost immediately [1]. Probably, from this aspect many cloud-based applications can benefit, even if the performance might not be as good as in other well-established architectures. However, there are many other scenarios in which one needs to maintain and use data for OLTP/OLAP purposes. This invites adequate testing and a good weighting of the advantages and disadvantages involved. 

A SQL database is a native item in Fabric, and therefore it utilizes Fabric capacity units like other Fabric workloads [1]. One can use the Fabric SKU estimator (still in private preview) to estimate the costs [2], though it will be interesting to see how cost-effective the solutions are. Probably, especially when the infrastructure is already available outside of Fabric, it will be easier and cost-effective to use the mirroring functionality. One should test and have a better estimator before moving blindly from the existing infrastructure to Fabric. 

SQL databases in Fabric are autonomous by design, while allowing to get the best performance and availability by default [1]. High availability is reached through zone redundancy, while performance is achieved by scaling automatically the storage and compute to accommodate the workloads [1]. The auto-optimization capability is achieved with the help of the latest Intelligent Query Processing (IQP) enhancements, respectively the creation of missing indexes to improve query performance [1]. It will be interesting to see how the whole process works, given that the maintenance of indexes usually involves some challenges (e.g. identifying covering indexes, indexes needed only for temporary workloads, duplicated indexes).

SQL databases in Fabric are automatically configured for high availability with zone redundancy, while storage and compute scale automatically to accommodate the user workload [1]. The database is auto-optimized through the latest IQP enhancements while the system creates any missing indexes to improve query performance. All data is replicated to OneLake by default [1]. Finally, the database always receives the latest security updates with auto-patching, while automatic backups help in disaster recovery scenarios  [1], which can be of real help for database administrators. 

References:
[1] Microsoft Fabric Updates Blog (2024) Announcing SQL database in Microsoft Fabric Public Preview [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing New Recruitment for the Private Preview of Microsoft Fabric SKU Estimator [link]

16 October 2024

🧭💹Business Intelligence: Perspectives (Part XVIII: There’s More to Noise)

Business Intelligence Series
Business Intelligence Series

Visualizations should be built with an audience's characteristics in mind! Upon case, it might be sufficient to show only values or labels of importance (minima, maxima, inflexion points, exceptions, trends), while other times it might be needed to show all or most of the values to provide an accurate extended perspective. It even might be useful to allow users switching between the different perspectives to reduce the clutter when navigating the data or look at the patterns revealed by the clutter. 

In data-based storytelling are typically shown the points, labels and further elements that support the story, the aspects the readers should focus on, though this approach limits the navigability and users’ overall experience. The audience should be able to compare magnitudes and make inferences based on what is shown, and the accurate decoding shouldn’t be taken as given, especially when the audience can associate different meanings to what’s available and what’s missing. 

In decision-making, selecting only some well-chosen values or perspectives to show might increase the chances for a decision to be made, though is this equitable? Cherry-picking may be justified by the purpose, though is in general not a recommended practice! What is not shown can be as important as what is shown, and people should be aware of the implications!

One person’s noise can be another person’s signal. Patterns in the noise can provide more insight compared with the trends revealed in the "unnoisy" data shown! Probably such scenarios are rare, though it’s worth investigating what hides behind the noise. The choice of scale, the use of special types of visualizations or the building of models can reveal more. If it’s not possible to identify automatically such scenarios using the standard software, the users should have the possibility of changing the scale and perspective as seems fit. 

Identifying patterns in what seems random can prove to be a challenge no matter the context and the experience in the field. Occasionally, one might need to go beyond the general methods available and statistical packages can help when used intelligently. However, a presenter’s challenge is to find a plausible narrative around the findings and communicate it further adequately. Additional capabilities must be available to confirm the hypotheses framed and other aspects related to this approach.

It's ideal to build data models and a set of visualizations around them. Most probable some noise may be removed in the process, while other noise will be further investigated. However, this should be done through adjustable visual filters because what is removed can be important as well. Rare events do occur, probably more often than we are aware and they may remain hidden until we find the right perspective that takes them into consideration. 

Probably, some of the noise can be explained by special events that don’t need to be that rare. The challenge is to identify those parameters, associations, models and perspectives that reveal such insights. One’s gut feeling and experience can help in this direction, though novel scenarios can surprise us as well.

Not in every set of data one can find patterns, respectively a story trying to come out. Whether we can identify something worth revealing depends also on the data available at our disposal, respectively on whether the chosen data allow identifying significant patterns. Occasionally, the focus might be too narrow, too wide or too shallow. It’s important to look behind the obvious, to look at data from different perspectives, even if the data seems dull. It’s ideal to have the tools and knowledge needed to explore such cases and here the exposure to other real-life similar scenarios is probably critical!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.