10 November 2024

🏭🗒️Microsoft Fabric: Data Mesh (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!
Last updated: 23-May-2024

Data Mesh [Notes]
  • {definition} a type of decentralized data architecture that organizes data based on different business domains [2]
    •   a centrally managed network of decentralized data products
  • {concept} landing zone
    • typically a subscription that needs to be governed by a common policy [7]
      • {downside} creating one landing zone for every project can lead to too many landing zones to manage
        • {alternative} landing zones based on a business domain [7] 
    •  resources must be managed efficiently in a way that each team is given access to only their resources [7]
      •   shared resources might be need with separate management and common access to all [7]
    • need to be linked together into a mesh
      • via peer-to-peer networks
  • {concept} connectivity hub
  • {feature} resource group
    • {definition} a container that holds related resources for an Azure solution 
    • can be associated with a data product
      • when the data product becomes obsolete, the resource group can be deleted [7]
  • {feature} subscription
    • {definition} a logical unit of Azure services that are linked to an Azure account
    • can be associated as a landing zone governed by a policy [7]
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • some tenant-level settings for managing and governing data can be delegated to the domain level [2]
  • {feature} subdomains
    • a way for fine tuning the logical grouping data under a domain [1]
    • subdivisions of a domain
  • {concept} deployment template

References
[1] Microsoft Learn: Fabric (2023) Fabric domains (link)
[2] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam (link
[3] Data mesh: A perspective on using Azure Synapse Analytics to build data products, by Amanjeet Singh (link)
[4] Zhamak Dehghani (2021) Data Mesh: Delivering Data-Driven Value at Scale
[5] Marthe Mengen (2024) How do you set up your Data Mesh in Microsoft Fabric? (link)
[6] Administering Microsoft Fabric - Considering Data Products vs Domains vs Workspaces, by Paul Andrew (link)
[7] Aniruddha Deswandikar (2024) Engineering Data Mesh in Azure Cloud

🏭🗒️Microsoft Fabric: Data Warehouse (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources.
Last updated: 11-Mar-2024

Warehouse vs SQL analytics endpoint in Microsoft Fabric
Warehouse vs SQL analytics endpoint in Microsoft Fabric [3]

Data Warehouse

  • highly available relational data warehouse that can be used to store and query data in the Lakehouse
    • supports the full transactional T-SQL capabilities 
    • modernized version of the traditional data warehouse
  • unifies capabilities from Synapse Dedicated and Serverless SQL Pools
  • modernized with key improvements
  • resources are managed elastically to provide the best possible performance
    • ⇒ no need to think about indexing or distribution
    • a new parser gives enhanced CSV file ingestion time
    • metadata is now cached in addition to data
    • improved assignment of compute resources to milliseconds
    • multi-TB result sets are streamed to the client
  • leverages a distributed query processing engine
    • provides with workloads that have a natural isolation boundary [3]
      • true isolation is achieved by separating workloads with different characteristics, ensuring that ETL jobs never interfere with their ad hoc analytics and reporting workloads [3]
  • {operation} data ingestion
    • involves moving data from source systems into the data warehouse [2]
      • the data becomes available for analysis [1]
    • via Pipelines, Dataflows, cross-database querying, COPY INTO command
    • no need to copy data from the lakehouse to the data warehouse [1]
      • one can query data in the lakehouse directly from the data warehouse using cross-database querying [1]
  • {operation} data storage
    • involves storing the data in a format that is optimized for analytics [2]
  • {operation} data processing
    • involves transforming the data into a format that is ready for consumption by analytical tools [1]
  • {operation} data analysis and delivery
    • involves analyzing the data to gain insights and delivering those insights to the business [1]
  • {operation} designing a warehouse (aka warehouse design)
    • standard warehouse design can be used
  • {operation} sharing a warehouse (aka warehouse sharing)
    • a way to provide users read access to the warehouse for downstream consumption
      • via SQL, Spark, or Power BI
    • the level of permissions can be customized to provide the appropriate level of access
  • {feature} mirroring 
    • provides a modern way of accessing and ingesting data continuously and seamlessly from any database or data warehouse into the Data Warehousing experience in Fabric
      • any database can be accessed and managed centrally from within Fabric without having to switch database clients
      • data is replicated in a reliable way in real-time and lands as Delta tables for consumption in any Fabric workload
  • {concept}SQL analytics endpoint 
    • a warehouse that is automatically generated from a Lakehouse in Microsoft Fabric [3]
  • {concept}virtual warehouse
    • can containing data from virtually any source by using shortcuts [3]
  • {concept} cross database querying 
    • enables to quickly and seamlessly leverage multiple data sources for fast insights and with zero data duplication [3]
Previous Post <<||>> Next Post

References:
[1] Microsoft Learn: Fabric (2023) Get started with data warehouses in Microsoft Fabric (link
[2] Microsoft Learn: Fabric (2023) Microsoft Fabric decision guide: choose a data store (link)
[3] Microsoft Learn: Fabric (2024) What is data warehousing in Microsoft Fabric? (link)
[4] Microsoft Learn: Fabric (2023) Better together: the lakehouse and warehouse (link)

Resources:
[1] Microsoft Learn: Fabric (2023) Data warehousing documentation in Microsoft Fabric (link)


03 November 2024

📉Graphical Representation: Mosaic Plots (Just the Quotes)

"We have so consistently inveighed against the use of areas to illustrate quantities that the reader will indeed be surprised at some coming retractions. [...] But the fact is that we now propose to turn to advantage the very feature of areas which has previously been their greatest fault. [...] We now come to data in which we wish to show simultaneously three ratios or sets of ratios, one of which is always the product of the other two. In other words, we wish to show two factors or sets of factors and their product." (Karl Karsten, "Charts and Graphs", 1925)

"A contingency table specifies the joint distribution of a number of discrete variables. The numbers in a contingency table are represented by rectangles of areas proportional to the numbers, with shape and position chosen to expose deviations from independence models. The collection of rectangles for the contingency table is called a mosaic." (John A Hartigan & B Kleiner, "Mosaics for Contingency Tables", 1981)

"Mosaic displays represent the counts in a contingency table by tiles whose size is proportional to the cell count. This graphical display for categorical data generalizes readily to multiway tables."  (Michael Friendly, "Mosaic Displays for Loglinear Models", Proceedings of the Statistical Graphics, 1992)

"Although the basic mosaic display shows the data in any contingency table, it does not in general provide a visual representation of the fit of the data to a specified model. In the two-way case independence is shown when the tiles in each row align vertically, but visual assessment of other models is more difficult." (Michael Friendly, "Mosaic Displays for Loglinear Models", Proceedings of the Statistical Graphics, 1992)

"Categorical data are most often modeled using loglinear models. For certain loglinear models, mosaic plots have unique shapes that do not depend on the actual data being modeled. These shapes reflect the structure of a model, defined by the presence and absence of particular model coefficients. Displaying the expected values of a loglinear model allows one to incorporate the residuals of the model graphically and to visually judge the adequacy of the loglinear fit. This procedure leads to stepwise interactive graphical modeling of loglinear models. We show that it often results in a deeper understanding of the structure of the data. Linking mosaic plots to other inter- active displays offers additional power that allows the investigation of more complex dependence models than provided by static displays." (Martin Theus & Stephan R W Lauer, "Visualizing Loglinear Models", Journal of Computational and Graphical Statistics Vol. 8 (3), 1999)

"The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot, shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary." (Michael Friendly, "Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data", 199)

"A graphical display of a p-dimensional contingency table, the empirical distribution of p categorical variables, is a mosaic plot. Each tile (or bin) corresponds to one cell of the contingency table, its size to the number of the cell's entries. The shape of a tile is calculated during the (strictly hierarchical) construction." (Heike Hoffmann, "Generalized Odds Ratios for Visual Modeling", Journal of Computational and Graphical Statistics Vol. 10 (4), 2001)

"Mosaics are space-filling designs composed of contiguous shapes ('tiles')." (Michael Friendly, "A Brief History of the Mosaic Display", Journal of Computational and Graphical Statistics, Vol. 11 (1), 2002)

"The principal graphical ideas [of mosaic plots] are: (*) using area = height x width, to represent a quantity which depends on a product of two other variables, each of interest; (*) using recursive subsdivision to show any number of variables; (*) using shading to display some other attribute of the data; (*) purely multiplicative relations (e.g., Pij = Pi+P+j) produce equal subdivisions; (*) for two or more variables, the levels of subdivision are spaced with larger gaps at the earlier levels, to allow easier perception of the groupings at various levels, and to provide for empty cells." (Michael Friendly, "A Brief History of the Mosaic Display", Journal of Computational and Graphical Statistics, Vol. 11 (1), 2002)

"Due to their recursive definition, switching the order of variables in a mosaic plot has a strong impact on what can be read from the plot. For instance, exchanging the two variables in a two-dimensional mosaic plot results in a completely new plot rather than in a mere graphically transposed version of the original plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)  

"Mosaic plots are defined recursively, i.e., each variable that is introduced in a mosaic plot is plotted conditioned on the groups already established in the plot. As with barcharts, the area of bars or tiles is proportional to the number of observations (or the sum of the observation weights of a class). The direction along which bars are divided by a newly introduced variable is usually alternating, starting with the x-direction." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009) 

"Mosaic plots become more difficult to read for variables with more than two or three categories. One way out is to assign a constant space for all possible crossings of categories. This way, the data from the r×c table are plotted in a table-like layout. Whereas this regular layout makes it much easier to compare values across rows and columns, the plot space is used less efficiently than in a mosaic plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Conceptually, mosaic plots for s + 1 factors in strength s designs can be used for any s; in practice, the idea is limited by space constraints, especially for accommodating labels for the factor levels. All four margins are used for four-factor projections; with the next dimension, one margin has to be used for two factors. In practice, one will rarely consider mosaic plots for more factors than four at a time." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"Mosaic plots are particularly useful for design and analysis of orthogonal main effect plans. [...] mosaic plots do not reflect geometric properties relevant for designs in quantitative factors. Nevertheless, mosaic plots can also be used to visualize founding severity for designs with quantitative factors [...]" (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"Mosaic plots can get quite messy when increasing the number of variables, which is presumably the reason many commercial software products offer them for two variables only." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"The way that the model differs from the data gives us clues about how we can improve our model. We can use mosaic displays to find the specific ways in which the model is different from the data, since mosaics show the residuals (or differences) of the cells with respect to the model. Looking at these differences, we can observe patterns in the deviation that will help us in our search." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

16 October 2024

🧭Business Intelligence: Perspectives (Part VIII: There’s More to Noise)

Business Intelligence Series
Business Intelligence Series

Visualizations should be built with an audience's characteristics in mind! Upon case, it might be sufficient to show only values or labels of importance (minima, maxima, inflexion points, exceptions, trends), while other times it might be needed to show all or most of the values to provide an accurate extended perspective. It even might be useful to allow users switching between the different perspectives to reduce the clutter when navigating the data or look at the patterns revealed by the clutter. 

In data-based storytelling are typically shown the points, labels and further elements that support the story, the aspects the readers should focus on, though this approach limits the navigability and users’ overall experience. The audience should be able to compare magnitudes and make inferences based on what is shown, and the accurate decoding shouldn’t be taken as given, especially when the audience can associate different meanings to what’s available and what’s missing. 

In decision-making, selecting only some well-chosen values or perspectives to show might increase the chances for a decision to be made, though is this equitable? Cherry-picking may be justified by the purpose, though is in general not a recommended practice! What is not shown can be as important as what is shown, and people should be aware of the implications!

One person’s noise can be another person’s signal. Patterns in the noise can provide more insight compared with the trends revealed in the "unnoisy" data shown! Probably such scenarios are rare, though it’s worth investigating what hides behind the noise. The choice of scale, the use of special types of visualizations or the building of models can reveal more. If it’s not possible to identify automatically such scenarios using the standard software, the users should have the possibility of changing the scale and perspective as seems fit. 

Identifying patterns in what seems random can prove to be a challenge no matter the context and the experience in the field. Occasionally, one might need to go beyond the general methods available and statistical packages can help when used intelligently. However, a presenter’s challenge is to find a plausible narrative around the findings and communicate it further adequately. Additional capabilities must be available to confirm the hypotheses framed and other aspects related to this approach.

It's ideal to build data models and a set of visualizations around them. Most probable some noise may be removed in the process, while other noise will be further investigated. However, this should be done through adjustable visual filters because what is removed can be important as well. Rare events do occur, probably more often than we are aware and they may remain hidden until we find the right perspective that takes them into consideration. 

Probably, some of the noise can be explained by special events that don’t need to be that rare. The challenge is to identify those parameters, associations, models and perspectives that reveal such insights. One’s gut feeling and experience can help in this direction, though novel scenarios can surprise us as well.

Not in every set of data one can find patterns, respectively a story trying to come out. Whether we can identify something worth revealing depends also on the data available at our disposal, respectively on whether the chosen data allow identifying significant patterns. Occasionally, the focus might be too narrow, too wide or too shallow. It’s important to look behind the obvious, to look at data from different perspectives, even if the data seems dull. It’s ideal to have the tools and knowledge needed to explore such cases and here the exposure to other real-life similar scenarios is probably critical!

Previous Post <<||>> Next Post

𖣯Strategic Management: Strategic Perspectives (Part II: The Elephant in the Room)

Strategic Management Perspectives
Strategic Management Perspectives

There’s an ancient parable about several blind people who touch a shape they had never met before, an elephant, and try to identify what it is. The elephant is big, more than each person can sense through direct experience, and people’s experiences don’t correlate to the degree that they don’t trust each other, the situation escalating upon case. The moral of the parable is that we tend to claim (absolute) truths based on limited, subjective experience [1], and this can easily happen in business scenarios in which each of us has a limited view of the challenges we are facing individually and as a collective. 

The situation from the parable can be met in business scenarios, when we try to make sense of the challenges we are faced with, and we get only a limited perspective from the whole picture. Only open dialog and working together can get us closer to the solution! Even then, the accurate depiction might not be in sight, and we need to extrapolate the unknown further.  

A third-party consultant with experience might be the right answer, at least in theory, though experience and solutions are relative. The consultant might lead us in a direction, though from this to finding the answer can be a long way that requires experimentation, a mix of tactics and strategies that change over time, more sense-making and more challenges lying ahead. 

We would like a clear answer and a set of steps that lead us to the solution, though the answer is as usual, it depends! It depends on the various forces/drivers that have the biggest impact on the organization, on the context, on the organization’s goals, on the resources available directly or indirectly, on people’s capabilities, the occurrences of external factors, etc. 

In many situations the smartest thing to do is to gather information, respectively perspectives from all the parties. Tools like brainstorming, SWOT/PESTLE analysis or scenario planning can help in sense-making to identify the overall picture and where the gravity point lies. For some organizations the solution will be probably a new ERP system, or the redesign of some processes, introduction of additional systems to track quality, flow of material, etc. 

A new ERP system will not necessarily solve all the issues (even if that’s the expectation), and some organizations just try to design the old processes into a new context. Process redesign in some areas can be upon case a better approach, at least as primary measure. Otherwise, general initiatives focused on quality, data/information management, customer/vendor management, integrations, and the list remains open, can provide the binder/vehicle an organization needs to overcome the current challenges.

Conversely, if the ERP or other strategical systems are 10-20 years old, then there’s indeed an elephant in the room! Moreover, the elephant might be bigger than we can chew, and other challenges might lurk in its shadow(s). Everything is a matter of perspective with no apparent unique answer. Thus, finding an acceptable solution might lurk in the shadow of the broader perspective, in the cumulated knowledge of the people experiencing the issues, respectively in some external guidance. Unfortunately, the guides can be as blind as we are, making limited or no important impact. 

Sometimes, all it’s needed is a leap of faith corroborated with a set of tactics or strategies kept continuously in check, redirected as they seem fit based on the knowledge accumulated and the challenges ahead. It helps to be aware of how others approached the same issues. Unfortunately, there’s no answer that works for all! In this lies the challenge, in identifying what works and makes sense for us!

Previous Post <<||>> Next Post

Resources:
[1] Wikipedia (2024) Blind men and an elephant [link]


15 October 2024

🗄️Data Management: Data Governance (Part III: Taming the Complexity)

Data Management Series
Data Management Series

The Chief Data Officer (CDO) or the “Head of the Data Team” is one of the most challenging jobs because is more of a "political" than a technical role. It requires the ideal candidate to be able to throw and catch curved balls almost all the time, and one must be able to play ball with all the parties having an interest in data (aka stakeholders). It’s a full-time job that requires the combination of management and technical skillsets, and both are important! The focus will change occasionally in one direction more than in the other, with important fluctuations. 

Moreover, even if one masters the technical and managerial aspects, the combination of the two gives birth to situations that require further expertise – applied systems thinking being probably the most important. This, also because there are so many points of failure that it's challenging to address all the important causes. Therefore, it’s critical to be a system thinker, to have an experienced team and make use adequately of its experience! 

In a complex word, in which even the smallest constraint or opportunity can have an important impact especially when it’s involved in the early stages of the processes taking place in organizations. It relies on the manager’s and team’s skillset, their inspiration, the way the business reacts to the tasks involved and probably many other aspects that make things work. It takes considerable effort until the whole mechanism works, and even more time to make things work efficiently. The best metaphor is probably the one of a small combat team in which everybody has their place and skillset in the mechanism, independently if one talks about strategy, tactics or operations. 

Unfortunately, building such teams takes time, and the more people are involved, the more complex this endeavor becomes. The manager and the team must meet somewhere in the middle in what concerns the philosophy, the execution of the various endeavors, the way of working together to achieve the same goals. There are multiple forces pulling in all directions and it takes time until one can align the goals, respectively the effort. 

The most challenging forces are the ones between the business and the data team, respectively the business and data requirements, forces that don’t necessarily converge. Working in small organizations, the two parties have in theory more challenges to overcome the challenges and a team’s experience can weight a lot in the process, though as soon the scale changes, the number of challenges to be overcome changes exponentially (there are however different exponential functions in which the basis and exponent make the growth rapid). 

In big organizations can appear other parties that have the same force to pull the weight in one direction or another. Thus, the political aspects become more complex to the degree that the technologies must follow the political decisions, with all the positive and negative implications deriving from this. As comparison, think about the challenges from moving from two to three or more moving bodies orbiting each other, resulting in a chaotic dynamical system for most initial conditions. 

Of course, a business’ context doesn’t have to create such complexity, though when things are unchecked, when delays in decision-making as well as other typical events occur, when there’s no structure, strategy, coordinated effort, or any other important components, the chances for chaotic behavior are quite high with the pass of time. This is just a model to explain real life situations that seem similar on the surface but prove to be quite complex when diving deeper. That’s probably why a CDO’s role as tamer of complexity is important and challenging!

Previous Post <<||>> Next Post

11 October 2024

🧭Business Intelligence: Perspectives (Part VII: Creating Value for Organizations)

Business Intelligence Series
Business Intelligence Series

How does one create value for an organization in BI area? This should be one of the questions the BI professional should ask himself and eventually his/her colleagues on a periodic basis because the mere act of providing reports and good-looking visualizations doesn’t provide value per se. Therefore, it’s important to identify the critical to success and value drivers within each area!

One can start with the data, BI or IT strategies, when organizations invest the time in their direction, respectively with the considered KPIs and/or OKRs defined, and hopefully the organizations already have something similar in place! However, these are just topics that can be used to get a bird view over the overall landscape and challenges. It’s advisable to dig deeper, especially when the strategic, tactical and operational plans aren’t in sync, and let’s be realistic, this happens probably in many organizations, more often than one wants to admit!

Ideally, the BI professional should be able to talk with the colleagues who could benefit from having a set of reports or dashboards that offer a deeper perspective into their challenges. Talking with each of them can be time consuming and not necessarily value driven. However, giving each team or department the chance to speak their mind, and brainstorm what can be done, could in theory bring more value. Even if their issues and challenges should be reflected in the strategy, there’s always an important gap between the actual business needs and those reflected in formal documents, especially when the latter are not revised periodically. Ideally, such issues should be tracked back to a business goal, though it’s questionable how much such an alignment is possible in practice. Exceptions will always exist, no matter how well structured and thought a strategy is!

Unfortunately, this approach also involves some risks. Despite their local importance, the topics raised might not be aligned with what the organization wants, and there can be a strong case against and even a set of negative aspects related to this. However, talking about the costs involved by losing an opportunity can hopefully change the balance favorably. In general, transposing the perspective of issues into the area of their associated cost for the organization has (hopefully) the power to change people’s minds.

Organizations tend to bring forward the major issues, addressing the minor ones only after that, this having the effect that occasionally some of the small issues increase in impact when not addressed. It makes sense to prioritize with the risks, costs and quick wins in mind while looking at the broader perspective! Quick wins are usually addressed at strategic level, but apparently seldom at tactical and operational level, and at these levels one can create the most important impact, paving the way for other strategic measures and activities.

The question from the title is not limited only to BI professionals - it should be in each manager and every employee’s mind. The user is the closest to the problems and opportunities, while the manager is the one who has a broader view and the authority to push the topic up the waiting list. Unfortunately, the waiting lists in some organizations are quite big, while not having a good set of requests on the list might pinpoint that issues might exist in other areas!  

BI professionals and organizations probably know the theory well but prove to have difficulties in combining it with praxis. It’s challenging to obtain the needed impact (eventually the maximum effect) with a minimum of effort while addressing the different topics. Sooner or later the complexity of the topic kicks in, messing things around!

17 September 2024

#️⃣Software Engineering: Mea Culpa (Part V: All-Knowing Developers are Back in Demand?)

Software Engineering Series

I’ve been reading many job descriptions lately related to my experience and curiously or not I observed that many organizations look for developers with Microsoft Dynamics experience in the CRM, respectively Finance and Operations (F&O) and Business Central (BC) areas. It’s a good sign that the adoption of Microsoft solutions for CRM and ERP increases, especially when one considers the progress made in the BI and AI areas with the introduction of Microsoft Fabric, which gives Microsoft a considerable boost. Conversely, it seems that the "developers are good for everything" syntagma is back, at least from what one reads in job descriptions. 

Of course, it’s useful to have an inhouse developer who can address all the aspects of an implementation, though that’s a lot to ask considering the different non-programming areas that need to be addressed. It’s true that a developer with experience can handle Requirements, Data and Process Management, respectively Data Migrations and Business Intelligence topics, though if one considers that each of the topics can easily become a full-time job before, during and post-project implementations. I’ve been there and I (hopefully) know that the jobs imply. Even if an experienced programmer can easily handle the different aspects, there will be also times when all the topics combined will be too much for a person!

It's not a novelty that job descriptions are treated like Christmas lists, but it’s difficult to differentiate between essential and nonessential skillset. I read many jobs descriptions lately in which among a huge list of demands, one of the requirements is to program in the F&O framework, sign that D365 programmers are in high demand. I worked for many years as programmer and Software Engineer, respectively in the BI area, where SQL and non-SQL code is needed. Even if I can understand the code in F&O, does it make sense to learn now to program in X++ and the whole framework? 

It's never too late to learn new tricks, respectively another programming language and/or framework. It even helps to provide better solutions in other areas, though frankly I would invest my time in other areas, and AI-related topics like AI prompting or Data Science seem to be more interesting in the long term, especially when they are already in demand!

There seems to be a tendency for Data Science professionals to do everything, building their own solutions, ignoring the experience accumulated respectively the data models built in BI and Data Analytics areas, as if the topics and data models are unrelated! It’s also true that AI-modeling comes with its own requirements in what concerns data modeling (e.g. translating non-numeric to numeric values), though I believe that common ground can be found!

Similarly, the notebook-based programming seems to replicate logic in each solution, which occasionally makes sense, though personally I wouldn’t recommend it as practice! The other day, I was looking at code developed in Python to mimic the joining of tables, when a view with the same could be easier (re)used, maintained, read and probably more efficient, even if different engines will be used. It will be interesting to see how the mix of spaghetti solutions will evolve over time. There are developers already complaining of the number of objects used in the process by building logic for each layer from the medallion architecture! Even if it makes sense from architectural considerations, it will become a nightmare in time.

One can wonder also about nomenclature used – Data Engineer or Prompt Engineering for the simple manipulation of data between structures in data transformations, respectively for structuring the prompts for AI. I believe that engineering involves more than this, no matter the context! 

Previous Post <<||>> Next Post

16 September 2024

🧭Business Intelligence: Mea Culpa (Part IV: Generalist or Specialist in an AI Era?)

Business Intelligence Series
Business Intelligence Series

Except the early professional years when I did mainly programming for web or desktop applications in the context of n-tier architectures, over the past 20 years my professional life was a mix between BI, Data Analytics, Data Warehousing, Data Migrations and other topics (ERP implementations and support, Project Management, IT Service Management, IT, Data and Applications Management), though the BI topics covered probably on average at least 60% of my time, either as internal or external consultant. 

I can consider myself thus a generalist who had the chance to cover most of the important aspects of a business from an IT perspective, and it was thus a great experience, at least until now! It’s a great opportunity to have the chance to look at problems, solutions, processes and the various challenges and opportunities from different perspectives. Technical people should have this opportunity directly in their jobs through the communication occurring in projects or IT services, though that’s more of a wish! Unfortunately, the dialogue between IT and business occurs almost only over the tickets and documents, which might be transparent but isn’t necessarily effective or efficient! 

Does working only part time in an area make one person less experienced or knowledgeable than other people? In theory, a full-time employee should get more exposure in depth and/or breadth, but that’s relative! It depends on the challenges one faces, the variation of the tasks, the implemented solutions, their depth and other technical and nontechnical factors like training, one’s experience in working with the various tools, the variety of the tasks and problem faced, professionalism, etc. A richer exposure can but not necessarily involve more technical and nontechnical knowledge, and this shouldn’t be taken as given! There’s no right or wrong answer even if people tend to take sides and argue over details.

Independently of job's effective time, one is forced to use his/her time to keep current with technologies or extend one’s horizon. In IT, a professional seldom can rely on what is learned on the job. Fortunately, nowadays one has more and more ways of learning, while the challenge shifts toward what to ignore, respectively better management of one’s time while learning. The topics increase in complexity and with this blogging becomes even more difficult, especially when one competes with AI content!

Talking about IT, it will be interesting to see how much AI can help or replace some of the professions or professionals. Anyway, some jobs will become obsolete or shift the focus to prompt engineering and technical reviews. AI still needs explicit descriptions of how to address tasks, at least until it learns to create and use better recipes for problem definition and solving. The bottom line, AI and its use can’t be ignored, and it can and should be used also in learning new things. It’s amazing what one can do nowadays with prompt engineering! 

Another aspect on which AI can help is to tailor the content to one’s needs. A high percentage in the learning process is spent on fishing in a sea of information for content that is worth knowing, respectively for a solution to one’s needs. AI must be able to address also some of the context without prompters being forced to give information explicitly!

AI opens many doors but can close many others. How much of one’s experience will remain relevant over the next years? Will AI have more success in addressing some of the challenges existing in people’s understanding or people will just trust AI blindly? Anyway, somebody must be smarter than AI, and here people’s collective intelligence probably can prove to be a real match. 

14 September 2024

🗄️Data Management: Data Governance (Part II: Heroes Die Young)

Data Management Series
Data Management Series

In the call for action there are tendencies in some organizations to idealize and overcharge main actors' purpose and image when talking about data governance by calling them heroes. Heroes are those people who fight for a goal they believe in with all their being and occasionally they pay the supreme tribute. Of course, the image of heroes is idealized and many other aspects are ignored, though such images sell ideas and ideals. Organizations might need heroes and heroic deeds to change the status quo, but the heroism doesn't necessarily payoff for the "heroes"! 

Sometimes, organizations need a considerable effort to change the status quo. It can be people's resistance to new, to the demands, to the ideas propagated, especially when they are not clearly explained and executed. It can be the incommensurable distance between the "AS IS" and the "TO BE" perspectives, especially when clear paths aren't in sight. It can be the lack of resources (e.g., time, money, people, tools), knowledge, understanding or skillset that makes the effort difficult. 

Unfortunately, such initiatives favor action over adequate strategies, planning and understanding of the overall context. The call do to something creates waves of actions and reactions which in the organizational context can lead to storms and even extreme behavior that ranges from resistance to the new to heroic deeds. Finding a few messages that support the call for action can help, though they can't replace the various critical for success factors.

Leading organizations on a new path requires a well-defined realistic strategy, respectively adequate tactical and operational planning that reflects organizations' specific needs, knowledge and capabilities. Just demanding from people to do their best is not enough, and heroism has chances to appear especially in this context. Unfortunately, the whole weight falls on the shoulders of the people chosen as actors in the fight. Ideally, it should be possible to spread the whole weight on a broader basis which should be considered the foundation for the new. 

The "heroes" metaphor is idealized and the negative outcome probably exaggerated, though extreme situations do occur in organizations when decisions, planning, execution and expectations are far from ideal. Ideal situations are met only in books and less in practice!

The management demands and the people execute, much like in the army, though by contrast people need to understand the reasoning behind what they are doing. Proper execution requires skillset, understanding, training, support, tools and the right resources for the right job. Just relying on people's professionalism and effort is not enough and is suboptimal, but this is what many organizations seem to do!

Organizations tend to respond to the various barriers or challenges with more resources or pressure instead of analyzing and depicting the situation adequately, and eventually change the strategy, tactics or operations accordingly. It's also difficult to do this as long an organization doesn't have the capabilities and practices of self-check, self-introspection, self-reflection, etc. Even if it sounds a bit exaggerated, an organization must know itself to overcome the various challenges. Regular meetings, KPIs and other metrics give the illusion of control when self-control is needed. 

Things don't have to be that complex even if managing data governance is a complex endeavor. Small or midsized organizations are in theory more capable to handle complexity because they can be more agile, have a robust structure and the flow of information and knowledge has less barriers, respectively a shorter distance to overcome, at least in theory. One can probably appeal to the laws and characteristics of networks to understand more about the deeper implications, of how solutions can be implemented in more complex setups.

🗄️Data Management: Data Culture (Part V: Quid nunc? [What now?])

Data Management Series
Data Management Series

Despite the detailed planning, the concentrated and well-directed effort with which the various aspects of data culture are addressed, things don't necessarily turn into what we want them to be. There's seldom only one cause but a mix of various factors that create a network of cause and effect relationships that tend to diminish or increase the effect of certain events or decisions, and it can be just a butterfly's flutter that stirs a set of chained reactions. The butterfly effect is usually an exaggeration until the proper conditions for the chaotic behavior appear!

The butterfly effect is made possible by the exponential divergence of two paths. Conversely, success needs probably multiple trajectories to converge toward a final point or intermediary points or areas from which things move on the "right" path. Success doesn't necessarily mean reaching a point but reaching a favorable zone for future behavior to follow a positive trend. For example, a sink or a cone-like structure allow water to accumulate and flow toward an area. A similar structure is needed for success to converge, and the structure results from what is built in the process. 

Data culture needs a similar structure for the various points of interest to converge. Things don't happen by themselves unless the force of the overall structure is so strong that allows things to move toward the intended path(s). Even then the paths can be far from optimal, but they can be favorable. Probably, that's what the general effort must do - bring the various aspects in the zone for allowing things to unfold. It might still be a long road, though the basis is there!

A consequence of this metaphor is that one must identify the important aspects, respectively factors that influence an organization's culture and drive them in the right direction(s) – the paths that converge toward the defined goal(s). (Depending on the area of focus one can consider that there are successions of more refined goals.)

The structure that allows things to converge is based on the alignment of the various paths and implicitly forces. Misalignment can make a force move in other direction with all the consequences deriving from this behavior. If its force is weak, probably will not have an impact over the overall structure, though that's relative and can change in time. 

One may ask for what's needed all this construct, even if it doesn’t reflect the reality. Sometimes, even a not entirely correct model can allow us to navigate the unknown. Model's intent is to depict what's needed for a initiative to be successful. Moreover, success doesn’t mean to shoot bulls eye but to be first in the zone until one's skillset enables performance.

Conversely, it's important to understand that things don't happen by themselves. At least this seems to be the feeling some initiatives let. One needs to build and pull the whole structure in the right direction and the alignment of the various forces can reduce the overall effort and increase the chances for success. Attempting to build something just because it’s written in documentation without understanding the whole picture (or something close to it) can easily lead to failure.

This doesn’t mean that all attempts that don’t follow a set of patterns are doomed to failure, but that the road will be more challenging and will probably take longer. Conversely, maybe these deviations from the optimal paths are what an organization needs to grow, to solidify the foundation on which something else can be built. The whole path is an exploration that doesn’t necessarily match what is written in books, respectively the expectations!

Previous Post <<||>> Next Post

11 September 2024

🗄️Data Management: Data Culture (Part IV: Quo vadis? [Where are you going?])

Data Management Series

The people working for many years in the fields of BI/Data Analytics, Data and Process Management probably met many reactions that at the first sight seem funny, though they reflect bigger issues existing in organizations: people don’t always understand the data they work with, how data are brought together as part of the processes they support, respectively how data can be used to manage and optimize the respective processes. Moreover, occasionally people torture the data until it confesses something that doesn’t necessarily reflect the reality. It’s even more deplorable when the conclusions are used for decision-making, managing or optimizing the process. In extremis, the result is an iterative process that creates more and bigger issues than whose it was supposed to solve!

Behind each blunder there are probably bigger understanding issues that need to be addressed. Many of the issues revolve around understanding how data are created, how are brought together, how the processes work and what data they need, use and generate. Moreover, few business and IT people look at the full lifecycle of data and try to optimize it, or they optimize it in the wrong direction. Data Management is supposed to help, and it does this occasionally, though a methodology, its processes and practices are as good as people’s understanding about data and its use! No matter how good a data methodology is, it’s as weak as the weakest link in its use, and typically the issues revolving around data and data understanding are the weakest link. 

Besides technical people, few businesspeople understand the full extent of managing data and its lifecycle. Unfortunately, even if some of the topics are treated in the books, they are too dry, need hands on experience and some thought in corroborating practices with theories. Without this, people will do things mechanically, processes being as good as the people using them, their value becoming suboptimal and hinder the business. That’s why training on Data Management is not enough without some hands-on experience!

The most important impact is however in BI/Data Analytics areas - how the various artifacts are created and used as support in decision-making, process optimization and other activities rooted in data. Ideally, some KPIs and other metrics should be enough for managing and directing a business, however just basing the decisions on a set of KPIs without understanding the bigger picture, without having a feeling of the data and their quality, the whole architecture, no matter how splendid, can breakdown as sandcastle on a shore meeting the first powerful wave!

Sometimes it feels like organizations do things from inertia, driven by the forces of the moment, initiatives and business issues for which temporary and later permanent solutions are needed. The best chance for solving many of the issues would have been a long time ago, when the issues were still small to create any powerful waves within the organizations. Therefore, a lot of effort is sometimes spent in solving the consequences of decisions not made at the right time, and that can be painful and costly!

For building a good business one needs also a solid foundation. In the past it was enough to have a good set of products that are profitable. However, during the past decade(s) the rules of the game changed driven by the acerb competition across geographies, inefficiencies, especially in the data and process areas, costing organizations on the short and long term. Data Management in general and Data Quality in particular, even if they’re challenging to quantify, have the power to address by design many of the issues existing in organizations, if given the right chance!

Previous Post <<||>> Next Post

02 September 2024

🗄️Data Management: Data Culture (Part III: A Tale of Two Cities)


One of the curious things is that as part of their change of culture organizations try to adopt a new language, to give new names to things, try to make distinction between the "AS IS" and "TO BE" states, insisting how the new image will replace the previous one. Occasionally, they even stress how bad things were in the past and how great will be in the future, trying to depict the future in vivid images. 

Even if this might work occasionally, it tends to confuse people and this not necessarily because of the language and the metaphors used, or the fact that same people were in the same positions, but the lack of belief or conviction, respectively half-hearted enthusiasm personified by the parties. To "convert" people to new philosophies one needs to believe in them or mimic that in similar terms. The lack of conviction can easily have a false effect that spreads within the organization. 

Dissociation from the past, from what an organization was, tends to increase the resistance against the new because two different images are involved. On one side there’s the attachment to the past, and even if there were mistakes made, or things didn’t go optimally, the experiences and decisions made are part of the organization, of the people who made them. People as individuals and as an organization should embrace their mistakes and good deeds altogether, learn from them, improve what is to improve and move forward. Conversely, there’s the resistance to the new, to the change, words they don’t believe in yet, the bigger picture is still fuzzy in their minds, and there can be many other reasons that don’t agree with one’s understanding. 

There are images, memories, views, decisions, objectives of the past and people need to recognize the road from what it was to what should be. One can hypothesize that embracing one’s mistake and understanding, the chain of reasoning from then and from now will help an organization transition towards the new. Awareness of one’s situation most probably will help in the transition process. Unfortunately, leaders and technology gurus tend to depict the past as negative, creating thus more negative emotions, respectively reactions in the process. The past is still part of the people, of the organization and will continue to be.

Conversely, the disassociation from the past can create more resistance to the new, and probably more unnecessary barriers. Probably, it’s easier for the gurus to build the new if the past weren’t there! Forgetting the past would be an error because there are many lessons that can be still useful. All the experience needs to be redirected in new directions. It’s more important to help people see the vision of the future, understand their missions, the paths to be followed and the challenges ahead, . 

It sounds more of a rambling from a psychology course, though organizations do have an image they want to change, to bring forth to cope with the various challenges, an image they want to reflect when needed. There are also organizations that want to change but keep their image intact, which leads to deeper conflicts. Unfortunately, changes of image involve conflicts that can become complex from what they bring forth.  

A data culture should increase people’s awareness of the present, respectively of the future, of what it takes to bridge the gap, the challenges ahead, how to embrace change, how to keep a realistic perspective, how to do a reality check, etc. Methodologies can increase people’s awareness and provide the theoretical basis, though walking the path will be a different story for everyone. 

01 September 2024

🗄️Data Management: Data Governance (Part I: No Guild of Heroes)

Data Management Series
Data Management Series

Data governance appeared around 1980s as topic though it gained popularity in early 2000s [1]. Twenty years later, organizations still miss the mark, respectively fail to understand and implement it in a consistent manner. As usual, the reasons for failure are multiple and they vary from misunderstanding what governance is all about to poor implementation of methodologies and inadequate management or leadership. 

Moreover, methodologies tend to idealize the various aspects and is not what organizations need, but pragmatism. For example, data governance is not about heroes and heroism [2], which can give the impression that heroic actions are involved and is not the case! Actions for the sake of action don’t necessarily lead to change by themselves. Organizations are in general good at creating meaningless action without results, especially when people preoccupy themselves, miss or ignore the mark. Big organizations are very good at generating actions without effects. 

People do talk to each other, though they try to solve their own problems and optimize their own areas without necessarily thinking about the bigger picture. The problem is not necessarily communication or the lack of depth into business issues, people do communicate, know the issues without a business impact assessment. The challenge is usually in convincing the upper management that the effort needs to be consolidated, supported, respectively the needed resources made available. 

Probably, one of the issues with data governance is the attempt of creating another structure in the organization focused on quality, which has the chances to fail, and unfortunately does fail. Many issues appear when the structure gains weight and it becomes a separate entity instead of being the backbone of organizations. 

As soon organizations separate the data governance from the key users, management and the other important decisional people in the organization, it takes a life of its own that has the chances to diverge from the initial construct. Then, organizations need "alignment" and probably other big words to coordinate the effort. Also such constructs can work but they are suboptimal because the forces will always pull in different directions.

Making each manager and the upper management responsible for governance is probably the way to go, though they’ll need the time for it. In theory, this can be achieved when many of the issues are solved at the lower level, when automation and further aspects allow them to supervise things, rather than hiding behind every issue. 

When too much mircomanagement is involved, people tend to busy themselves with topics rather than solve the issues they are confronted with. The actual actors need to be empowered to take decisions and optimize their work when needed. Kaizen, the philosophy of continuous improvement, proved itself that it works when applied correctly. They’ll need the knowledge, skills, time and support to do it though. One of the dangers is however that this becomes a full-time responsibility, which tends to create a separate entity again.

The challenge for organizations lies probably in the friction between where they are and what they must do to move forward toward the various objectives. Moving in small rapid steps is probably the way to go, though each person must be aware when something doesn’t work as expected and react. That’s probably the most important aspect. 

So, the more functions are created that diverge from the actual organization, the higher the chances for failure. Unfortunately, failure is visible in the later phases, and thus self-awareness, self-control and other similar “qualities” are needed, like small actors that keep the system in check and react whenever is needed. Ideally, the employees are the best resources to react whenever something doesn’t work as per design. 

Previous Post <<||>> Next Post 

Resources:
[1] Wikipedia (2023) Data Management [link]
[2] Tiankai Feng (2023) How to Turn Your Data Team Into Governance Heroes [link]


22 August 2024

🧭Business Intelligence: Perspectives (Part V: From Data to Storytelling III)

Business Intelligence Series
Business Intelligence Series 

As children we heard or later read many stories, and even if few remained imprinted in memory, we can still recognize some of the metaphors and ideas used. Stories prepared us for life, and one can suppose that the business stories we hear nowadays have similar intent, charge and impact. However, if we dig deeper into each story and dissect it, we may be disappointed by its simplicity, the resemblance to other stories, to what we've heard over time. Moreover, stories can bring also negative connotations, that can impact any other story we hear. 

From the scores or hundreds of distinct stories that have been told, few reach a magnitude that can become more than the stories themselves, few become a catalyst for the auditorium, and even then they tend to manipulate. Conversely, well-written transformative stories can move mountains when they resonate with the auditorium. In a leader’s motivational speech such stories can become a catalyst that moves people in the intended direction.

Children stories are quite simple and apparently don’t need special constructs even if the choice of words, structure and messages is important. Moving further into organizations, storytelling becomes more complex, upon case, structures and messages need to follow certain conventions within some politically correct scripts. Facts become important to the degree they serve the story, though the purposes they serve change with time, becoming secondary to the story. Storytelling becomes thus just of way of changing the facts as seems fit to the storyteller. 

Storytelling has its role in organizations for channeling the multitude of messages across various structures. However, the more one hears the word storytelling, the more likely one is closer to fiction than to business decision-making. It's also true that the word in itself carries a power we all tasted during childhood and why not much later. The word has a magic power that appeals to our memories, to our feelings, to our expectations. However, as soon one's expectations are not met, the fight with the chimeras turns into a battle of our own. Yes, storytelling has great power when used right, when there's a story to tell, when the business narratives are worth telling. 

The problem with stories is that no matter how much they are based on real facts or happenings, they become fictitious in time, to the degree that they lose some of the most important facts they were based on. That’s valid especially when there’s no written track of the story, though even then various versions of the story can multiply outside of the standard channels and boundaries. 

Even if the author tried to keep the story as close to the facts, the way stories are understood, remembered and retold depend on too many factors - the words used, the degree to which metaphors and similar elements are understood, remembered and transmitted correctly, the language used, the mental structure existing in the auditorium, the association of words, ideas or metaphors, etc.

Unfortunately, the effect of stories can be negative too, especially when stories are designed to manipulate the auditorium beyond any ethical norms. When they don’t resonate with the crowd or are repeated unnecessary, the narratives may have adverse effects and the messages can get lost in the crowd or create resistance. Moreover, stories may have a multifold and opposite effect within different segments of the auditorium. 

Storytelling can make hearts and minds resonate with the carried messages, though misdirected, improper or poorly conceived stories have also the power to destroy all that have been built over the years. Between the two extremes there’s a small space to send the messages across!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.