Showing posts with label reporting framework. Show all posts
Showing posts with label reporting framework. Show all posts

21 October 2023

🧊Data Warehousing: Architecture V (Dynamics 365, the Data Lakehouse and the Medallion Architecture)

Data Warehousing
Data Warehousing Series

An IT architecture is built and functions under a set of constraints that derive from architecture’s components. Usually, if we want flexibility or to change something in one area, this might have an impact in another area. This rule applies to the usage of the medallion architecture as well! 

In Data Warehousing the medallion architecture considers a multilayered approach in building a single source of truth, each layer denoting the quality of data stored in the lakehouse [1]. For the moment are defined 3 layers - bronze for raw data, silver for validated data, and gold for enriched data. The concept seems sound considering that a Data Lake contains all types of raw data of different quality that needs to be validated and prepared for reporting or other purposes.

On the other side there are systems like Dynamics 365 that synchronize the data in near-real-time to the Data Lake through various mechanisms at table and/or data entity level (think of data entities as views on top of other tables or views). The databases behind are relational and in theory the data should be of proper quality as needed by business.

The greatest benefit of serverless SQL pool is that it can be used to build near-real-time data analytics solutions on top of the files existing in the Data Lake and the mechanism is quite simple. On top of such files are built external tables in serverless SQL pool, tables that reflect the data model from the source systems. The external tables can be called as any other tables from the various database objects (views, stored procedures and table-valued functions). Thus, can be built an enterprise data model with dimensions, fact-like and mart-like entities on top of the synchronized filed from the Data Lake. The Data Lakehouse (= Data Warehouse + Data Lake) thus created can be used for (enterprise) reporting and other purposes.

As long as there are no special requirements for data processing (e.g. flattening hierarchies, complex data processing, high-performance, data cleaning) this approach allows to report the data from the data sources in near-real time (10-30 minutes), which can prove to be useful for operational and tactical reporting. Tapping into this model via standard Power BI and paginated reports is quite easy. 

Now, if it's to use the data medallion approach and rely on pipelines to process the data, unless one is able to process the data in near-real-time or something compared with it, a considerable delay will be introduced, delay that can span from a couple of hours to one day. It's also true that having the data prepared as needed by the reports can increase the performance considerably as compared to processing the logic at runtime. There are advantages and disadvantages to both approaches. 

Probably, the most important scenario that needs to be handled is that of integrating the data from different sources. If unique mappings between values exist, unique references are available in one system to the records from the other system, respectively when a unique logic can be identified, the data integration can be handled in serverless SQL pool.

Unfortunately, when compared to on-premise or Azure SQL functionality, the serverless SQL pool has important constraints - it's not possible to use scalar UDFs, tables, recursive CTEs, etc. So, one needs to work around these limitations and in some cases use the Spark pool or pipelines. So, at least for exceptions and maybe for strategic reporting a medallion architecture can make sense and be used in parallel. However, imposing it on all the data can reduce flexibility!

Bottom line: consider the architecture against your requirements!

Previous Post <<||>>> Next Post

[1] What is the medallion lakehouse architecture?
https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion

27 February 2016

🧭Business Intelligence: Perspectives (Part II: The Complexity Myth)

Business Intelligence

Introduction

While looking over “Business Intelligence Concepts and Platform Capabilities” Coursera MOOC resources for Module 2 I run into two similar articles from Solutions Review, respectively Information Age. What caught my attention was the easiness with which the complexity of BI “myth” is approached in both columns.

According to the two sources the capabilities of nowadays BI tools “enabled business users to easily identify and present trends in an impactful way” [1], and “do not require an expert at the helm” [2]. It became thus simpler for users to independently query data and create interactive reports and presentations [2]. In both columns one can read between the lines that the simplicity of using BI tools is equivalent with negating the complexity of BI, which from my point of view is false. In fact here are regarded especially the self-service BI tools, in trend nowadays, that allow users to easily perform ad-hoc analysis with a minimal involvement from IT. Self-service BI is only a subset of what BI for organizations means, and just a capability from the many BI capabilities an organization needs in theory, even if some organizations might use it extensively.

Beyond the Surface

A BI tool is not a BI solution per se, even if many generic BI solutions for different systems are available out of the box. This is one of the biggest confusion managers, users and unfortunately also BI professionals make. A BI tool offers the technological basis for creating a BI infrastructure, though it comes with no guarantees. It takes a well-defined IT and business strategy, one or more successful projects, skillful developers and users in order to harness the BI investment.

On the other side it’s also true that organizations can obtain results also from less, though BI doesn’t equates with any ad-hoc analysis performed by users, even if they use BI tools for this purpose. BI is not only about tools, reporting and revealing trends in the data. BI often implies a holistic knowledge about the business and certain data awareness, without which users will start aggregating and comparing apples with pears and wonder why they taste and look different.

If everything were so simple then why so many BI projects fail to deliver what’s expected? Why so many managers complain that they don’t have the data they need, when they need them? Sure maybe the problem lies in over-complexifying the whole BI landscape and treating everything from a high-level, though that’s more likely not it.

It’s a Teamwork Knowledge Game

BI is or needs to be monitoring and problem solving oriented. This requires a deep understanding about processes and business. There are business users and also BI professionals who don’t have the knowledge one needs in order to approach a business problem. One can see that from the premises they have, the questions they raise, the data they consider, the models they build, and the results.

From a BI professional’s perspective, even if one has a broad knowledge about various businesses, one often lacks the insight in a given business. BI professionals can seldom provide adequate BI solutions without input and feedback from the business. Some BI professionals rely too much on their knowledge, same as the business sometimes expects a maximum output from BI professionals by providing a minimum of input.

Considering the business users, quite often their focus and knowledge cover only the data boundaries of their department, while many problems extend over those boundaries. They know facts that are not necessarily reflected in the data. Even if they are closer to the data than other parties, they still lack some data-awareness (including statistical awareness) in order to approach problems.

Somebody was saying ironically when talking about users’ data and problem solving skills - “not everybody is a Bill Gates or Steve Jobs”. Continuing the idea, one can’t expect users to act as such. For sure there are many business users who are better problem solvers than BI consultants, though on the other side one can’t expect that the average business user will have the same skillset as an experienced BI consultant. This is in fact one of the problems of self-service BI. Probably with time and effort organization will develop such resources, though some help from BI professionals will be still needed. Without a good cooperation between the business and BI professionals an organization might not have the hoped results when investing in BI

More on Complexity

The complexity arises when one tries to make more with the data, especially the data found in raw form. Usually the complexity of raw data can be addressed by building a logical or physical model that allows easier consumption of data. Here is the point where the users find themselves overwhelmed, because for this is required a good knowledge of the physical data model and its semantics, the technical knowledge to build models and the skills to reengineer the logic available in the source systems. These are the themes BI professionals are supposed to excel in. Talking about models, they are the most difficult to build because they reflect various segments of the business, they reflect a breakdown of the complexity. It’s also the point where many BI projects fail as the built models don’t reflect the reality or aren’t capable to answer to business questions.

Coming back to the two columns, I have to point out that the complexity of a subject or domain can’t be judged based on how easy is to approach basic tasks. The complexity lies typically when one goes beyond the basics, when one dives into details. In case of BI its complexity starts when one attempts mixing various technologies and knowledge domains to model and solve daily business problems in an integrated, holistic, aligned, consistent and cost-effective manner. The more the technologies, the knowledge domains and constraints one has to consider, the more complex the BI landscape and solutions become.

On the other side this doesn’t mean that the BI infrastructure can’t be simplified, that BI can’t rely heavily or exclusively on self-service BI solutions. However for each strategy there are advantages and disadvantages and one more likely has to consider both sides of the coin in the process. And self-service BI has its own trade-offs, weaknesses that can be transformed in strengths with time.

Conclusion

When one considers nowadays BI tools capabilities, ad-hoc analyses are relatively easy to perform and can lead to results, though such analyses don’t equate with BI and the simplicity with which they are performed don’t necessarily imply that BI is simple as a whole. When one considers the complexity of nowadays businesses, the more one dives in various problems a business has, the more complex the BI landscape seems. In the end it’s in each organization powers to simplify and harmonize its BI infrastructure to a degree that its business goals aren’t affected negatively.


Previous Post <<||>> Next Post

Resources
[1] Information Age (2015) 5 Myths about Intelligence, by Ben Rossi, [Online] Available from: http://www.information-age.com/technology/information-management/123460271/5-myths-about-business-intelligence 
[2] SolutionsReview (2015) Top 5 Business Intelligence Myths Revealed, by Timothy King, [Online] Available from: http://solutionsreview.com/business-intelligence/top-5-business-intelligence-myths-revealed
[3] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb 
[4] Coursera (2016) Business Intelligence Concepts, Tools, and Applications MOOC, led by Jahangir Karimi, University of Colorado, [Online] Available from: https://www.coursera.org/learn/business-intelligence-tools

02 December 2015

🪙Business Intelligence: Reporting (Just the Quotes)

"A man's judgment cannot be better than the information on which he has based it. Give him no news, or present him only with distorted and incomplete data, with ignorant, sloppy, or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning process and make him somewhat less than a man." (Arthur H Sulzberger, [speech] 1948)

"The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, 'opinion' polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"It is probable that one day we shall begin to draw organization charts as a series of linked groups rather than as a hierarchical structure of individual 'reporting' relationships." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"[...] as the planning process proceeds to a specific financial or marketing state, it is usually discovered that a considerable body of 'numbers' is missing, but needed numbers for which there has been no regular system of collection and reporting; numbers that must be collected outside the firm in some cases. This serendipity usually pays off in a much better management information system in the form of reports which will be collected and reviewed routinely." (William H. Franklin Jr., Financial Strategies, 1987)

"Intangible assets [...] surpass physical assets in most business enterprises, both in value and contribution to growth, yet they are routinely expensed in the financial reports and hence remain absent from corporate balance sheets. This asymmetric treatment of capitalizing (considering as assets) physical and financial investment while expensing intangibles leads to biased and deficient reporting of firms’ performance and value." (Baruch Lev, "Intangibles: Management, Measurement, and Reporting", 2000)

"Project planning is the key to effective project management. Detailed and accurate planning of a project produces the managerial information that is the basis of project justification (costs, benefits, strategic impact, etc.) and the defining of the business drivers (scope, objectives) that form the context for the technical solution. In addition, project planning also produces the project schedules and resource allocations that are the framework for the other project management processes: tracking, reporting, and review." (Rob Thomsett, "Radical Project Management", 2002)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage 'what gets measured gets done' still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Reporting to the Board is a classic 'catch-22' situation. Boards complain about getting too much information too late, and management complains that up to 20% of their time is tied up in the Board reporting process. Boards obviously need to ascertain whether management is steering the ship correctly and the state of the crew and customers before they can relax and 'strategize' about future initiatives. The process of assessing the current status of the organization from the most recent Board report is where the principal problem lies. Board reporting needs to occur more efficiently and effectively for both the Board and management." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"A dashboard is like the executive summary of a report. We read executive summaries and skip the body of the report if the summary is more or less in line with our expectations. Trouble is, measurement is never exhaustive. It is only when we dive in that we realize what areas may have been missed." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"'Getting it right the first time' is a rare achievement, and ascertaining the organization’s winning KPIs and associated reports is no exception. The performance measure framework and associated reporting is just like a piece of sculpture: you can be criticized on taste and content, but you can’t be wrong. The senior management team and KPI project team need to ensure that the project has a just-do-it culture, not one in which every step and measure is debated as part of an intellectual exercise." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"In order to get measures to drive performance, a reporting framework needs to be developed at all levels within the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

30 November 2010

🧭Business Intelligence: Enterprise Reporting (Part IX: How Many Reporting Systems Do You Need?)

Business Intelligence
Business Intelligence Series

I hope nobody’s questioning the fact that an organization needs at least a reporting system in order to take advantage of the huge amount of data it has collected over time, in order to have an overview on what’s happening in the organization, take better decision supported by data, etc. In one way or another many organizations arrive to have in place two or more reporting systems and all the consequences deriving from it. From experience, I would expect that many professionals have put themselves at least once the above question, often they having to live with their decisions and they are not always happy about them. 

On one side we have the demand, the needs of the various departments and groups existing in an organization, while on the other side of the balance we have the various types of data and the technologies existing on the market (could be regarded as the supply), and the various constraints an organization might have: resources (financial, human, processing power, time), politics and philosophies, geography, complexity, etc. If they don’t seem so many you have to consider that the respective constraints could be further split into a long chain of causality, the books on Project Management, Architecture, Methodologies and other related topics treating them in detail, so no need to go there.

From practical reasons, I’d like to reformulate the above question as follows:
1. Is it enough only one reporting system or do we need more than one?
2. What’s the optimum number of reporting systems existing in an organization?
 
A blunt answer for the first question would be that an organization might need as many reporting systems as it’s needed in order to satisfy the existing reporting needs, in other words the demand. Straightforward answer but not really acceptable for a Manager, in particular, and an IT Professional, in general. Another blunt answer, this time for the second question, would be that an organization needs only “one and good” reporting system, though that’s contradicting the various constraints existing in IT world. So, what should it be? 

Typically there are two views, the bottom up view, in this case starting with the data and building up to the reporting demands, and top down view, starting from the reporting requirements to data. In this play-set the technologies, in general, the reporting systems, in particular, are playing the role of middle point, the two views applying in respect to data vs. reporting technologies, reporting technologies vs. reporting requirements, increasing the complexity of the view. Sometimes is enough to focus only on the reporting requirements, why we consider then also the data in this scenario?! Remember, reporting it’s all about harnessing the value of your data, not only in finding the right reporting solution.

Often, in order to understand the requirements of an organization and understanding the value of data it’s easier to look at the various types of data it stores – the bottom up approach. Typically we could make distinction between master vs. transactional data, raw vs. aggregated data, structured vs. semi-structured vs. non-structured data, cleaned vs. not cleaned data, historical vs. live data, mined, distributed or heterogenous data, to mention just a few of the important ways of categorizing the data.

The respective types of data are important because they require special types of techniques or tools to process, report and harness them. The master and transactional data are at the heart of business applications, the “fluid” that flows and nourishes them, with an incredible potential when harnessed at its real value. Such unaltered data are typically structured at macro level, though they could contain also semi-structured or non-structured chunks of data, could have different levels of quality, could be distributed or heterogenous, etc. The range of data repositories range from text or document systems to specialized databases, any mixture being in theory possible.

So we have a huge volume of data collected over time, how could we harness them in order to reveal their real value? Excepting the operational use, the usefulness of raw data, data found in their unaltered form, stops usually at the point where their complexity falls beyond people’s limits of understanding them. Thus we arrived to aggregate, recombine or extract chunks from the data, bringing the data to a more comprehensible form. We even arrived to extract more information out of the data using specific techniques known as data mining methods, algorithms that allow us to identify associations, cluster or interpolate the data, the complexity of such algorithms evolving in considerable in the past two decades.
 
There are few software applications which provide the whole range of types of data processing, most of them resuming in recombining and presenting the data. Also data presentation offers a considerable range of formats (e.g. tabular, text, XML, Excel, charts and other diagrams) with complex navigational functionality (slice-and-dice, drill-down, drill-through, click-through), different ways of making the data available for consumption (direct-access, cached, web services), different security levels, etc.

When it comes to managers and users they want the data at the level of detail and form that allows the easiest/lowest proximate understanding level, and, “yesterday”, to use a word that reflects the urgency of requirements. This could be done in theory by coming with a whole set of reports covering all possible requirements, though that’s not so efficient as investment and not always possible with a button click, as probably all we’d like to. 

For the ones familiarized to Manufacturing, it’s actually a disguised push vs. pull scenario, pushing the reports to the user without expecting their demands vs. waiting for the user to forward the reporting requirements, from which to derive eventually new reports or make changes to existing ones. In Manufacturing it’s all about finding the right balance between push and demand, and even if the respective field has been found some (best) practices that leads to the middle way, in IT we still have to dig for it. The road seems to lead nowhere… but it helps getting a rough understanding of what a report involves, at least in what concerns the important non-programming stuff.

In the top down approach, as the data are remaining relatively constant, it’s easier to focus on the set of requirements and the reporting tool(s) used. It makes sense to choose the reporting tool that covers at least the most important requirements while for the other you have the choice to use workarounds or address them using two or more reporting solutions, attempting again to cover the most important requirements, and so on. The problem with having more than one reporting solution is that often data, logic and reports arrive to be replicated, the whole reporting infrastructure becoming more difficult to manage. On the other side as long you are having a clear (partitioned) delimitation between the reporting frameworks, logic and data are replicated at minimum, having two or more reporting solutions seems to be an acceptable trade. Examples of such delimitations are for example the OLTP vs. OLAP solutions, to which adds the data mining reporting solution(s).

There are also attempts to extend an existing reporting framework above the initial functionality, though often people arrived to reinvent the wheel or create little monsters they arrive to live with. Also this is an acceptable approach, as long the reporting framework allows that. Some attempts might succeed to provide what they were designed for, others not. There are also some good news from this perspective, the appearance of mash-ups and semantic technologies could in the future allow the integration of reporting systems, making possible things that nowadays are quite challenging. The future is open of unlimited possibilities, but better stick to the present! For that stay tuned for a second post!

Previous Post <<||>> Next Post
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.