Showing posts with label Business intelligence. Show all posts
Showing posts with label Business intelligence. Show all posts

03 May 2025

🧭Business Intelligence: Perspectives (Part XXXI: More on Data Visualization)

Business Intelligence Series
Business Intelligence Series

There are many reasons why the data visualizations available in the different mediums can be considerate as having poor quality and unfortunately there is often more than one issue that can be corroborated with this - the complexity of the data or of the models behind them, the lack of identifying the right data, respectively aspects that should be visualized, poor data visualization software or the lack of skills to use its capabilities, improper choice of visual displays, misleading choice of scales, axes and other elements, the lack of clear outlines for telling a story respectively of pushing a story too far, not adapting visualizations to changing requirements or different perspectives, to name just the most important causes.

The complexity of the data increases with the dimensions associated typically with what we call currently big data - velocity, volume, value, variety, veracity, variability and whatever V might be in scope. If it's relatively easy to work with a small dataset, understanding its shapes and challenges, our understanding power decreases with the Vs added into the picture. Of course, we can always treat the data alike, though the broader the timeframe, the higher the chances are for the data to have important changing characteristics that can impact the outcomes. It can be simple definition changes or more importantly, the model itself. Data, processes and perspectives change fluidly with the many requirements, and quite often the further implications for reporting, visualizations and other aspects are not considered.

Quite often there's a gap between what one wants to achieve with a data visualization and the data or knowledge available. It might be a matter of missing values or whole attributes that would help to delimit clearly the different perspectives or of modelling adequately the processes behind. It can be the intrinsic data quality issues that can be challenging to correct after the fact. It can also be our understanding about the processes themselves as reflected in the data, or more important, on what's missing to provide better perspectives. Therefore, many are forced to work with what they have or what they know.

Many of the data visualizations inadvertently reflect their creators' understanding about the data, procedures, processes, and any other aspects related to them. Unfortunately, also business users or other participants have only limited views and thus their knowledge must be elicited accordingly. Even then, it might be pieces of data that are not reflected in any knowledge available.

If one tortures enough data, one or more stories worthy of telling can probably be identified. However, much of the data is dull to the degree that some creators feel forced to add elements. Earlier, one could have blamed the software for it, though modern software provides nice graphics and plenty of features that can help graphics creators in the process. Even data with high quality can reveal some challenges difficult to overcome. One needs to compromise and there can be compromises in many places to the degree that one can but wonder whether the end result still reflects reality. Unfortunately, it's difficult to evaluate the impact of such gaps, however progress can be made occasionally by continuously evaluating the gaps and finding the appropriate methods to address them.

Not all stories must have complex visualizations in which multiple variables are used to provide the many perspectives. Some simple visualizations can be enough for establishing common ground on which something more complex (or simple) can be built upon. Data visualization is a continuous process of exploration, extrapolation, evaluation, testing assumptions and ideas, where one's experience can be a useful mediator between the various forces. 

Previous Post <<||>> Next Post

24 April 2025

🧭Business Intelligence: Perspectives (Part XXX: The Data Science Connection)

Business Intelligence Series
Business Intelligence Series

Data Science is a collection of quantitative and qualitative methods, respectively techniques, algorithms, principles, processes and technologies used to analyze, and process amounts of raw and aggregated data to extract information or knowledge it contains. Its theoretical basis is rooted within mathematics, mainly statistics, computer science and domain expertise, though it can include further aspects related to communication, management, sociology, ecology, cybernetics, and probably many other fields, as there’s enough space for experimentation and translation of knowledge from one field to another.  

The aim of Data Science is to extract valuable insights from data to support decision-making, problem-solving, drive innovation and probably it can achieve more in time. Reading in between the lines, Data Science sounds like a superhero that can solve all the problems existing out there, which frankly is too beautiful to be true! In theory everything is possible, when in practice there are many hard limitations! Given any amount of data, the knowledge that can be obtained from it can be limited by many factors - the degree to which the data, processes and models built reflect reality, and there can be many levels of approximation, respectively the degree to which such data can be collected consistently. 

Moreover, even if the theoretical basis seems sound, the data, information or knowledge which is not available can be the important missing link in making any sensible progress toward the goals set in Data Science projects. In some cases, one might be aware of what's missing, though for the data scientist not having the required domain knowledge, this can be a hard limit! This gap can be probably bridged with sensemaking, exploration and experimentation approaches, especially by applying models from other domains, though there are no guarantees ahead!

AI can help in this direction by utilizing its capacity to explore fast ideas or models. However, it's questionable how much the models built with AI can be further used if one can't build mechanistical mental models of the processes reflected in the data. It's like devising an algorithm for winning at lottery small amounts, though investing more money in the algorithm doesn't automatically imply greater wins. Even if occasionally the performance is improved, it's questionable how much it can be leveraged for each utilization. Statistics has its utility when one studies data in aggregation and can predict average behavior. It can’t be used to predict the occurrence of events with a high precision. Think how hard the prediction of earthquakes or extreme weather is by just looking at a pile of data reflecting what’s happening only in a certain zone!

In theory, the more data one has from different geographical areas or organizations, the more robust the models can become. However, no two geographies, respectively no two organizations are alike: business models, the people, the events and other aspects make global models less applicable to local context. Frankly, one has more chances of progress if a model is obtained by having a local scope and then attempting to leverage the respective model for a broader scope. Even then, there can be differences between the behavior or phenomena at micro, respectively at macro level (see the law of physics). 

This doesn’t mean that Data Science or AI related knowledge is useless. The knowledge accumulated by applying various techniques, models and programming languages in problem-solving can be more valuable than the results obtained! Experimentation is a must for organizations to innovate, to extend their knowledge base. It’s also questionable how much of the respective knowledge can be retained and put to good use. In the end, each organization must determine this by itself!

27 March 2025

🧭Business Intelligence: Perspectives (Part XXIX: Navigating into the Unknown)

Business Intelligence Series
Business Intelligence Series

One of the important challenges in Business Intelligence and the other related knowledge domains is that people try to oversell ideas, overstretching, shifting, mixing and bending the definition of concepts and their use to suit the sales pitch or other related purposes. Even if there are several methodologies built around data that attempt to provide a solid foundation on which organizations can build upon, terms like actionable, value, insight, quality or importance continue to be a matter of perception, interpretation, and quite often be misused. 

It's often challenging to define precisely such businesses concepts especially there are degrees of fuzziness that may apply to the different contexts that are associated with them. What makes a piece of signal, data, information or knowledge valuable, respectively actionable? What is the value, respectively values we associate with a piece or aggregation of information, insight or degree of quality? When do values, changes, variations and other aspects become important, respectively can be ignored? How much can one generalize or particularize certain aspects? And, many more such questions can be added to this line of inquiry. 

Just because an important value changed, no matter in what direction, it might mean nothing as long as the value moves in certain ranges, respectively other direct or indirect conditions are met or not. Sometimes, there are simple rules and models that can be used to identify the various areas that should trigger different responses, respectively actions, though even small variations can increase the overall complexity multifold. There seems to be certain comfort in numbers, even if the same numbers can mean different things to different people, at different points in time, respectively contexts.

In the pursuit to bridge the multitude of gaps and challenges, organization attempt to arrive at common definitions and understanding in what concerns the business terms, goals, objectives, metrics, rules, procedures, processes and other points of focus associated with the respective terms. Unfortunately, many such foundations barely support the edifices built as long as there’s no common mental models established!

Even if the use of shared models is not new, few organizations try to make the knowledge associated with them explicit, respectively agree on and evolve a set of mental models that reflect how the business works, what is important, respectively can be ignored, which are the dependent and independent aspects, etc. This effort can prove to be a challenge for many organizations, especially when several leaps of faith must be made in the process.

Independently on whether organizations use shared mental models, some kind of common ground must be achieved. It starts with open dialog, identifying the gaps, respectively the minimum volume of knowledge required for making progress in the right direction(s). The broader the gaps and the misalignment, the more iterations are needed to make progress! And, of course, one must know which are the destinations, what paths to follow, what to ignore, etc. 

It's important how we look at the business, and people tend to use different filters (aka glasses or hats) for this purpose. Simple relationships between the various facts are ideal, though uncommon. There’s a chain of causality that may trigger a certain change, though more likely one deals with a networked structure of cause-effect relationships. The world is more complex than we (can} imagine. We try to focus on the aspects we are aware of, respectively consider as important. However, in a complex world also small variations in certain areas can shift the overall weight to aspects outside of our focus, influence or area of responsibility. Quite often, what we don’t know is more important than what we know!

10 March 2025

🧭Business Intelligence: Perspectives (Part XXVIII: Cutting through Complexity)

Business Intelligence Series
Business Intelligence Series

Independently of the complexity of the problems, one should start by framing the problem(s) correctly and this might take several steps and iterations until a foundation is achieved, upon which further steps can be based. Ideally, the framed problem should reflect reality and should provide a basis on which one can build something stable, durable and sustainable. Conversely, people want quick low-cost fixes and probably the easiest way to achieve this is by focusing on appearances, which are often confused with more.

In many data-related contexts, there’s the tendency to start with the "solution" in mind, typically one or more reports or visualizations which should serve as basis for further exploration. Often, the information exchange between the parties involved (requestor(s), respectively developer(s)) is kept to a minimum, though the formalized requirements barely qualify for the minimum required. The whole process starts with a gap that can create further changes as the development process progresses, with all the consequences deriving from this: the report doesn’t satisfy the needs, more iterations are needed - requirements’ reevaluation, redesign, redevelopment, retesting, etc.

The poor results are understandable, all parties start with a perspective based on facts which provide suboptimal views when compared with the minimum basis for making the right steps in the right direction. That’s not only valid for reports’ development but also for more complex endeavors – data models, data marts and warehouses, and other software products. Data professionals attempt to bridge the gaps by formalizing and validating the requirements, building mock-ups and prototypes, testing, though that’s more than many organizations can handle!

There are simple reports or data visualizations for which not having prior knowledge of the needed data sources, processes and the business rules has a minimal impact on the further steps of the processes involved in building the final product(s). However, "all generalizations are false" to some degree, and there’s a critical point after which minimal practices tend to create more waste than companies can afford. Consequently, applying the full extent of the processes can lead to waste when the steps aren’t imperative for the final product.

Even if one is aware of all the implications, one’s experience and the application of best practices doesn’t guarantee the quality of the results as long as some kind of novelty, unknown, fuzziness or complexity is involved. Novelty can appear in different ways – process, business rules, data or problem formulations, particularities that aren’t easily perceived or correctly understood. Each important minor piece of information can have an exponential impact under the wrong circumstances.

The unknown can encompass novelty, though can be also associated with the multitude of facts not explicitly and/or directly considered. "The devil is in details" and it’s so easy for important or minor facts to remain hidden under the veil of suppositions, expectations, respectively under the complex and fuzzy texture of logical aspects. Many processes operate under strict rules, though there are various consequences, facts or unnecessary information that tend to increase the overall complexity and fuzziness.

Predefined processes, procedures and practices can help cut and illuminate through this complex structure associated with the various requirements and aspects of problems. Plunging headfirst can be occasionally associated with the need to evaluate what is known and unknown from facts and data’s perspective, to identify the gaps and various factors that can weigh in the final solution. Unfortunately, too often it’s nothing of this!  

Besides the multitude of good/best practices and problem-solving approaches, all one has is his experience and intuition to cut through the overall complexity. False steps are inevitable for finding the approachable path(s) from the requirements to the solution.

21 February 2025

🧩IT: Idioms, Sayings, Proverbs and Other Words of Wisdom

In IT setups one can hear many idioms, sayings and other type of words of wisdom that make the audience smile, even if some words seem to rub salt in the wounds. These are some of the idioms met in IT meetings or literature. Frankly, it's worth to write more about each of them, and this it the purpose of the "project". 

"A bad excuse is better than none"

"A bird in the hand is worth two in the bush": a working solution is worth more than hypothetically better solutions. 

"A drowning man will clutch at a straw": a drowning organization will clutch to the latest hope

"A friend in need (is a friend indeed)": 

"A journey of a thousand miles begins with a single step"

"A little learning is a dangerous thing"

"A nail keeps a shoe, a shoe a horse, a horse a man, a man a castle" (cca 1610): A nail keeps the shoe

"A picture is worth a thousand words"

"A stitch in time (saves nine)"

"Actions speak louder than words"

"All good things must come to an end"

"All generalizations are false" [attributed to Mark Twain, Alexandre Dumas (Père)]: Cutting though Complexity

"All the world's a stage, And all [...] merely players": A look forward

"All roads lead to Rome"

"All is well that ends well"

"An ounce of prevention is worth a pound of cure"

"Another day, another dollar"

"As you sow so shall you reap"

"Beauty is in the eye of the beholder"

"Better late than never": SQL Server and Excel Data

"Better safe than sorry": Deleting obsolete companies

"Big fish eat little fish"

"Better the Devil you know (than the Devil you do not)": 

"Calm seas never made a good sailor"

"Count your blessings"

"Dead men tell no tales"

"Do not bite the hand that feeds you"

"Do not change horses in midstream"

"Do not count your chickens before they are hatched"

"Do not cross the bridge till you come to it"

"Do not judge a book by its cover"

"Do not meet troubles half-way"

"Do not put all your eggs in one basket"

"Do not put the cart before the horse"

"Do not try to rush things; ignore matters of minor advantage" (Confucius): A tale of two cities II

"Do not try to walk before you can crawl"

"Doubt is the beginning, not the end, of wisdom"

"Easier said than done"

"Every cloud has a silver lining"

"Every little bit helps"

"Every picture tells a story"

"Failing to plan is planning to fail"Planning correctly misunderstood...

"Faith will move mountains"

"Fake it till you make it"

"Fight fire with fire"

"First impressions are the most lasting"

"First things first": Ways of looking at data

"Fish always rots from the head downwards"

"Fools rush in (where angels fear to tread)" (Alexander Pope, "An Essay on Criticism", cca. 1711): A tale of two cities II

"Half a loaf is better than no bread"

"Haste makes waste"

"History repeats itself"

"Hope for the best, and prepare for the worst"

"If anything can go wrong, it will" (Murphy's law)

"If it ain't broke, don't fix it.": Approaching a query

"If you play with fire, you will get burned"

"If you want a thing done well, do it yourself"

"Ignorance is bliss"

"Imitation is the sincerest form of flattery"

"It ain't over till/until it's over"

"It is a small world"

"It is better to light a candle than curse the darkness"

"It is never too late": A look backAll-knowing developers are back...

"It's a bad plan that admits of no modification." (Publilius Syrus)Planning Correctly Misunderstood I

"It’s not an adventure until something goes wrong." (Yvon Chouinard)Documentation - Lessons learned

"It is not enough to learn how to ride, you must also learn how to fall"

"It takes a whole village to raise a child"

"It will come back and haunt you"

"Judge not, that ye be not judged"

"Kill two birds with one stone"

"Knowledge is power, guard it well"

"Learn a language, and you will avoid a war" (Arab proverb)

"Less is more"

"Life is what you make it"

"Many hands make light work"

"Moderation in all things"

"Money talks"

"More haste, less speed"

"Necessity is the mother of invention"

"Never judge a book by its cover"

"Never say never"

"Never too old to learn"

"No man can serve two masters"

"No pain, no gain"

"No plan ever survived contact with the enemy.' (Carl von Clausewitz)Planning Correctly Misunderstood I

"Oil and water do not mix"

"One-man show": series

"One man's trash is another man's treasure"

"One swallow does not make a summer"

"Only time will tell": The Software Quality Perspective and AI, Microsoft FabricIt’s all about Partnership IIAccess vs. LightSwitch

"Patience is a virtue"

"Poke the bear": Mea Culpa - A Look Forward

"Practice makes perfect"

"Practice what you preach"

"Prevention is better than cure"

"Rules were made to be broken"

"Seek and ye shall find"

"Some are more equal than others" (George Orwell, "Animal Farm")

"Spoken words fly away, written words remain." ["Verba volant, scripta manent"]: Documentation - Lessons learned

"Strike while the iron is hot"

"Technology is dead": Dashboards Are Dead & Other Crapprogramming is dead

"The best defense is a good offense"

"The bets are off":  A look forward

"The bigger they are, the harder they fall"

"The devil is in the detail": Copilot Stories Part IV, Cutting through ComplexityMore on SQL DatabasesThe Analytics MarathonThe Choice of Tools in PM, Who Messed with My Data?

"The die is cast"

"The exception which proves the rule"

"The longest journey starts with a single step"

"The pursuit of perfection is a fool's errand"

"There are two sides to every question"

"There is no smoke without fire"

"There's more than one way to skin a cat" (cca. 1600s)

"There is no I in team"

"There is safety in numbers"

"Those who do not learn from history are doomed to repeat it" (George Santayana)

"Time is money"

"To learn a language is to have one more window from which to look at the world" (Chinese proverb)[5

"Too little, too late"

"Too much of a good thing"

"Truth is stranger than fiction"

"Two birds with one stone": Deleting sequential data...

"Two heads are better than one": Pair programming

"Two wrongs (do not) make a right"

"United we stand, divided we fall"

"Use it or lose it"

"Unity is strength"

"Variety is the spice of life." (William Cowper)

"Virtue is its own reward"

"Well begun is half done"

"What does not kill me makes me stronger"

"Well done is better than well said"

"What cannot be cured must be endured"

"What goes around, comes around"

"When life gives you lemons, make lemonade"

"When the cat is away, the mice will play"

"When the going gets tough, the tough get going"

"Where there is a will there is a way"

"With great power comes great responsibility"

"Work expands so as to fill the time available"

"You are never too old to learn": All-Knowing Developers are Back in Demand?

"You can lead a horse to water, but you cannot make it drink"

"You cannot make an omelet without breaking eggs"

"(You cannot) teach an old dog new tricks"

"You must believe and not doubt at all": Believe and not doubt

"Zeal without knowledge is fire without light"

Previous Post <<||>> Next Post

References:
[1] Wikipedia (2024) List of proverbial phrases [link]

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series
Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider: 

Area Lakehouse Warehouse Eventhouse Fabric SQL database Power BI Datamart
Data volume Unlimited Unlimited Unlimited 4 TB Up to 100 GB
Type of data Unstructured, semi-structured, structured Structured, semi-structured (JSON) Unstructured, semi-structured, structured Structured, semi-structured, unstructured Structured
Primary developer persona Data engineer, data scientist Data warehouse developer, data architect, data engineer, database developer App developer, data scientist, data engineer AI developer, App developer, database developer, DB admin Data scientist, data analyst
Primary dev skill Spark (Scala, PySpark, Spark SQL, R) SQL No code, KQL, SQL SQL No code, SQL
Data organized by Folders and files, databases, and tables Databases, schemas, and tables Databases, schemas, and tables Databases, schemas, tables Database, tables, queries
Read operations Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark T-SQL Spark, T-SQL
Write operations Spark (Scala, PySpark, Spark SQL, R) T-SQL KQL, Spark, connector ecosystem T-SQL Dataflows, T-SQL
Multi-table transactions No Yes Yes, for multi-table ingestion Yes, full ACID compliance No
Primary development interface Spark notebooks, Spark job definitions SQL scripts KQL Queryset, KQL Database SQL scripts Power BI
Security RLS, CLS**, table level (T-SQL), none for Spark Object level, RLS, CLS, DDL/DML, dynamic data masking RLS Object level, RLS, CLS, DDL/DML, dynamic data masking Built-in RLS editor
Access data via shortcuts Yes Yes Yes Yes No
Can be a source for shortcuts Yes (files and tables) Yes (tables) Yes Yes (tables) No
Query across items Yes Yes Yes Yes No
Advanced analytics Interface for large-scale data processing, built-in data parallelism, and fault tolerance Interface for large-scale data processing, built-in data parallelism, and fault tolerance Time Series native elements, full geo-spatial and query capabilities T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics Interface for data processing with automated performance tuning
Advanced formatting support Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Full indexing for free text and semi-structured data like JSON Table support for OLTP, JSON, vector, graph, XML, spatial, key-value Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency Available instantly for querying Available instantly for querying Queued ingestion, streaming ingestion has a couple of seconds latency Available instantly for querying Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag. 

The warehouse seems to be more versatile, though careful attention needs to be given to its design. 

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas. 

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.


References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]
[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]
[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]
[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]
[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link

14 October 2023

🧭Business Intelligence: Perspectives (Part VIII: Insights - The Complexity Perspective)

Business Intelligence Series
Business Intelligence Series

Scientists attempt to discover laws and principles, and for this they conduct experiments, build theories and models rooted in the data they collect. In the business setup, data professionals analyze the data for identifying patterns, trends, outliers or anything else that can lead to new information or knowledge. On one side scientists choose the boundaries of the systems they study, while for data professionals even if the systems are usually given, they can make similar choices. 

In theory, scientists are more flexible in what data they collect, though they might have constraints imposed by the boundaries of their experiments and the tools they use. For data professionals most of the data they need is already there, in the systems the business uses, though the constraints reside in the intrinsic and extrinsic quality of the data, whether the data are fit for the purpose. Both parties need to work around limitations, or attempt to improve the experiments, respectively the systems. 

Even if the data might have different characteristics, this doesn't mean that the methods applied by data professionals can't be used by scientists and vice-versa. The closer data professionals move from Data Analytics to Data Science, the higher the overlap between the business and scientific setup. 

Conversely, the problems data professionals meet have different characteristics. Scientists outlook is directed mainly at the phenomena and processes occurring in nature and society, where randomness, emergence and chaos seem to feel at home. Business processes deal more with predefined controlled structures, cyclicity, higher dependency between processes, feedback and delays. Even if the problems may seem to be different, they can be modeled with systems dynamics. 

Returning to data visualization and the problem of insight, there are multiple questions. Can we use simple designs or characterizations to find the answer to complex problems? Which must be the characteristics of a piece of information or knowledge to generate insight? How can a simple visualization generate an insight moment? 

Appealing to complexity theory, there are several general approaches in handling complexity. One approach resides in answering complexity with complexity. This means building complex data visualizations that attempt to model problem's complexity. For example, this could be done by building a complex model that reflects the problem studied, and build a set of complex visualizations that reflect the different important facets. Many data professionals advise against this approach as it goes against the simplicity principle. On the other hand, starting with something complex and removing the nonessential can prove to be an approachable strategy, even if it involves more effort. 

Another approach resides in reducing the complexity of the problem either by relaxing the constraints, or by breaking the problem into simple problems and addressing each one of them with visualizations. Relaxing the constraints allow studying upon case a more general problem or a linearization of the initial problem. Breaking down the problem into problems that can be easier solved, can help to better understand the general problem though we might lose the sight of emergence and other behavior that characterize complex systems.

Providing simple visualizations to complex problems implies a good understanding of the problem, its solution(s) and the overall context, which frankly is harder to achieve the more complex a problem is. For its understanding a problem requires a minimum of knowledge that needs to be reflected in the visualization(s). Even if some important aspects are assumed as known, they still need to be confirmed by the visualizations, otherwise any deviation from assumptions can lead to a new problem. Therefore, its questionable that simple visualizations can address the complexity of the problems in a general manner. 

Previous Post <<||>> Next Post 


06 November 2020

🧭Business Intelligence: Perspectives (Part VI: Data Soup - Reports vs. Data Visualizations)

Business Intelligence Series
Business Intelligence Series

Considering visualizations, John Tukey remarked that ‘the greatest value of a picture is when it forces us to notice what we never expected to see’, which is not always the case for many of the graphics and visualizations available in organizations, typically in the form of simple charts and dashboards, quite often with no esthetics or meaning behind.

In general, reports are needed as source for operational activities, in which the details in form of raw or aggregate data are important. As one moves further to the tactical or strategic aspects of a business, visualizations gain in importance especially when they allow encoding data and information, respectively variations, trends or relations in smaller places with minimal loss of information.

There are also different aspects of visualizations that need to be considered. Modern tools allow rapid visualization and interactive navigation of data across different variables which is great as long one knows what is searching for, which is not always the case.

There are junk charts in which the data drowns in graphical elements that bring no value to the reader, in extremis even distorting the message/meaning.

There are graphics/visualizations that attempt bringing together and encoding multiple variables in respect to a theme, and for which a ‘project’ is typically needed as data is not ad-hoc available, don’t have the desired quality or need further transformations to be ready for consumption. Good quality graphics/visualizations require time and a good understanding of the business, which are not necessarily available into the BI/Analytics teams, and unfortunately few organizations do something in that direction, ignoring typically such needs. In this type of environments is stressed the rapid availability of data for decision-making or action-relevant insight, which depends typically on the consumer.

The story-telling capabilities of graphics/visualizations are often exaggerated. Yes, they can tell a story though stories need to be framed into a context/problem, some background and further references need to be provided, while without detailed data the graphics/visualizations are just nice representations in which each consumer understands what he can.

In an ideal world the consumer and the ‘designer’ would work together to identify the important data for the theme considered, to find the appropriate level of detail, respectively the best form of encoding. Such attempts can stop at table-based representations (aka reports), respectively basic or richer forms of graphical representations. One can consider reports as an early stage of the visualization process, with the potential to derive move value when the data allow meaningful graphical representations. Unfortunately, the time, data and knowledge available seldom make this achievable.

In addition, a well-designed report can be used as basis for multiple purposes, while a graphic/visualization can enforce more limitations. Ideal would be when multiple forms of representation (including reports) are combined to harness the value of data. Navigations from visualizations to detailed data can be useful to understand what happens; learning and understanding the various aspects being an iterative process.

It’s also difficult to demonstrate the value of insight derived from visualizations, especially when graphical literacy goes behind the numeracy and statistical literacy - many consumers lacking the skills needed to evaluate numbers and statistics adequately. If for a good artistic movie you need an assistance to enjoy the show and understand the message(s) behind it, the same can be said also about good graphics/visualizations. Moreover, this requires creativity, abstraction-based thinking, and other capabilities to harness the value of representations.

Given the considerable volume of requirements related to the need of basis data, reports will continue to be on high demand in organizations. In exchange visualizations can complement them by providing insights otherwise not available.

Initially published on Medium as answer to a post on Reporting and Visualizations. 

02 March 2016

🧭Business Intelligence: Perspectives (Part III: Self-Service BI)

Business Intelligence

Introduction


According to Gartner, the world's leading information technology research and advisory company, Self-Service BI (aka self-service analytics, ad-hoc analysis, personal analytics), for short SSBI, is a “form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support” [1].

Reading between the lines, SSBI presumes the existence of an infrastructure made of tools to support it (aka self-service BI tools), direct or indirect access to row data and/or data models for the users, and the skillset needed in order to work with data and answer to business problems/questions.

A Little History

The concept of self-service is not new, it just got “rebranded” and transformed into a business opportunity. The need for business users to perform ad-hoc analyses was always there in organizations, especially in the ones not having the right infrastructure for harnessing their data. Even since the 90s with the appearance of products like MS Excel or MS Access in many organizations users were forced by the state of art to learn how to use such products in order to get the answers they needed from the data. Users started building personal solutions, many of them temporary, intended to fill the reporting gaps organizations had. With a little effort and relatively small investment users had the possibility of playing with the data, understanding the data, identifying and solving problems in the business. They acquired thus a certain level of business expertise and data awareness becoming valuable resources in the organization.

With time such solutions grew in scope and data volume, gained broader visibility and reached deeper in organizations, some of them becoming team, departmental or cross-departmental solutions. What grows uncontrolled with time starts to have negative impact on the environment. First tools’ management became a problem because the solutions needed to be backed-up and maintained regularly, then other problems started to surface: security of data, inefficient data processing as increasing volumes of data were processed on local computers and transferred over the network, data and effort were duplicated, different versions of reality existed as different numbers were reported, numbers that were reflecting different definitions, knowledge about the business or data-analysis skillsets. The management needed a more consolidated and standardized effort in order to address these problems. Organizations were forced or embraced the idea of investing money in modern BI solutions, in more powerful servers capable of handling a larger amount of requests, in flexible data models that facilitate data consumption, in data quality initiatives. Thus through various projects a considerable number of such solutions were converted into more standardized and performant BI solutions, the IT department being in control of the changes and new requests.

Back to Present

With IT in control of the reporting requirements the business is forced to rely on the rapidity with which IT is able to address new requirements. Some organizations acquired internal resources in order to build reports and afferent infrastructure in-house, others created partnerships with vendors, or approached a combination of the two. As the volume of requirements isn’t uniform over time, the business has to wait several days between the time a requirement was addressed to IT and a solution was provided. In business terms a few of days of waiting for data can equate with the loss of an opportunity, a decision taken too late, decision that could have broader impact.

A few years ago things started to change when the ad-hoc analysis concept was rebranded as self-service and surfaced as trend. This time vendors like Qlik, Tableau, MicroStrategy or Microsoft, some of the main SSBI vendors, are offering easy to use and rich functionality tools for data integration, visualization and discovery, tools that reflect the advances made in graphics, data storage and processing technologies (e.g. in-memory databases, parallel processing). With just a few drag-and-drops users are able to display details, aggregate data, identify trends and correlations between data. In addition the tools can make use of the existing data models available in data warehouses, data marts and other types of data repositories, including the rich set of open data available on the web.

Looking at the Future

Like its predecessors, SSBI seems to address primarily data analysts and data-aware business users (aka data citizens), however in time is expected to be adopted by more organizations and become more mature where already adopted. Of course, some of the problems from the early days more likely will resurface though through governance, better architectures and tools, integration with other BI capabilities, trainings and awareness most of the problems will be overcome. More likely there will be also organizations in which SSBI will fail. In the end each organization will need to find by itself the value of SSBI.

Previous Post <<||>> Next Post

Resources:
[1] Gartner (2016) Self-Service Analytics [Online] Available from: http://www.gartner.com/it-glossary/self-service-analytics
[2
] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb

27 February 2016

🧭Business Intelligence: Perspectives (Part II: The Complexity Myth)

Business Intelligence

Introduction

While looking over “Business Intelligence Concepts and Platform Capabilities” Coursera MOOC resources for Module 2 I run into two similar articles from Solutions Review, respectively Information Age. What caught my attention was the easiness with which the complexity of BI “myth” is approached in both columns.

According to the two sources the capabilities of nowadays BI tools “enabled business users to easily identify and present trends in an impactful way” [1], and “do not require an expert at the helm” [2]. It became thus simpler for users to independently query data and create interactive reports and presentations [2]. In both columns one can read between the lines that the simplicity of using BI tools is equivalent with negating the complexity of BI, which from my point of view is false. In fact here are regarded especially the self-service BI tools, in trend nowadays, that allow users to easily perform ad-hoc analysis with a minimal involvement from IT. Self-service BI is only a subset of what BI for organizations means, and just a capability from the many BI capabilities an organization needs in theory, even if some organizations might use it extensively.

Beyond the Surface

A BI tool is not a BI solution per se, even if many generic BI solutions for different systems are available out of the box. This is one of the biggest confusion managers, users and unfortunately also BI professionals make. A BI tool offers the technological basis for creating a BI infrastructure, though it comes with no guarantees. It takes a well-defined IT and business strategy, one or more successful projects, skillful developers and users in order to harness the BI investment.

On the other side it’s also true that organizations can obtain results also from less, though BI doesn’t equates with any ad-hoc analysis performed by users, even if they use BI tools for this purpose. BI is not only about tools, reporting and revealing trends in the data. BI often implies a holistic knowledge about the business and certain data awareness, without which users will start aggregating and comparing apples with pears and wonder why they taste and look different.

If everything were so simple then why so many BI projects fail to deliver what’s expected? Why so many managers complain that they don’t have the data they need, when they need them? Sure maybe the problem lies in over-complexifying the whole BI landscape and treating everything from a high-level, though that’s more likely not it.

It’s a Teamwork Knowledge Game

BI is or needs to be monitoring and problem solving oriented. This requires a deep understanding about processes and business. There are business users and also BI professionals who don’t have the knowledge one needs in order to approach a business problem. One can see that from the premises they have, the questions they raise, the data they consider, the models they build, and the results.

From a BI professional’s perspective, even if one has a broad knowledge about various businesses, one often lacks the insight in a given business. BI professionals can seldom provide adequate BI solutions without input and feedback from the business. Some BI professionals rely too much on their knowledge, same as the business sometimes expects a maximum output from BI professionals by providing a minimum of input.

Considering the business users, quite often their focus and knowledge cover only the data boundaries of their department, while many problems extend over those boundaries. They know facts that are not necessarily reflected in the data. Even if they are closer to the data than other parties, they still lack some data-awareness (including statistical awareness) in order to approach problems.

Somebody was saying ironically when talking about users’ data and problem solving skills - “not everybody is a Bill Gates or Steve Jobs”. Continuing the idea, one can’t expect users to act as such. For sure there are many business users who are better problem solvers than BI consultants, though on the other side one can’t expect that the average business user will have the same skillset as an experienced BI consultant. This is in fact one of the problems of self-service BI. Probably with time and effort organization will develop such resources, though some help from BI professionals will be still needed. Without a good cooperation between the business and BI professionals an organization might not have the hoped results when investing in BI

More on Complexity

The complexity arises when one tries to make more with the data, especially the data found in raw form. Usually the complexity of raw data can be addressed by building a logical or physical model that allows easier consumption of data. Here is the point where the users find themselves overwhelmed, because for this is required a good knowledge of the physical data model and its semantics, the technical knowledge to build models and the skills to reengineer the logic available in the source systems. These are the themes BI professionals are supposed to excel in. Talking about models, they are the most difficult to build because they reflect various segments of the business, they reflect a breakdown of the complexity. It’s also the point where many BI projects fail as the built models don’t reflect the reality or aren’t capable to answer to business questions.

Coming back to the two columns, I have to point out that the complexity of a subject or domain can’t be judged based on how easy is to approach basic tasks. The complexity lies typically when one goes beyond the basics, when one dives into details. In case of BI its complexity starts when one attempts mixing various technologies and knowledge domains to model and solve daily business problems in an integrated, holistic, aligned, consistent and cost-effective manner. The more the technologies, the knowledge domains and constraints one has to consider, the more complex the BI landscape and solutions become.

On the other side this doesn’t mean that the BI infrastructure can’t be simplified, that BI can’t rely heavily or exclusively on self-service BI solutions. However for each strategy there are advantages and disadvantages and one more likely has to consider both sides of the coin in the process. And self-service BI has its own trade-offs, weaknesses that can be transformed in strengths with time.

Conclusion

When one considers nowadays BI tools capabilities, ad-hoc analyses are relatively easy to perform and can lead to results, though such analyses don’t equate with BI and the simplicity with which they are performed don’t necessarily imply that BI is simple as a whole. When one considers the complexity of nowadays businesses, the more one dives in various problems a business has, the more complex the BI landscape seems. In the end it’s in each organization powers to simplify and harmonize its BI infrastructure to a degree that its business goals aren’t affected negatively.


Previous Post <<||>> Next Post

Resources
[1] Information Age (2015) 5 Myths about Intelligence, by Ben Rossi, [Online] Available from: http://www.information-age.com/technology/information-management/123460271/5-myths-about-business-intelligence 
[2] SolutionsReview (2015) Top 5 Business Intelligence Myths Revealed, by Timothy King, [Online] Available from: http://solutionsreview.com/business-intelligence/top-5-business-intelligence-myths-revealed
[3] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb 
[4] Coursera (2016) Business Intelligence Concepts, Tools, and Applications MOOC, led by Jahangir Karimi, University of Colorado, [Online] Available from: https://www.coursera.org/learn/business-intelligence-tools

25 December 2015

🪙Business Intelligence: Data Mesh (Just the quotes)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare,"Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh architectures are inherently decentralized, and significant responsibility is delegated to the data product owners. A data mesh also benefits from a degree of centralization in the form of data product compatibility and common self-service tooling. Differing opinions, preferences, business requirements, legal constraints, technologies, and technical debt are just a few of the many factors that influence how we work together." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"A data mesh splits the boundaries of the exchange of data into multiple data products. This provides a unique opportunity to partially distribute the responsibility of data security. Each data product team can be made responsible for how their data should be accessed and what privacy policies should be applied." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules."(James Serra, "Deciphering Data Architectures", 2024)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)

24 December 2015

🪙Business Intelligence: Data Marts (Just the Quotes)

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"If you think of a Data Mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the Data Lake is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples." (James Dixon 2010)

"Many organizations need to create data warehouses—massive data stores of timeseries data for decision support. Data are imported from various external and internal resources and are cleansed and organized in a manner consistent with the organization’s needs. After the data are populated in the data warehouse, data marts can be loaded for a specific area or department. Alternatively, data marts can be created first, as needed, and then integrated into an EDW." (Ramesh Sharda et al, "Business Intelligence: A Managerial Perspective on Analytics" 3rd Ed., 2014)

"Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department. A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g., marketing, operations). A data mart can be either dependent or independent. A dependent data mart is a subset that is created directly from the data warehouse. It has the advantages of using a consistent data model and providing quality data. [...] An independent data mart is a small warehouse designed for a strategic business unit (SBU) or a department, but its source is not an EDW." (Ramesh Sharda et al, "Business Intelligence: A Managerial Perspective on Analytics" 3rd Ed., 2014)

"If you think of a Data Mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the Data Lake is a large body of water in a more natural state. [...] The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples." (James Dixon, "Pentaho, Hadoop, and Data Lakes", 2010) [sorce] [first known usage]

"Data mart: A subset of a data warehouse that’s usually oriented to a business group or process rather than enterprise-wide views. They have value as part of the overall enterprise data architecture, but can cause problems when they sprout uncontrolled as data silos with their own data definitions, creating data shadow systems." (Rick Sherman, "Business Intelligence Guidebook: From Data Integration to Analytics, 2015)

"Data marts promised to be quicker and cheaper to build, and provided many more benefits—including the benefit of actually being able to finish building them! The data mart was primarily a backlash to the big, cumbersome CDW projects, with the key difference being that its scope was limited to a single business group rather than the entire enterprise. Of course, that shortcut did speed things up, but at the expense of obtaining agreement on consistent data definitions, thereby guaranteeing data silos." (Rick Sherman, "Business Intelligence Guidebook: From Data Integration to Analytics, 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"The data warehouse approach is also referred to as data marts with the usual distinction that a data mart serves a single department in an organization, while a data warehouse serves the larger organization integrating across multiple departments. Regardless of their scope, from the architectural modeling perspective they both have similar characteristics." (Zhamak Dehghani,"Data Mesh: Delivering Data-Driven Value at Scale", 2021) 

"Data marts are subject-oriented databases typically aligned with a particular business unit like sales, finance, or marketing. These are some-times called 'functional data marts' since they support specific business functions. Data marts accelerate business processes by allowing access to relevant information in a more timely nature since they are not aggregating the volume and variety (many data sources) that an EDW does. However, they are more transformed or normalized than an ODS." (Scott Burk et al, It’s All Analytics - Part II: Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization, 2022)

"Some sources try to distinguish the differences of data marts and EDWs by size. Size is a consequence and not a determinant. While EDWs are normally much larger, they are larger due to the fact they are pulling data from many sources and across business functions." (Scott Burk et al, It’s All Analytics - Part II: Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization, 2022)

"Traditional data stores used for analytics, such as data marts and data ware-houses, followed an ETL process, extract first by making a copy from the source, then transformations are made upon this copy, and then the data is loaded into the target system. ETL tools require processing engines for running transformations prior to loading data into a destination. Running these engines performing transformations before the load phase results in a more complex data replication process." (Scott Burk et al, It’s All Analytics - Part II: Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization, 2022)

"Data marts are used in most data warehouse approaches and focus on the data of a specific area, department, or domain of the business. A data warehouse focuses on building an SSOT for all data. A data warehouse can be made up of data marts (bottom-up approach), or data marts can be created from the data warehouse (top-down approach). In either case, data marts are smaller and less complicated than the data warehouse. As they only contain a subset of the data, data marts are often easier and quicker to establish. They are often managed by the departments the data mart focuses on." (Olivier Mertens & Breght Van Baelen, "Azure Data and AI Architect Handbook", 2023)

22 December 2015

🪙Business Intelligence: Lakehouses (Just the Quotes)

"A data lakehouse is an amalgamation of the best components from both data lakes and data warehouses. A data lakehouse implements data structure and data management features from data warehouses into a cost-effective storage like a data lake. It tries to combine the best from both worlds - data lake - based Big Data analytics and a data warehouse." (Bhadresh Shiyal, "Beginning Azure Synapse Analytics: Transition from Data Warehouse to Data Lakehouse", 2021) 

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The lakehouse provides a key advantage over the modern data warehouse by eliminating the need to have two places to store the same data." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Delta Lake is one solution for building data lakehouses, an open data architecture combining the best of data warehouses and data lakes." (Bennie Haelen & Dan Davis, "Delta Lake: Up and Running Modern Data Lakehouse Architectures with Delta Lake", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"The term data lakehouse is a portmanteau (blend) of data lake and data warehouse. [...] The concept of a lakehouse is to get rid of the relational data warehouse and use just one repository, a data lake, in your data architecture." (James Serra, "Deciphering Data Architectures", 2024)

"The lakehouse combines the best elements of data lakes and data warehouses for OLAP workloads. It merges the scalability and flexibility of data lakes with the management features and performance optimization of data warehouses. [...] A lakehouse eliminates the need for disjointed systems and provides a single, coherent platform for all forms of data analysis. Lakehouses enhance the performance of data queries and simplify data management, making it easier for organizations to derive insights from their data." (Denny Lee et al, "Delta Lake: The Definitive Guide", 2025)

02 December 2015

🪙Business Intelligence: Measure (Just the Quotes)

"[…] when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science." (William T Kelvin, "Electrical Units of Measurement", 1883)

"Quantify. If whatever it is you’re explaining has some measure, some numerical quantity attached to it, you’ll be much better able to discriminate among competing hypotheses. What is vague and qualitative is open to many explanations." (Carl Sagan, "The Demon-Haunted World: Science as a Candle in the Dark", 1995)

"Clearly, the mean is greatly influenced by extreme values, but it can be appropriate for many situations where extreme values do not arise. To avoid misuse, it is essential to know which summary measure best reflects the data and to use it carefully. Understanding the situation is necessary for making the right choice. Know the subject!" (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Changing measures are a particularly common problem with comparisons over time, but measures also can cause problems of their own. [...] We cannot talk about change without making comparisons over time. We cannot avoid such comparisons, nor should we want to. However, there are several basic problems that can affect statistics about change. It is important to consider the problems posed by changing - and sometimes unchanging - measures, and it is also important to recognize the limits of predictions. Claims about change deserve critical inspection; we need to ask ourselves whether apples are being compared to apples - or to very different objects." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Our culture, obsessed with numbers, has given us the idea that what we can measure is more important than what we can't measure. Think about that for a minute. It means that we make quantity more important than quality." (Donella Meadows, "Thinking in Systems: A Primer", 2008)

"What gets measured gets managed - even when it’s pointless to measure and manage it, and even if it harms the purpose of the organisation to do so." (Simon Caulkin, "The rule is simple: be careful what you measure", 2008) [source]

"What gets measured gets managed - so be sure you have the right measures, because the wrong ones kill." (Simon Caulkin, "The rule is simple: be careful what you measure", 2008) [source]

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Selecting the right measure and measuring things right are both art and science. And KPIs influence management behavior as well as business culture." (Pearl Zhu, "CIO Master: Unleash the Digital Potential of It", 2016)

"It’d be nice to fondly imagine that high-quality statistics simply appear in a spreadsheet somewhere, divine providence from the numerical heavens. Yet any dataset begins with somebody deciding to collect the numbers. What numbers are and aren’t collected, what is and isn’t measured, and who is included or excluded are the result of all-too-human assumptions, preconceptions, and oversights." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"People do care about how they are measured. What can we do about this? If you are in the position to measure something, think about whether measuring it will change people’s behaviors in ways that undermine the value of your results. If you are looking at quantitative indicators that others have compiled, ask yourself: Are these numbers measuring what they are intended to measure? Or are people gaming the system and rendering this measure useless?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"As long as measurements are abused as a tool of control, measuring will remain the weakest area in a manager’s performance." (Peter Drucker)

"For although it is certainly true that quantitative measurements are of great importance, it is a grave error to suppose that the whole of experimental physics can be brought under this heading. We can start measuring only when we know what to measure: qualitative observation has to precede quantitative measurement, and by making experimental arrangements for quantitative measurements we may even eliminate the possibility of new phenomena appearing." (Heinrich B G Casimir)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.