Showing posts with label integration. Show all posts
Showing posts with label integration. Show all posts

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part IV: SQL Databases for OLTP scenarios) [new feature]

Data Warehousing Series
Data Warehousing Series

One interesting announcements at Ignite is the availability in public preview of SQL databases in Microsoft Fabric, "a versatile and developer-friendly transactional database built on the foundation of Azure SQL database". With this Fabric can address besides OLAP also OLTP scenarios, evolving thus from analytics to a data platform [1]. According to the announcement, besides the AI-optimized architectural aspects, the feature makes the SQL Azure simple, autonomous and secure by design [1], and these latest aspects are considered in this post. 

Simplicity revolves around the deployment and configuration of databases, the creation of a new database requiring giving a name and the database is created in seconds [1]. It’s a considerable improvement compared with the relatively complex setup needed for on-premise configurations, though sometimes more flexibility in configuration is needed upfront or over database’s lifetime. To get a database ready for testing one can import a sample database or get specific data via data flows and/or pipelines [1]. As development tools one can use Visual Studio Code or SSMS [1], and probably more tools will be available in time.

The integration with both GitHub and Azure DevOps allows to configure each database under source control, which is needed for many scenarios especially when multiple resources make changes to the database objects [1]. Frankly, that’s mainly important during the development phase, respectively in scenarios in which multiple people make in parallel changes to the logic. It will be interesting to see how much overhead or challenges the feature adds to development and how smoothly everything works together!

The most important aspect for many solutions is the replication of data in near-real time to the (open-source) delta parquet format in OneLake and thus making the data available for analytics almost immediately [1]. Probably, from this aspect many cloud-based applications can benefit, even if the performance might not be as good as in other well-established architectures. However, there are many other scenarios in which one needs to maintain and use data for OLTP/OLAP purposes. This invites adequate testing and a good weighting of the advantages and disadvantages involved. 

A SQL database is a native item in Fabric, and therefore it utilizes Fabric capacity units like other Fabric workloads [1]. One can use the Fabric SKU estimator (still in private preview) to estimate the costs [2], though it will be interesting to see how cost-effective the solutions are. Probably, especially when the infrastructure is already available outside of Fabric, it will be easier and cost-effective to use the mirroring functionality. One should test and have a better estimator before moving blindly from the existing infrastructure to Fabric. 

SQL databases in Fabric are autonomous by design, while allowing to get the best performance and availability by default [1]. High availability is reached through zone redundancy, while performance is achieved by scaling automatically the storage and compute to accommodate the workloads [1]. The auto-optimization capability is achieved with the help of the latest Intelligent Query Processing (IQP) enhancements, respectively the creation of missing indexes to improve query performance [1]. It will be interesting to see how the whole process works, given that the maintenance of indexes usually involves some challenges (e.g. identifying covering indexes, indexes needed only for temporary workloads, duplicated indexes).

SQL databases in Fabric are automatically configured for high availability with zone redundancy, while storage and compute scale automatically to accommodate the user workload [1]. The database is auto-optimized through the latest IQP enhancements while the system creates any missing indexes to improve query performance. All data is replicated to OneLake by default [1]. Finally, the database always receives the latest security updates with auto-patching, while automatic backups help in disaster recovery scenarios  [1], which can be of real help for database administrators. 

References:

[1] Microsoft Fabric Updates Blog (2024) Announcing SQL database in Microsoft Fabric Public Preview [link

[2] Microsoft Fabric Updates Blog (2024) Announcing New Recruitment for the Private Preview of Microsoft Fabric SKU Estimator [link]


06 May 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part III: The Metrics Layer [new feature])

Introduction

One of the announcements of this year's Microsoft Fabric Community first conference was the introduction of a metrics layer in Fabric which "allows organizations to create standardized business metrics, that are rooted in measures and are discoverable and intended for reuse" [1]. As it seems, the information content provided at the conference was kept to a minimum given that the feature is still in private preview, though several webcasts start to catch up on the topic (see [2], [4]). Moreover, as part of their show, the Explicit Measures (@PowerBITips) hosts had Carly Newsome as invitee, the manager of the project, who unveiled more details about the project and the feature, details which became the main source for the information below. 

The idea of a metric layer or metric store is not new, data professionals occasionally refer to their structure(s) of metrics as such. The terms gained weight in their modern conception relatively recently in 2021-2022 (see [5], [6], [7], [8], [10]). Within the modern data stack, a metrics layer or metric store is an abstraction layer available between the data store(s) and end users. It allows to centrally define, store, and manage business metrics. Thus, it allows us to standardize and enforce a single source of truth (SSoT), respectively solve several issues existing in the data stacks. As Benn Stancil earlier remarked, the metrics layer is one of the missing pieces from the modern data stack (see [10]).

Microsoft's Solution

Microsoft's business case for metrics layer's implementation is based on three main ideas (1) duplicate measures contribute to poor data quality, (2) complex data models hinder self-service, (3) reduce data silos in Power BI. In Microsoft's conception the metric layer provides several benefits: consistent definitions and descriptions, easy management via management views, searchable and discoverable metrics, respectively assure trust through indicators. 

For this feature's implementation Microsoft introduces a new Fabric Item called a metric set that allows to group several (business) metrics together as part of a mini-model that can be tailored to the needs of a subset of end-users and accessed by them via the standard tools already available. The metric set becomes thus a mini-model. Such mini-models allow to break down and reduce the overall complexity of semantic models, while being easy to evolve and consume. The challenge will become then on how to break down existing and future semantic models into nonoverlapping mini-models, creating in extremis a partition (see the Lego metaphor for data products). The idea of mini-models is not new, [12] advocating the idea of using a Master Model, a technique for creating derivative tabular models based on a single tabular solution.

A (business) metric is a way to elevate the measures from the various semantic models existing in the organization within the mini-model defined by the metric set. A metric can be reused in other fabric artifacts - currently in new reports on the Power BI service, respectively in notebooks by copying the code. Reusing metrics in other measures can mean that one can chain metrics and the changes made will be further propagated downstream. 

The Metrics Layer in Microsoft Fabric (adapted diagram)
The Metrics Layer in Microsoft Fabric (adapted diagram)

Every metric is tied to the original semantic model which allows thus to track how a metric is used across the solutions and, looking forward to Purview, to identify data's lineage. A measure is related to a "table", the source from which the measure came from.

Users' Perspective

The Metrics Layer feature is available in Microsoft Fabric service for Power BI within the Metrics menu element next to Scorecards. One starts by creating a metric set in an existing workspace, an operation which creates the actual artifact, to which the individual metrics are added. To create a metric, a user with build permissions can navigate through the semantic models across different workspaces he/she has access to, pick a measure from one of them and elevate it to a metric, copying in the process its measure's definition and description. In this way the metric will always point back to the measure from the semantic model, while the metrics thus created are considered as a related collection and can be shared around accordingly. 

Once a metric is added to the metric set, one can add in edit mode dimensions to it (e.g. Date, Category, Product Id, etc.). One can then further explore a metric's output and add filters (e.g. concentrate on only one product or category) point from which one can slice-and-dice the data as needed.

There is a panel where one can see where the metric has been used (e.g. in reports, scorecards, and other integrations), when was last time refreshed, respectively how many times was used. Thus, one has the most important information in one place, which is great for developers as well as for the users. Probably, other metadata will be added, such as whether an increase in the metric would be favorable or unfavorable (like in Tableau Pulse, see [13]) or maybe levels of criticality, an unit of measure, or maybe its type - simple metric, performance indicator (PI), result indicator (RI), KPI, KRI etc.

Metrics can be persisted to the OneLake by saving their output to a delta table into the lakehouse. As demonstrated in the presentation(s), with just a copy-paste and a small piece of code one can materialize the data into a lakehouse delta table, from where the data can be reused as needed. Hopefully, the process will be further automated. 

One can consume metrics and metrics sets also in Power BI Desktop, where a new menu element called Metric sets was added under the OneLake data hub, which can be used to connect to a metric set from a Semantic model and select the metrics needed for the project. 

Tapping into the available Power BI solutions is done via an integration feature based on Sempy fabric package, a dataframe for storage and propagation of Power BI metadata which is part of the python-based semantic Link in Fabric [11].

Further Thoughts

When dealing with a new feature, a natural idea comes to mind: what challenges does the feature involve, respectively how can it be misused? Given that the metrics layer can be built within a workspace and that it can tap into the existing measures, this means that one can built on the existing infrastructure. However, this can imply restructuring, refactoring, moving, and testing a lot of code in the process, hopefully with minimal implications for the solutions already available. Whether the process is as simple as imagined is another story. As misusage, in extremis, data professionals might start building everything as metrics, though the danger might come when the data is persisted unnecessarily. 

From a data mesh's perspective, a metric set is associated with a domain, though there will be metrics and data common to multiple domains. Moreover, a mini-model has the potential of becoming a data product. Distributing the logic across multiple workspaces and domains can add further challenges, especially in what concerns the synchronization and implemented of requirements in a way that doesn't lead to bottlenecks. But this is a general challenge for the development team(s). 

The feature will probably suffer further changes until is released in public review (probably by September or the end of the year). I subscribe to other data professionals' opinion that the feature was for long needed and that can have an important impact on the solutions built. 

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Fabric Blog (2024) Announcements from the Microsoft Fabric Community Conference (link)
[2] Power BI Tips (2024) Explicit Measures Ep. 236: Metrics Hub, Hot New Feature with Carly Newsome (link)
[3] Power BI Tips (2024) Introducing Fabric Metrics Layer / Power Metrics Hub [with Carly Newsome] (link)
[4] KratosBI (2024) Fabric Fridays: Metrics Layer Conspiracy Theories #40 (link)
[5] Chris Webb's BI Blog (2022) Is Power BI A Semantic Layer? (link)
[6] The Data Stack Show (2022) TDSS 95: How the Metrics Layer Bridges the Gap Between Data & Business with Nick Handel of Transform (link)
[7] Sundeep Teki (2022) The Metric Layer & how it fits into the Modern Data Stack (link)
[8] Nick Handel (2021) A brief history of the metrics store (link)
[9] Aurimas (2022) The Jungle of Metrics Layers and its Invisible Elephant (link)
[10] Benn Stancil (2021) The missing piece of the modern data stack (link)
[11] Microsoft Learn (2024) Sempy fabric Package (link)
[12] Michael Kovalsky (2019) Master Model: Creating Derivative Tabular Models (link)
[13] Christina Obry (2023) The Power of a Metrics Layer - and How Your Organization Can Benefit From It (link
[14] KratosBI (2024) Introducing the Metrics Layer in #MicrosoftFabric with Carly Newsome [link]

02 November 2016

♟️Strategic Management: Integration (Just the Quotes)

"By integration we mean the process of achieving unity of effort among the various subsystems in the accomplishment of the organization's tasks." (Paul R Lawrence, "Organization and environment: Managing differentiation and integration", 1967)

"No matter how difficult or unprecedented the problem, a breakthrough to the best possible solution can come only from a combination of rational analysis, based on the real nature of things, and imaginative reintegration of all the different items into a new pattern, using nonlinear brainpower. This is always the most effective approach to devising strategies for dealing successfully with challenges and opportunities, in the market arena as on the battlefield." (Kenichi Ohmae, "The Mind Of The Strategist", 1982)

"Culture [is] a pattern of basic assumptions invented, discovered, or developed by a given group as it learns to cope with its problems of external adaptation and internal integration that has worked well enough to be considered valid and, therefore, to be taught to new members as the correct way to perceive, think, and feel in relation to those problems." (Edgar H Schein, "Organizational Culture and Leadership", 1985)

"To keep the business from disintegrating, the concept of information systems architecture is becoming less of an option and more of a necessity." (John Zachman, "A Framework for Information Systems Architecture", 1987)

"Conventional process structures are fragmented and piecemeal, and they lack the integration necessary to maintain quality and service. They are breeding grounds for tunnel vision, as people tend to substitute the narrow goals of their particular department for the larger goals of the process as a whole. When work is handed off from person to person and unit to unit, delays and errors are inevitable. Accountability blurs, and critical issues fall between the cracks." (Michael M Hammer, "Reengineering Work: Don't Automate, Obliterate", Magazine, 1990) [source]

"But the net effect of increasing scale, centralization of capital, vertical integration and diversification within the corporate form of enterprise has been to replace the 'invisible hand' of the market by the 'visible hand' of the managers." (David Harvey, "The Limits To Capital", 2006)

02 November 2007

🏗️Software Engineering: Integration (Just the Quotes)

"With increasing size and complexity of the implementations of information systems, it is necessary to use some logical construct (or architecture) for defining and controlling the interfaces and the integration of all of the components of the system." (John Zachman, "A Framework for Information Systems Architecture", 1987)

"The longer we wait between integrations and acceptance tests, the worse things get. Wait twice as long and we'll have four or more times the hassle. The reason is that one bug written just yesterday is pretty easy to find, while ten or a hundred written weeks ago can become almost impossible." (Ron Jeffries, "Extreme Programming Installed", 2001)

"The main activity of programming is not the origination of new independent programs, but in the integration, modification, and explanation of existing ones." (Terry Winograd, "Beyond Programming Languages", 1991)

"As the size of software systems increases, the algorithms and data structures of the computation no longer constitute the major design problems. When systems are constructed from many components, the organization of the overall system - the software architecture - presents a new set of design problems. This level of design has been addressed in a number of ways including informal diagrams and descriptive terms, module interconnection languages, templates and frameworks for systems that serve the needs of specific domains, and formal models of component integration mechanisms." (David Garlan & Mary Shaw, "An introduction to software architecture", Advances in software engineering and knowledge engineering Vol 1, 1993)

"Enterprise architecture is the organizing logic for business processes and IT infrastructure reflecting the integration and standardization requirements of a company's operation model. […] The key to effective enterprise architecture is to identify the processes, data, technology, and customer interfaces that take the operating model from vision to reality." (Jeanne W Ross et al, "Enterprise architecture as strategy: creating a foundation for business", 2006)

"Enterprise-architecture is the integration of everything the enterprise is and does. Even the term ‘architecture’ is perhaps a little misleading. It’s on a much larger scale, the scale of the whole rather than of single subsystems: more akin to city-planning than to the architecture of a single building. In something this large, there are no simple states of ‘as-is’ versus ‘to-be’, because its world is dynamic, not static. And it has to find some way to manage the messy confusion of what is, rather than the ideal that we might like it to be." (Tom Graves, "Real Enterprise-Architecture : Beyond IT to the whole enterprise", 2007)

"Acceptance testing relies on the ability to execute automated tests in a productionlike environment. However, a vital property of such a test environment is that it is able to successfully support automated testing. Automated acceptance testing is not the same as user acceptance testing. One of the differences is that automated acceptance tests should not run in an environment that includes integration to all external systems. Instead, your acceptance testing should be focused on providing a controllable environment in which the system under test can be run. 'Controllable' in this context means that you are able to create the correct initial state for our tests. Integrating with real external systems removes our ability to do this." (David Farley & Jez Humble, "Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation", 2010)

"In many applications, integration or functional tests are used by default as the standard way to test almost all aspects of the system. However integration and functional tests are not the best way to detect and identify bugs. Because of the large number of components involved in a typical end-to-end test, it can be very hard to know where something has gone wrong. In addition, with so many moving parts, it is extremely difficult, if not completely unfeasible, to cover all of the possible paths through the application." (John F Smart, "Jenkins: The Definitive Guide", 2011)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.