Showing posts with label architecture. Show all posts
Showing posts with label architecture. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 26-Apr-2025

Enterprise Content Publishing [2]

[Microsoft Fabric] Power BI Environments

  • {def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
  • {environment} development 
    • allows to develop the solution
    • accessible only to the development team 
      • via Contributor access
    • {recommendation} use Power BI Desktop as local development environment
      • {benefit} allows to try, explore, and review updates to reports and datasets
        • once the work is done, upload the new version to the development stage
      • {benefit} enables collaborating and changing dashboards
      • {benefit} avoids duplication 
        • making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication
    • {recommendation} use version control to keep the .pbix files up to date
      • [OneDrive] use Power BI's autosync
        • {alternative} SharePoint Online with folder synchronization
        • {alternative} GitHub and/or VSTS with local repository & folder synchronization
    • [enterprise scale deployments] 
      • {recommendation} separate dataset from reports and dashboards’ development
        • use the deployment pipelines selective deploy option [22]
        • create separate .pbix files for datasets and reports [22]
          • create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
          • create .pbix only for the report, and connect it to the published dataset using a live connection [22]
        • {benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently
      • {recommendation} separate data model from report and dashboard development
        • allows using advanced capabilities 
          • e.g. source control, merging diff changes, automated processes
        • separate the development from test data sources [1]
          • the development database should be relatively small [1]
    • {recommendation} use only a subset of the data [1]
      • ⇐ otherwise the data volume can slow down the development [1]
  • {environment} user acceptance testing (UAT)
    • test environment that within the deployment lifecycle sits between development and production
      • it's not necessary for all Power BI solutions [3]
      • allows to test the solution before deploying it into production
        • all tests must have 
          • View access for testing
          • Contributor access for report authoring
      • involves business users who are SMEs
        • provide approval that the content 
          • is accurate
          • meets requirements
          • can be deployed for wider consumption
    • {recommendation} check report’s load and the interactions to find out if changes impact performance [1]
    • {recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
    • {recommendation} test data refresh in the Power BI service regularly during development [20]
  • {environment} production
    • {concept} staged deployment
      • {goal} help minimize risk, user disruption, or address other concerns [3]
        • the deployment involves a smaller group of pilot users who provide feedback [3]
    • {recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]
      • allows ensuring the data in production is always connected and available to users [1]
    • {recommendation} don’t upload a new .pbix version directly to the production stage
      •  ⇐ without going through testing
  • {feature|preview} deployment pipelines 
    • enable creators to develop and test content in the service before it reaches the users [5]
  • {recommendation} build separate databases for development and testing 
    • helps protect production data [1]
  • {recommendation} make sure that the test and production environment have similar characteristics [1]
    • e.g. data volume, sage volume, similar capacity 
    • {warning} testing into production can make production unstable [1]
    • {recommendation} use Azure A capacities [22]
  • {recommendation} for formal projects, consider creating an environment for each phase
  • {recommendation} enable users to connect to published datasets to create their own reports
  • {recommendation} use parameters to store connection details 
    • e.g. instance names, database names
    • ⇐  deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages
      • alternatively data source rules can be used to specify a connection string for a given dataset
        • {restriction} in deployment pipelines, this isn't supported for all data sources
  • {recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
  • {recommendation} provide data to self-service authors from a centralized data warehouse [20]
    • allows to minimize the amount of work that self-service authors need to take on [20]
  • {recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
  • {recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
  • {recommendation} be aware of API connectivity issues and limits [20]
  • {recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
  • {recommendation} minimize the query load on source systems [20]
    • use incremental refresh in Power BI for the dataset(s)
    • use a Power BI dataflow that extracts the data from the source on a schedule
    • reduce the dataset size by only extracting the needed amount of data 
  • {recommendation} expect data refresh operations to take some time [20]
  • {recommendation} use relational database sources when practical [20]
  • {recommendation} make the data easily accessible [20]
  • [knowledge area] knowledge transfer
    • {recommendation} maintain a list of best practices and review it regularly [24]
    • {recommendation} develop a training plan for the various types of users [24]
      • usability training for read only report/app users [24
      • self-service reporting for report authors & data analysts [24]
      • more elaborated training for advanced analysts & developers [24]
  • [knowledge area] lifecycle management
    • consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
    • {recommendation} postfix files with 3-part version number in Development stage [24]
      • remove the version number when publishing files in UAT and production 
    • {recommendation} backup files for archive 
    • {recommendation} track version history 

    References:
    [1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]
    [2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
    [3] Microsoft Learn (2024) Deploy to Power BI [link]
    [4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
    [5] Microsoft Learn (2024) Introduction to deployment pipelines [link]
    [6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
    [20] Microsoft (2020) Planning a Power BI  Enterprise Deployment [White paper] [link]
    [22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]
    [24] Paul Turley (2019)  A Best Practice Guide and Checklist for Power BI Projects

    Resources:

    Acronyms:
    API - Application Programming Interface
    CLM - Content Lifecycle Management
    COE - Center of Excellence
    SaaS - Software-as-a-Service
    SME - Subject Matter Expert
    UAT - User Acceptance Testing
    VSTS - Visual Studio Team System
    SME - Subject Matter Experts

    25 April 2025

    💫🗒️ERP Systems: Microsoft Dynamics 365's Business Process Catalog (BPC) [Notes]

    Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

    Last updated: 25-Apr-2025

    Business Process Catalog - End-to-End Scenarios

    [Dynamics 365] Business Process Catalog (BPC)

    • {def} lists of end-to-end processes that are commonly used to manage or support work within an organization [1]
      • agnostic catalog of business processes contained within the entire D365 solution space [3]
        • {benefit} efficiency and time savings [3]
        • {benefit} best practices [3]
        • {benefit} reduced risk [3]
        • {benefit} technology alignment [3]
        • {benefit} scalability [3]
        • {benefit} cross-industry applicability [3]
      • stored in an Excel workbook
        • used to organize and prioritize the work on the business process documentation [1]
        • {recommendation} check the latest versions (see [R1])
      • assigns unique IDs to 
        • {concept} end-to-end scenario
          • describe in business terms 
            • not in terms of software technology
          • includes the high-level products and features that map to the process [3]
          • covers two or more business process areas
          • {purpose} map products and features to benefits that can be understood in business contexts [3]
        • {concept} business process areas
          • combination of business language and basic D365 terminology [3]
          • groups business processes for easier searching and navigation [1]
          • separated by major job functions or departments in an organization [1]
          • {purpose} map concepts to benefits that can be understood in business context [3]
          • more than 90 business process areas defined [1]
        • {concept} business processes
          • a series of structured activities and tasks that organizations use to achieve specific goals and objectives [3]
            • efficiency and productivity
            • consistency and quality
            • cost reduction
            • risk management
            • scalability
            • data-driven decision-making
          • a set of tasks in a sequence that is completed to achieve a specific objective [5]
            • define when each step is done in the implementation [5] [3]
            • define how many are needed [5] [3]
          • covers a wide range of structured, often sequenced, activities or tasks to achieve a predetermined organizational goal
          • can refer to the cumulative effects of all steps progressing toward a business goal
          • describes a function or process that D365 supports
            • more than 700 business processes identified
            • {goal} provide a single entry point with links to relevant product-specific content [1]
          • {concept} business process guide
            • provides documentation on the structure and patterns of the process along with guidance on how to use them in a process-oriented implementation [3]
            • based on a catalog of business process supported by D365 [3]
          • {concept} process steps 
            • represented sequentially, top to bottom
              • can include hyperlinks to the product documentation [5] 
              • {recommendation} avoid back and forth in the steps as much as possible [5]
            • can be
              • forms used in D365 [5]
              • steps completed in LCS, PPAC, Azure or other Microsoft products [5]
              • steps that are done outside the system (incl. third-party system) [5]
              • steps that are done manually [5]
            • are not 
              • product documentation [5]
              • a list of each click to perform a task [5]
          • {concept} process states
            • include
              • project phase 
                • e.g. strategize, initialize, develop, prepare, operate
              • configuration 
                • e.g. base, foundation, optional
              • process type
                • e.g. configuration, operational
        • {concept} patterns
          • repeatable configurations that support a specific business process [1]
            • specific way of setting up D365 to achieve an objective [1]
            • address specific challenges in implementations and are based on a specific scenario or best practice [6]
            • the solution is embedded into the application [6]
            • includes high-level process steps [6]
          • include the most common use cases, scenarios, and industries [1]
          • {goal} provide a baseline for implementations
            • more than 2000 patterns, and we expect that number to grow significantly over time [1]
          • {activity} naming a new pattern
            • starts with a verb
            • describes a process
            • includes product names
            • indicate the industry
            • indicate AppSource products
        • {concept} reference architecture 
          • acts as a core architecture with a common solution that applies to many scenarios [6]
          • typically used for integrations to external solutions [6]
          • must include an architecture diagram [6]
      • {concept} process governance
        • {benefit} improved quality
        • {benefit} enhanced decision making
        • {benefit} agility adaptability
        • {benefit{ Sbd alignment
        • {goal} enhance efficiency 
        • {goal} ensure compliance 
        • {goal} facilitate accountability 
        • {concept} policy
        • {concept} procedure
        • {concept} control
      • {concept} scope definition
        • {recommendation} avoid replicating current processes without considering future needs [4]
          • {risk} replicating processes in the new system without re-evaluating and optimizing [4] 
          • {impact} missed opportunities for process improvement [4]
        • {recommendation} align processes with overarching business goals rather than the limitations of the current system [4]
      • {concept} guidance hub
        • a central landing spot for D365 guidance and tools
        • contains cross-application documentations
    • {purpose} provide considerations and best practices for implementation [6]
    • {purpose} provide technical information for implementation [6]
    • {purpose} provide link to product documentation to achieve the tasks in scope [6]
    Previous Post <<||>> Next Post 

    References:
    [1] Microsoft Learn (2024) Dynamics 365: Overview of end-to-end scenarios and business processes in Dynamics 365 [link]
    [2] Microsoft Dynamics 365 Community (2023) Business Process Guides - Business Process Guides [link]
    [3] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 2 Introduction to Business Processes [link]
    [4] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 3: Using the Business Process Catalog to Manage Project Scope and Estimation [link]
    [5] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 4: Authoring Business Processes [link]
    [6] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 5:  Authoring Business Processes Patterns and Use Cases [link]
    [7] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance  - Part 6: Conducting Process-Centric Discovery [link]
    [8] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance  - Part 7: Introduction to Process Governance [link]

    Resources:
    [R1] GitHub (2024) Business Process Catalog [link]
    [R2] Microsoft Learn (2024) Dynamics 365 guidance documentation and other resources [link]
    [R3] Dynamics 365 Blog (2025) Process, meet product: The business process catalog for Dynamics 365 [link]

    Acronyms:
    3T - Tools, Techniques, Tips
    ADO - 
    BPC - Business Process Catalog
    D365 - Dynamics 365
    LCS - Lifecycle Services
    PPAC - Power Platform admin center
    RFI - Request for Information
    RFP - Request for Proposal

    18 April 2025

    🧮ERP: Implementations (Part XV: An Ecosystemic View)

    ERP Implementation Series
    ERP Implementations Series

    In organizations, the network of interconnected systems, technologies, applications, users and all what resides around them, physically or logically, through their direct or indirect interactions form an ecosystem governed by common goals, principles, values, policies, processes and resources. An ERP system becomes thus part of the ecosystem and, given its importance, probably becomes the centerpiece of the respective structure, having the force to shift towards it all the flows existing in organizations, respectively nourishing its satellite systems, enabling them to create more value.

    Ecology’s components should allow for growth, regeneration, adaptation, flow, interaction and diversity, respectively creating value for the organizations. The more the flow of information, data, products, and money is unrestricted and sustained by the overall architecture, the more beneficial they become. At least, this is the theoretical basis on which such architectures are built upon. Of course, all this doesn’t happen by itself, but must be sustained and nourished adequately!

    The mixture of components can enable system interoperability, scalability, reliability, automation, extensibility or maintainability with impact on collaboration, innovation, sustainability, cost-efficiency and effectiveness. All of these are concepts with weight and further implications for ecology. Some aspects are available by design, while others need to be introduced and enforced in a systematic, respectively systemic manner. It’s an exploration of what’s achievable, of what works, of what creates value. Organizations should review the best practices in the field and explore.

    Ideally, the ERP system should nourish the ecology it belongs to, enabling it to grow and create more value for the organization. Conversely, the ERP system can become a cannibal for the ecology, pulling towards it all the resources, but more importantly, limiting the other systems in terms of functionality, value, usage, etc. In other terms, an ERP system has the power to cannibalize its environment, with all the implications deriving from this. This may seem like a pessimistic view, though it’s met in organizations more often than people think. Just think about the rate of failure in ERP implementations, the ERP cannibalizing thus all the financial and human resources available at their disposal.

    To create more value, the ERPs should be integrated with the other critical and non-critical information systems – email and other communication systems, customer and vendor relationship systems, inventory management systems, regulatory reporting platforms and probably there are many other systems designed to address the various needs. Of course, not every organization needs all these types of information systems, though the functions exist to some degree in many organizations. Usually, components grow in importance with the shift of needs and attention.

    There are also small organizations that only use small subparts of the ERP system (e.g. finance, HR), however an ERP investment becomes more attractive and cost-effective, the further the organization can leverage all its (important) features. It’s also true that an ERP system is not always the solution for everybody. It’s a matter of scale, functionality, and business model. Above all, the final solution must be cost-effective and an enabler for the organization!

    Beyond the ecosystemic view, there are design principles and rules that can be used to describe, map and plan the road ahead, though nothing should be considered as fixed. One needs to balance between opportunities and risks, value and costs, respectively between the different priorities. In some scenarios there’s a place for experimentation, while in other scenarios organizations must stick to what they know. Even if there are maybe recipes for success, there’s no guarantee behind them. Each organization must evaluate what works and what doesn’t. It’s a never-ending story in which such topics need to be approached gradually and iteratively.

    Previous Post <<||>> Next Post

    26 January 2025

    🧭Business Intelligence: Perspectives (Part XXV: Grounding the Roots)

    Business Intelligence Series
    Business Intelligence Series

    When building something that is supposed to last, one needs a solid foundation on which the artifact can be built upon. That’s valid for castles, houses, IT architectures, and probably most important, for BI infrastructures. There are so many tools out there that allow building a dashboard, report or other types of BI artifacts with a few drag-and-drops, moving things around, adding formatting and shiny things. In many cases all these steps are followed to create a prototype for a set of ideas or more formalized requirements keeping the overall process to a minimum. 

    Rapid prototyping, the process of building a proof-of-concept by focusing at high level on the most important design and functional aspects, is helpful and sometimes a mandatory step in eliciting and addressing the requirements properly. It provides a fast road from an idea to the actual concept, however the prototype, still in its early stages, can rapidly become the actual solution that unfortunately continues to haunt the dreams of its creator(s). 

    Especially in the BI area, there are many solutions that started as a prototype and gained mass until they start to disturb many things around them with implications for security, performance, data quality, and many other aspects. Moreover, the mass becomes in time critical, to the degree that it pulled more attention and effort than intended, with positive and negative impact altogether. It’s like building an artificial sun that suddenly becomes a danger for the nearby planet(s) and other celestial bodies. 

    When building such artifacts, it’s important to define what goals the end-result must or would be nice to have, differentiating clearly between them, respectively when is the time to stop and properly address the aspects mandatory in transitioning from the prototype to an actual solution that addresses the best practices in scope. It’s also the point when one should decide upon solution’s feasibility, needed quality acceptance criteria, and broader aspects like supporting processes, human resources, data, and the various aspects that have impact. Unfortunately, many solutions gain inertia without the proper foundation and in extremis succumb under the various forces.

    Developing software artifacts of any type is a balancing act between all these aspects, often under suboptimal circumstances. Therefore, one must be able to set priorities right, react and change direction (and gear) according to the changing context. Many wish all this to be a straight sequential road, when in reality it looks more like mountain climbing, with many peaks, valleys and change of scenery. The more exploration is needed, the slower the progress.

    All these aspects require additional time, effort, resources and planning, which can easily increase the overall complexity of projects to the degree that it leads to (exponential) effort and more important - waste. Moreover, the complexity pushes back, leading to more effort, and with it to higher costs. On top of this one has the iteration character of BI topics, multiple iterations being needed from the initial concept to the final solution(s), sometimes many steps being discarded in the process, corners are cut, with all the further implications following from this. 

    Somewhere in the middle, between minimum and the broad overextending complexity, is the sweet spot that drives the most impact with a minimum of effort. For some organizations, respectively professionals, reaching and remaining in the zone will be quite a challenge, though that’s not impossible. It’s important to be aware of all the aspects that drive and sustain the quality of artefacts, data and processes. There’s a lot to learn from successful as well from failed endeavors, and the various aspects should be reflected in the lessons learned. 

    11 June 2024

    🧭🏭Business Intelligence: Microsoft Fabric (Part IV: Is Microsoft Fabric Ready?)

    Business Intelligence Series
    Business Intelligence Series

    When writing a Business Case, besides the problem and solution(s) high-level descriptions, is important to roughly estimate how much it costs, how long it takes, respectively how many resources are needed and for what activities. A proof-of-concept (PoC) might not need an explicit business case, though the same high-level information is needed at least for the planning of resources and a formal approval.

    Given that there are several analytical experiences in Microsoft Fabric (MF), it’s clear that can’t be anymore a reference architecture that can be recommended for customers. Frankly, that ship has sailed even since the introduction of Microsoft Synapse, if not earlier, with the movement to the cloud. Also, there’s no one size fits all as certain building blocks make sense only in certain scenarios (e.g. organization scale, data volume or source’s type). Moreover, even if MF has been generally available for quite some time, customers and service providers ask themselves whether the available features are enough for building analytics solutions based on it. 

    “Is Fabric Ready?” was the topic of today’s Explicit Measures webcast [1]. Probably the answer is as usual “it depends” and the general recommendation is to do a PoC to check solution's feasibility. Conversely, MF may be the best approach to consider if integration with other systems (e.g. Dynamics 365, Dataverse) is needed. 

    What the customers need are some rough realistic estimates they can base any planning upon (at least for a PoC if not for the whole project) in terms of making the data available into OneLake, building a semantic model, respectively processing and making the data available for consumption. Ideally, one needs a translation of the various steps as done earlier. For example, how long it takes to make the data available in OneLake, how long it takes to move the data physically or logically though the various layers, to build semantic models, etc. 

    Probably, some things can be achieved in a matter of days, at least if one knows what one’s doing. However, we are talking here about a new architecture that may resemble for some of an unknow territory. Even if old and new techniques can be mixed, there are further implications or improvements that can be considered. There are many webcasts, blog posts and other material on how to do things, on what’s possible, though building a functioning solution from beginning to the end, even as PoC, requires more than putting all this together. 

    Just making the data flow from point A to B or C is not enough - data security, data governance and a few other topics like scalability and availability need to be considered as well. Security and governance are also the areas in which probably more features must be considered. For many customers starting now with MF, the hope is that most of these features will be available during the time the solutions are ready for production.

    From a cost perspective, there’s the cost of data at rest, in transit, the licensing for MF and the other components involved. Ideally, one should start small and increase capacities as needed, though small can vary from case to case, while it’s important to find out the optimum. Starting in the middle could be an alternative approach even if may involve higher costs. If one starts small, the costs for PoC can be neglectable, though sooner or later a compromise is needed to provide an acceptable performance. 

    In terms of human resources, the topic is more complex (see [2]), and it depends largely on the nature of the project. The pool of skillsets is the most important constraint or enabler such projects can have.

    Previous Post <<||>> Next Post

    References:
    [1] Explicit Measures (2024) Power BI tips Ep.327: Is Fabric Ready? (link)
    [2] Explicit Measures (2024) Power BI tips Ep.321: Building and BI Team (link)

    06 May 2024

    🧭🏭Business Intelligence: Microsoft Fabric (Part III: The Metrics Layer) 🆕

    Introduction

    One of the announcements of this year's Microsoft Fabric Community first conference was the introduction of a metrics layer in Fabric which "allows organizations to create standardized business metrics, that are rooted in measures and are discoverable and intended for reuse" [1]. As it seems, the information content provided at the conference was kept to a minimum given that the feature is still in private preview, though several webcasts start to catch up on the topic (see [2], [4]). Moreover, as part of their show, the Explicit Measures (@PowerBITips) hosts had Carly Newsome as invitee, the manager of the project, who unveiled more details about the project and the feature, details which became the main source for the information below. 

    The idea of a metric layer or metric store is not new, data professionals occasionally refer to their structure(s) of metrics as such. The terms gained weight in their modern conception relatively recently in 2021-2022 (see [5], [6], [7], [8], [10]). Within the modern data stack, a metrics layer or metric store is an abstraction layer available between the data store(s) and end users. It allows to centrally define, store, and manage business metrics. Thus, it allows us to standardize and enforce a single source of truth (SSoT), respectively solve several issues existing in the data stacks. As Benn Stancil earlier remarked, the metrics layer is one of the missing pieces from the modern data stack (see [10]).

    Microsoft's Solution

    Microsoft's business case for metrics layer's implementation is based on three main ideas (1) duplicate measures contribute to poor data quality, (2) complex data models hinder self-service, (3) reduce data silos in Power BI. In Microsoft's conception the metric layer provides several benefits: consistent definitions and descriptions, easy management via management views, searchable and discoverable metrics, respectively assure trust through indicators. 

    For this feature's implementation Microsoft introduces a new Fabric Item called a metric set that allows to group several (business) metrics together as part of a mini-model that can be tailored to the needs of a subset of end-users and accessed by them via the standard tools already available. The metric set becomes thus a mini-model. Such mini-models allow to break down and reduce the overall complexity of semantic models, while being easy to evolve and consume. The challenge will become then on how to break down existing and future semantic models into nonoverlapping mini-models, creating in extremis a partition (see the Lego metaphor for data products). The idea of mini-models is not new, [12] advocating the idea of using a Master Model, a technique for creating derivative tabular models based on a single tabular solution.

    A (business) metric is a way to elevate the measures from the various semantic models existing in the organization within the mini-model defined by the metric set. A metric can be reused in other fabric artifacts - currently in new reports on the Power BI service, respectively in notebooks by copying the code. Reusing metrics in other measures can mean that one can chain metrics and the changes made will be further propagated downstream. 

    The Metrics Layer in Microsoft Fabric (adapted diagram)
    The Metrics Layer in Microsoft Fabric (adapted diagram)

    Every metric is tied to the original semantic model which allows thus to track how a metric is used across the solutions and, looking forward to Purview, to identify data's lineage. A measure is related to a "table", the source from which the measure came from.

    Users' Perspective

    The Metrics Layer feature is available in Microsoft Fabric service for Power BI within the Metrics menu element next to Scorecards. One starts by creating a metric set in an existing workspace, an operation which creates the actual artifact, to which the individual metrics are added. To create a metric, a user with build permissions can navigate through the semantic models across different workspaces he/she has access to, pick a measure from one of them and elevate it to a metric, copying in the process its measure's definition and description. In this way the metric will always point back to the measure from the semantic model, while the metrics thus created are considered as a related collection and can be shared around accordingly. 

    Once a metric is added to the metric set, one can add in edit mode dimensions to it (e.g. Date, Category, Product Id, etc.). One can then further explore a metric's output and add filters (e.g. concentrate on only one product or category) point from which one can slice-and-dice the data as needed.

    There is a panel where one can see where the metric has been used (e.g. in reports, scorecards, and other integrations), when was last time refreshed, respectively how many times was used. Thus, one has the most important information in one place, which is great for developers as well as for the users. Probably, other metadata will be added, such as whether an increase in the metric would be favorable or unfavorable (like in Tableau Pulse, see [13]) or maybe levels of criticality, an unit of measure, or maybe its type - simple metric, performance indicator (PI), result indicator (RI), KPI, KRI etc.

    Metrics can be persisted to the OneLake by saving their output to a delta table into the lakehouse. As demonstrated in the presentation(s), with just a copy-paste and a small piece of code one can materialize the data into a lakehouse delta table, from where the data can be reused as needed. Hopefully, the process will be further automated. 

    One can consume metrics and metrics sets also in Power BI Desktop, where a new menu element called Metric sets was added under the OneLake data hub, which can be used to connect to a metric set from a Semantic model and select the metrics needed for the project. 

    Tapping into the available Power BI solutions is done via an integration feature based on Sempy fabric package, a dataframe for storage and propagation of Power BI metadata which is part of the python-based semantic Link in Fabric [11].

    Further Thoughts

    When dealing with a new feature, a natural idea comes to mind: what challenges does the feature involve, respectively how can it be misused? Given that the metrics layer can be built within a workspace and that it can tap into the existing measures, this means that one can built on the existing infrastructure. However, this can imply restructuring, refactoring, moving, and testing a lot of code in the process, hopefully with minimal implications for the solutions already available. Whether the process is as simple as imagined is another story. As misusage, in extremis, data professionals might start building everything as metrics, though the danger might come when the data is persisted unnecessarily. 

    From a data mesh's perspective, a metric set is associated with a domain, though there will be metrics and data common to multiple domains. Moreover, a mini-model has the potential of becoming a data product. Distributing the logic across multiple workspaces and domains can add further challenges, especially in what concerns the synchronization and implemented of requirements in a way that doesn't lead to bottlenecks. But this is a general challenge for the development team(s). 

    The feature will probably suffer further changes until is released in public review (probably by September or the end of the year). I subscribe to other data professionals' opinion that the feature was for long needed and that can have an important impact on the solutions built. 

    Previous Post <<||>> Next Post

    Resources:
    [1] Microsoft Fabric Blog (2024) Announcements from the Microsoft Fabric Community Conference (link)
    [2] Power BI Tips (2024) Explicit Measures Ep. 236: Metrics Hub, Hot New Feature with Carly Newsome (link)
    [3] Power BI Tips (2024) Introducing Fabric Metrics Layer / Power Metrics Hub [with Carly Newsome] (link)
    [4] KratosBI (2024) Fabric Fridays: Metrics Layer Conspiracy Theories #40 (link)
    [5] Chris Webb's BI Blog (2022) Is Power BI A Semantic Layer? (link)
    [6] The Data Stack Show (2022) TDSS 95: How the Metrics Layer Bridges the Gap Between Data & Business with Nick Handel of Transform (link)
    [7] Sundeep Teki (2022) The Metric Layer & how it fits into the Modern Data Stack (link)
    [8] Nick Handel (2021) A brief history of the metrics store (link)
    [9] Aurimas (2022) The Jungle of Metrics Layers and its Invisible Elephant (link)
    [10] Benn Stancil (2021) The missing piece of the modern data stack (link)
    [11] Microsoft Learn (2024) Sempy fabric Package (link)
    [12] Michael Kovalsky (2019) Master Model: Creating Derivative Tabular Models (link)
    [13] Christina Obry (2023) The Power of a Metrics Layer - and How Your Organization Can Benefit From It (link
    [14] KratosBI (2024) Introducing the Metrics Layer in #MicrosoftFabric with Carly Newsome [link]

    Resources:
    [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    03 April 2024

    🧭Business Intelligence: Perspectives (Part X: The Top 5 Pains of a BI/Analytics Manager)

    Business Intelligence Series
    Business Intelligence Series

    1) Business Strategy

    A business strategy is supposed to define an organization's mission, vision, values, direction, purpose, goals, objectives, respectively the roadmap, alternatives, capabilities considered to achieve them. All this information is needed by the BI manager to sketch the BI strategy needed to support the business strategy. 

    Without them, the BI manager must extrapolate, and one thing is to base one's decisions on a clearly stated and communicated business strategy, and another thing to work with vague declarations full of uncertainty. In the latter sense, it's like attempting to build castles into thin air and expecting to have a solid foundation. It may work as many BI requirements are common across organizations, but it can also become a disaster. 

    2) BI/Data Strategy

    Organizations usually differentiate between the BI and the data Strategy because different driving forces and needs are involved, even if there are common goals, needs and opportunities that must be considered from both perspectives. When there's no data strategy available, the BI manager is either forced to address thus many data-related topics (e.g. data culture, data quality, metadata management, data governance), or ignore them with all consequences deriving from this. 

    A BI strategy is an extension of the business, data and IT strategies into the BI knowledge areas. Unfortunately, few organizations give it the required attention. Besides the fact that the BI strategy breaks down the business strategy from its perspective, it also adds its own goals and objectives which are ideally aligned with the ones from the other strategies. 

    3) Data Culture

    Data culture is "the collective beliefs, values, behaviors, and practices of an organization’s employees in harnessing the value of data for decision-making, operations, or insight". Therefore, data culture is an enabler which, when the many aspects are addressed adequately, can have a multiplier effect for the BI strategy and its execution. Conversely, when basic data culture assumptions and requirements aren't addressed, the interrelated issues resulting from this can prove to be a barrier for the BI projects, operations and strategy. 

    As mentioned before, an organization’s (data) culture is created, managed, nourished, and destroyed through leadership. If the other leaders aren't playing along, each challenge related to data culture and BI will become a concern for the BI manager.

    4) Managing Expectations 

    A business has great expectations from the investment in its BI infrastructure, especially when the vendors promise competitive advantage, real-time access to data and insights, self-service capabilities, etc. Even if these promises are achievable, they represent a potential that needs to be harnessed and there are several premises that need to be addressed continuously. 

    Some BI strategies and/or projects address these expectations from the beginning, though there are many organizations that ignore or don't give them the required importance. Unfortunately, these expectations (re)surface when people start using the infrastructure and this can easily become an acceptance issue. It's the BI manager's responsibility to ensure expectations are managed accordingly.

    5) Building the Right BI Architecture

    For the BI architecture the main driving forces are the shifts in technologies from single servers to distributed environments, from relational tables and data warehouses to delta tables and delta lakes built with the data mesh's principles and product-orientation in mind, which increase the overall complexity considerably. Vendors and data professionals' vision of how the architectures of the future will look like still has major milestones and challenges to surpass. 

    Therefore, organizations are forced to explore the new architectures and the opportunities they bring, however this involves a considerable effort, skilled resources, and more iterations. Conversely, ignoring these trends might prove to be an opportunity lost and eventually duplicated effort on the long term.

    22 March 2024

    🧭Business Intelligence: Monolithic vs. Distributed Architecture (Part III: Architectural Applications)

     

    Business Intelligence
    Business Intelligence Series

    Now considering the 500 houses and the skyscraper model introduced in thee previous post, which do you think will be built first? A skyscraper takes 2-10 years to build, depending on the city in which is built and the architecture characteristics. A house may take 6-12 months depending on similar factors. But one needs to build 500 houses. For sure the process can be optimized when the houses look the same, though there are many constraints one needs to consider - the number of workers, tools, and the construction material available at a given time, the volume of planning, etc. 

    Within a rough estimate, it can take 2-5 years for each architecture to be built considering that on the average the advantages and disadvantages from the various areas can balance each other out. Historical data are in general needed for estimating the actual development time. One can start with a rough estimate and reevaluate the estimates up and down as more information are gathered. This usually happens in Software Engineering as well. 

    Monolith vs. Distributed Architecture
    Monolith vs. Distributed Architecture - 500 families

    There are multiple ways in which the work can be assigned to the contractors. When the houses are split between domains, each domain can have its own contractor(s) or the contractors can be specialized by knowledge areas, or a combination of the two. Contractors’ performance should be the same, though in practice no two contractors are the same. Conversely, the chances are higher for some contractors to deliver at the expected quality. It would be useful to have worked before with the contractors and have a partnership that spans years back. There are risks on both sides, even if the risks might favor one architecture over the other, and this depends also on the quality of the contractors, designs, and planning. 

    The planning must be good if not perfect to assure smooth development and each day can cost money when contractors are involved. The first planning must be done for the whole project and then split individually for each contractor and/or group of buildings. A back-and-forth check between the various plans is needed. Managing by exception can work, though it can also go terribly wrong. 

    Lot of communication must occur between domains to make sure that everything fits together. Especially at the beginning, all the parties must plan together, must make sure that the rules of the games (best practices, policies, procedures, processes, methodologies) are agreed upon. Oversight (governance) needs to happen at a small scale as well on aggregate to makes sure that the rules of the game are followed. 

    Now, which of the architectures do you think will fit a data warehouse (DWH)? Probably multiple voices will opt for the skyscraper, at least this is how a DWH looks from the outside. However, when one evaluates the architecture behind it, it can resemble a residential complex in which parts are bound together, but there are parts that can be distributed if needed. For example, in a DWH the HR department has its own area that's isolated from the other areas as it has higher security demands. There can be 2-3 other areas that don't share objects, and they can be distributed as well. The reasons why all infrastructure is on one machine are the costs associated with the licenses, respectively the reporting tools point to only one address. 

    In data marts based DWHs, there are multiple buildings within the architecture, and thus the data marts can be distributed across a wider infrastructure, with each domain responsible for its own data mart(s). The data marts are by definition domain-dependent, and this is one of the downsides imputed to this architecture. 

    Previous Post <<||>> Next Post

    🧭Business Intelligence: Monolithic vs. Distributed Architecture (Part II: Architectural Choices)

    Business Intelligence
    Business Intelligence Series

    One metaphor that can be used to understand the difference between monolith and distributed architectures, respectively between data warehouses and data mesh-based architectures as per Dehghani’s definition [1] - think that you need to accommodate 500 families (the data products to be built). There are several options: (1) build a skyscraper (developing on vertical) (2) build a complex of high buildings and develop by horizontal and vertical but finding a balance between the two; (3) to split (aka distribute) the second option and create several buildings; (4) build for each family a house, creating a village or a neighborhood. 

    Monolith vs. Distributed Architecture
    Monolith vs. Distributed Architecture - 500 families

    (1) and (2) fit the definition of monoliths, whiles (3) and (4) are distributed architectures, though also in (3) one of the buildings can resemble a monolith if one chooses different architectures and heights for the buildings. For houses one can use a single architecture, agree on a set of predefined architectures, or have an architecture for each house, so that houses would look alike only by chance. One can also opt to have the same architecture for the buildings belonging to the same neighborhood (domain or subdomain). Moreover, the development could be split between multiple contractors that adhere to the same standards.

    If the land is expensive, for example in big, overpopulated cities, when the infrastructure and the terrain allow it, one can build entirely on vertical, a skyscraper. If the land is cheap one can build a house for each family. The other architectures can be considered for everything in between.

    A skyscraper is easier for externals to find (mailmen, couriers, milkmen, and other service providers) though will need a doorman to interact with them and probably a few other resources. Everybody will have the same address except the apartment number. There must be many elevators and the infrastructure must allow the flux of utilities up and down the floors, which can be challenging to achieve.

    Within a village every person who needs to deliver or pick up something needs to traverse parts of the village. There are many services that need to be provided for both scenarios though the difference it will be the time that's needed to move in between addresses. In the virtual world this shouldn't matter unless one needs to inspect each house to check and/or retrieve something. The network of streets and the flux of utilities must scale with the population from the area.

    A skyscraper will need materials of high quality that resist the various forces that apply on the building even in the most extreme situations. Not the same can be said about a house, which in theory needs more materials though a less solid foundation and the construction specifications are more relaxed. Moreover, a house needs smaller tools and is easier to build, unless each house has own design.

    A skyscraper can host the families only when the construction is finished, and the needed certificates were approved. The same can be said about houses but the effort and time is considerably smaller, though the utilities must be also available, and they can have their own timeline.

    The model is far from perfect, though it allows us to reason how changing the architecture affects various aspects. It doesn't reflect the reality because there's a big difference between the physical and virtual world. E.g., parts of the monolith can be used productively much earlier (though the core functionality might become available later), one doesn't need construction material but needs tool, the infrastructure must be available first, etc. Conversely, functional prototypes must be available beforehand, the needed skillset and a set of assumptions and other requirements must be met, etc.

    Previous Post <<||>> Next Post

    References:
    [1] Zhamak Dehghani (2021) Data Mesh: Delivering Data-Driven Value at Scale (book review)

    17 March 2024

    🧭Business Intelligence: Data Products (Part II: The Complexity Challenge)

    Business Intelligence
    Business Intelligence Series

    Creating data products within a data mesh resumes in "partitioning" a given set of inputs, outputs and transformations to create something that looks like a Lego structure, in which each Lego piece represents a data product. The word partition is improperly used as there can be overlapping in terms of inputs, outputs and transformations, though in an ideal solution the outcome should be close to a partition.

    If the complexity of inputs and outputs can be neglected, even if their number could amount to a big number, not the same can be said about the transformations that must be performed in the process. Moreover, the transformations involve reengineering the logic built in the source systems, which is not a trivial task and must involve adequate testing. The transformations are a must and there's no way to avoid them. 

    When designing a data warehouse or data mart one of the goals is to keep the redundancy of the transformations and of the intermediary results to a minimum to minimize the unnecessary duplication of code and data. Code duplication becomes usually an issue when the logic needs to be changed, and in business contexts that can happen often enough to create other challenges. Data duplication becomes an issue when they are not in synch, fact derived from code not synchronized or with different refresh rates.

    Building the transformations as SQL-based database objects has its advantages. There were many attempts for providing non-SQL operators for the same (in SSIS, Power Query) though the solutions built based on them are difficult to troubleshoot and maintain, the overall complexity increasing with the volume of transformations that must be performed. In data mashes, the complexity increases also with the number of data products involved, especially when there are multiple stakeholders and different goals involved (see the challenges for developing data marts supposed to be domain-specific). 

    To growing complexity organizations answer with complexity. On one side the teams of developers, business users and other members of the governance teams who together with the solution create an ecosystem. On the other side, the inherent coordination and organization meetings, managing proposals, the negotiation of scope for data products, their design, testing, etc.  The more complex the whole ecosystem becomes, the higher the chances for systemic errors to occur and multiply, respectively to create unwanted behavior of the parties involved. Ecosystems are challenging to monitor and manage. 

    The more complex the architecture, the higher the chances for failure. Even if some organizations might succeed, it doesn't mean that such an endeavor is for everybody - a certain maturity in building data architectures, data-based artefacts and managing projects must exist in the organization. Many organizations fail in addressing basic analytical requirements, why would one think that they are capable of handling an increased complexity? Even if one breaks the complexity of a data warehouse to more manageable units, the complexity is just moved at other levels that are more difficult to manage in ensemble. 

    Being able to audit and test each data product individually has its advantages, though when a data product becomes part of an aggregate it can be easily get lost in the bigger picture. Thus, is needed a global observability framework that allows to monitor the performance and health of each data product in aggregate. Besides that, there are needed event brokers and other mechanisms to handle failure, availability, security, etc. 

    Data products make sense in certain scenarios, especially when the complexity of architectures is manageable, though attempting to redesign everything from their perspective is like having a hammer in one's hand and treating everything like a nail.

    Previous Post <<||>> Next Post

    14 March 2024

    🧭Business Intelligence: Architecture (Part I: Monolithic vs. Distributed and Zhamak Dehghani's Data Mesh - Debunked)

    Business Intelligence
    Business Intelligence Series

    In [1] the author categorizes data warehouses (DWHs) and lakes as monolithic architectures, as opposed to data mesh's distributed architecture, which makes me circumspect about term's use. There are two general definitions of what monolithic means: (1) formed of a single large block (2) large, indivisible, and slow to change.

    In software architecture one can differentiate between monolithic applications where the whole application is one block of code, multi-tier applications where the logic is split over several components with different functions that may reside on the same machine or are split non-redundantly between multiple machines, respectively distributed, where the application or its components run on multiple machines in parallel.

    Distributed multi-tire applications are a natural evolution of the two types of applications, allowing to distribute redundantly components across multiple machines. Much later came the cloud where components are mostly entirely distributed within same or across distinct geo-locations, respectively cloud providers.

    Data Warehouse vs. Data Lake vs. Lakehouse [2]

    From licensing and maintenance convenience, a DWH resides typically on one powerful machine with many chores, though components can be moved to other machines and even distributed, the ETL functionality being probably the best candidate for this. In what concerns the overall schema there can be two or more data stores with different purposes (operational/transactional data stores, data marts), each of them with their own schema. Each such data store could be moved on its own machine though that's not feasible.

    DWHs tend to be large because they need to accommodate a considerable number of tables where data is extracted, transformed, and maybe dumped for the various needs. With the proper design, also DWHs can be partitioned in domains (e.g. define one schema for each domain) and model domain-based perspectives, at least from a data consumer's perspective. The advantage a DWH offers is that one can create general dimensions and fact tables and build on top of them the domain-based perspectives, minimizing thus code's redundancy and reducing the costs.  

    With this type of design, the DWH can be changed when needed, however there are several aspects to consider. First, it takes time until the development team can process the request, and this depends on the workload and priorities set. Secondly, implementing the changes should take a fair amount of time no matter of the overall architecture used, given that the transformations that need to be done on the data are largely the same. Therefore, one should not confuse the speed with which a team can start working on a change with the actual implementation of the change. Third, the possibility of reusing existing objects can speed up changes' implementation. 

    Data lakes are distributed data repositories in which structured, unstructured and semi-structured data are dumped in raw form in standard file formats from the various sources and further prepared for consumption in other data files via data pipelines, notebooks and similar means. One can use the medallion architecture with a folder structure and adequate permissions for domains and build reports and other data artefacts on top. 

    A data lake's value increases when is combined with the capabilities of a DWH (see dedicated SQL server pool) and/or analytics engine (see serverless SQL pool) that allow(s) building an enterprise semantic model on top of the data lake. The result is a data lakehouse that from data consumer's perspective and other aspects mentioned above is not much different than the DWH. The resulting architecture is distributed too. 

    Especially in the context of cloud computing, referring to nowadays applications metaphorically (for advocative purposes) as monolithic or distributed is at most a matter of degree and not of distinction. Therefore, the reader should be careful!

    Previous Post <<||>> Next Post

    References:
    [1] Zhamak Dehghani (2021) Data Mesh: Delivering Data-Driven Value at Scale (book review)
    [2] Databricks (2022) Data Lakehouse (link)

    21 October 2023

    🧊Data Warehousing: Architecture V (Dynamics 365, the Data Lakehouse and the Medallion Architecture)

    Data Warehousing
    Data Warehousing Series

    An IT architecture is built and functions under a set of constraints that derive from architecture’s components. Usually, if we want flexibility or to change something in one area, this might have an impact in another area. This rule applies to the usage of the medallion architecture as well! 

    In Data Warehousing the medallion architecture considers a multilayered approach in building a single source of truth, each layer denoting the quality of data stored in the lakehouse [1]. For the moment are defined 3 layers - bronze for raw data, silver for validated data, and gold for enriched data. The concept seems sound considering that a Data Lake contains all types of raw data of different quality that needs to be validated and prepared for reporting or other purposes.

    On the other side there are systems like Dynamics 365 that synchronize the data in near-real-time to the Data Lake through various mechanisms at table and/or data entity level (think of data entities as views on top of other tables or views). The databases behind are relational and in theory the data should be of proper quality as needed by business.

    The greatest benefit of serverless SQL pool is that it can be used to build near-real-time data analytics solutions on top of the files existing in the Data Lake and the mechanism is quite simple. On top of such files are built external tables in serverless SQL pool, tables that reflect the data model from the source systems. The external tables can be called as any other tables from the various database objects (views, stored procedures and table-valued functions). Thus, can be built an enterprise data model with dimensions, fact-like and mart-like entities on top of the synchronized filed from the Data Lake. The Data Lakehouse (= Data Warehouse + Data Lake) thus created can be used for (enterprise) reporting and other purposes.

    As long as there are no special requirements for data processing (e.g. flattening hierarchies, complex data processing, high-performance, data cleaning) this approach allows to report the data from the data sources in near-real time (10-30 minutes), which can prove to be useful for operational and tactical reporting. Tapping into this model via standard Power BI and paginated reports is quite easy. 

    Now, if it's to use the data medallion approach and rely on pipelines to process the data, unless one is able to process the data in near-real-time or something compared with it, a considerable delay will be introduced, delay that can span from a couple of hours to one day. It's also true that having the data prepared as needed by the reports can increase the performance considerably as compared to processing the logic at runtime. There are advantages and disadvantages to both approaches. 

    Probably, the most important scenario that needs to be handled is that of integrating the data from different sources. If unique mappings between values exist, unique references are available in one system to the records from the other system, respectively when a unique logic can be identified, the data integration can be handled in serverless SQL pool.

    Unfortunately, when compared to on-premise or Azure SQL functionality, the serverless SQL pool has important constraints - it's not possible to use scalar UDFs, tables, recursive CTEs, etc. So, one needs to work around these limitations and in some cases use the Spark pool or pipelines. So, at least for exceptions and maybe for strategic reporting a medallion architecture can make sense and be used in parallel. However, imposing it on all the data can reduce flexibility!

    Bottom line: consider the architecture against your requirements!

    Previous Post <<||>>> Next Post

    [1] What is the medallion lakehouse architecture?
    https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion

    03 July 2023

    📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" I (Architecture Aspects in Dynamics 365 Finance & Operations)

     

    Data Migrations
    Data Migrations Series

    Introduction

    This weekend I read the chapter 5 on Data Migration (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The section on best practices makes many good points, however some of the practices require further clarifications, while some statements can be questionable as the context associated with them can make an important difference.  Overall however the recommendations hold.

    Concerining Data Migrations (DM), besides a few teachnical recommendations, the author makes also several architectural recommendations that can be summarized as follows:

    (1) put the data into a backup system or database, if possible, and use that system to the data extraction parts of the DM tasks;
    (2) use a Tier 2 system for the majority of the development of the data packages;
    (3) once the data packages validated, they can be used against production environments;
    (4) don’t use the OData protocol for data transfer, but use the Batch API instead;
    (5) don’t use dual-write for DM (technology used for data integrations), first complete the DM and after that enable the dual-write;
    (6)  have a backup of the environments involved;
    (7) have a good internet connection;
    (8) plan an environment for DM (at a 2-tier environment, distinct from the one used for functional testing);
    (9) for the gold configurations have an environment with limited access.

    General Aspects
     
    In a Data Migration there are at least 2 systems involved, though in more complex scenarios there can be one more source systems, respectively one or more target systems. At minimum there is a source and a target system.

    Ideally, a target production environment should not be used for testing the data migration! On the other side, as long there’s a backup with a given state of the system (e.g. only configuration data, without master or transactional data) a system can be always restored to a previous state. This applies to D365 or to any system for which a database backup and restore can be applied. Even so, as best practice it isn’t recommended to use a production environment for testing as this can increase the complexity of the data migration.

    Moreover, the same constraint applies also to the sandbox used for UAT (User Acceptance Testing), given that is supposed to represent at different points in time the same state as the production environment). Thus, at least a third environment will be needed.

    There are no hard constraints on the source systems. Ideally, one should use the production source system(s) or environments that resemble the production environments. A read replica of the respective environment(s) will work as well, given that there are typically only reads involved.

    The downside of accessing directly a production environment for DM is that the data changes frequently, which makes it more difficult to validate the DM logic – the time factor needing to be considered – data being added, deleted or changed. That’s why an environment with a recent snapshot from production would facilitate the process and would make sure that the DM workloads don’t affect production environment’s performance.

    Often, a better alternative would be to have a database in between (aka DM layer) that contains only the data in scope of DM. ETL (Extract Transfer Load) jobs can extract the data on demand and in a consistent manner, this approach assuring a snapshot. This layer can be used to build, test and troubleshoot the DM logic, before Go-Live and after, as issues will be more likely raised by the business and will need to be mitigated.

    There are also scenarios in which the direct access to source systems is not possible, a push, respectively a push & pull scenario being needed. If possible, it would be great if the data needed for migration could be exported directly from the source system(s) as needed by the target system(s). In some scenarios this might be achievable, though the bigger the differences in schemas betweeen the systems and the more complex the data, the more transformations are needed, respectively the more difficult it becomes to achieve this. Therefore, moving such logic to an intermediate DM layer would facilitate the DM architecture allowing to address many of the challenges. 
     
    Batch API
     
    Using Batch API could be a solution when the source environments allow only API access to the data (thus no direct access over SQL scripting) or when the volume of data makes the alternatives unusable. Indeed, OData seems to be slow or unusable when the volume of data exceeds a given threshold, even if the calls can be partitioned.

    Another scenario for Batch API is when the source and target systems need to operate in parallel for a considerable amount of time that would make other approaches unusable. Even if a DM typically involves the replacement of one or more systems, there can be exceptions. Such scenario increases a DM’s complexity by several factors and should be avoided. Even if such scenarios seem to be logical and approachable at first sight, the benefits can be easily outrun by the downsides.

    Backup

    Hopefully, your organization has a backup and restore strategy for the production and other essential environments! The strategy needs to be extended also for the further environments available during the implementation. It’s also true that until Go-Live the target environments don’t suffer many changes. Ideally, a backup should be taken at least when important changes are made to the systems. This can involve the configuration as well the DM. E.g. a setup would be required after the configuration is completed, when the master data, respectively when the transactional data was migrated. A backup of all the systems involved should be taken before Go-Live.

    Gold Configuration

    Having a system with the gold configuration (the values used to configure the system) available can indeed facilitate the implementation and there are two main reasons for this. Primarily, the gold configuration allows to build reliable processes around its maintenance and to minimize the risk of having discrepancies between expectations and reality. Secondly, the database with the gold configuration can be used to easily setup a new environment and this might be needed often than thought (e.g. for dry runs).

    However, in praxis the technical value is easily overrun by the financial aspect as such an environment is barely used and can involve significant costs. As alternative one can use the DAT legal entity from an available environment for storing the gold configuration common across all the legal entities and easily copy it to the other legal entities. In addition, it’s needed to document the deviations, however it’s recommended to document all configurations and use this as baseline for the post-Go-Live changes.
    Indeed, the access to the gold configuration should be restrained as much as possible (e.g. only admin, consultants and/or data owners) and change policies should be enforced. Otherwise, one risks having different configurations between the environments. For Go-Live it is critical that the UAT and Go-Live environments have the same configuration.

    Independently of the approach used to maintain the gold configuration, it’s recommended to perform a comparison between UAT and production environments to make sure that there are no differences. The comparison can be handled also via SQL scripts, the effort being well-spent when such comparisons needed to be done several times. Even if the data from production isn’t directly accessible, a snapshot of the production database can be copied in another environment. However, this approach requires a good understanding of the tables and/or entities involved. There will be cases (e.g. module parameters) in which it’s easier to perform a manual comparison.

    Wrap Up

    Coming back to the recommendations, the only points that require some discussion are (1), (2) and maybe (8), while (9) was discussed above (see 'Gold configuration' section).

    The recommendation of putting the data into a backup system or database is too vague. A backup system can mean a backup database that can be accessible typically only over DRBMS or an instance of the system having a copy of the data (which usually implies a RDBMS as well). Besides these, a database can refer to a read replica of source system's database or to a DM layer.

    Besides price and performance, the main differences between a Tier-1 and a Tier-2 environment (see also the Microsoft documentation) rely in the number of VM machines (aka boxes) involved, how the various components are distributed between them, respectively the edition of SQL Server used. Otherwise, for the users the system will look the same. The most important constraint is that a Tier-1 isn't suitable for UAT or performance testing. In other words, the environment will be slow for concurrent use.

    If the performance is acceptable, if the volume of data and the number of users is small, a Tier-1 environment can be used for building data packages, performing initial DM dry-runs and other tasks. However, a Tier-2 resembles closer the production environment and if the UAT is performed using such a system, the more likely is to identify and address the bottlenecks related to performance. Unless they accept the costs blindly, the customers will need to trade between performance and costs from the perspective of their requirements and their business context. 

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.