SQL Troubles: UAT

Showing posts with label UAT. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

Enterprise Content Publishing [2]

[Microsoft Fabric] Power BI Environments

{def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
{environment} development

allows to develop the solution
accessible only to the development team

via Contributor access

{recommendation} use Power BI Desktop as local development environment

{benefit} allows to try, explore, and review updates to reports and datasets

once the work is done, upload the new version to the development stage

{benefit} enables collaborating and changing dashboards
{benefit} avoids duplication

making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication

{recommendation} use version control to keep the .pbix files up to date

[OneDrive] use Power BI's autosync

{alternative} SharePoint Online with folder synchronization
{alternative} GitHub and/or VSTS with local repository & folder synchronization

[enterprise scale deployments]

{recommendation} separate dataset from reports and dashboards’ development

use the deployment pipelines selective deploy option [22]
create separate .pbix files for datasets and reports [22]

create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
create .pbix only for the report, and connect it to the published dataset using a live connection [22]

{benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently

{recommendation} separate data model from report and dashboard development

allows using advanced capabilities

e.g. source control, merging diff changes, automated processes

separate the development from test data sources [1]

the development database should be relatively small [1]

{recommendation} use only a subset of the data [1]

⇐ otherwise the data volume can slow down the development [1]

{environment} user acceptance testing (UAT)

test environment that within the deployment lifecycle sits between development and production

it's not necessary for all Power BI solutions [3]
allows to test the solution before deploying it into production

all tests must have

View access for testing
Contributor access for report authoring

involves business users who are SMEs

provide approval that the content

is accurate
meets requirements
can be deployed for wider consumption

{recommendation} check report’s load and the interactions to find out if changes impact performance [1]
{recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
{recommendation} test data refresh in the Power BI service regularly during development [20]

{environment} production

{concept} staged deployment

{goal} help minimize risk, user disruption, or address other concerns [3]

the deployment involves a smaller group of pilot users who provide feedback [3]

{recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]

allows ensuring the data in production is always connected and available to users [1]

{recommendation} don’t upload a new .pbix version directly to the production stage

⇐ without going through testing

{feature|preview} deployment pipelines

enable creators to develop and test content in the service before it reaches the users [5]

{recommendation} build separate databases for development and testing

helps protect production data [1]

{recommendation} make sure that the test and production environment have similar characteristics [1]

e.g. data volume, sage volume, similar capacity
{warning} testing into production can make production unstable [1]
{recommendation} use Azure A capacities [22]

{recommendation} for formal projects, consider creating an environment for each phase
{recommendation} enable users to connect to published datasets to create their own reports
{recommendation} use parameters to store connection details

e.g. instance names, database names
⇐ deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages

alternatively data source rules can be used to specify a connection string for a given dataset

{restriction} in deployment pipelines, this isn't supported for all data sources

{recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
{recommendation} provide data to self-service authors from a centralized data warehouse [20]

allows to minimize the amount of work that self-service authors need to take on [20]

{recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
{recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
{recommendation} be aware of API connectivity issues and limits [20]
{recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
{recommendation} minimize the query load on source systems [20]

use incremental refresh in Power BI for the dataset(s)
use a Power BI dataflow that extracts the data from the source on a schedule
reduce the dataset size by only extracting the needed amount of data

{recommendation} expect data refresh operations to take some time [20]
{recommendation} use relational database sources when practical [20]
{recommendation} make the data easily accessible [20]
[knowledge area] knowledge transfer

{recommendation} maintain a list of best practices and review it regularly [24]
{recommendation} develop a training plan for the various types of users [24]

usability training for read only report/app users [24
self-service reporting for report authors & data analysts [24]
more elaborated training for advanced analysts & developers [24]

[knowledge area] lifecycle management

consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
{recommendation} postfix files with 3-part version number in Development stage [24]

remove the version number when publishing files in UAT and production

{recommendation} backup files for archive
{recommendation} track version history

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]

[2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
[3] Microsoft Learn (2024) Deploy to Power BI [link]
[4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]

[20] Microsoft (2020) Planning a Power BI Enterprise Deployment [White paper] [link]

[22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]

[24] Paul Turley (2019) A Best Practice Guide and Checklist for Power BI Projects

Resources:

Acronyms:
API - Application Programming Interface
CLM - Content Lifecycle Management
COE - Center of Excellence
SaaS - Software-as-a-Service
SME - Subject Matter Expert
UAT - User Acceptance Testing
VSTS - Visual Studio Team System
SME - Subject Matter Experts

07 March 2024

📦Data Migrations (DM): The SQL Server Perspective (Licensing Costs and Edition Choices)

Data Migration Series

A Data Migration (DM) moves all or a subset of the data available from one or more system(s) into other system(s). For this purpose, especially in ERP Implementations, one can use a SQL Server as intermediate layer, where SSIS can be used for the data extraction and exporting, SSRS for reporting the errors, while the database engine for the heavy processing. Master Data and Data Quality Services can be used as well in certain scenarios. Therefore, SQL Server allows by design to address the various challenges related to a DM. At high level the architecture can be depicted as follows:

Data Migration Architecture

Once the decision to go with SQL Server for the DM layer is made, one needs to define which edition to use. If the DM doesn't have special requirements, one can use for it an available SQL Server instance, as long as the cumulated workloads don't create major issues. Therefore, in the past I used existing licensed versions of SQL Server to build solutions for DMs in ERP implementations, though I evaluated in each project whether it's possible to reduce the costs and remain compliant with the license requirements.

Of course, there's always the alternative of using SQL Server Express which supports databases with a maximum of 10 GB, which should be enough for most of DMs, though it has also further limitations (see [2]). There are also ways of moving around existing limitations, like splitting the logic across multiple databases.

Then there's the SQL Server Developer edition, which involves no license costs, has the full SQL Server functionality available, and can be used to build and test applications. In a recent post [1], Bob Ward, principal architect at Microsoft made several clarifications on the licenses for the Developer edition, which is "licensed for development, test, and demonstration purposes only" and "may not be used in a production environment”. Bob Ward makes the following clarifications:
(1) "Production environments include any system that is accessed by end-users for anything more than acceptance testing, environments that connects to production systems (such as Linked servers), disaster recovery or backups of production systems, and environments that are 'rotated' into production at any point in time." [1]
(2) One "cannot use Developer edition to build test data and move that same data into production" [1].
(3) One can "restore a production set of data backup for testing purposes" [1].

There are two-three impediments for using the Developer edition completely for a DM. The first, at least during Go Live and UAT, one needs to work with data coming directly from the various production environments. Secondly, the data generated by the solution are used primarily for UAT and in a second step for Production, which seems to be against the rule (2), or at least it's a grey area (which might be overlooked by Microsoft). Thirdly, some data from the production environment might need to be imported back into the DM layer for validation or enhancing the entities with data generated in the target systems.

In what concerns the first issue, the DM solution can always point to the test environments used as source, following that during UAT to copy the databases from production into the test environments. This might be anyway necessary for other purposes. Otherwise, the effort might be considerable and not working in the last phases with the data timeliness might raise other concerns.

The second issue is a matter of interpretation. The UAT phase makes sure that the data generated by the DM solution respects the criteria for Go Live. If there are no issues, the same data can be used for Go-Live. If for this is required another licensed edition, then an environment can be built only for UAT and Go Live, project phases which usually span over a couple of weeks, unless multiple migrations need to be performed at different time intervals. If the environments are in the cloud, probably the instances can be turned on and off on a as-needed basis.

One can plan for different environments between Production and Development and the environments can be on the same SQL Server as distinct databases, respectively use the Developer edition for Development, and use a different licensed edition for UAT and Production. This approach involves additional overhead in synchronizing the logic between environments. Conversely, in the case of the DM layer, the same environment can be used from beginning to the end, while the code should/must be backed-up periodically. For multiple migrations based on the same data, one should archive the data after each migration or important phase.

For the scenarios in which after migration the data are copied back to the DM solution, it's enough to have these steps performed against the UAT target system(s). This should work as long there are no differences in configuration between UAT and Production. There are however exceptions, e.g. data generated by the target systems, for which the values between Prod and UAT are different. At least in Dynamics 365 one can attempt to generate the values in the DM layer and import them as they are into the target system. It worked for many scenarios, though there can be exceptions here as well.

A more complex scenario is when data from the DM layer needs to be exported to Data Warehouses or similar solutions that can be considered as Production systems. Here a licensed edition seems to be mandatory. For other scenarios in which Master Data and/or Data Quality Services are needed, there's only the option to use the Enterprise or Developer editions.

To summarize, to reduce the overall costs for the DM, consider using an existing licensed SQL Server instance for building the solution. If separates environments need to be built, the Express edition might have some limitations though it can prove to be a viable solutions in many cases. Otherwise, consider the above workarounds for using the Developer edition, including the scenario in which distinct environments are used for Production and Development.

Resources:
[1] Microsoft Data Platform (2024) How SQL developers can maximize savings, by Bob Ward (link)
[2] Microsoft Learn (2024) Editions and supported features of SQL Server 2022 (link)
[3] Microsoft Learn (2023) Master Data Services and Data Quality Services Features Support (link)

03 October 2023

🧮ERP: Implementations (Part VII: The Process Seems to be Broken)

ERP Implementations Series

Participating in several ERP implementations, one has the expectation that things will change for the better when moving from one implementation to another. Things change positively in certain areas as experience is integrated, though on average the overall performance seems to be the same. Thus, one may wonder, how can this happen? Of course, there are so many explanations - what went wrong, what could have been done better, and the list is usually quite big. However, the history repeats in the next implementation. Something seems to be broken, or maybe this is the way implementations should work, though I doubt this!

An ERP implementation starts with a need and the customer usually has an idea of what the respective need is about. It might even have a set of high-level or even low-level requirements, which should be the case when starting on such a journey. Then the customer selects an implementation partner, event followed by a period of discovery in which the partner learns more about the business including the overall infrastructure, business processes, data and people. Once the requirements are available, the partner can evaluate them to identify the deviations from the standard functionality available and that translate into customizations, sketch solutions, respectively make a first estimate of the costs and resources needed.

Of course, there can be multiple iterations of the process in which the requirements are reviewed, reevaluated, justified, prioritized by all parties and a common understanding, respectively an agreement on the scope and expectations is reached. In the process some requirements are dropped, others are modified or postponed for a later phase or later phases. The whole process can take a few months, though it’s mandatory for creating a workable estimate used as basis for the statement of work and the overall contract.

In parallel the parties can also work on a project plan and agree upon a project methodology, following that once the legal paperwork is signed, resources to be allocated to the project. A common practice is then for the functional consultants to generate based on the requirements a set of documents - functional design documents (FDD), process diagrams - that should be used as basis for the setup, for programming the customizations and User acceptance testing (UAT). Of course, the documents need to be reviewed by the business, gaps or misunderstandings mitigated, and this takes several iterations until the business can sign-off on the respective documents. It’s the point where the setup and programming can start, usually half a year, or even a year or more after the initial steps.

Depending on the scope, in the best-case scenario the setup will take one to two months, at least until having a system ready for UAT with business data as needed for Go-Live. The agreed customizations can translate in further months and effort not only for programming, but also for testing, reviewing and further mitigations. This would be the time when many of the key users see for the first time a working version of the system, which frankly might be too late. Of course, they read and reread the FDDs, though until this point everything was very abstract and no matter how good such documents were written, they can’t replace the hand-on experience with working with the system, discovering the functionality, understanding how it works.

In the best-case scenario, the key-users are satisfied with the results and the UAT, respectively Go-Live can go on as planned, however the expectations for first time right are seldom (never) met. Further iterations and delays are then involved. Overall, the process doesn’t seem to be efficient!

Previous <<||>> Next

04 February 2021

📦Data Migrations (DM): Conceptualization (Part VI: Data Migration Layer)

Data Migrations Series

Besides migrating the master and transactional data from the legacy systems there are usually three additional important business requirements for a Data Migration (DM) – migrate the data within expected timeline, with minimal disruption for the business, respectively within expected quality levels. Hence, DM’ timeline must match and synchronize with main project’s timeline in terms of main milestones, though the DM needs to be executed typically within a small timeframe of a few days during the Go-Live. In what concerns the third requirement, even if the data have high quality as available in the source systems or provided by the business, there are aspects like integration and consistency that rely primarily on the DM logic.

To address these requirements the DM logic must reach a certain level of performance and quality that allows importing the data as expected. From project’s beginning until UAT the DM team will integrate the various information iteratively, will need to test the changes several times, troubleshoot the deviations from expectations. The volume of effort required for these activities can be overwhelming. It’s not only important for the whole solution to be performant but each step must be designed so that besides fast execution, the changes and troubleshooting must involve a minimum of overhead.

For better understanding the importance, imagine a quest game in which the character has to go through a labyrinth with traps. If the player made a mistake he’ll need to restart from a certain distant point in time or even from the beginning. Now imagine that for each mistake he has the possibility of going one step back try a new option and move forward. For some it may look like cheating though in this way one can finish the game relatively quickly. It would be great if executing a DM could allow the same flexibility.

Unfortunately, unless the data are stored between steps or each step is a different package, an ETL solution doesn’t provide the flexibility of changing the code, moving one step behind, rerunning the step and performing troubleshooting, and this over and over again like in the quest game. To better illustrate the impact of such approach let’s consider that the DM has about 40 entities and one needs to perform on average 20 changes per entity. If one is able to move forwards and backwards probably each change will take about a few minutes to execute the code. Otherwise rerunning a whole package can take 5-10 times or even more as this can depend on packages’ size and data volume. For 800 changes only an additional minute per change equates with 800 minutes (about 13 hours).

In exchange, storing the data for an entity in a database for the important points of the processing and implementing the logic as a succession of SQL scripts allows this flexibility. The most important downside is that the steps need to be executed manually though this is a small price to pay for the flexibility and control gained. Moreover, with a few tricks one can load deltas as in the case of a phased DM.

To assure that the consistency of the data is kept one needs to build for each entity a set of validation queries that check for duplicates, for special cases, for data integrity, incorrect format, etc. The queries can be included in the sequence of logic used for the DM. Thus, one can react promptly to each unexpected value. When required, the validation rules can be built within reports and used in the data cleaning process by users, or even logged periodically per entity for tracking the progress.

Previous Post <<||>> Next Post

01 February 2021

📦Data Migrations (DM): Quality Assurance (Part IV: Quality Acceptance Criteria IV)

Data Migrations Series

Reliability

Reliability is the degree to which a solution performs its intended functions under stated conditions without failure. In other words, a DM is reliable if it performs what was intended by design. The data should be migrated only when migration’s reliability was confirmed by the users as part of the sign-off process. The dry-runs as well the final iteration for the UAT have the objective of confirming solution’s reliability.

Reversibility

Reversibility is the degree to which a solution can return to a previous state without starting the process from the beginning. For example, it should be possible to reverse the changes made to a table by returning to the previous state. This can involve having a copy of the data stored respectively deleting and reloading the data when necessary.

Considering that the sequence in which the various activities is fix, in theory it’s possible to address reversibility by design, e.g. by allowing to repeat individual steps or by creating rollback points. Rollback points are especially important when loading the data into the target system.

Robustness

Robustness is the degree to which the solution can accommodate invalid input or environmental conditions that might affect data’s processing or other requirements (e,g. performance). If the logic can be stabilized over the various iterations, the variance in data quality can have an important impact on a solutions robustness. One can accommodate erroneous input by relaxing schema’s rules and adding further quality checks.

Security

Security is the degree to which the DM solution protects the data so that only authorized people have access to the respective data to the defined level of authorization as data are moved through the solution. The security provided by a solution needs to be considered against the standards and further requirements defined within the organization. In case no such standards are available, one can in theory consider the industry best practices.

Scalability

Scalability is the degree to which the solution is able to respond to an increased workload. Given that the number of data considered during the various iterations vary in volume, a solution’s scalability needs to be considered in respect to the volume of data to be migrated.

Standardization

Standardization is the degree to which technical standards were implemented for a solution to guarantee certain level of performance or other aspects considered as import. There can be standards for data storage, processing, access, transportation, or other aspects associated with the migration processes. Moreover, especially when multiple DMs are in scope, organizations can define a set of standards and guidelines that should be further considered.

Testability

Testability is the degree to which a solution can be tested in the respect to the set of functional and data-related requirements. Even if for the success of a migration are important the data in their final form, to achieve that is needed to validate the logic and test thoroughly the transformations performed on the data. As the data go trough the data pipelines, they need to be tested in the critical points – points where the data suffer important transformations. Moreover, one can consider record counters for the records processed in each such critical point, to assure that no record was lost in the process.

Traceability

Traceability is the degree to which the changes performed on the data can be traced from the target to the source systems as record, respectively at entity level. In theory, it’s enough to document the changes at attribute level, though upon case it might needed to document also the changes performed on individual values.

Mappings at attribute level allow tracing the data flow, while mappings at value level allow tracing the changes occurrent within values.

Previous Post <> Next Post

📦Data Migrations (DM): Quality Assurance (Part III: Quality Acceptance Criteria III)

Data Migrations Series

Repeatability

Repeatability is the degree with which a DM can be repeated and obtain consistent results between repetitions. Even if a DM is supposed to be a one-time activity for a project, to guarantee a certain level of quality it’s important to consider several iterations in which the data requirements are refined and made sure that the data can be imported as needed into the target system(s). Considered as a process, as long the data and the rules haven’t changed, the results should be the same or have the expected level of deviation from expectations.

This requirement is important especially for the data migrated during UAT and Go-Live, time during which the input data and rules need to remain frozen (even if small changes in the data can still occur). In fact, that’s the role of UAT – to assure that the data have the expected quality and when compared to the previous dry-run, that it attains the expected level of consistency.

Reusability

Reusability is the degree to which the whole solution, parts of the logic or data can be reused for multiple purposes. Master data and the logic associated with them have high reusability potential as they tend to be referenced by multiple entities.

Modularity

Modularity is the degree to which a solution is composed of discrete components such that a change to one component has minimal impact on other components. It applies to the solution itself but also to the degree to which the logic for the various entities is partitioned so to assure a minimal impact.

Partitionability

Partitionability is the degree to which data or logic can be partitioned to address the various requirements. Despite the assurance that the data will be migrated only once, in practice this assumption can be easily invalidated. It’s enough to increase the system freeze by a few days and/or to have transaction data that suddenly requires master data not considered. Even if the deltas can be migrated in system manually, it’s probably recommended to migrate them using the same logic. Moreover, the performing of incremental loads can be a project requirement.

Data might need to be partitioned into batches to improve processing’s performance. Partitioning the logic based on certain parameters (e.g. business unit, categorical values) allows more flexibility in handling other requirements (e.g. reversibility, performance, testability, reusability).

Performance

Performance refers to the degree a piece of software can process data into an amount of time considered as acceptable for the business. It can vary with the architecture and methods used, respectively data volume, veracity, variance, variability, or quality.

Performance is a critical requirement for a DM, especially when considering the amount of time spent on executing the logic during development, tests and troubleshooting, as well for other activities. Performance is important during dry-runs but more important during Go-Live, as it equates with a period during which the system(s) are not available for the users. Upon case, a few hours of delays can have an important impact on the business. In extremis, the delays can sum up to days.

Predictability

Predictability is the degree to which the results and behavior of a solution, respectively the processes involve are predictable based on the design, implementation or other factors considered (e.g. best practices, methodology used, experience, procedures and processes). Highly predictable solutions are desirable, though reaching the required level of performance and quality can be challenging.

The results from the dry-runs can offer an indication on whether the data migrated during UAT and Go-Live provide a certain level of assurance that the DM will be a success. Otherwise, an additional dry-run should be planned during UAT, if the schedule allows it.

Previous Post <> Nest Post

SQL Troubles

Pages

26 April 2025

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

07 March 2024

📦Data Migrations (DM): The SQL Server Perspective (Licensing Costs and Edition Choices)

03 October 2023

🧮ERP: Implementations (Part VII: The Process Seems to be Broken)

04 February 2021

📦Data Migrations (DM): Conceptualization (Part VI: Data Migration Layer)

01 February 2021

📦Data Migrations (DM): Quality Assurance (Part IV: Quality Acceptance Criteria IV)

📦Data Migrations (DM): Quality Assurance (Part III: Quality Acceptance Criteria III)

About Me