SQL Troubles

03 February 2021

📦Data Migrations (DM): Conceptualization (Part IV: Data Access)

Data Migrations Series

Once the data sources for a Data Migration (DM) were identified the first question is how the data can be accessed. The legacy systems relying on ODBC-based databases are in theory relatively easy to access as long they allow the direct access to their data, which would enable thus a pull strategy. Despite this, there are organizations that don’t allow the direct access to the data even for read-only operations, being preferred to push the data directly to the consumers (aka push strategy) or push the data to a given location from where the consumer can use the data as needed (aka hybrid strategy).

The direct access to the data allows in theory the best flexibility as the solution can extract the data when needed and this especially important during the initial phases of the project when the data need to be pulled more frequently until the requirements and logic is stabilized. A push strategy tends to add additional overhead as usually somebody else oversees the data exports, respectively the data need to be prepared in the expected format. On the other side, it would make sense to make an exception for a DM and allow the direct access to the data.

Hybrid strategies tend to be more complex and require additional resources or overhead as the data are stored temporarily at a separate location. Unfortunately, in certain scenarios this is the only approach can be used. Are preferred data files that allow keeping the integrity of the data and facilitate data consumption. Therefore, tabular text files or JSON files are preferred in the detriment of XML or Excel files. It’s preferable to export one data structure individually then storing parent-child solutions even if the latter can prove to be useful in certain scenarios. When there’s no other solution one can use also the standard reports available in the legacy systems.

When storing data outside the legacy systems for further processing it’s recommended to follow organization’s best practices, respectively to address the data security and privacy requirements. ETL tools allows accessing data from password protected areas like FTP, OneDrive or SharePoint. The fewer security layers in between the lower is in theory the overhead. Therefore, given its stability and simplicity FTP might prove to be a better storage solution than OneDrive, SharePoint or other similar technologies.

Ideally the extraction/export mechanisms should use the database objects that encapsulate already the logic in the legacy systems otherwise the team will need to reengineer the logic – for master data this can prove to be easy, though the logic of transactional data like on-hand or open invoices can be relatively complex to reengineer. Otherwise, the logic can be implemented directly in the extraction/export mechanisms or sometimes is more advisable to create database objects (usually on a different schema) on the legacy systems and just call the respective objects.

When connecting directly to the data source it’s advisable using the data provider which allows the best performance and flexibility, however several tests might be needed to determine the best fit. It would be useful to check the limitations of each provider and find a stable driver version. OLEDB and ADO.Net data providers provide in general a good performance, though native drivers of the legacy systems can be a better option upon case.

Some legacy systems allow the access to their data only via service-based technologies like OData. OData tends to have poor performance for large data exports than standard access methods and therefore not indicated in such scenarios. In such cases might be a good idea to export the data directly from the legacy system.

Previous Post <<||>> Next Post

📦Data Migrations (DM): Conceptualization (Part III: Heuristics)

Probably one of the most difficult things to learn as a technical person is using the right technology for a given purpose, this mainly because one’s inclined using the tools one knows best. Moreover, technologies’ overlapping makes the task more and more challenging, the difference between competing technologies often residing in the details. Thus, identifying the gaps resumes in understanding the details of the problem(s) or need(s), respectively the advantages or disadvantages of a technology over the other. This is true especially about competing technologies, including the ones that replace other technologies.

There are simple heuristics, that can allow approaching such challenges. For example, heavy data processing belongs usually in databases, while import/export functionality belongs in an ETL tool. Therefore, one can start looking at the problems from these two perspectives. Would the solution benefit from these two approaches or are there more appropriate technologies (e.g. data streaming, ELT, non-relational databases)? How much effort would involve building the solution?

Commercial Off-The-Shelf (COTS) tools provided by third-party vendors usually offer specialized functionality in each area. Gartner and Forrester provide regular analyses of the main players in the important areas, analyses which can be used in theory as basis for further research. Even if COTS tend to be more expensive and can have some important functionality gaps, as long they are extensible, they can prove a good starting point for developing a solution.

Sometimes it helps researching on the web what other people or organizations did, how they approached the same aspects, what technologies, techniques and best practices they used to overcome the challenges. One doesn’t need to reinvent the wheel even if it’s sometimes fun to do so. Moreover, a few hours of research can give one a basis of useful information and a better understanding over the work ahead.

On the other side sometimes it’s advisable to use the tools one knows best, however this can lead also to unusable and less performant solutions. For example, MS Excel and Access have been for years the tools of choice for building personal solutions that later grew into maintenance nightmares for the IT team. Ideally, they can still be used for data entry or data cleaning, though building solutions exclusively based on (one of) them can prove to be far than optimal.

When one doesn’t know whether a technology or mix of technologies can be used to provide a solution, it’s recommended to start a proof-of-concept (PoC) that would allow addressing most important aspects of the needed solution. One can start small by focusing on the minimal functionality needed to check the main aspects and evolve the PoC during several iterations as needed.

For example, in the case of a Data Migration (DM) this would involve building the data extraction layer for an entity, implement several data transformations based on the defined mappings, consider building a few integrity rules for validation, respectively attempt importing the data into the target system. Once this accomplished, one can start increasing the volume of data to check how the solution behaves under stress. The volume of data can be increased incrementally or by considering all the data available.

As soon the skeleton was built one can consider all the mappings, respectively add several entities to build the dependencies existing between them and other functionality. The prototype might not address all the requirements from the beginning, therefore consider the problems as they arise. For example, if the volume of data seems to cause problems then attempt splitting the data during processing in batches or considering specific optimization techniques like indexing or scaling techniques like increasing computing resources.

Previous Post <<||>> Next Post

📦Data Migrations (DM): Conceptualization (Part II: Plan vs. Concept vs. Strategy)

Data Migrations Series

A concept is a document that describes at high level the set of necessary steps and their implications to achieve a desired result, typically making the object of a project. A concept is usually needed to provide more technical and nontechnical information about the desired solution, the context in which a set of steps are conducted, respectively the changes considered, how the changes will be implemented and the further aspects that need to be considered. It can include a high-level plan and sometimes also information that typically belong in a Business Case – goals,objectives, required resources, estimated effort and costs, risks and opportunities.

A concept is used primarily as basis for sign-off as well for establishing common ground and understanding. When approved, it’s used for the actual implementation and solution’s validation. The concept should be updated as the project progresses, respectively as new information are discovered.

Creating a concept for a DM can be considered as best practice because it allows documenting the context, the technical and organizational requirements and dependencies existing between the DM and other projects, how they will be addressed. The concept can include also a high-level plan of the main activities (following to be detailed in a separate document).

Especially when the concept has an exploratory nature (due to incomplete knowledge or other considerations), it can be validated with the help of a proof-of-concept (PoC), the realization of a high-level-design prototype that focuses on the main characteristics of the solution and allows thus identifying the challenges. Once the PoC implemented, the feedback can be used to round out the concept.

Building a PoC for a DM should be considered as objective even when the project doesn’t seem to meet any major challenges. The PoC should resume in addressing the most important DM requirements, ideally by implementing the whole or most important aspects of functionality (e.g. data extraction, data transformations, integrity validation, respectively the import into the target system) for one or two data entities. Once the PoC built, the team can use it as basis for the evolutive development of the solution during the iterations considered.

A strategy is a set of coordinated and sustainable actions following a set of well-defined goals, actions devised into a plan and designed to create value and overcome further challenges. A strategy has the character of a concept though it has a broader scope being usually considered when multiple projects or initiatives compete for the same resources to provide a broader context and handle the challenges, risks and opportunities. Moreover, the strategy takes an inventory of the current issues and architecture – the 'AS-IS' perspective and sketches the to 'TO-BE' perspective by devising a roadmap that bridges the gap between the two.

In the case of a DM a strategy might be required when multiple DM projects need to be performed in parallel or sequentially, as it can help the organization to better manage the migrations.

A plan is a high-level document that describes the tasks, schedule and resources required to carry on an activity. Even if it typically refers to the work or product breakdown structure, it can cover other information usually available in a Business Case. A project plan is used to guide both project execution and project control, while in the context of Strategic Management the (strategic) plan provides a high-level roadmap on how the defined goals and objectives will be achieved during the period covered by the strategy.

For small DM projects a plan can be in theory enough. As both a strategy and a concept can include a high-level plan, the names are in praxis interchangeable.

Previous Post <<||>> Next Post

02 February 2021

📦Data Migrations (DM): Conceptualization I (Goals, Objectives & Requirements)

Data Migrations Series

One of the nowadays’ challenges is finding the right mix of technologies that allows building a solution for a business need. There are so many choices and the responsible person is easily tempted to use one of the trending technologies just because he wants to learn something new or the technologies seem to fit into the bigger picture, which probably in many cases it would be acceptable. Unfortunately, there’s also the tendency of picking a technology without looking at what functionality it provides, respectively whether the functionality meets intended solutions’ requirements. Moreover, the requirements are sometimes barely defined at the appropriate level of detail, fact that makes from the implementation project a candidate for failure. Sometimes even the goals and objectives aren’t clearly stated, fact that can make a project’s success easily questionable from the beginning.

A goal is a general statement that reflects the desired result toward which an organization’s effort needs to be directed. For example, a Data Migration (DM)’s primary goal can be formulated as 'to make available all the master and transactional data needed by the business from the legacy systems to the target system(s) within expected timeline and quality with a minimal disruption for the business'.

An objective is a break down of the goal into several components that should foster a clear understanding on how the goal will be achieved. Ideally the objectives should be SMART (specific, measurable, attainable, relevant, time-bound), even if measurable objectives are sometimes hard to define properly. One can consider them as the tactics used in achieving the goal. For example, the above formulated goal can be broken down into the following objectives:

Build a DM concept/strategy
Build a flexible and performant infrastructure for DM that can be adapted to further requirements
Provide a basis for further DMs
Align DM and main project’s requirements and activities
Provide an interface and support for the Data Management areas
Foster trust, transparency and awareness
Address internal/external compliance requirements
Document and communicate accountability for the various activities
Cleanse and enrich the data needed by the target system
Archive the DM and project data

One can attempt defining the objectives directly from the goal(s), though unless one is aware of all the implication a DM has, more likely one will be forced to define and evaluate the individual functional and nonfunctional requirements for the DM first, and attempt consolidating the requirements into a set of objectives. In the end it can be a combination of both, in which some objectives are first formulated, the requirements are defined and evaluated, respectively the objectives are refined to accommodate the requirements.

ISO 9126, an international standard for the evaluation of software quality, defines about 45-50 attributes that can be used for addressing the requirements of software solutions, attributes that reflect functionality, reliability, usability, efficiency, and maintainability characteristics. One can start with such a list and identify how important are the respective attributes for the solution. The next step would be to document the requirements into a consolidated list by providing a short argumentation for their use, respectively how they will be addressed as part of the solution. The process can prove to be time-consuming, however it is a useful exercise that usually needs to be done only once and be reviewed occasionally.

The list can be created independently of any other documentation or be included directly into a concept or strategy. The latter will assure in theory that the document provides a unitary view of the migration, considering that each new or obsolete requirement can impact the concept.

Previous Post <<||>> Next Post

01 February 2021

📦Data Migrations (DM): Quality Assurance (Part V: Quality Acceptance Criteria V)

Efficiency

Efficiency is the degree to which a solution uses the hardware (storage, network) and other organizational resources to fulfill a given task. Data characterized by high volume, velocity, variety and veracity can be challenging to process, requiring upon case more processing power. Therefore, the DM solutions need to consider these aspects as well. However, efficiency refers on whether the available resources are used efficiently – the waste in terms of resource utilization is minimal.

On the other side the waste of resources can be acceptable when there are other benefits or requirements that need to be considered, respectively when the ratio between resources utilization and effort to built more efficient processes is acceptable.

A DM solution involves iterative and exploratory processes in which knowledge and feedback is integrated in each iteration, therefore it might look like resources are not used efficiently. However, this is a way to handle complexity and uncertainty by breaking the effort in manageable chunks.

Learnability

Learnability is the degree to which a person can become familiar with a solution’s use, the data and the processes associated with it. A DM can be challenging for many technical and non-technical resources as it requires a certain level of skillset and understanding of the requirements, needs and deliverables. The complexity of the data and requirements can be overwhelming, however with appropriate communication and awareness established, the challenges can be overcome.

Stability

Stability is the degree to which a solution is sensitive to environment changes (e.g. overuse of resource, hardware or software failures, updates), respectively on whether it performs with no performance defects or it does not crash under defined levels of stress. Stability can be monitored during the various phases and countermeasures need to be considered in case the solution is not stable enough (e.g. redesigning the solution, breaking the data in smaller chunks)

Suitability

Suitability is the degree to which a solutions provides functions that meet the stated and implied needs. No matter how performant and technologically advanced a solution is, it brings less value as long it doesn’t perform what it was intended to do.

Transparency

Transparency is the degree to which a solution’s stakeholders have access to the requirements, processes, data, documentation, or other information required by them. In a DM transparency is important especially important in respect to the data, logic and rules used in data processing, respectively the number of records processed.

Trustability

Trustability is the degree to which a solution can be trusted to provide the expected results. Even if the technical team assures that the solution can deliver what was indented, the success of a DM is a matter of perception from stakeholders’ perspective. Providing transparency into the data, rules and processes can improve the level of trust however, special attention need to be given to the issues raised by stakeholders during and after Go-Live, as differences need to be mitigated.

Understandability

Understandability is the degree to which the requirements of a solution were understood by the resources involved in terms of what needs to be performed. For the average project resource it might be challenging to understand the implications of a DM, and this can apply to technical as well non-technical resources. Making people aware of the implications is probably one of the most important criteria for success, as the success of a migration is often a matter of perception.

Usability

Usability is the degree to which a solution can be used by the targeted users within the agreed context of usage. Ideally DM solutions need to be easy to use, though there are always trade-offs. In the end, a DM must fit the purpose it was built for.

Previous Post <

📦Data Migrations (DM): Quality Assurance (Part IV: Quality Acceptance Criteria IV)

Data Migrations Series

Reliability

Reliability is the degree to which a solution performs its intended functions under stated conditions without failure. In other words, a DM is reliable if it performs what was intended by design. The data should be migrated only when migration’s reliability was confirmed by the users as part of the sign-off process. The dry-runs as well the final iteration for the UAT have the objective of confirming solution’s reliability.

Reversibility

Reversibility is the degree to which a solution can return to a previous state without starting the process from the beginning. For example, it should be possible to reverse the changes made to a table by returning to the previous state. This can involve having a copy of the data stored respectively deleting and reloading the data when necessary.

Considering that the sequence in which the various activities is fix, in theory it’s possible to address reversibility by design, e.g. by allowing to repeat individual steps or by creating rollback points. Rollback points are especially important when loading the data into the target system.

Robustness

Robustness is the degree to which the solution can accommodate invalid input or environmental conditions that might affect data’s processing or other requirements (e,g. performance). If the logic can be stabilized over the various iterations, the variance in data quality can have an important impact on a solutions robustness. One can accommodate erroneous input by relaxing schema’s rules and adding further quality checks.

Security

Security is the degree to which the DM solution protects the data so that only authorized people have access to the respective data to the defined level of authorization as data are moved through the solution. The security provided by a solution needs to be considered against the standards and further requirements defined within the organization. In case no such standards are available, one can in theory consider the industry best practices.

Scalability

Scalability is the degree to which the solution is able to respond to an increased workload. Given that the number of data considered during the various iterations vary in volume, a solution’s scalability needs to be considered in respect to the volume of data to be migrated.

Standardization

Standardization is the degree to which technical standards were implemented for a solution to guarantee certain level of performance or other aspects considered as import. There can be standards for data storage, processing, access, transportation, or other aspects associated with the migration processes. Moreover, especially when multiple DMs are in scope, organizations can define a set of standards and guidelines that should be further considered.

Testability

Testability is the degree to which a solution can be tested in the respect to the set of functional and data-related requirements. Even if for the success of a migration are important the data in their final form, to achieve that is needed to validate the logic and test thoroughly the transformations performed on the data. As the data go trough the data pipelines, they need to be tested in the critical points – points where the data suffer important transformations. Moreover, one can consider record counters for the records processed in each such critical point, to assure that no record was lost in the process.

Traceability

Traceability is the degree to which the changes performed on the data can be traced from the target to the source systems as record, respectively at entity level. In theory, it’s enough to document the changes at attribute level, though upon case it might needed to document also the changes performed on individual values.

Mappings at attribute level allow tracing the data flow, while mappings at value level allow tracing the changes occurrent within values.

Previous Post <> Next Post

📦Data Migrations (DM): Quality Assurance (Part III: Quality Acceptance Criteria III)

Data Migrations Series

Repeatability

Repeatability is the degree with which a DM can be repeated and obtain consistent results between repetitions. Even if a DM is supposed to be a one-time activity for a project, to guarantee a certain level of quality it’s important to consider several iterations in which the data requirements are refined and made sure that the data can be imported as needed into the target system(s). Considered as a process, as long the data and the rules haven’t changed, the results should be the same or have the expected level of deviation from expectations.

This requirement is important especially for the data migrated during UAT and Go-Live, time during which the input data and rules need to remain frozen (even if small changes in the data can still occur). In fact, that’s the role of UAT – to assure that the data have the expected quality and when compared to the previous dry-run, that it attains the expected level of consistency.

Reusability

Reusability is the degree to which the whole solution, parts of the logic or data can be reused for multiple purposes. Master data and the logic associated with them have high reusability potential as they tend to be referenced by multiple entities.

Modularity

Modularity is the degree to which a solution is composed of discrete components such that a change to one component has minimal impact on other components. It applies to the solution itself but also to the degree to which the logic for the various entities is partitioned so to assure a minimal impact.

Partitionability

Partitionability is the degree to which data or logic can be partitioned to address the various requirements. Despite the assurance that the data will be migrated only once, in practice this assumption can be easily invalidated. It’s enough to increase the system freeze by a few days and/or to have transaction data that suddenly requires master data not considered. Even if the deltas can be migrated in system manually, it’s probably recommended to migrate them using the same logic. Moreover, the performing of incremental loads can be a project requirement.

Data might need to be partitioned into batches to improve processing’s performance. Partitioning the logic based on certain parameters (e.g. business unit, categorical values) allows more flexibility in handling other requirements (e.g. reversibility, performance, testability, reusability).

Performance

Performance refers to the degree a piece of software can process data into an amount of time considered as acceptable for the business. It can vary with the architecture and methods used, respectively data volume, veracity, variance, variability, or quality.

Performance is a critical requirement for a DM, especially when considering the amount of time spent on executing the logic during development, tests and troubleshooting, as well for other activities. Performance is important during dry-runs but more important during Go-Live, as it equates with a period during which the system(s) are not available for the users. Upon case, a few hours of delays can have an important impact on the business. In extremis, the delays can sum up to days.

Predictability

Predictability is the degree to which the results and behavior of a solution, respectively the processes involve are predictable based on the design, implementation or other factors considered (e.g. best practices, methodology used, experience, procedures and processes). Highly predictable solutions are desirable, though reaching the required level of performance and quality can be challenging.

The results from the dry-runs can offer an indication on whether the data migrated during UAT and Go-Live provide a certain level of assurance that the DM will be a success. Otherwise, an additional dry-run should be planned during UAT, if the schedule allows it.

Previous Post <> Nest Post

📦Data Migrations (DM): Quality Assurance (Part II: Quality Acceptance Criteria II)

Data Migrations Series

Auditability

Auditability is the degree to which the solution allows checking the data for their accuracy, or for their quality in general, respectively the degree to which the DM solution and processes allow to be audited regarding compliance, security and other types of requirements. All these aspects are important in case an external sign-off from an auditor is mandatory.

Automation

Automation is the degree to which the activities within a DM can be automated. Ideally all the processes or activities should be automated, though other requirements might be impacted negatively. Ideally, one needs to find the right balance between the various requirements.

Cohesion

Cohesion is the degree to which the tasks performed by the solution, respectively during the migration, are related to each other. Given the dependencies existing between data, their processing and further project-related activities, DM imply a high degree of cohesion that need to be addressed by design.

Complexity

Complexity is the degree to which a solution is difficult to understand given the various processing layers and dependencies existing within the data. The complexity of DM revolve mainly around the data structures and the transformations needed to translate the data between the various data models.

Compliance

Compliance is the degree to which a solution is compliant with internal or external regulations that apply. There should be differentiated between mandatory requirements, respectively recommendations and other requirements.

Consistency

Consistency is the degree to which data conform to an equivalent set of data, in this case the entities considered for the DM need to be consistent to each other. A record referenced in any entity of the migration need to be considered, respectively made available in the target system(s) either by parametrization or migration.

During each iteration, the data need to remain consistent, so it can facilitate the troubleshooting. The data are usually reimported between iterations or during same iteration, typically to reflect the changes occurred in the source systems or other purposes.

Coupling

Data coupling is the degree to which different processing areas within a DM share the same data, typically a reflection of the dependencies existing between the data. Ideally, the areas should be decoupled as much as possible.

Extensibility

Extensibility is the degree to which the solution or parts of the logic can be extended to accommodate further requirements. Typically, this involves changes that deviate from the standard functionality. Extensibility impacts positively the flexibility.

Flexibility

Flexibility is the degree to which a solution can handle new requirements or ad-hoc changes to the logic. No matter how good everything was planned there’s always something forgotten or new information is identified. Having the flexibility to change code or data on the fly can make an important difference.

Integrity

Integrity is the degree to which a solution prevents the changes to data besides the ones considered by design. Users and processes should not be able modifying the data besides the agreed procedures. This means that the data need to be processed in the sequence agreed. All aspects related to data integrity need to be documented accordingly.

Interoperability

Interoperability is the degree to which a solution’s components can exchange data and use the respective data. The various layers of a DM’s solutions must be able to process the data and this should be possible by design.

Maintainability

Maintainability is the degree to which a solution can be modified to or add minor features, change existing code, corrects issues, refactor code, improve performance or address changes in environment. The data required and the transformation rules are seldom known in advance. The data requirements are definitized during the various iterations, the changes needing to be implemented as the iterations progress. Thus, maintainability is a critical requirement.

Previous Post - Next Post

📦Data Migrations (DM): Quality Assurance (Part I: Quality Acceptance Criteria I)

Data Migrations Series

Introduction

When designing a Data Migration (DM), respectively any software solution, it’s important to take inventory of project’s requirements, evaluate, document, communicate and monitor them accordingly. Each of them can have an important impact on the solution, as a solution’s success will be validated and judged upon them. Therefore, the identified requirements must be considered as baseline for conceptualization, design, implementation and sign-off, and should go through same procedures and rigor as other projects requirements. The existence of a standardized Requirements Management process can facilitate their management through project’s lifecycle.

The requirements are usually driven by the source and target systems (e.g. data import/export features, data models and their constraints), the environments they are hosted on (e.g. cloud vs. on-premise), respectively the layers in between (e.g. network, firewalls), project and business aspects that need to be considered (e.g. freeze window for the Go-Live, data availability dates, data quality, external dependencies, etc.). They resume to the solution itself as well to the data and processes involved, and are reflected but not limited to the following important aspects, that can be considered upon case also as quality acceptance criteria:

Accessibility

Accessibility is the degree to which the data are available for a solution so it can be processed when needed, in the form, by resources, or means intended for processing. It’s critical for a DM solution to access or have available the master, transaction, parameter and further data when needed. The team must make sure that the data become easily accessible.

Unavailability of data can impact the DM and can easily lead to delays in the project. This also means that the various project activities (parametrization, cleansing, enrichment, development) need to be synchronized with the migration activities.

Upon case, accessibility can involve the solution itself expressed as the degree to which it’s available to the resources supposed to use it. Certain architectural decisions can have impact on the carried activities. As the solution is usually deployed on a server, it can happen that only a limited number of people is able to access it concurrently. Moreover, a DM’s complexity makes the involvement of multiple developers challenging.

Accountability

Accountability is the degree to which accountability is enforced for the various resources involved in DM processes and related activities. As multiple resources are involved for parametrization, cleaning, processing, validation, software development, each resource needs to be aware about the extent they are accountable for. Without accountability made explicit, there’s the danger that the activities are neglected, with all the implications deriving from it – quality deviations, delays, data unavailability, etc.

Adaptability

Adaptability is the degree to which a solution can be adapted to environment or requirement changes. Even if typically, the environments don’t change, it doesn’t mean that this will not happen as the IT infrastructure goes through continuous changes that can affect directly or indirectly a migration. Same can be said about requirements, which however have higher probability to change even late in the process as new knowledge is acquired and needs to be integrated in the solution.

Atomicity

Atomicity is the degree to which data entities can be processes at the required level of abstraction in an atomic manner. Even if transformations occur during the various stages, the data belonging to an entity need to be kept and processed together (e.g. Customers and their Addresses). This can involve processing attributes in advance even if the data might be required later. There can be situations in which the data belonging to the same entity need to be processed on different paths, though in the end it’s important to keep the data together, when the processing logic allows it.

Next Post

13 January 2021

💎💫SQL Reloaded: Useful Queries for Dynamics 365 F&O and not only (Part II: Security )

The security architecture of Microsoft Dynamics 365 for Finance and Operations (D365FO) is based on a role-based model in which the access is not granted individually to users but through security roles. A set of roles are assigned to a user, each role having access to a set of privileges. In between duties can be assigned to one or more roles, respectively duties can contain several privileges. The model comes with a default set of security roles, which can be further extended to reflect organization's needs.

Navigating through the model via the D365FO's UI isn't that straightforward as it should be. Probably it's much easier to export to Excel the associations between roles and privileges, respectively between roles and duties, or search punctually for a certain value within the same dataset directly in the database. The following queries can be run into a non-production environment:

-- security role's assigned privileges
SELECT SRO.AOTName [Role AOT Name]
, SRO.Name [Role Name]
, SRO.Description [Role Description]
, SPI.Identifier [Privilege Identifier]
, SPI.Name [Privilege Name]
, SPI.Description [Privilege Description]
FROM dbo.SecurityRolePrivilegeExplodedGraph SRP
     JOIN dbo.SecurityRole SRO
          ON SRP.SecurityRole = SRO.RecId
         JOIN dbo.SecurityPrivilege SPI
           ON SRP.SecurityPrivilege = SPI.RecId
WHERE SPI.NAME IN ('Maintain accounts receivable aging period definitions'
, 'Maintain accounts payable aging period definitions')
ORDER BY SRO.AOTName

-- security role's assigned duties
SELECT  SRO.AOTName [Role AOT Name]
, SRO.Name [Role Name]
, SRO.Description [Role Description]
, SDU.Identifier [Duty Identifier]
, SDU.Name [Duty Name]
, SDU.Description [Duty Description]
FROM dbo.SecurityRoleDutyExplodedGraph SRP
     JOIN dbo.SecurityRole SRO
          ON SRP.SecurityRole = SRO.RecId
        JOIN dbo.SecurityDuty SDU
          ON SRP.SecurityDuty = SDU.RecId
WHERE SRO.Name LIKE 'Accounting Manager%'
ORDER BY SDU.Identifier

Below you can find a short description of the security tables considered above:

Table	Description
SecurityDuty	contains the list of duties defined by the security AOT role node
SecurityPrivilege	contains the list of privileges defined by the security AOT role node
SecurityRole	contains the list of roles defined by the security AOT role node
SecurityRoleDutyExplodedGraph	contains the list of role to duty mappings and role to privilege mappings as defined by the AOT security role
SecurityRoleExplodedGraph	contains all role relationships, direct or indirect, as defined by the AOT sub role nodes of the security role nodes.
SecurityRolePrivilegeExplodedGraph	contains the list of role to privilege mappings and role to privilege mappings as defined by the AOT security role
SecurityUserRole	contains the user to role mappings

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Dynamics 365 (2020) Security architecture [source]
[2] Microsoft Dynamics 365 (2020) Role-based security [source]

💎💫SQL Reloaded: Useful Queries for Dynamics 365 F&O and not only (Part I: Dependencies)

Microsoft Dynamics 365 for Finance and Operations (D365 FO or simply D365) has currently more than 12000 tables, therefore attempting to understand the data model behind might seem at first sight to be a foolhardy effort. Fortunately, only a small percentage from all the tables contain meaningful master or transactional data, fact that makes easier the task.

Moreover, the tables have meaningful prefixes (e.g. 'Invent' for Inventory, 'Purch' for Purchae Orders, 'Sales' for Sales Orders, etc.) and postfixes (e.g. 'Table for header data, 'Line' for position data, 'Trans' for transaction data) while their names are in many cases self-explanatory if one knows the corresponding business terms.

For those coming from earlier versions of D365 (e.g. Microsoft Dynamics AX 2009), with a few exceptions in the inventory transactions, product master data, addresses and finance dimensions, the data structures remained almost the same, if one considers the main functional area of the ERP system. However as new modules and/or functionality were added, the number of new tables increased considerably between the versions.

On the other side in D365 the data are exposed through data entities, abstractions from the physical implementation of the database tables [1], implemented as views typically having the 'Entity' postfix. In many scenarios the data entities more likely will prove to be the best way of accessing the data, as they include an important part of the logic one will need anyway to reengineer, however there are maybe also exceptions in which is needed to work with the base tables. In the end it will be needed to know the main data entities as well the important tables related to them, being needed to check which views reference a given table, respectively which tables are references by a view. This can be easily achieved via the sys.sql_expression_dependencies system catalog view.

The following queries can be run in non-production D365 environments:

-- retrieving the views where an object was used 
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.REFERENCING_ID 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE sed.referenced_entity_name = 'PurchTable'
  AND obj.Type = 'V'
  AND obj.name LIKE 'Purch%'
ORDER BY ReferencingObject

Output (12 records):

ReferencingObject	type_desc	ReferencedObject	ReferencedObjectType
dbo.PURCHASEORDERCUBE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHASEORDERRESPONSEHEADERSUMMARY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEBACKORDER	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEBACKORDERWORKSPACE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEEXPANDED	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHORDERAPPROVEDVIEW	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPREPAYOPENBASE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERCHARGEENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERLINECDSENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHTABLENOTARCHIVEDVERSIONS	VIEW	dbo.PURCHTABLE	USER_TABLE

The inverse search:

-- retrieving the objects referenced in a view
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc ReferncingObjectType
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.referencing_id 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE obj.name = 'PURCHPURCHASEORDERHEADERV2ENTITY'
ORDER BY ReferencedObject

Output:

ReferencingObject	ReferncingObjectType	ReferencedObject	ReferencedObjectType
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.ACCOUNTINGDISTRIBUTIONTEMPLATE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.CONFIRMINGPO	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.DIMENSIONSETENTITY	VIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.GetValidFromInContextInfo	SQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.GetValidToInContextInfo	SQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.HCMWORKER	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.LOGISTICSPOSTALADDRESS	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.LOGISTICSPOSTALADDRESSBASEENTITY	VIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE_W	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.REASONTABLEREF	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.TMSPURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.TRANSPORTATIONDOCUMENT	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.VENDINVOICEDECLARATION_IS	USER_TABLE

Notes:

1. There will be cases in which is needed to check views' definition. To understand the logic it will be needed to format the queries in a more readable form. This can take time, given that the joins and the further constraints are not always straightforward. Therefore, it might be a good idea to export a formatted copy of the view for later use.

2. Unfortunately, the data model for D365 is not publicly available. One can use however the data model for AX 2012 (see [2] [3]) as basis and fill the gaps when needed.

3. The important changes between the various data models versions are partially documented in several documents available via the Microsoft Dynamics CustomerSource.

4. For non-production environments the data from views, tables or other database objects can be exported directly from the database. Unfortunately, Microsoft blocked the direct access to production environments. The data from tables can be accessed and exported via the table browser, while the data from entities can be exported to Excel typically from the display forms or via the Data Management Framework.

5. The queries can be used to explore any SQL Server database. The logic can be easily encapsulated in a view to simplify the calls.

Previous Post <<||>> Next Post

References:
[1] Microsoft Dynamics 365 (2020) Data entities overview [source]

[2] Microsoft Docs (2014) Application tables [source]

[3] Microsoft Docs (2014) Base Enums AX 2012 [source]

💠🛠️SQL Server: Undocumented (Part IV: DBCC SQLPERF)

Besides the documented LOGSPACE parameter (see previous post), DBCC SQLPERF utility has several undocumented parameters which allow providing statistics about schedulers, threads, spinlocks, IO, network, read-aheads, respectively waits.

Scheduler Statistics

By providing 'umsstats' as parameter, the utility returns as result the visible UMS schedulers on the system:

-- visible UMS schedulers
DBCC SQLPERF(umsstats)

Output:

Statistic	Value
Node Id	0
Avg Sched Load	5
Sched Switches	6903
Sched Pass	6721358
IO Comp Passes	11334
Scheduler ID	0
online	1
num tasks	6
num runnable	0
num workers	9
active workers	6
work queued	0
cntxt switches	7125444
cntxt switches(idle)	9898903
preemptive switches	2304
Scheduler ID	1
online	1
num tasks	6
num runnable	0
num workers	9
active workers	5
work queued	0
cntxt switches	3370432
cntxt switches(idle)	4427991
preemptive switches	22729

The fields have the following meaning:

Statistic	Description
Node Id
Avg Sched Load
Sched Switches	The number of switches between schedulers
Sched Pass
IO Comp Passes
Scheduler ID	The scheduler's zero-based ID number
num tasks	The number of tasks associated with the scheduler
num runnable	The number of workers on the runnable list
num workers	The total number of workers associated with the scheduler
idle workers	The number of idle workers
work queued	The number of items waiting to be processed in the work queue
cntxt switches	The number of switches between workers for the scheduler
cntxt switches(idle)	The number of times the idle loop was switched into

The functionality is useful when one suspects that there are issues related to the visible schedulers.

Detailed information about the schedulers can be found also via the sys.dm_os_schedulers DMV.

Threads Statistics

By providing 'threads' as parameter, the utility returns as result the threads created in the system:

--  thread statistics
DBCC SQLPERF(threads)

Output (just the first records):

Spid	Thread ID	Status	LoginName	IO	CPU	MemUsage
1	11228	background	sa	0	0	0
2	12572	background	sa	0	0	0
3	12576	background	sa	0	0	0
4	11884	background	sa	0	0	0
5	12964	background	sa	0	0	0
6	12960	background	sa	0	0	4
7	12968	background	sa	0	0	0

Detailed information about the schedulers can be found also via the sys.dm_os_threads DMV.

IO Statistics

By providing 'iostats' as parameter, the utility returns as result a count of the outstanding reads, respectively writes:

--  IO statistics
DBCC SQLPERF(iostats)

Output:

Statistic	Value
Reads Outstanding	0
Writes Outstanding	0

Network Statistics

By providing 'netstats' as parameter, the utility returns as result network-related statistics:

--  network statistics
DBCC SQLPERF(netstats)

Output:

Statistic	Value
Network Reads	6976
Network Writes	9036
Network Bytes Read	5318957
Network Bytes Written	2,222512E+07
Command Queue Length	0
Max Command Queue Length	0
Worker Threads	0
Max Worker Threads	0
Network Threads	0
Max Network Threads	0

Read Ahead Statistics

By providing 'rastats' as parameter, the utility returns read-ahead statistics:

--  read ahead statistics
DBCC SQLPERF(rastats)

Output

Statistic	Value
RA Pages Found in Cache	0
RA Pages Placed in Cache	0
RA Physical IO	0
Used Slots	0

Spinlock Statistics

By providing 'spinlockstats' as parameter, the utility returns the spinlock statistics, where a spinlock is a a lightweight synchronization object used to serialize access to data structures which are typically held for a short period of time:

--  spinlock statistics
DBCC SQLPERF(spinlockstats)

Output:

Spinlock Name	Collisions	Spins	Spins/Collision	Sleep Time (ms)	Backoffs
LOCK_RW_TEST	0	0	0	0	0
LOCK_RW_SECURITY_CACHE	711	210500	296,0619	0	54
LOCK_RW_CMED_HASH_SET	5	1750	350	0	1
LOCK_RW_ABTX_HASH_SET	0	0	0	0	0
LOCK_RW_RBIO_REQ	0	0	0	0	0

Detailed information about the spinlock stats can be found also via the sys.dm_os_spinlock_stats DMV.

Wait Statistics

By providing 'waitstats' as parameter, the utility returns the available wait statistics:

-- wait statistics 
DBCC SQLPERF(waitstats)

Output (only a few records):

Wait Type	Requests	Wait Time	Signal Wait Time
LCK_M_SCH_M	18	597	3
LCK_M_S	13	16895	0
PAGEIOLATCH_SH	1789	38568	25
PAGEIOLATCH_UP	118	101	0
PAGEIOLATCH_EX	736	10490	16

Detailed information about the wait stats can be found also via sys.dm_os_wait_stats DMV.

Notes:
As Microsoft warns, the undocumented features shouldn't be used into production environments as they will be deprecated in future versions. Instead should be used the documented DMVs, when available.
All objects mentioned above require VIEW SERVER STATE permissions.
The DBCC SQLPERF utility allows resetting the latch, spinlock, respectively the wait statistics by providing the following parameters (see the SQL Docs for more information):

-- resetting the latch statistics
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR)

-- resetting the spinlock statistics
DBCC SQLPERF ('sys.dm_os_spinlock_stats', CLEAR);  

-- resetting the wait statistics
DBCC SQLPERF('sys.dm_os_latch_stats', CLEAR)

DBCC PERFMON provides in a single call the IO, network, read ahead, spinlocks, respectively the wait statistics in distinct resultsets:

-- performance statistics
DBCC PERFMON

Pages

03 February 2021

02 February 2021

01 February 2021

13 January 2021

About Me