SQL Troubles: lifecycle

Showing posts with label lifecycle. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

Enterprise Content Publishing [2]

[Microsoft Fabric] Power BI Environments

{def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
{environment} development

allows to develop the solution
accessible only to the development team

via Contributor access

{recommendation} use Power BI Desktop as local development environment

{benefit} allows to try, explore, and review updates to reports and datasets

once the work is done, upload the new version to the development stage

{benefit} enables collaborating and changing dashboards
{benefit} avoids duplication

making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication

{recommendation} use version control to keep the .pbix files up to date

[OneDrive] use Power BI's autosync

{alternative} SharePoint Online with folder synchronization
{alternative} GitHub and/or VSTS with local repository & folder synchronization

[enterprise scale deployments]

{recommendation} separate dataset from reports and dashboards’ development

use the deployment pipelines selective deploy option [22]
create separate .pbix files for datasets and reports [22]

create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
create .pbix only for the report, and connect it to the published dataset using a live connection [22]

{benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently

{recommendation} separate data model from report and dashboard development

allows using advanced capabilities

e.g. source control, merging diff changes, automated processes

separate the development from test data sources [1]

the development database should be relatively small [1]

{recommendation} use only a subset of the data [1]

⇐ otherwise the data volume can slow down the development [1]

{environment} user acceptance testing (UAT)

test environment that within the deployment lifecycle sits between development and production

it's not necessary for all Power BI solutions [3]
allows to test the solution before deploying it into production

all tests must have

View access for testing
Contributor access for report authoring

involves business users who are SMEs

provide approval that the content

is accurate
meets requirements
can be deployed for wider consumption

{recommendation} check report’s load and the interactions to find out if changes impact performance [1]
{recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
{recommendation} test data refresh in the Power BI service regularly during development [20]

{environment} production

{concept} staged deployment

{goal} help minimize risk, user disruption, or address other concerns [3]

the deployment involves a smaller group of pilot users who provide feedback [3]

{recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]

allows ensuring the data in production is always connected and available to users [1]

{recommendation} don’t upload a new .pbix version directly to the production stage

⇐ without going through testing

{feature|preview} deployment pipelines

enable creators to develop and test content in the service before it reaches the users [5]

{recommendation} build separate databases for development and testing

helps protect production data [1]

{recommendation} make sure that the test and production environment have similar characteristics [1]

e.g. data volume, sage volume, similar capacity
{warning} testing into production can make production unstable [1]
{recommendation} use Azure A capacities [22]

{recommendation} for formal projects, consider creating an environment for each phase
{recommendation} enable users to connect to published datasets to create their own reports
{recommendation} use parameters to store connection details

e.g. instance names, database names
⇐ deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages

alternatively data source rules can be used to specify a connection string for a given dataset

{restriction} in deployment pipelines, this isn't supported for all data sources

{recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
{recommendation} provide data to self-service authors from a centralized data warehouse [20]

allows to minimize the amount of work that self-service authors need to take on [20]

{recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
{recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
{recommendation} be aware of API connectivity issues and limits [20]
{recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
{recommendation} minimize the query load on source systems [20]

use incremental refresh in Power BI for the dataset(s)
use a Power BI dataflow that extracts the data from the source on a schedule
reduce the dataset size by only extracting the needed amount of data

{recommendation} expect data refresh operations to take some time [20]
{recommendation} use relational database sources when practical [20]
{recommendation} make the data easily accessible [20]
[knowledge area] knowledge transfer

{recommendation} maintain a list of best practices and review it regularly [24]
{recommendation} develop a training plan for the various types of users [24]

usability training for read only report/app users [24
self-service reporting for report authors & data analysts [24]
more elaborated training for advanced analysts & developers [24]

[knowledge area] lifecycle management

consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
{recommendation} postfix files with 3-part version number in Development stage [24]

remove the version number when publishing files in UAT and production

{recommendation} backup files for archive
{recommendation} track version history

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]

[2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
[3] Microsoft Learn (2024) Deploy to Power BI [link]
[4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]

[20] Microsoft (2020) Planning a Power BI Enterprise Deployment [White paper] [link]

[22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]

[24] Paul Turley (2019) A Best Practice Guide and Checklist for Power BI Projects

Resources:

Acronyms:
API - Application Programming Interface
CLM - Content Lifecycle Management
COE - Center of Excellence
SaaS - Software-as-a-Service
SME - Subject Matter Expert
UAT - User Acceptance Testing
VSTS - Visual Studio Team System
SME - Subject Matter Experts

07 March 2024

📦Data Migrations (DM): The SQL Server Perspective (Licensing Costs and Edition Choices)

Data Migration Series

A Data Migration (DM) moves all or a subset of the data available from one or more system(s) into other system(s). For this purpose, especially in ERP Implementations, one can use a SQL Server as intermediate layer, where SSIS can be used for the data extraction and exporting, SSRS for reporting the errors, while the database engine for the heavy processing. Master Data and Data Quality Services can be used as well in certain scenarios. Therefore, SQL Server allows by design to address the various challenges related to a DM. At high level the architecture can be depicted as follows:

Data Migration Architecture

Once the decision to go with SQL Server for the DM layer is made, one needs to define which edition to use. If the DM doesn't have special requirements, one can use for it an available SQL Server instance, as long as the cumulated workloads don't create major issues. Therefore, in the past I used existing licensed versions of SQL Server to build solutions for DMs in ERP implementations, though I evaluated in each project whether it's possible to reduce the costs and remain compliant with the license requirements.

Of course, there's always the alternative of using SQL Server Express which supports databases with a maximum of 10 GB, which should be enough for most of DMs, though it has also further limitations (see [2]). There are also ways of moving around existing limitations, like splitting the logic across multiple databases.

Then there's the SQL Server Developer edition, which involves no license costs, has the full SQL Server functionality available, and can be used to build and test applications. In a recent post [1], Bob Ward, principal architect at Microsoft made several clarifications on the licenses for the Developer edition, which is "licensed for development, test, and demonstration purposes only" and "may not be used in a production environment”. Bob Ward makes the following clarifications:
(1) "Production environments include any system that is accessed by end-users for anything more than acceptance testing, environments that connects to production systems (such as Linked servers), disaster recovery or backups of production systems, and environments that are 'rotated' into production at any point in time." [1]
(2) One "cannot use Developer edition to build test data and move that same data into production" [1].
(3) One can "restore a production set of data backup for testing purposes" [1].

There are two-three impediments for using the Developer edition completely for a DM. The first, at least during Go Live and UAT, one needs to work with data coming directly from the various production environments. Secondly, the data generated by the solution are used primarily for UAT and in a second step for Production, which seems to be against the rule (2), or at least it's a grey area (which might be overlooked by Microsoft). Thirdly, some data from the production environment might need to be imported back into the DM layer for validation or enhancing the entities with data generated in the target systems.

In what concerns the first issue, the DM solution can always point to the test environments used as source, following that during UAT to copy the databases from production into the test environments. This might be anyway necessary for other purposes. Otherwise, the effort might be considerable and not working in the last phases with the data timeliness might raise other concerns.

The second issue is a matter of interpretation. The UAT phase makes sure that the data generated by the DM solution respects the criteria for Go Live. If there are no issues, the same data can be used for Go-Live. If for this is required another licensed edition, then an environment can be built only for UAT and Go Live, project phases which usually span over a couple of weeks, unless multiple migrations need to be performed at different time intervals. If the environments are in the cloud, probably the instances can be turned on and off on a as-needed basis.

One can plan for different environments between Production and Development and the environments can be on the same SQL Server as distinct databases, respectively use the Developer edition for Development, and use a different licensed edition for UAT and Production. This approach involves additional overhead in synchronizing the logic between environments. Conversely, in the case of the DM layer, the same environment can be used from beginning to the end, while the code should/must be backed-up periodically. For multiple migrations based on the same data, one should archive the data after each migration or important phase.

For the scenarios in which after migration the data are copied back to the DM solution, it's enough to have these steps performed against the UAT target system(s). This should work as long there are no differences in configuration between UAT and Production. There are however exceptions, e.g. data generated by the target systems, for which the values between Prod and UAT are different. At least in Dynamics 365 one can attempt to generate the values in the DM layer and import them as they are into the target system. It worked for many scenarios, though there can be exceptions here as well.

A more complex scenario is when data from the DM layer needs to be exported to Data Warehouses or similar solutions that can be considered as Production systems. Here a licensed edition seems to be mandatory. For other scenarios in which Master Data and/or Data Quality Services are needed, there's only the option to use the Enterprise or Developer editions.

To summarize, to reduce the overall costs for the DM, consider using an existing licensed SQL Server instance for building the solution. If separates environments need to be built, the Express edition might have some limitations though it can prove to be a viable solutions in many cases. Otherwise, consider the above workarounds for using the Developer edition, including the scenario in which distinct environments are used for Production and Development.

Resources:
[1] Microsoft Data Platform (2024) How SQL developers can maximize savings, by Bob Ward (link)
[2] Microsoft Learn (2024) Editions and supported features of SQL Server 2022 (link)
[3] Microsoft Learn (2023) Master Data Services and Data Quality Services Features Support (link)

11 August 2019

🛡️Information Security: Privacy (Definitions)

"Privacy is concerned with the appropriate use of personal data based on regulation and the explicit consent of the party." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

[MDM privacy:] "Privacy is focused on the appropriate use of personal data based on regulation and the explicit consent of the Party. MDM Systems that have Party data (customer or patient) are quite sensitive to privacy concerns and regulations." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"The ability of keeping secret someone’s identity, resources, or actions. It is realized by anonymity and pseudonymity." (Tomasz Ciszkowski & Zbigniew Kotulski, "Secure Routing with Reputation in MANET", 2008)

"Proper handling and use of personal information (PI) throughout its life cycle, consistent with data-protection principles and the preferences of the subject." (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

"Control of data usage dealing with the rights of individuals and organizations to determine the 'who, what, when, where, and how' of data access." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Keeping information as a secret, known only to the originators of that information. This contrasts with confidentiality, in which information is shared among a select group of recipients." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"The ability of a person to keep personal information to himself or herself." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"The protection of individual rights to nondisclosure." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"The right of individuals to control or influence what information related to them may be collected and stored and by whom, as well as to whom that information may be disclosed." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The right of individuals to a private life includes a right not to have personal information about themselves made public." (Open Data Handbook)

12 March 2019

🧭Business Intelligence: Enterprise Reporting (Part XII: Reports’ Lifecycle)

Introduction

A report’s lifecycle is the sequence of stages through which a report goes during the timespan of its ownership. The main stages resume mainly to report’s definition, development, testing and deployment, however a report’s life occurs within the context of IT processes like Change, Incident/Problem, Access, Availability, Information Security and Knowledge Management. To them can add up Data Management processes like Data Governance, Data Quality and Metadata Management. Therefore, the extended reports’ lifecycle could take the following form:

The processes can be easily tailored to an organization’s needs, even if it may take several attempts until the best mix is found. The activities introduced by the supporting processes don’t necessarily change the way reports are developed as long the processes integrate smoothing in report’s authoring.

Definition Phase

The lifecycle of a report starts with a series of steps that lead to report’s definition and the requirements associated with it:

The starting point is the identification of a need for data. It can be a business question that needs to be answered, a decision that needs to be made, data needed to keep an operational, tactical or strategical objective under control, and so on. Such business situations can be referred simple as (business) problems.

Problem definition

Problem definition (statement) is the process by which a business issue or need is clearly and concisely stated. This step might seem trivial and implied, however in praxis correlated to it lies the most important volume of overwork.

The dictum “a problem well stated is a problem half-solved” applies as well in BI field. Unfortunately, there are cases in which the users want something else than stated or they leave important details out. Sometimes the users aren’t sure what they need/want, and it comes in developer’s attributions to help clarify the problem and put it within a context.

There are cases in which the users just request a report without specifying the problem they need to solve. This might do when the user has a good understanding of the data and the problem, however this approach does not always work. Personally, I find it useful to define for each report also the underneath problem. I see it as a “win-win” situation in which the user invests some knowledge into the developer and thus the developer will better understand the business, while in time he can provide better help. A thorough understanding of the business and knowledge of the users and their needs can help minimize the volume of overwork involved in reports’ development.

Requirements definition

Requirements definition is the process by which functional and non-functional expectations, targets and specifications are elicited and documented.

Functional requirements specify what the report must do - how the report is structured or formatted, how data need to be visualized or navigated, to what file formats need to be exported, on whether needs to be printed, how the data needs to be grouped, in which order, in what currency/language needs to be displayed, what data sources need to be used, etc. The functional requirements are typically listed in the use case and test script.

Non-functional requirements refer to requirements related to report’s accessibility, availability, performance, compliance, documentation, quality, maintainability, security or testability.

The degree to which a requirement can be fulfilled depends entirely on the reporting platform. It can be differentiated between soft and hard constraints. Soft constraints can be overcome by adding more processing power, memory or other types of resources, while hard constraints can’t be easily or at all overcome. Of course, not all requirements are equally important. Important not fulfilled requirements can make a report unusable and, in extremis, can lead to choosing one reporting platform over another.

The requirements can be elicited by a developer, an analyst/consultant or defined by the business itself. Organizations can simplify the process by defining a set of guidelines and standards that need to be considered in reports’ definition. Normally, is enough to reference the document(s) where the guidelines and standards are found. In contrast to other software artifacts, the requirements for reports can be gather in a simplified version of a document. Quite often a checklist can help identify these requirements upfront with a minimum of overhead.

Report definition

Report definition is the process by which report’s content, logic and layout are explicitly defined - what attributes are needed for output and from what source, what static/dynamic parameters are needed, how the data need to be displayed/formatted, what formulas, aggregations or ordering apply.

A report’s definition can be anything between a simple statement summarizing what the report is about and complex structures (mainly in form of a mapping) reflecting in detail each attribute, constraint, formula, grouping or sorting.

A good definition should allow a developer to create the report as needed by the users, eventually with minimal deviations implied by user’s understanding. The holy grail in report’s definition is finding a structure flexible enough to cover all the aspects of a report. Even if some structures allow such flexibility, sometimes it’s almost impossible not provide additional descriptions in textual forms. The less insight the developer has into the business, the more textual descriptions and visuals are needed to be included to support the knowledge gap.

GAP Analysis

GAP Analysis is the iterative process by which the current state of a software artifact or situation is compared with the potential or desired state. It became an integrant tool from professionals’ thinking to the extent its role as separate process is quite often ignored. In the context of reporting authoring it can be used when comparing the requirements against the current infrastructure and the data available, as well while comparing the developed report against the requirements.

It can happen that the technical and data constraints don’t allow building the report as needed by the users. The differences need to be mitigated and eventually the requirements need to be changed to accommodate the reality. In extremis must be considered whether the report still make sense in the light of the modified requirements.

Solution formulation

Solution formulation is the process by which a formal (technical) solution is defined for the given requirements. It’s a conceptualization (aka concept) of the requirements, and in many cases it’s just a short description by which means the report will be build and what data sources will be used. In more complex cases it can include details about the changes needed in the infrastructure to support the report (e.g. creation/extensions of tables and other database objects, ETL jobs, components, etc.), about the data that need to be collected, etc.

Of course, the conceptualization must be considered together with report’s definition. In fact, report’s definition can be considered as part of the conceptualization. A conceptualization can cover multiple reports, as well two or more different solutions can be provided for different sets of reports. The infrastructure can make a concept futile, either when there is a single reporting platform, or when clear rules are in place.

Prototyping

Prototyping is the iterative process of building a simplified version of the report for demonstration and evaluation purposes, so that users can better define the requirements or to prove the concept. The prototype is a preliminary version that can be refined successively until user’s requirements have a final form. It can take the form of a mock-up query to verify report’s technical and logical feasibility, and/or an Excel layout to depict how the report will look like. Prototypes can facilitate the communication between the parties involved and can be considered as part of the requirements.

A prototype might be needed 1 from 5 cases or so, however this number depends also on the number of queries available or of the knowledge of the source and business processes. Because a prototype can involve additional work, it’s important to identify those cases in which a prototype makes sense and keep the effort to a minimum, especially when an approval is involved in the process. Therefore, one should consider the most important characteristics that need to be proved (e.g. if the data can be aggregated, matched, displayed at the requested level of detail, or in the requested format).

With the help of self-service tools, the business has the capabilities to play with the data and find answers by itself, being able thus to create a prototyped version of the report. Once the report met business needs it can be standardized so it can be used organization-wide. It’s recommended to standardize the reports that are used as part of organization’s processes, otherwise self-service can become a bottleneck for the organization.

Change Management

Change Management is the process of ensuring that the changes performed to a system, in this case a BI tool or the whole BI infrastructure, are performed with minimal disruption for the business and that risks are kept under control. Changes can be requested via standard requests or change requests. A standard request (SR) is a pre-approved change that involves low risks, is relatively common and follows a predefined procedure. In contrast to SRs, a change request (CR) requires the authorization of a board, e.g. the Change Advisory Board (CAB), it often involves risks, an investment and the approach is not that common.

Both are hard-copy or electronic templates that allow to capture information about the changes and allow to document the change and track its status. They include typically the problem definition together with users’ requirements, report definition and the formulation of the solution. What differentiates them thus is the approval process that can be sometimes time-consuming, and the volume of formalism needed to manage the requests (e.g. tracking status, writing status reports, handling risks, etc.).

Unless infrastructural changes are necessary, the risks involved with the creation of reports are relatively small, especially when the reports are developed in-house. Reports developed by vendors involve more risks and imply investments that in a form or other need to be approved. Considering the particularities of the two approaches, personally I think that reports that can be developed with internal resources should be done via SRs, while reports developed externally should be done via CRs. Even if this categorization has the potential of creating some confusion, the use of SRs allows reducing the volume of effort necessary to manage the requests. I suppose there can be found solutions to request external changes via SRs as well (e.g. by using contingents and a set of well-defined rules).

28 February 2017

⛏️Data Management: Data Lifecycle (Definitions)

[Data Lifecycle Management (DLM):" "The process by which data is moved to different mass storage devices based on its age." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

[master data lifecycle management:] "Supports the definition, creation, access, and management of master data. Master data must be managed and leveraged effectively throughout its entire lifecycle." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

[Data lifecycle management (DLM):] "Managing data as blocks without underlying knowledge of the content of the blocks, based on limited metadata (e.g., creation date, last accessed)." (David G Hill, "Data Protection: Governance, Risk Management, and Compliance", 2009)

"The data life cycle is the set of processes a dataset goes through from its origin through its use(s) to its retirement. Data that moves through multiple systems and multiple uses has a complex life cycle. Danette McGilvray’s POSMAD formulation identifies the phases of the life cycle as: planning for, obtaining, storing and sharing, maintaining, applying, and disposing of data." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The recognition that as data ages, that data takes on different characteristics" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The development of a record in the company’s IT systems from its creation until its deletion. This process may also be designated as 'CRUD', an acronym for the Create, Read/Retrieve, Update and Delete database operations." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The series of stages that data moves though from initiation, to creation, to destruction. Example: the data life cycle of customer data has four distinct phases and lasts approximately eight years." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"covers the period of time from data origination to the time when data are no longer considered useful or otherwise disposed of. The data lifecycle includes three phases, the origination phase during which data are first collected, the active phase during which data are accumulating and changing, and the inactive phase during which data are no longer expected to accumulate or change, but during which data are maintained for possible use." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The complete set of development stages from creation to disposal, each with its own characteristics and management responsibilities, through which organizational data assets pass." (Kevin J Sweeney, "Re-Imagining Data Governance", 2018)

"An illustrative phrase describing the many manifestations of data from its raw, unanalyzed state, such as survey data, to intellectual property, such as blueprints." (Sue Milton, "Data Privacy vs. Data Security", 2021)

"Refers to all the stages in the existence of digital information from creation to destruction. A lifecycle view is used to enable active management of the data objects and resource over time, thus maintaining accessibility and usability." (CODATA)

19 January 2017

🚧Project Management: Product Lifecycle (Definitions)

"The period of time, consisting of phases, that begins when a product is conceived and ends when the product is no longer available for use. Since an organization may be producing multiple products for multiple customers, one description of a product life cycle may not be adequate. Therefore, the organization may define a set of approved product life-cycle models. These models are typically found in published literature and are likely to be tailored for use in an organization. A product life cycle could consist of the following phases: (1) concept/vision, (2) feasibility, (3) design/development, (4) production, and (5) phase out." (Sandy Shrum et al, "CMMI®: Guidelines for Process Integration and Product Improvement", 2003)

"The period of time that begins when a product is conceived and ends when the product is no longer available for use. This cycle typically includes phases for concept definition (verifies feasibility), full-scale development (builds and optionally installs the initial version of the system), production (manufactures copies of the first article), transition (transfers the responsibility for product upkeep to another organization), operation and sustainment (repairs and enhances the product), and retirement (removes the product from service). Full-scale development may be divided into subphases to facilitate planning and management such as requirements analysis, design, implementation, integration and test, installation and checkout." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"A term to describe a product, from its conception to its discontinuance and ultimate market withdrawal." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"a model of the sales and profits of a product category from its introduction until its decline and disappearance from the market; focuses on the appropriate strategies at each stage." (Gina C O'Connor & V K Narayanan, "Encyclopedia of Technology and Innovation Management", 2010)

"A collection of generally sequential, non-overlapping product phases whose name and number are determined by the manufacturing and control needs of the organization. The last product life cycle phase for a product is generally the product's retirement. Generally, a project life cycle is contained within one or more product life cycles." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"The series of phases that represent the evolution of a product, from concept through delivery, growth, maturity, and to retirement." (For Dummies, "PMP Certification All-in-One For Dummies" 2nd Ed., 2013)

16 January 2017

⛏️Data Management: Data Flow (Definitions)

"The sequence in which data transfer, use, and transformation are performed during the execution of a computer program." (IEEE," IEEE Standard Glossary of Software Engineering Terminology", 1990)

"A component of a SQL Server Integration Services package that controls the flow of data within the package." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"Activities of a business process may exchange data during the execution of the process. The data flow graph of the process connects activities that exchange data and - in some notations - may also represent which input/output parameters of the activities are involved." (Cesare Pautasso, "Compiling Business Process Models into Executable Code", 2009)

"Data dependency and data movement between process steps to ensure that required data is available to a process step at execution time." (Christoph Bussler, "B2B and EAI with Business Process Management", 2009)

[logical data flow:] "A data flow diagram that describes the flow of information in an enterprise without regard to any mechanisms that might be required to support that flow." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

[physical data flow:] "A data flow diagram that identifies and represents data flows and processes in terms of the mechanisms currently used to carry them out." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The fact that data, in the form of a virtual entity class, can be sent from a party, position, external entity, or system process to a party, position, external entity, or system process." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction [Beizer]." (International Qualifications Board for Business Analysis, "Standard glossary of terms used in Software Engineering", 2011)

"Data flow refers to the movement of data from one purpose to another; also the movement of data through a set of systems, or through a set of transformations within one system; it is a nontechnical description of how data is processed. See also Data Chain." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The movement of data through a group of connected elements that extract, transform, and load data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A path that carries packets of information of known composition; a roadway for data. Every data flow’s composition is recorded in the data dictionary." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"the path, in information systems or otherwise, through which data move during the active phase of a study." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The lifecycle movement and storage of data assets along business process networks, including creation and collection from external sources, movement within and between internal business units, and departure through disposal, archiving, or as products or other outputs." (Kevin J Sweeney, "Re-Imagining Data Governance", 2018)

"A graphical model that defines activities that extract data from flat files or relational tables, transform the data, and load it into a data warehouse, data mart, or staging table." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction." (Software Quality Assurance)

15 January 2012

🚧Project Management: Project Lifecycle (Definitions)

"A collection of generally sequential project phases whose names and numbers are determined by the control needs of the organization." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"A set of activities organized to produce a product and/or deliver services. A project life cycle partitions the activities of a project into a sequence of phases to assist planning and management. The early phases gather and analyze information about user needs, product requirements, and alternative designs. Later phases elaborate and implement the design. Some life cycles are iterative, performing certain activities multiple times. Same as project life cycle model." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"A collection of generally sequential project phases whose name and number are determined by the control needs of the organization or organizations involved in the project. A life cycle can be documented with a methodology." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Sequence of phases of the project from beginning to end." (Mike Clayton, "Brilliant Project Leader", 2012)

"The series of phases that a project passes through from its initiation to its closure" (For Dummies, "PMP Certification All-in-One For Dummies" 2nd Ed., 2013)

"The period between the start of the Assess stage to the handover of the asset to the user or the operations group." (Paul H Barshop, "Capital Projects", 2016)

"The series of phases that a project passes through from its start to its completion." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK Guide)", 2017)

"A collection of generally sequential project phases whose name and number are determined by the control needs of the organization or organizations involved in the project. A life cycle can be documented with a methodology." (Jeffrey K Pinto, "Project Management: Achieving Competitive Advantage" 5th Ed., 2018)

"The series of generally sequential Phases a project passes through from beginning to end. Starting, organizing and preparing, performing project work, closing; cost and staffing levels low at the start and end; risk and uncertainty greatest at the start; ability to influence highest at start; later changes cost more." (H James Harrington & William S Ruggles, "Project Management for Performance Improvement Teams", 2018)

03 June 2010

🧭Business Intelligence: Enterprise Reporting (VIII: Addressable Questions in Reports’ Creation I)

Business Intelligence Series

The creation of a (data) report of any type involves most of the times several iterations until all the requirements are gathered, the query and eventually the final report are created, the reports gets tested, the issues related to deliveries and requirements mitigated, etc. Typically the steps involved in the creation of a report are straightforward for the developer, though for the user and the other people involved in the process everything might look like a black box, often also from the lack of visibility and communication between the involved parties. Actually at least in this case the black box is not the problem but the input and output that’s highly dependent on the input, not to mention that any information omitted from user’s side could have unexpected impact on output.

In many cases I found out that the information on which a report is created are minimal, for example the user comes to the developer with a question (e.g. what’s the total revenue for the past x years) and he’s expecting the developer to come with the result in terms of a report with data reflecting the answer to the respective question. When formulating the requirements, the user might even come with a list of attributes he would like to see in such a report, some of the attributes might not be possible to be shown given the needed report’s level of detail or that it makes sense to add other attributes (e.g. Unit of Measure, Currency, Price Unit, Quantity) in order to avoid possible confusion, miscalculations, and to facilitate the understanding, testing or usage of such a report. There could be cases in which the user has no really idea of what he wants, being difficult to describe the problem and the exact data he might need, and this could happen also because the user isn’t aware of the data and their structure, processes or the various constraints and other business rules that applies.

When from the whole initial dataset based on the requested attributes the user needs only a smaller subset, this is translated into a set of constraints (e.g. the report is needed is given only for a single Vendor or a time unit), then it makes sense to consider some of the constraints as parameters, allowing to fit the report to a larger set of requirements. In addition the user might need to use as filter additional attributes not already considered in constraints. For an easier report’s understanding the columns could be positioned in a certain order and the records sorted based on a given list of ordered attributes. For the same reason special formatting could be applied for some of the attributes, for example the values could be right/center/left aligned, the dates written is a predefined format (e.g. DD/MM/YY), the rounded to a given number of decimals, the negative values could be highlighted in a another color, etc.

When the requirements don’t match the semantic or relational data model, then must be found the adequate level of details, for this maybe being needed to renounce to some of the attributes, eventually split the report in summary and detail report, aggregate some of the values, consider only last/first record from a given subtable, all these aspects needing to be mitigated with the users. Some of the calculations the user needs could be included directly in the report, the exact applied formulas needing to be given as input by users, the special cases needing to be mitigated too.

As the same data could come from different sources, having different timeliness, accuracy, consistency, completeness, availability, in other words with different quality, the adequate data source needs again to be mitigated with the users, in theory needing to be considered the data with the most appropriate data quality. In addition to these constraints could be added also the technical constraints derived from the volume of data, report type, storage type, available time, data transformations needed, reporting platform, etc.

There are several other important facts that should be actually considered in requirements gathering phase, namely on whether a similar report already exists, could be used unaltered or needs to be modified by adding additional attributes, filters, formatting, etc. As all such work needs to have also a ROI argumentation, there are also cases in which the costs of creating such a report are higher then the costs associated with the user preparing the report, on the other side such costs are not always so easy to quantify, though it’s a good idea to have the needed reports as much as possible automated. In such cases might be requested some input from the Functional and IT Managers, the enforcement of policies and processes to deal with such aspects.

The above aspects are gathered in the below set of questions, users and developers should address them altogether, and even if some apply only to developers, some awareness from the users could be beneficial to:
Q1: What’s the problem statement? (What issues tries the report to address?)
Q2: What kind of data would need to be gathered (in order to solve or better understand the problem)?
Q3: What level of detail is needed?
Q4: What attributes are need?
Q5: Is there a similar report (that could be extended or used as template)?
Q6: What are report’s definition, purpose and financial argumentation?
Q7: Does it make sense to invest time, effort and money in creating such a report? What's the estimated effort to build the report? What's the halting point?
Q8: What attributes it makes sense to add/remove?
Q9: What’s the (appropriate) source for the attributes?
Q10: What special filter constraints are needed?
Q11: Which of the filter constraints should be considered as parameters?
Q12: Which are the formulas that need to be applied for calculated values?
Q13: What (default) sorting the user needs?
Q14: What’s the needed order of the attributes in the layout?
Q15: Are there any aggregations that need to be added?
Q16: In what form the report should be delivered?
Q17: What formatting should be applied?
Q18: What’s report’s frequency?
Q19: Who will be report’s owner?
Q20: How many records the average report will have?
Q21: What other technical or logistic constraints qualify?
Q22: When is the report needed?
Q23: What documentation is needed?

Q24: Are there any sensitive information contained?

Q25: Who needs access to the report? Do any externals need access to it?

Q26: Are there further uses for the same data?

Q27: What's the expected life-expectancy for the report?

Q28: What's the reporting platform on which the report should be built?

Q29: Does it makes sense to split the report in multiple perspectives?

Q30: What's the expected performance for the report? Are there any limitations?

Note:
The above list of questions should not be considered as complete, some of the questions could be merged while some of the question could lead to other questions, especially in the area of technical and logistics constraints that need to be addressed adequately.

Previous Post <<||>> Next Post

20 May 2007

🌁Software Engineering: DevOps (Definitions)

"An application delivery philosophy that stresses communication, collaboration, and integration between software developers and their information technology (IT) counterparts in operations. DevOps is a response to the interdependence of software development and IT operations. It aims to help an organization rapidly produce software products and services." (Pierre Pureur & Murat Erder, "Continuous Architecture", 2015)

DevOps is an approach based on lean and agile principles in which business owners and the development, operations, and quality assurance departments collaborate to deliver software in a continuous manner that enables the business to more quickly seize market opportunities and reduce the time to include customer feedback. Indeed, enterprise (Sanjeev Sharma & Bernie Coyne, "DevOps For Dummies" 2nd Ed, 2015)

"Is a method for software development and management that integrates the development and deployment cycles to achieve a more agile, continuous evolution of software-based products and services" (Diego R López & Pedro A. Aranda, "Network Functions Virtualization: Going beyond the Carrier Cloud", 2015)

"DevOps is a mindset, a culture, and a set of technical practices. It provides communication, integration, automation, and close cooperation among all the people needed to plan, develop, test, deploy, release, and maintain a Solution." (Dean Leffingwell, "SAFe 4.5 Reference Guide: Scaled Agile Framework for Lean Enterprises" 2nd Ed., 2018)

"Short for development operations, an information technology environment in which development and operations are tightly tied together, yielding small incremental releases to gain user feedback." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"The practice of incorporating developers and members of operations and quality assurance (QA) staff into software development projects to align their incentives and enable frequent, efficient, and reliable releases of software products." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"The tighter integration between the developers of applications and the IT department that tests and deploys them. DevOps is said to be the intersection of software engineering, quality assurance, and operations." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A software engineering practice that aims at unifying software development (Dev) and software operation (Ops)." (Jun Bi et al, "Automatic Address Scheduling and Management for Broadband IP Networks", Emerging Automation Techniques for the Future Internet, 2019)

"Develop operations, or DevOps, is an agile methodology that merges the functions of software development and operations in the enterprise software development domain. This approach has been adopted in the networking world to facilitate a programmable approach to network operations. Often when applied to networking the term is changed to NetOps." (Patrick Moore, "Model-Centric Fulfillment Operations and Maintenance Automation", Emerging Automation Techniques for the Future Internet, 2019)

"Practices and technologies that promote tighter coupling of software development (Dev) and operations (Ops) - typically marked by more automation, continuous monitoring, shorter development cycles and higher deployment frequencies. A key driver for security policy automation. DevSecOps is a related term that refers to practices and technologies that aim to embed security in DevOps practices." (Myo Zarny et al, "Network Security Policy Automation: Enterprise Use Cases and Methodologies", 2019)

"Development and operations is an abbreviation for 'development' and 'operations'; is a software engineering methodology for managing software development (Dev) and technology operations (Ops). The main aim of DevOps is to enable automation and tracing for all phases of software implementation, from integration, testing, releasing to deployment and infrastructure management." (Antoine Trad & Damir Kalpić, "Using Applied Mathematical Models for Business Transformation", 2020)

"Development and operations (DevOps) has been adopted by prominent software and service companies (e.g., IBM) to support enhanced collaboration across the company and its value chain partners. In this way, DevOps facilitates uninterrupted delivery and coexistence between development and operation facilities, enhances the quality and performance of software applications, improving end-user experience, and help to simultaneous deployment of software across different platforms." (Kamalendu Pal & Bill Karakostas, "Software Testing Under Agile, Scrum, and DevOps", 2021)

"DevOps is a sprint-based approach that can catch coding flaws during the development of code due to security reviews, rework on previous sprint cycles, and testing." (David A Bird, "Hacker and Non-Attributed State Actors", Real-Time and Retrospective Analyses of Cyber Security, 2021)

"It is a set of practices emerging to bridge the gaps between operation and developer teams to achieve a better collaboration." (Mirna Muñoz, "Boosting the Competitiveness of Organizations With the Use of Software Engineering", 2021)

"It is a way to work were the software is rapidly developed and immediately deployed for operating in a computational productive environment. It is continuous delivery product development lifecycle. It must automate the development process. DevOps is both a culture and a set of technologies and tools used for automation." Laura C Rodriguez-Martinez et al, "Service-Oriented Computing Applications (SOCA) Development Methodologies: A Review of Agility-Rigor Balance", 2021)

"People from software development and operations work together to enhance the speed of delivery of new software features. It is a concept for bridging the gap between software development and software operations and integrating the logic of common responsibility for the complete software delivery lifecycle into one cross-functional team." (Anna Wiedemann et al, "Transforming Disciplined IT Functions: Guidelines for DevOps Integration", 2021)

"DevOps is a set of tools and processes that help automate IT operations." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"DevOps is a catch‑all term for the blending of roles between developers and operations engineers. As the barriers between roles such as database administrator, systems administrator, and software engineer have eroded, the term DevOps has emerged as a way of describing the intersection of responsibilities from all these camps, and their increasing interrelation in the lifecycle of a product. A crucial enabling aspect of this movement is the increased use of automation in building, deploying, and monitoring large applications." (NGINX) [source]

"DevOps is a collection of best practices and working methods for the software development process whose cumulative goal is to shorten the development life cycle and support practice such as continuous integration, continuous delivery and continuous deployment." (Sum Logic) [source]

"DevOps is a set of practices that works to automate and integrate the processes between software development and IT teams, so they can build, test, and release software faster and more reliably." Atlassian [source]

"DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes." (Amazon) [source]

"DevOps refers to a broad range of practices related to the development and operation of software code in production in cloud data centers. DevOps is centered in Agile project management techniques and microservice support. DevOps approaches the entire software development lifecycle with automation based around version control standards." (VMWare) [source]

"The cultural movement that stresses communication, collaboration and integration between software developers and IT operations." (Global Knowledge)

SQL Troubles

Pages