SQL Troubles: parametrization

Showing posts with label parametrization. Show all posts

01 February 2021

📦Data Migrations (DM): Quality Assurance (Part I: Quality Acceptance Criteria I)

Data Migrations Series

Introduction

When designing a Data Migration (DM), respectively any software solution, it’s important to take inventory of project’s requirements, evaluate, document, communicate and monitor them accordingly. Each of them can have an important impact on the solution, as a solution’s success will be validated and judged upon them. Therefore, the identified requirements must be considered as baseline for conceptualization, design, implementation and sign-off, and should go through same procedures and rigor as other projects requirements. The existence of a standardized Requirements Management process can facilitate their management through project’s lifecycle.

The requirements are usually driven by the source and target systems (e.g. data import/export features, data models and their constraints), the environments they are hosted on (e.g. cloud vs. on-premise), respectively the layers in between (e.g. network, firewalls), project and business aspects that need to be considered (e.g. freeze window for the Go-Live, data availability dates, data quality, external dependencies, etc.). They resume to the solution itself as well to the data and processes involved, and are reflected but not limited to the following important aspects, that can be considered upon case also as quality acceptance criteria:

Accessibility

Accessibility is the degree to which the data are available for a solution so it can be processed when needed, in the form, by resources, or means intended for processing. It’s critical for a DM solution to access or have available the master, transaction, parameter and further data when needed. The team must make sure that the data become easily accessible.

Unavailability of data can impact the DM and can easily lead to delays in the project. This also means that the various project activities (parametrization, cleansing, enrichment, development) need to be synchronized with the migration activities.

Upon case, accessibility can involve the solution itself expressed as the degree to which it’s available to the resources supposed to use it. Certain architectural decisions can have impact on the carried activities. As the solution is usually deployed on a server, it can happen that only a limited number of people is able to access it concurrently. Moreover, a DM’s complexity makes the involvement of multiple developers challenging.

Accountability

Accountability is the degree to which accountability is enforced for the various resources involved in DM processes and related activities. As multiple resources are involved for parametrization, cleaning, processing, validation, software development, each resource needs to be aware about the extent they are accountable for. Without accountability made explicit, there’s the danger that the activities are neglected, with all the implications deriving from it – quality deviations, delays, data unavailability, etc.

Adaptability

Adaptability is the degree to which a solution can be adapted to environment or requirement changes. Even if typically, the environments don’t change, it doesn’t mean that this will not happen as the IT infrastructure goes through continuous changes that can affect directly or indirectly a migration. Same can be said about requirements, which however have higher probability to change even late in the process as new knowledge is acquired and needs to be integrated in the solution.

Atomicity

Atomicity is the degree to which data entities can be processes at the required level of abstraction in an atomic manner. Even if transformations occur during the various stages, the data belonging to an entity need to be kept and processed together (e.g. Customers and their Addresses). This can involve processing attributes in advance even if the data might be required later. There can be situations in which the data belonging to the same entity need to be processed on different paths, though in the end it’s important to keep the data together, when the processing logic allows it.

Next Post

11 February 2018

🔬Data Science: Parametric Estimating (Definitions)

[parametric:] "A statistical procedure that makes assumptions concerning the frequency distributions." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A simplified mathematical description of a system or process, used to assist calculations and predictions. Generally speaking, parametric models calculate the dependent variables of cost and duration on the basis of one or more variables." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"An estimating technique that uses a statistical relationship between historical data and other variables (e.g., square footage in construction, lines of code in software development) to calculate an estimate for activity parameters, such as scope, cost, budget, and duration. An example for the cost parameter is multiplying the planned quantity of work to be performed by the historical cost per unit to obtain the estimated cost." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"A branch of statistics that assumes the data being examined comes from a variety of known probability distributions. In general, the tests sacrifice generalizability for speed of computation and precision, providing the requisite assumptions are met." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"An estimating technique in which an algorithm is used to calculate cost or duration based on historical data and project parameters." (For Dummies, "PMP Certification All-in-One For Dummies" 2nd Ed., 2013)

"Inferential statistical procedures that rely on sample statistics to draw inferences about population parameters, such as mean and variance." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

🔬Data Science: Non-Parametric Tests (Definitions)

[nonparametric:] "A statistical procedure that does not require a normal distribution of the data." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A branch of statistics that makes no assumptions on the underlying distributions of the data being examined. In general, the tests are far more generalizable but sacrifice precision and power." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"Inferential statistical procedures that do not rely on estimating population parameters such as the mean and variance." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A family of methods which makes no assumptions about the population distribution. Non-parametric methods most commonly work by ignoring the actual values, and, instead, analyzing only their ranks. This approach ensures that the test is not affected much by outliers, and does not assume any particular distribution. The clear advantage of non-parametric tests is that they do not require the assumption of sampling from a Gaussian population. When the assumption of Gaussian distribution does not hold, non-parametric tests have more power than parametric tests to detect differences." (Soheila Nasiri & Bijan Raahemi, "Non-Parametric Statistical Analysis of Rare Events in Healthcare", 2017)

12 September 2009

💎SQL Reloaded: Deterministic Functions (GetDate SQL Server 2000/2005)

In an answer on LinkedIn, one of the users made me attentive that in SQL Server 2005 and later versions the GetDate() and other non-deterministic UDFs (user-defined functions) can be used inside functions and views, thing which was not possible under SQL Server 2000. This lack of functionality involved considerable overhead when was needed to use current date in table-valued functions. The solution was to send the date as parameter to the respective objects.

For exemplification the following piece of code would run successfully on SQL Server 2005 but fail on SQL Server 2000, returning the below error message.

-- creating the test function
CREATE FUNCTION dbo.fGetCurrentDate() 
RETURNS smalldatetime AS 
BEGIN 
    RETURN GetDate()
END

-- testing the function
SELECT dbo.fGetCurrentDate()

Error message:
Server: Msg 443, Level 16, State 1, Procedure GetCurrentDate, Line 6 Invalid use of 'getdate' within a function.

A similar error message resulted when using the GetDate() function inside of a view:

-- creating the test view
CREATE VIEW dbo.vGetCurrentDate
AS
SELECT GetDate() AS CurrentDate

-- testing the view
SELECT * 
FROM dbo.vGetCurrentDate

The alternative was to use an UDF instead, which functions like a parameterized view, something like:

-- creating the test UDF
CREATE FUNCTION dbo.udfFilterByDate(
 @date datetime)
 RETURNS TABLE
 AS 
 RETURN(
 SELECT * 
 FROM [AdventureWorks2014].Production.Product
 WHERE [SellStartDate]>=@Date)

 -- testing the UDF
 SELECT *
 FROM dbo.udfFilterByDate(GetDate())

 -- testing the UDF (alternative)
 DECLARE @date as datetime
 SET @date = GetDate()
 SELECT *
 FROM dbo.udfFilterByDate(@date)

Using an UDF can still be necessary when is needed to pass dates and other types of parameters used inside of a correlated query or within the logic.

29 March 2009

🛢SQL Server: Parameterization (Definitions)

"Parameterization is the act of using named or positional markers in place of constant values in a T-SQL query or statement. The actual values are passed to SQL Server independently of the actual query." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"The act of using parameters or parameter markers rather than constant values." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"Parameterization is the act of using named or positional markers in place of constant values in a T-SQL query or statement. The actual values are passed to SQL Server independently of the actual query." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"The act of using named or positional markers in place of constant values in a T-SQL query or statement. The actual values are passed to SQL Server independently of the actual query." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

SQL Troubles

Pages