20 May 2020

💼Project Management: Project Planning (Part V: Some Thoughts on Planning II)

Mismanagement

A project’s dependency on resources’ (average) utilization time (UT) and quality expectations expressed as a quality factor (QF) doesn’t come as a surprise, as hopefully one is acquainted with project’s triangle which reflects the dependency between scope, cost and time in respect to quality. Even if this dependency is intuitive, it’s difficult to express it in numbers and study the way it affects the project. That was the purpose of the model built previously.
From the respective model there are a few things to ponder. First, it’s a utopia to plan with 90% UT, unless one is really sure that the resources have enough work to bring the idle time close to zero. A single person can achieve maybe a 90% UT if he works alone on the project, though even then there are phases in which the input or feedback from other people is necessary. The more people involved into the project and the higher the dependency between their activities, the higher the chances that the (average) UT will decrease considerably.
When in addition there’s also a geographical or organizational boundary between team members, the UT will decrease even more. In consequence, in big projects like ERP implementations the team members from customer and vendor side are allocated fully to the project; when this is not possible, then on the vendor side the consultants need to be involved in at least two projects to cover the idle time. Of course, with good planning, communication, and awareness of the work ahead one can try minimizing the idle time, though that’s less likely to happen.
Probably, a better idea would be planning with 75% or even 60% UT though the values depend on team's experience in handling similar projects. If the team members are involved also in operational activities or other projects, then a 50% UT is more realistic.
Secondly, in the previous post was considered in respect to quality the 80%-20% rule which applies to the various deliverables, though the rule has a punctual character. Taken on the average the rule is somehow attenuated. Therefore, in the model was considered a sprung between factors of 1 to 2 with a step of 0,25 for each 5% quality increase. It's needed to prove whether the values are realistic and how much they depend on project's characteristics.
On the other side, quality is difficult to quantify, and 100% quality is hypothetical. One discusses in theory about 3 sigma (the equivalent of 93,3 accuracy) or 4 sigma (99,4 accuracy) in respect to the number of errors found in the code, though from there on everything is fuzzy. In software projects each decision has the potential of leading to an error, and there’s lot of interpretability as long there’s no fix basis against to compare the deviations. One needs precise and correct specification for that.
I think that one should target in a first phase 80% quality (on average) and further build from there, try to improve the quality iteratively as the project goes on and as lessons are learned. In other words, a project plan, a concept, a design document doesn’t need to be perfect from the beginning but should be good enough to allow working with it. One can detail them as progress is made into the project, and hopefully their quality should converge to a value that is acceptable for the business.
Thirdly, in case a planning tool was used, one can use the model backwards to roughly prove timeline’s feasibility, dividing the planned effort by the estimated effort and the number of resources involved to identify the implied utilization time.  

💎SQL Reloaded: Query Patterns in SQL Server (Part V: Matrix Representation)

In a previous post on Project Management (PM) I used Excel to represent the data into a matrix format. The same output can be obtained by using SQL in a simple query based on a CROSS JOIN between two numeric intervals created via a CTE within a table-valued function. For this purpose I used an old piece of code which displays the values within an interval given a step. It’s the typical loop with a specified step.
I have made mainly two changes to the old example - I defined the start, end and step to be numeric, while considering also an index into the output table. Here’s the modified function:

-- SQL Server 2005+: Numeric Interval 
CREATE FUNCTION dbo.fGetNInterval( 
  @Start numeric(18,2)
, @End numeric(18,2)
, @Step numeric(18,2)) 
RETURNS @Interval TABLE( 
  Ind int PRIMARY KEY
, Val numeric(18,2)) 
AS 
BEGIN 
 DECLARE @Sign smallint 
 SET @Sign = IsNull(NullIf(Sign(@End-@Start), 0), 1); 
 WITH CTE(Ind, Val) 
 AS( 
   SELECT Cast(1 as int) Ind
   , Cast(COALESCE(@Start, @End, 0) as numeric(18,2)) Val 
   UNION ALL 
   SELECT cast(Ind+1 as int) Ind
   , Cast(CTE.Val + @Step * @Sign as numeric(18,2)) Val 
   FROM CTE 
   WHERE (@End-CTE.Val)*@Sign-@Step>=0 
 ) 
 INSERT @Interval 
 SELECT Ind
 , Val 
 FROM CTE 
 OPTION(MAXRECURSION 0) 
 RETURN 
END 
As it can be seen, the function can be used to step forwards as well backwards:

--descending sequence 
SELECT UT.Val, 100/UT.Val
FROM dbo.fGetNInterval(100, 25, 5) UT

-- ascending sequence 
SELECT *
FROM dbo.fGetNInterval(1, 2, 0.25)
Based on these two sequences the matrix query can be written as follows:

-- matrix representation
DECLARE @EE as numeric(18,2) = 100
SELECT UT.Val [UT %]
, Max(CASE WHEN QF.Ind = 1 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [80%]
, Max(CASE WHEN QF.Ind = 2 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [85%]
, Max(CASE WHEN QF.Ind = 3 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [90%]
, Max(CASE WHEN QF.Ind = 4 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [95%]
, Max(CASE WHEN QF.Ind = 5 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [100%]
FROM dbo.fGetNInterval(100, 25, 5) UT
, dbo.fGetNInterval(1, 2, 0.25) QF
GROUP BY UT.Ind
, UT.Val

Here’s the output:

Note:
The indexes were used to display the values into the needed format. The table-valued function could be created also without the respective indexes, however it would have complicated the query.

19 May 2020

💼Project Management: Project Planning (Part IV: Some Thoughts on Planning I)

Mismanagement

One of the issues in Project Management (PM) planning is that the planner idealizes a resource and activities performed by it much like a machine. Unlike machines whose uptime can approach 100%, a human resource can work at most 90% of the available time (aka utilization time), the remaining 10% being typically associated with interruptions – internal emails and meetings, casual communications, pauses, etc. For resources split between projects or operations the utilization time can be at most 70%, however a realistic value is in general between 40% and 60% on average. What does it mean this for a project?
So, if a resource has a volume of work W, the amount of time needed to complete the work would be at best W/UT, where UT is the utilization time of the respective resource. “At best” because in each project there are additional idle time resulted from waste related activities – waiting for sign off, for information, for other resource to complete the time, etc.
The utilization time is not the only factor to consider. Upon case, the delivered work can reach maybe on average 80% of the expected quality. This applies to documentation and concepts as well for written code, bug testing and other project activities. To reach in the range of 100% one more likely will need 4 times of the effort associated with reaching 80% of the expected quality, however this value is dependent also on people’s professionalism and the degree with which the requirements were understood and possibly achievable. Therefore, the values vehiculated can be regarded as “boundary” values.
Let’s consider a quality factor (QF) which has a value of 1 for 80%, with an increase of 0,25 for each 5% of quality increase. Thus, with an initial effort estimation of 100 days, this is how the resulted effort modifies for various UT and QF values:

Considering that a project can target between 60% and 95% UT, and between 80% and 95% quality, for an initial estimation of 100 days the actual project duration can range between 117 and 292 days, where the lowest, respectively the right bound values are more realistic.
The model is simplistic as it doesn’t reflect the nonlinear aspect of the factors involved and the dependencies existing between them. It also doesn’t reflect the maturity of an organization to handle the projects and the tasks involved. However, it can be used to increase the awareness in how the utilization time and expected quality can affect a project’s timeline, and to check on whether one’s planning is realistic.
For example, at project’s start one can target an UT of 70% and a quality of 85%, which for 100 days of estimated effort will result in about 178 days of actual effort. Now diving the value by the number of resources involved, e.g. 4, it results that the project could be finished in about 44,5 days. This value can be compared then with the actual plan in which the activities are listed.
During the project it would be useful to look on how the UT changed and by how much, to understand the impact the change has on the project. For example, a decrease of 5% in utilization time can delay the project with 2,5 days which is not much, though for a project of 1000 days with talk already about one month. Same, it will be helpful to check how much the quality deviated from the expectation, because a decrease in quality by 5% can result in an additional effort of extra 8 days, which for 1000 days would mean almost 4 months of delay.

📦Data Migrations (DM): In-house Built Solutions (Part I - An Introduction)

Data Migration
Data Migrations Series

A Data Migration (DM) is the move of all or a subset of the data available from one or more system(s) into other system(s). For ERP systems a DM this can be achieved by having a database in between which allows the import and combination of data from various sources, the further processing of data, respectively the preparation of the data as per target system(s)’ needs. 

At a deeper lever is needed to model the entities from source and the target systems at level that allows easier processing, which resumes to removing and adding data, making mapping and transformations between the input and output models. In other words, a set of entities is transformed in another set of entities, while at a much lower level attributes from the source system must be mapped to the target system. 

SQL scripting provides enough flexibility in modelling the source or target system(s), in changing the data and logic on the fly, in building additional functionality, in reusing logic or cleaning the data. It works with big amounts of structured data, while for unstructured data one can still combine the power of SQL with ad-hoc or even NoSQL processing. It all sounds easy, isn’t it? Then from where comes the complexity people talk about?

Knowing SQL and the features of a (relational) database (the how) is only a small part from all is needed. Building a DM solution implies the knowledge of the source data models and of the data behind (the what), knowing the target (where), how and when to import the data, why the data must be made available, in what way the data has impact on the target system(s), respectively who needs to be involved. The more of these aspects are known, the higher the chances for the DM to succeed. 

When there’s a gap in the needed knowledge, then the knowledge must be acquired from the business and/or consultants, by investing time in data discovery or similar activities, in experimenting and testing. The more uncertainty, the more iterations are needed to achieve the degree of certainty (aka quality) needed. At minimum one needs at least 3-4 iterations to migrate the data – to build the concept, to test fully the migration, respectively to get the sign-off in a test environment and doing the migration into the production. 

Another type of complexity is derived from the interdependencies existing between the DM and vital activities like system parameterization and data cleaning. System’s parametrization defines how a system behaves, an important number of the parametrization values being reflected also in the prepared data for the DM. Wrong parametrization values can be trapped in the process during the various iterations, however poor data quality will reflect though the system after Go-Live and will haunt the business on the long term, unless addressed correctly into the project. 

All these aspects are reflected to some degree also in DM’s architecture (e.g. building restart points, including data validation and automatic cleaning) and process (e.g. sequencing of steps). From all the effort involved in DM, maybe 20-30% needs to be invested in building the solution. These 20-30% can be reduced in theory by using specialized DM tools, though the advantage provided by such tools can be illusory, especially when further limitations are involved. On the other side such tools may provide functionality that address the other 70-80%. 

As usual in IT, there’s a trade-off between advantages and disadvantages. An in-house build DM solution may provide the needed flexibility in the detriment of a small time investment. Whether it pays-off is something one has to prove by walking this path. 

17 May 2020

💎SQL Reloaded: Query Patterns in SQL Server (Part IV: Window Functions)

For a long time aggregate functions were the only tool for statistical purposes available for raw SQL scripting. Their limitations become more evident with the introduction of window functions, which allow to apply the aggregates over a defined partition. Besides aggregate window functions, the use of ranking, respectively value window functions, opened the door to a new set of techniques. Here are a few examples based on the tables defined in a previous post.

Ranking windows functions allow ranking a record within the whole dataset or a partition by providing a sorting key:

-- Ranking window functions (RANK, DENSE_RANK, ROW_NUMBER)
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
, A.StartDate 
, A.EndDate 
, RANK() OVER (ORDER BY A.Mark) [Rank over whole set]
, DENSE_RANK() OVER (ORDER BY A.Mark) [Dense Rank over whole set]
, ROW_NUMBER() OVER (ORDER BY A.Mark) [Row Number over whole set]
--, RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Rank over Course]
--, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Rank over Student]
--, DENSE_RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course]
--, DENSE_RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student]
--, ROW_NUMBER() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course]
--, ROW_NUMBER() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student]
FROM dbo.T_Allocations A
     JOIN dbo.T_Courses C
       ON A.CourseId = C.CourseId 
     JOIN dbo.T_Students S
       ON A.StudentId = S.StudentId 
ORDER BY C.CourseName 
, S.StudentName 

Being able to rank records allows to easier select the last, respectively the first records from a dataset or a given partition:

-- first course 
SELECT *
, A.StartDate 
FROM dbo.T_Students S
     LEFT JOIN (
		SELECT A.CourseId
		, A.StudentId 
		, A.StartDate 
		, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) Ranking 
		FROM dbo.T_Allocations A
    ) A
   ON S.StudentId = A.StudentId 
  AND A.Ranking = 1 
ORDER BY S.StudentName

The aggregate window functions function similarly as their simple counterparts, except that they are valid over a partition. If the partition is left out, the functions apply over the whole data set.

-- aggregate window functions 
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
, A.StartDate 
, A.EndDate 
, SUM(A.Mark) OVER (PARTITION BY A.StudentId) [Sum]
, MAX(A.Mark) OVER (PARTITION BY A.StudentId) [Max]
, MIN(A.Mark) OVER (PARTITION BY A.StudentId) [Min]
, AVG(A.Mark) OVER (PARTITION BY A.StudentId) [Avg]
, COUNT(A.Mark) OVER (PARTITION BY A.StudentId) [Count]
FROM dbo.T_Allocations A
     JOIN dbo.T_Courses C
       ON A.CourseId = C.CourseId 
     JOIN dbo.T_Students S
       ON A.StudentId = S.StudentId 
ORDER BY C.CourseName 
, S.StudentName 

When the ordering attributes are specified, running averages, respectively sums derive:

-- Running averages/sums and ranking via Sum and ORDER BY
SELECT A.StudentId 
, A.CourseId
, A.StartDate
, A.EndDate
, A.Mark 
, AVG(Cast(Mark as decimal(4,2))) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) AvgGrade
, SUM(Mark) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) SumGrade
, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking
FROM dbo.T_Allocations A
ORDER BY A.StudentId
, A.StartDate
Running averages were introduced only with SQL Server 2008. Previously, one could still calculate them using common table expressions and a ranking function:
-- Running averages/sums and ranking via common table expressions 
WITH CTE AS (
	SELECT A.StudentId 
	, A.CourseId
	, A.StartDate
	, A.EndDate
	, A.Mark 
	, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking
	FROM dbo.T_Allocations A
), 
DAT AS (
	SELECT A.StudentId 
	, A.CourseId
	, A.StartDate
	, A.EndDate
	, A.Mark 
	, A.Ranking
	, A.Mark SumMark
	FROM CTE A
	WHERE A.Ranking = 1
	UNION ALL
	SELECT CTE.StudentId 
	, CTE.CourseId
	, CTE.StartDate
	, CTE.EndDate
	, CTE.Mark 
	, CTE.Ranking
	, CTE.Mark + DAT.Mark SumMark 
	FROM CTE
		 JOIN DAT
		   ON CTE.StudentID = DAT.StudentId 
		  AND CTE.Ranking-1 = DAT.Ranking 
) 
SELECT DAT.StudentId 
, DAT.CourseId
, DAT.StartDate
, DAT.EndDate
, DAT.Mark 
, cast(DAT.SumMark/DAT.Ranking as decimal(4,2)) AvgMark
, DAT.SumMark
, DAT.Ranking
FROM DAT
ORDER BY DAT.StudentId
, DAT.StartDate
   

💎SQL Reloaded: Query Patterns in SQL Server (Part III: Aggregate Functions)

As their names denote, aggregate functions are used to summarize information across the whole data set or based on several attributes. The following queries exemplify the use of aggregates based on the tables created in a previous post:

-- aggreates over the whole table 
SELECT COUNT(*) NoRecords 
, MIN(StartDate) StartDate
, MAX(EndDate) EndDate
, DateDiff(d, MIN(StartDate), MAX(EndDate)) [DiffDays]
FROM dbo.T_Allocations A

-- aggregates based on several several attributes 
 SELECT A.CourseId
 , A.StudentId 
 , COUNT(*) NoRecords 
 , MIN(StartDate) StartDate
 , MAX(EndDate) EndDate
 , DateDiff(d, MIN(StartDate), MAX(EndDate)) [DiffDays]
 FROM dbo.T_Allocations A
GROUP BY A.CourseId
 , A.StudentId 

Correlated subqueries can be of help either inside of a SELECT or an CROSS, respectively OUTER APPLY: 

-- correlated subquery within a SELECT 
 SELECT *
 , (SELECT COUNT(*) NoRecords 
     FROM dbo.T_Allocations A
     WHERE C.CourseId = A.CourseId
     GROUP BY A.CourseId) NoRecords 
 FROM dbo.T_Courses C


 -- CROSS APPLY (correlated subquery)
 SELECT *
 , A.NoRecords
 , A.StartDate
 FROM dbo.T_Courses C
      CROSS APPLY (
         SELECT A.CourseId
         , COUNT(*) NoRecords 
         , MIN(StartDate) StartDate
         FROM dbo.T_Allocations A
         WHERE C.CourseId = A.CourseId
         GROUP BY A.CourseId
     ) A
 ORDER BY C.CourseName

-- OUTER APPLY (correlated subquery)
 SELECT *
 , A.NoRecords
 , A.StartDate
 FROM dbo.T_Courses C
      OUTER APPLY (
         SELECT A.CourseId
         , COUNT(*) NoRecords 
         , MIN(StartDate) StartDate
         FROM dbo.T_Allocations A
         WHERE C.CourseId = A.CourseId
         GROUP BY A.CourseId
     ) A
 ORDER BY C.CourseName

Aggregates are useful in a series of techniques like the listing of duplicates:

-- identifying duplicates via an aggregate 
SELECT A.*
FROM dbo.T_Allocations A
     JOIN (
  SELECT A.CourseId
  , A.StudentId 
  , COUNT(*) NoRecords 
  FROM dbo.T_Allocations A
  GROUP BY A.CourseId
  , A.StudentId 
  HAVING COUNT(*)>1
  ) DUP
   ON A.CourseId = DUP.CourseId 
  AND A.StudentId = DUP.Studentid
ORDER BY A.CourseId
, A.StudentId 

A similar technique can be used to remove the duplicates from the base table:

-- removing the duplicates except the last record  
 DELETE dbo.T_Allocations 
 FROM (
 SELECT A.StudentId 
 , A.CourseId
 , A.StartDate
 , A.EndDate
 , Max(A.AllocationId) AllocationId
 FROM dbo.T_Allocations A
 GROUP BY A.StudentId 
 , A.CourseId
 , A.StartDate
 , A.EndDate
 HAVING count(*)>1
  ) A
WHERE dbo.T_Allocations.CourseId = A.CourseId
  AND dbo.T_Allocations.StudentId = A.StudentId
  AND dbo.T_Allocations.StartDate = A.StartDate
  AND dbo.T_Allocations.EndDate = A.EndDate
  AND dbo.T_Allocations.AllocationId <> A.AllocationId

When the number of values is fixed and unique within a grouping, one can use aggregate functions to display the values within a matrix:

-- Matrix display via aggregates 
SELECT S.StudentId 
, S.StudentName 
, CASE WHEN A.Course1 = 1 THEN 'x' ELSE '-' END Course1
, CASE WHEN A.Course2 = 1 THEN 'x' ELSE '-' END Course2
, CASE WHEN A.Course3 = 1 THEN 'x' ELSE '-' END Course3
, CASE WHEN A.Course4 = 1 THEN 'x' ELSE '-' END Course4
, CASE WHEN A.Course5 = 1 THEN 'x' ELSE '-' END Course5
FROM dbo.T_Students S
     LEFT JOIN (
  SELECT A.StudentId 
  , Max(CASE WHEN A.CourseId = 1 THEN 1 ELSE 0 END) Course1
  , Max(CASE WHEN A.CourseId = 2 THEN 1 ELSE 0 END) Course2
  , Max(CASE WHEN A.CourseId = 3 THEN 1 ELSE 0 END) Course3
  , Max(CASE WHEN A.CourseId = 4 THEN 1 ELSE 0 END) Course4
  , Max(CASE WHEN A.CourseId = 5 THEN 1 ELSE 0 END) Course5
  FROM dbo.T_Allocations A
  GROUP BY A.StudentId
    ) A
   ON S.StudentId = A.StudentId 
ORDER BY S.StudentName 

16 May 2020

💎SQL Reloaded: Query Patterns in SQL Server (Part II: Joins and Subqueries)

The basis for being able to manipulate data via SQL scripting is a good knowledge of using joins and subqueries as seems fit for the purpose and data models. The following scripts are based on the tables created in a previous post.

-- FULL JOIN
SELECT *
FROM dbo.T_Courses C
     JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL JOIN (deprecated)
 SELECT *
 FROM dbo.T_Courses C
 , dbo.T_Students S
 WHERE C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

--LEFT JOIN
 SELECT *
 FROM dbo.T_Courses C
      LEFT JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 WHERE S.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- RIGHT JOIN
 SELECT *
 FROM dbo.T_Courses C
      RIGHT JOIN dbo.T_Students S
       ON C.CourseId = S.CourseId 
 WHERE C.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL OUTER JOIN
 SELECT *
 FROM dbo.T_Courses C
      FULL OUTER JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 --WHERE C.CourseId IS NULL
 --WHERE S.CourseId IS NULL
 --WHERE (C.CourseId IS NULL OR S.StudentId IS NULL)
 --WHERE (C.CourseId IS NULL AND S.StudentId IS NULL)
 ORDER BY C.CourseName
 , S.StudentName 

The IN, NOT IN, EXISTS and NOT EXISTS allow using correlated queries, their use being indicated when there are no actual data needed from the tables involved in the correlated queries:

-- EXISTS (correlated subquery)
SELECT *
FROM dbo.T_Courses C
WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
ORDER BY C.CourseName

-- NOT EXISTS (correlated subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
 ORDER BY C.CourseName

-- IN (subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE CourseId IN (SELECT CourseId 
        FROM dbo.T_Students S)
 ORDER BY C.CourseName

Joining multiples tables is done using the same principles as above:

-- joins with more tables 
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
      JOIN dbo.T_Courses C
        ON A.CourseId = C.CourseId 
      JOIN dbo.T_Students S
        ON A.StudentId = S.StudentId 
 ORDER BY C.CourseName 
 , S.StudentName 

One can obtain the same result via correlated subqueries (a technique often met between Oracle developers). From readability reasons I avoid writing such queries, unless there’s a special purpose to do so.

-- correlated subquery for individual values
SELECT A.CourseId 
, (SELECT C.CourseName 
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT S.StudentName 
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

When displaying values within a SELECT via a correlated subqueries, some developers feel the need to use MAX or MIN functions to make sure only one value will be returned. For data analysis it may be acceptable, however if the data model imposes it, then a redesign of the solution is more likely necessary.

-- correlated subquery for individual values
 SELECT A.CourseId 
 , (SELECT Max(C.CourseName)
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT Max(S.StudentName)
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
 FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

Another technique not recommended is displaying one or more attributes from the same table with the same conditions via individual correlated queries. The use of aggregate functions is more appropriate however with numerical or date values.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.