30 July 2010

Self-Join in Update Query - Update

    In yesterday’s post on the same topic I tried to exemplify how a self-join update query could be written in order to avoid an error raised by the database engine. Even if this type of query is normally written for hierarchical structures consisting of one-level parent-child relations stored in the same table, my example was using a simple join based directly on the primary key. During the day I was thinking that might be straightforward to create a simple hierarchical structure based on a relatively realistic example. For this let’s consider a department in which the performance of the the manager is a function of the performance of its employees, while the number of hours of training allocated to each employee is proportional to the number of hours allocated to its manager. For this let’s consider the following table:

--Employee table's script
CREATE TABLE [dbo].[Employees](
[EmployeeID] [int]  NOT NULL

, [ManagerID] [int]  NULL
, [Name] [nvarchar] (50) NULL
, [Performance] [numeric]  (5, 3) NULL
, [TrainingHours] [numeric]  (5, 2) NULL )
ON [PRIMARY]

-- inserting the test data
INSERT INTO dbo.Employees

VALUES
(1, NULL, 'Joe', NULL, 10
)
,
(2, 1, 'Jack', .65,
NULL)
,
(3, 1, 'Mary', .45,
NULL)
,
(4, 1, 'Ross', .54,
NULL)
,
(5, 1, 'Scott', .48,
NULL)
,
(6, NULL, 'Jane', NULL, 15
)
,
(7, 6, 'Sam', .50,
NULL)
,
(8, 6, 'Ron', .45,
NULL)

    In order to calculate Manager’s performance we could use a query similar with the following, of course the formula could be more complicated and not a simple average:
-- updating Manager's Performance
UPDATE dbo.Employees

SET
Performance = EMP.Performance

FROM
( -- inline view

     SELECT ManagerID
     , AVG(Performance) Performance
     FROM dbo.Employees
     WHERE ManagerID IS NOT NULL
     GROUP BY ManagerID
) EMP

WHERE
dbo.Employees.ManagerID IS NULL

AND dbo.Employees.EmployeeID = EMP.ManagerID

    Let’s check the updates and their correctitude:
-- Checking updated data
SELECT *

FROM
dbo.Employees

WHERE
ManagerID IS NULL

-- Verifying output
SELECT ManagerID

,
AVG(Performance) Performance

FROM
dbo.Employees

WHERE
ManagerID IS NOT NULL

GROUP
BY ManagerID

self-join output 1
Note:
    Please note that the average needs to be calculated in the inline view. In case are retrieved more records for each record from the updated table, the query will still work though the result is “unpredictable”:

-- updating Manager's Performance w/o aggregates
UPDATE dbo.Employees

SET
Performance = EMP.Performance

FROM
( -- inline view

     SELECT ManagerID
     , Performance
     FROM dbo.Employees
     WHERE ManagerID IS NOT NULL
) EMP WHERE dbo.Employees.ManagerID IS NULL
AND dbo.Employees.EmployeeID = EMP.ManagerID

    The reverse update based on the number of hours of training could be written as follows:
-- updating Employees' Training Hours
UPDATE dbo.Employees

SET
TrainingHours = 0.75 * EMP.TrainingHours

FROM
( -- inline view

    SELECT EmployeeID
    , TrainingHours
    FROM dbo.Employees
   WHERE ManagerID IS NULL
) EMP

WHERE
dbo.Employees.ManagerID IS NOT NULL

AND dbo.Employees.ManagerID = EMP.EmployeeID

    Let’s check the changes:
 
-- Checking updated data
SELECT *
FROM
dbo.Employees

WHERE
ManagerID IS NOT NULL

-- Verifying output
SELECT EmployeeID

,
0.75 * TrainingHours TrainingHours

FROM
dbo.Employees

WHERE
ManagerID IS NULL

  
self-join output 2

No comments: