30 July 2010

SQL Reloaded: Self-Join in Update Query II

    In yesterday’s post on the same topic I tried to exemplify how a self-join update query could be written in order to avoid an error raised by the database engine. Even if this type of query is normally written for hierarchical structures consisting of one-level parent-child relations stored in the same table, my example was using a simple join based directly on the primary key. During the day I was thinking that might be straightforward to create a simple hierarchical structure based on a relatively realistic example. For this let’s consider a department in which the performance of the the manager is a function of the performance of its employees, while the number of hours of training allocated to each employee is proportional to the number of hours allocated to its manager. For this let’s consider the following table:  

--Employee table's script 
CREATE TABLE [dbo].[Employees]( 
[EmployeeID] [int]  NOT NULL 
, [ManagerID] [int]  NULL 
, [Name] [nvarchar] (50) NULL 
, [Performance] [numeric]  (5, 3) NULL 
, [TrainingHours] [numeric]  (5, 2) NULL ) 
ON [PRIMARY] 

-- inserting the test data 
INSERT INTO dbo.Employees 
VALUES (1, NULL, 'Joe', NULL, 10) 
, (2, 1, 'Jack', .65, NULL) 
, (3, 1, 'Mary', .45, NULL) 
, (4, 1, 'Ross', .54, NULL) 
, (5, 1, 'Scott', .48, NULL) 
, (6, NULL, 'Jane', NULL, 15) 
, (7, 6, 'Sam', .50, NULL) 
, (8, 6, 'Ron', .45, NULL) 

   In order to calculate Manager’s performance we could use a query similar with the following, of course the formula could be more complicated and not a simple average:  

-- updating Manager's Performance 
UPDATE dbo.Employees 
SET Performance = EMP.Performance 
FROM ( -- inline view 
     SELECT ManagerID 
     , AVG(Performance) Performance 
     FROM dbo.Employees 
     WHERE ManagerID IS NOT NULL 
     GROUP BY ManagerID 
) EMP 
WHERE dbo.Employees.ManagerID IS NULL 
AND dbo.Employees.EmployeeID = EMP.ManagerID 

    Let’s check the updates and their correctitude: 


-- Checking updated data 
SELECT * 
FROM dbo.Employees 
WHERE ManagerID IS NULL -- Verifying output SELECT ManagerID 
, AVG(Performance) Performance 
FROM dbo.Employees 
WHERE ManagerID IS NOT NULL 
GROUP BY ManagerID       

self-join output 1      

Note:     
    The average needs to be calculated in the inline view. In case are retrieved more records for each record from the updated table, the query will still work though the result is “unpredictable”:
 


-- updating Manager's Performance w/o aggregates 
UPDATE dbo.Employees 
SET Performance = EMP.Performance 
FROM ( -- inline view 
     SELECT ManagerID 
     , Performance 
     FROM dbo.Employees 
     WHERE ManagerID IS NOT NULL 
) EMP WHERE dbo.Employees.ManagerID IS NULL 
AND dbo.Employees.EmployeeID = EMP.ManagerID       

      
    The reverse update based on the number of hours of training could be written as follows: 
 

-- updating Employees' Training Hours 
UPDATE dbo.Employees 
SET TrainingHours = 0.75 * EMP.TrainingHours 
FROM ( -- inline view 
    SELECT EmployeeID 
    , TrainingHours 
    FROM dbo.Employees 
   WHERE ManagerID IS NULL 
) EMP 
WHERE dbo.Employees.ManagerID IS NOT NULL 
AND dbo.Employees.ManagerID = EMP.EmployeeID       
      
-- Checking updated data 
SELECT * 
FROM dbo.Employees 
WHERE ManagerID IS NOT NULL -- Verifying output SELECT EmployeeID 
, 0.75 * TrainingHours TrainingHours 
FROM dbo.Employees 
WHERE ManagerID IS NULL   

  
self-join output 2     



No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.