SQL Troubles: merge

Showing posts with label merge. Show all posts

16 May 2020

💎🏭SQL Reloaded: Query Patterns in SQL Server (Part I: Inserts)

Independently on whether one is a database developer, administrator, or any other professional in the data area, as part of the job is important to be able to manipulate data on the fly. From the availability of DTS (Data Transformation Services) replaced since SQL Server 2005 with SSIS (SQL Server Integration Services), the use of packages for data import become one of the simplest and fastest ways of importing data into a database. It still is, however when dealing with small volume of data, it’s easier to import the data over a few SQL scripts, like in the below examples.
The old-fashion way of loading data is by defining first the target table based on the source’s definition:

-- Course data 
--DROP TABLE IF EXISTS dbo.T_Courses 

CREATE TABLE [dbo].[T_Courses](
     [CourseId] [int] NOT NULL,
     [CourseName] [varchar](50) NOT NULL
 ) ON [PRIMARY]

-- old-fashioned invidual inserts(SQL Server 2000)
INSERT INTO dbo.T_Courses
VALUES (1, 'Course 1')

INSERT INTO dbo.T_Courses(CourseId, CourseName)
VALUES (2, 'Course 2')

-- inserting multiple rows to a table at once (SQL Server 2005)
INSERT INTO dbo.T_Courses
--OUTPUT inserted.CourseId, inserted.CourseName -- 
VALUES (3, 'Course 3')
 , (4, 'Course 4')
 , (5, 'Course 5')
 , (6, 'Course 6')
 , (7, 'Course 7')

 SELECT *
 FROM dbo.T_Courses

Sometimes the definition of the table is not important. With SQL Server 2008 it was possible to create the table on the fly, and thus the inserts can be simplified as follows:

-- Course data 
--DROP TABLE IF EXISTS dbo.T_Courses 

SELECT *
INTO dbo.T_Courses
FROM (
  VALUES (1, 'Course 1')
 , (2, 'Course 2')
 , (3, 'Course 3')
 , (4, 'Course 4')
 , (5, 'Course 5')
 , (6, 'Course 6')
 , (7, 'Course 7')
 ) C (CourseId, CourseName)


 SELECT *
 FROM dbo.T_Courses

Now, to demonstrate further techniques it is useful to define besides Courses also a Students, respectively an Allocation table, the later storing the relation between a Course and a Student. As can be seen there is in the Students table a reference to the Courses, showing the course in which the student is enrolled currently The model is only for demonstration purposes:

-- Student data with current courses
-- DROP TABLE IF EXISTS dbo.T_Students

SELECT StudentId
 , StudentName
 , CourseId
 , Cast(StartDate as Date) StartDate
 , Cast(EndDate as Date) EndDate
 INTO dbo.T_Students
 FROM (
  VALUES (1, 'Student 1', 1, '20170201', '20170228')
 , (2, 'Student 2', 1, '20170201', '20170228')
 , (3, 'Student 3', 2, '20170201', '20170228')
 , (4, 'Student 4', 3, '20170201', '20170228')
 , (5, 'Student 5', NULL, NULL, NULL)
 , (6, 'Student 6', NULL, NULL, NULL)
 , (7, 'Student 7', NULL, NULL, NULL)
 ) S (StudentId, StudentName, CourseId, StartDate, EndDate)


 SELECT *
 FROM dbo.T_Students

-- Course allocations 
-- DROP TABLE IF EXISTS dbo.T_Allocations

SELECT CourseId
 , StudentId
 , Cast(StartDate as Date) StartDate
 , Cast(EndDate as Date) EndDate
 , Hours
 , Mark 
 INTO dbo.T_Allocations
 FROM (
  VALUES  (1, 1, '20170101', '20170131', 4, 1)
 , (1, 2, '20170101', '20170131', 5, 2)
 , (1, 3, '20170101', '20170131', 12, 3)
 , (2, 1, '20170201', '20170128', 23, 1)
 , (2, 3, '20170201', '20170128', 12, 3)
 , (2, 4, '20170201', '20170128', 21, 1)
 , (1, 5, '20170201', '20170128', 12, 1)
 , (4, 1, '20170101', '20170131', 5, 2)
 , (4, 1, '20170101', '20170131', 4, 2)
 ) A(CourseId, StudentId, StartDate, EndDate, Hours, Mark)

ALTER TABLE dbo.T_Allocations 
ADD AllocationId int IDENTITY(1,1) 

SELECT *
FROM dbo.T_Allocations

With this the data model is ready for use. By the initial design the courses in which the Students are currents enrolled were not supposed to be into the Allocations table. The following query repairs this:

-- insert data from query
INSERT INTO dbo.T_Allocations (CourseId, StudentId, StartDate, EndDate, Hours)
SELECT C.CourseId
, C.StudentId
, C.StartDate
 , C.EndDate
 , 0 Hours
 FROM dbo.T_Students C
      LEFT JOIN dbo.T_Allocations A
        ON C.CourseId = A.CourseId
       AND C.StudentId = A.Studentid 
 WHERE A.CourseId IS NULL
   AND C.CourseId IS NOT NULL

If is needed to append more data, one can modify the corresponding query from above as follows:

-- insert data from values with check against existing data 
INSERT INTO dbo.T_Allocations
SELECT DAT.CourseId
 , DAT.StudentId
 , Cast(DAT.StartDate as Date) StartDate
 , Cast(DAT.EndDate as Date) EndDate
 , DAT.Hours
 , DAT.Mark 
 FROM (
  VALUES  (5, 5, '20170101', '20170131', 4, 2)
 , (5, 5, '20170101', '20170131', 5, 3)
 , (5, 5, '20170101', '20170131', 12, 4)
 , (4, 1, '20170101', '20170131', 4, 2) --already exists 
 ) DAT(CourseId, StudentId, StartDate, EndDate, Hours, Mark)
   LEFT JOIN dbo.T_Allocations A
     ON DAT.CourseId = A.CourseId
    AND DAT.StudentId = A.Studentid 
 WHERE A.CourseId IS NULL

Starting with the 2008 version, a MERGE statement was introduced in SQL Server, which allows adding further behavior within the same statement for INSERTS, UPDATES or DELETES. The above insert based on a MERGE becomes:

-- insert data via a merge
 MERGE INTO dbo.T_Allocations A USING
 (
     SELECT DAT.CourseId
     , DAT.StudentId
     , Cast(DAT.StartDate as Date) StartDate
     , Cast(DAT.EndDate as Date) EndDate
     , DAT.Hours
     , DAT.Mark 
     FROM (
      VALUES  (5, 5, '20170101', '20170131', 4, 2)
     , (5, 5, '20170101', '20170131', 5, 3)
     , (5, 5, '20170101', '20170131', 12, 4)
     , (4, 1, '20170101', '20170131', 4, 2) --already exists 
     ) DAT(CourseId, StudentId, StartDate, EndDate, Hours, Mark)
 ) DAT
   ON DAT.CourseId = A.CourseId
  AND DAT.StudentId = A.Studentid 
 WHEN NOT MATCHED THEN 
 INSERT (CourseId, StudentId, StartDate, EndDate, Hours, Mark)
 VALUES (DAT.CourseId, DAT.StudentId, DAT.StartDate, DAT.EndDate, DAT.Hours, DAT.Mark);

These are the types of scripts I typically use to insert data into a database, and this independently of tables’ size. For more check Microsoft’s documentation on INSERT. The preparation of the data to be imported can be easily done in Excel with a few macros.

Notes:

The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

Previous Post <<||>> Next Post

17 April 2009

🛢DBMS: Merge Replication (Definitions)

"A type of replication that allows sites to make autonomous changes to replicated data and, at a later time, merge changes made at all sites. Merge replication does not guarantee transactional consistency." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A type of replication that allows sites to make autonomous changes to replicated data, and at a later time, merge changes and resolve conflicts when necessary." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"The process of transferring data from the Publisher to the Subscriber, allowing the Publisher and Subscriber to update data while connected or disconnected and then merging the updates after they both are connected. Merge replication begins with a snapshot. Thereafter, no data is replicated until the Publisher and Subscriber do a "merge." The merge can be scheduled or done via an ad-hoc request. Merge replication's main benefit is that it supports subscribers who are not on the network much of the time. Transactions, which are committed, however, may be rolled back as the result of conflict resolution." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A replication type that relies on DML operations being captured from both the published database and the subscriber database(s) and automatically synchronized. Typically starts with a snapshot of the publication. Uses triggers to track subsequent changes made at either the publisher or subscriber." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"A replication strategy used when multiple subscribers are also acting as publishers. In other words, the data is updated from multiple sources." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A type of replication that allows sites to make autonomous changes to replicated data, and at a later time, merge changes and resolve conflicts when necessary." (Microsoft Technet)

"Merge replication is a method for copying and distributing data and database objects from one SQL Server database to another followed by synchronizing the databases for consistency." (Idera) [source]

SQL Troubles

Pages

16 May 2020

💎🏭SQL Reloaded: Query Patterns in SQL Server (Part I: Inserts)

17 April 2009

🛢DBMS: Merge Replication (Definitions)

About Me