Independently on whether one is a database developer, administrator, or any other professional in the data area, as part of the job is important to be able to manipulate data on the fly. From the availability of DTS (Data Transformation Services) replaced since SQL Server 2005 with SSIS (SQL Server Integration Services), the use of packages for data import become one of the simplest and fastest ways of importing data into a database. It still is, however when dealing with small volume of data, it’s easier to import the data over a few SQL scripts, like in the below examples.
The old-fashion way of loading data is by defining first the target table based on the source’s definition:
-- Course data
--DROP TABLE IF EXISTS dbo.T_Courses
CREATE TABLE [dbo].[T_Courses](
[CourseId] [int] NOT NULL,
[CourseName] [varchar](50) NOT NULL
) ON [PRIMARY]
-- old-fashioned invidual inserts(SQL Server 2000)
INSERT INTO dbo.T_Courses
VALUES (1, 'Course 1')
INSERT INTO dbo.T_Courses(CourseId, CourseName)
VALUES (2, 'Course 2')
-- inserting multiple rows to a table at once (SQL Server 2005)
INSERT INTO dbo.T_Courses
--OUTPUT inserted.CourseId, inserted.CourseName --
VALUES (3, 'Course 3')
, (4, 'Course 4')
, (5, 'Course 5')
, (6, 'Course 6')
, (7, 'Course 7')
SELECT *
FROM dbo.T_Courses
Sometimes the definition of the table is not important. With SQL Server 2008 it was possible to create the table on the fly, and thus the inserts can be simplified as follows:
-- Course data
--DROP TABLE IF EXISTS dbo.T_Courses
SELECT *
INTO dbo.T_Courses
FROM (
VALUES (1, 'Course 1')
, (2, 'Course 2')
, (3, 'Course 3')
, (4, 'Course 4')
, (5, 'Course 5')
, (6, 'Course 6')
, (7, 'Course 7')
) C (CourseId, CourseName)
SELECT *
FROM dbo.T_Courses
Now, to demonstrate further techniques it is useful to define besides Courses also a Students, respectively an Allocation table, the later storing the relation between a Course and a Student. As can be seen there is in the Students table a reference to the Courses, showing the course in which the student is enrolled currently The model is only for demonstration purposes:
-- Student data with current courses
-- DROP TABLE IF EXISTS dbo.T_Students
SELECT StudentId
, StudentName
, CourseId
, Cast(StartDate as Date) StartDate
, Cast(EndDate as Date) EndDate
INTO dbo.T_Students
FROM (
VALUES (1, 'Student 1', 1, '20170201', '20170228')
, (2, 'Student 2', 1, '20170201', '20170228')
, (3, 'Student 3', 2, '20170201', '20170228')
, (4, 'Student 4', 3, '20170201', '20170228')
, (5, 'Student 5', NULL, NULL, NULL)
, (6, 'Student 6', NULL, NULL, NULL)
, (7, 'Student 7', NULL, NULL, NULL)
) S (StudentId, StudentName, CourseId, StartDate, EndDate)
SELECT *
FROM dbo.T_Students
-- Course allocations
-- DROP TABLE IF EXISTS dbo.T_Allocations
SELECT CourseId
, StudentId
, Cast(StartDate as Date) StartDate
, Cast(EndDate as Date) EndDate
, Hours
, Mark
INTO dbo.T_Allocations
FROM (
VALUES (1, 1, '20170101', '20170131', 4, 1)
, (1, 2, '20170101', '20170131', 5, 2)
, (1, 3, '20170101', '20170131', 12, 3)
, (2, 1, '20170201', '20170128', 23, 1)
, (2, 3, '20170201', '20170128', 12, 3)
, (2, 4, '20170201', '20170128', 21, 1)
, (1, 5, '20170201', '20170128', 12, 1)
, (4, 1, '20170101', '20170131', 5, 2)
, (4, 1, '20170101', '20170131', 4, 2)
) A(CourseId, StudentId, StartDate, EndDate, Hours, Mark)
ALTER TABLE dbo.T_Allocations
ADD AllocationId int IDENTITY(1,1)
SELECT *
FROM dbo.T_Allocations
With this the data model is ready for use. By the initial design the courses in which the Students are currents enrolled were not supposed to be into the Allocations table. The following query repairs this:
-- insert data from query
INSERT INTO dbo.T_Allocations (CourseId, StudentId, StartDate, EndDate, Hours)
SELECT C.CourseId
, C.StudentId
, C.StartDate
, C.EndDate
, 0 Hours
FROM dbo.T_Students C
LEFT JOIN dbo.T_Allocations A
ON C.CourseId = A.CourseId
AND C.StudentId = A.Studentid
WHERE A.CourseId IS NULL
AND C.CourseId IS NOT NULL
If is needed to append more data, one can modify the corresponding query from above as follows:
-- insert data from values with check against existing data
INSERT INTO dbo.T_Allocations
SELECT DAT.CourseId
, DAT.StudentId
, Cast(DAT.StartDate as Date) StartDate
, Cast(DAT.EndDate as Date) EndDate
, DAT.Hours
, DAT.Mark
FROM (
VALUES (5, 5, '20170101', '20170131', 4, 2)
, (5, 5, '20170101', '20170131', 5, 3)
, (5, 5, '20170101', '20170131', 12, 4)
, (4, 1, '20170101', '20170131', 4, 2) --already exists
) DAT(CourseId, StudentId, StartDate, EndDate, Hours, Mark)
LEFT JOIN dbo.T_Allocations A
ON DAT.CourseId = A.CourseId
AND DAT.StudentId = A.Studentid
WHERE A.CourseId IS NULL
Starting with the 2008 version, a
MERGE statement was introduced in SQL Server, which allows adding further behavior within the same statement for INSERTS, UPDATES or DELETES. The above insert based on a MERGE becomes:
-- insert data via a merge
MERGE INTO dbo.T_Allocations A USING
(
SELECT DAT.CourseId
, DAT.StudentId
, Cast(DAT.StartDate as Date) StartDate
, Cast(DAT.EndDate as Date) EndDate
, DAT.Hours
, DAT.Mark
FROM (
VALUES (5, 5, '20170101', '20170131', 4, 2)
, (5, 5, '20170101', '20170131', 5, 3)
, (5, 5, '20170101', '20170131', 12, 4)
, (4, 1, '20170101', '20170131', 4, 2) --already exists
) DAT(CourseId, StudentId, StartDate, EndDate, Hours, Mark)
) DAT
ON DAT.CourseId = A.CourseId
AND DAT.StudentId = A.Studentid
WHEN NOT MATCHED THEN
INSERT (CourseId, StudentId, StartDate, EndDate, Hours, Mark)
VALUES (DAT.CourseId, DAT.StudentId, DAT.StartDate, DAT.EndDate, DAT.Hours, DAT.Mark);
These are the types of scripts I typically use to insert data into a database, and this independently of tables’ size. For more check Microsoft’s documentation on
INSERT. The preparation of the data to be imported can be easily done in Excel with a few macros.