SQL Troubles: GitHub

Showing posts with label GitHub. Show all posts

12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 11-Apr-2025

[Microsoft Fabric] Copy job in Data Factory

{def}

{benefit} simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation [1]

across cloud data stores [1]
from on-premises data stores behind a firewall [1]
within a virtual network via a gateway [1]

elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination [1]
{benefit} provides seamless data integration

through over 100 built-in connectors [3]
provides essential tools for data operations [3]

{benefit} provides intuitive experience

easy configuration and monitoring [1]

{benefit} efficiency

enable incremental copying effortlessly, reducing manual intervention [1]

{benefit} less resource utilization and faster copy durations

flexibility to control data movement [1]

choose which tables and columns to copy
map the data
define read/write behavior
set schedules that fit requirements whether [1]

applies for a one-time or recurring jobs [1]

{benefit} robust performance

the serverless setup enables data transfer with large-scale parallelism
maximizes data movement throughput [1]

fully utilizes network bandwidth and data store IOPS for optimal performance [3]

{feature} monitoring

once a job executed, users can monitor its progress and metrics through either [1]

the Copy job panel

shows data from the most recent runs [1]

reports several metrics

status
row read
row written
throughput

the Monitoring hub

acts as a centralized portal for reviewing runs across various items [4]

{mode} full copy

copies all data from the source to the destination at once

{mode|GA} incremental copy

the initial job run copies all data, and subsequent job runs only copy changes since the last run [1]
an incremental column must be selected for each table to identify changes [1]

used as a watermark

allows comparing its value with the same from last run in order to copy the new or updated data only [1]
the incremental column can be a timestamp or an increasing INT [1]

{scenario} copying from a database

new or updated rows will be captured and moved to the destination [1]

{scenario} copying from a storage store

new or updated files identified by their LastModifiedTime are captured and moved to the destination [1]

{scenario} copy data to storage store

new rows from the tables or files are copied to new files in the destination [1]

files with the same name are overwritten [1]

{scenario} copy data to database

new rows from the tables or files are appended to destination tables [1]

the update method to merge or overwrite [1]

{default} appends data to the destination [1]

the update method can be adjusted to

{operation} merge

a key column must be provided

{default} the primary key is used, if available [1]

{operation} overwrite

availability

the same regional availability as the pipeline [1]

billing meter

Data Movement, with an identical consumption rate [1]

{feature} robust Public API

{benefit} allows to automate and manage Copy Job efficiently [2]

{feature} Git Integration

{benefit} allows to leverage Git repositories in Azure DevOps or GitHub [2]
{benefit} allows to seamlessly deploy Copy Job with Fabric’s built-in CI/CD workflows [2]

{feature|preview} VNET gateway support

enables secure connections to data sources within virtual network or behind firewalls

Copy Job can be executed directly on the VNet data gateway, ensuring seamless and secure data movement [2]

{feature} upsert to Azure SQL Database
{feature} overwrite to Fabric Lakehouse
{feature} [Jul-2025] native CDC

enables efficient and automated replication of changed data including inserted, updated and deleted records from a source to a destination [5]

ensures destination data stays up to date without manual effort

improves efficiency in data integration while reducing the load on source systems [5]

see Data Movement - Incremental Copy meter

consumption rate of 3 CU

{benefit} zero manual intervention

automatically captures incremental changes directly from the source [5]

{benefit} automatic replication

keeps destination data continuously synchronized with source changes [5]

{benefit} optimized performance

processes only changed data

reduces processing time and minimizing load on the source [5]

smarter incremental copy

automatically detects CDC-enabled source tables and allows to select either CDC-based or watermark-based incremental copy for each table [5]

applies to

CDC-enabled tables

CDC automatically captures and replicates actions on data

non-CDC-enabled tables

Copy Job detects changes by comparing an incremental column against the last run [5]

then merges or appends the changed data to the destination based on configuration [5]

supported connectors

⇐ applies to sources and destinations
Azure SQL DB [5]
On-premises SQL Server [5]
Azure SQL Managed Instance [5]

{enhancement} column mapping for simple data modification to storage as destination store [2]
{enhancement} data preview to help select the right incremental column [2]
{enhancement} search functionality to quickly find tables or columns [2]
{enhancement} real-time monitoring with an in-progress view of running Copy Jobs [2]
{enhancement} customizable update methods & schedules before job creation [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Fabric: What is the Copy job in Data Factory for Microsoft Fabric? [link]

[2] Microsoft Fabric Updates Blog (2025) Recap of Data Factory Announcements at Fabric Conference US 2025 [link]

[3] Microsoft Fabric Updates Blog (2025) Fabric: Announcing Public Preview: Copy Job in Microsoft Fabric [link]

[4] Microsoft Learn (2025) Fabric: Learn how to monitor a Copy job in Data Factory for Microsoft Fabric [link]
[5] Microsoft Fabric Updates Blog (2025) Fabric: Simplifying Data Ingestion with Copy job – Introducing Change Data Capture (CDC) Support (Preview) [link]
[6] Microsoft Learn (2025) Fabric: Change data capture (CDC) in Copy Job (Preview) [link]

[7] Microsoft Fabric Updates Blog (2025) Simplifying Data Ingestion with Copy job – Incremental Copy GA, Lakehouse Upserts, and New Connectors [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]
[R2] Microsoft Learn (2025) Microsoft Fabric decision guide: copy activity, Copy job, dataflow, Eventstream, or Spark [link]

Acronyms:

API - Application Programming Interface
CDC - Change Data Capture
CI/CD - Continuous Integration and Continuous Deployment

CU - Capacity Unit

DevOps - Development & Operations

DF - Data Factory
IOPS - Input/Output Operations Per Second

VNet - Virtual Network

09 April 2024

💎🤖SQL Reloaded: Copilot Stories (Part I: Copilot in Azure SQL Database is in Private Preview) 🆕

Following the news on Copilot and its adoption in the various areas, probably you asked yourself when it will become available in SQL Server. The good news is that the Copilot in Azure SQL Database is already in Private Preview (see [1], [2]). The bad news is that for the early adoption program, Microsoft looks only for customers that can provide feedback based on real-world use cases, and one will need a valid Tenant ID for signing up for the PP.

The new feature comes with two Azure portal experiences:

Natural language to SQL: within the Azure portal query editor the natural language prompts are translated into SQL;
Azure Copilot integration: provides customers with self-guided assistance for managing databases and troubleshooting.

In the Data Exposed webcast with Anna Hoffman [1], the invitee, Joe Sack, showcases the use of both experiences. In what concerns the first experience, besides simple joins, Joe also showcased the use of a PIVOT based on the prompt:

"Show me a pivot summary table that displays the total number of properties sold in each year from 2020 to 2023".

-- output from Copilot
SELECT *
FROM 
(
	SELECT YEAR(SaleDate) AS Year, COUNT(*) AS TotalPropertiesSold
	FROM [PropertySearch].[dbo].[PropertyHistory]
	WHERE YEAR(SaleDate) BETWEEN 2020 AND 2023
	GROUP BY YEAR(SalesDate)
) AS PivotDate
PIVOT
(
	SUM(TotalPropertiesSold)
	FOR Year IN ([2020], [2021], [2022], [2023]
) AS PivotTable;

Even if this is a modest example, the great thing is that it generates the skeleton on which the logic can be built, which can really help beginners, though it can be also useful for more seasoned data professionals as it saves time. Probably, in a first phase this will be the main benefit of Copilot - to generate skeletons or templates on which further logic can be built. Hopefully in time it will do much more than that.

I wonder how complex the prompts can become and how can the logic be structured to create a multistep scenario. The Copilot versions from other areas showed that complex prompts give results, the question is whether Copilot can devise the steps in an optimum manner, much like a seasoned data professional does.

The feature utilizes for the moment the table and view names, column names, primary key, and foreign key metadata to generate T-SQL code. Probably, it makes sense to also use index and statistics information, query hints and all the arsenal usually needed by data professionals to optimize a query. Conversely, maybe the second experience could be used for optimizing and troubleshooting the query.

I'd really love to test this feature, though probably I'll need to wait until it becomes publicly available. In the meanwhile, one can play with the GitHub Copilot [3] or install Copilot in Azure Data Studio [4].

|>> Next Post

References:
[1] Data Exposed (2024) Introducing Copilot in Azure SQL Database (Private Preview) (link)
[2] Azure SQL Blog (2024) Microsoft Copilot in Azure extends capabilities to Azure SQL Database (Private Preview) by Joe Sack (link)
[3] Azure SQL Blog (2023) GitHub Copilot for SQL Developers: Turbocharge your SQL Development, by Subhojit Basak (link)
[4] Microsoft Developer (2023) Copilot is now in Azure Data Studio and this is how it can help you! (link)

SQL Troubles

Pages

12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

09 April 2024

💎🤖SQL Reloaded: Copilot Stories (Part I: Copilot in Azure SQL Database is in Private Preview) 🆕

About Me