Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 11-Apr-2025
[Microsoft Fabric] Copy job in Data Factory- {def}
- {benefit} simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation [1]
- across cloud data stores [1]
- from on-premises data stores behind a firewall [1]
- within a virtual network via a gateway [1]
- elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination [1]
- {benefit} provides seamless data integration
- through over 100 built-in connectors [3]
- provides essential tools for data operations [3]
- {benefit} provides intuitive experience
- easy configuration and monitoring [1]
- {benefit} efficiency
- enable incremental copying effortlessly, reducing manual intervention [1]
- {benefit} less resource utilization and faster copy durations
- flexibility to control data movement [1]
- choose which tables and columns to copy
- map the data
- define read/write behavior
- set schedules that fit requirements whether [1]
- applies for a one-time or recurring jobs [1]
- {benefit} robust performance
- the serverless setup enables data transfer with large-scale parallelism
- maximizes data movement throughput [1]
- fully utilizes network bandwidth and data store IOPS for optimal performance [3]
- {feature} monitoring
- once a job executed, users can monitor its progress and metrics through either [1]
- the Copy job panel
- shows data from the most recent runs [1]
- reports several metrics
- status
- row read
- row written
- throughput
- the Monitoring hub
- acts as a centralized portal for reviewing runs across various items [4]
- {mode} full copy
- copies all data from the source to the destination at once
- {mode|preview} incremental copy
- the initial job run copies all data, and subsequent job runs only copy changes since the last run [1]
- an incremental column must be selected for each table to identify changes [1]
- used as a watermark
- allows comparing its value with the same from last run in order to copy the new or updated data only [1]
- the incremental column can be a timestamp or an increasing INT [1]
- {scenario} copying from a database
- new or updated rows will be captured and moved to the destination [1]
- {scenario} copying from a storage store
- new or updated files identified by their LastModifiedTime are captured and moved to the destination [1]
- {scenario} copy data to storage store
- new rows from the tables or files are copied to new files in the destination [1]
- files with the same name are overwritten [1]
- {scenario} copy data to database
- new rows from the tables or files are appended to destination tables [1]
- the update method to merge or overwrite [1]
- {default} appends data to the destination [1]
- the update method can be adjusted to
- {operation} merge
- a key column must be provided
- {default} the primary key is used, if available [1]
- {operation} overwrite
- availability
- the same regional availability as the pipeline [1]
- billing meter
- Data Movement, with an identical consumption rate [1]
- {feature} robust Public API
- {benefit} allows to automate and manage Copy Job efficiently [2]
- {feature} Git Integration
- {benefit} allows to leverage Git repositories in Azure DevOps or GitHub [2]
- {benefit} allows to seamlessly deploy Copy Job with Fabric’s built-in CI/CD workflows [2]
- {feature|preview} VNET gateway support
- enables secure connections to data sources within virtual network or behind firewalls
- Copy Job can be executed directly on the VNet data gateway, ensuring seamless and secure data movement [2]
- {feature} Upsert to Azure SQL Database
- {feature} overwrite to Fabric Lakehouse
- {enhancement} column mapping for simple data modification to storage as destination store [2]
- {enhancement} data preview to help select the right incremental column [2]
- {enhancement} search functionality to quickly find tables or columns [2]
- {enhancement} real-time monitoring with an in-progress view of running Copy Jobs [2]
- {enhancement} customizable update methods & schedules before job creation [2]
References:
[1] Microsoft Learn (2025) Fabric: What is the Copy job in Data Factory for
Microsoft Fabric? [link]
[2] Microsoft Fabric Updates Blog (2025) Recap of Data Factory Announcements
at Fabric Conference US 2025 [link]
[3] Microsoft Fabric Updates Blog (2025) Fabric: Announcing Public
Preview: Copy Job in Microsoft Fabric [link]
[4] Microsoft Learn (2025) Fabric: Learn how to monitor a Copy job in
Data Factory for Microsoft Fabric [link]
Resources:
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]
Acronyms:
API - Application Programming Interfrace
CI/CD - Continuous Integration and Continuous Deployment
CI/CD - Continuous Integration and Continuous Deployment
DevOps - Development & Operations
DF - Data Factory
IOPS - Input/Output Operations Per Second
IOPS - Input/Output Operations Per Second
VNet - Virtual Network