12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 11-Apr-2025

[Microsoft Fabric] Copy job in Data Factory 
  • {def} 
    • {benefit} simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation [1]
      • across cloud data stores [1]
      • from on-premises data stores behind a firewall [1]
      • within a virtual network via a gateway [1]
  • elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination [1]
  • {benefit} provides seamless data integration 
    • through over 100 built-in connectors [3]
    • provides essential tools for data operations [3]
  • {benefit} provides intuitive experience
    • easy configuration and monitoring [1]
  • {benefit} efficiency
    • enable incremental copying effortlessly, reducing manual intervention [1]
  • {benefit} less resource utilization and faster copy durations
    • flexibility to control data movement [1]
      • choose which tables and columns to copy
      • map the data
      • define read/write behavior
      • set schedules that fit requirements whether [1]
    • applies for a one-time or recurring jobs [1]
  • {benefit} robust performance
    • the serverless setup enables data transfer with large-scale parallelism
    • maximizes data movement throughput [1]
      • fully utilizes network bandwidth and data store IOPS for optimal performance [3]
  • {feature} monitoring
    • once a job executed, users can monitor its progress and metrics through either [1] 
      • the Copy job panel
        • shows data from the most recent runs [1]
      • reports several metrics
        • status
        • row read
        • row written
        • throughput
      • the Monitoring hub
        • acts as a centralized portal for reviewing runs across various items [4]
  • {mode} full copy
    • copies all data from the source to the destination at once
  • {mode|preview} incremental copy
    • the initial job run copies all data, and subsequent job runs only copy changes since the last run [1]
    • an incremental column must be selected for each table to identify changes [1]
      • used as a watermark
        • allows comparing its value with the same from last run in order to copy the new or updated data only [1]
        • the incremental column can be a timestamp or an increasing INT [1]
      • {scenario} copying from a database
        • new or updated rows will be captured and moved to the destination [1]
      • {scenario} copying from a storage store
        • new or updated files identified by their LastModifiedTime are captured and moved to the destination [1]
      • {scenario} copy data to storage store
        • new rows from the tables or files are copied to new files in the destination [1]
          • files with the same name are overwritten [1]
      • {scenario} copy data to database
        • new rows from the tables or files are appended to destination tables [1]
          • the update method to merge or overwrite [1]
  • {default} appends data to the destination [1]
    • the update method can be adjusted to 
      • {operation} merge
        • a key column must be provided
          • {default} the primary key is used, if available [1]
      • {operation} overwrite
  • availability 
    • the same regional availability as the pipeline [1]
  • billing meter
    • Data Movement, with an identical consumption rate [1]
  • {feature} robust Public API
    • {benefit} allows to automate and manage Copy Job efficiently [2]
  • {feature} Git Integration
    • {benefit} allows to leverage Git repositories in Azure DevOps or GitHub [2]
    • {benefit} allows to seamlessly deploy Copy Job with Fabric’s built-in CI/CD workflows [2]
  • {feature|preview} VNET gateway support
    • enables secure connections to data sources within virtual network or behind firewalls
      • Copy Job can be executed directly on the VNet data gateway, ensuring seamless and secure data movement [2]
  • {feature} Upsert to Azure SQL Database
  • {feature} overwrite to Fabric Lakehouse
  • {enhancement} column mapping for simple data modification to storage as destination store [2]
  • {enhancement} data preview to help select the right incremental column  [2]
  • {enhancement} search functionality to quickly find tables or columns  [2]
  • {enhancement} real-time monitoring with an in-progress view of running Copy Jobs  [2]
  • {enhancement} customizable update methods & schedules before job creation [2]

References:
[1] Microsoft Learn (2025) Fabric: What is the Copy job in Data Factory for Microsoft Fabric? [link]
[2] Microsoft Fabric Updates Blog (2025) Recap of Data Factory Announcements at Fabric Conference US 2025 [link]
[3] Microsoft Fabric Updates Blog (2025) Fabric: Announcing Public Preview: Copy Job in Microsoft Fabric [link]
[4] Microsoft Learn (2025) Fabric: Learn how to monitor a Copy job in Data Factory for Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]

Acronyms:
API - Application Programming Interfrace
CI/CD - Continuous Integration and Continuous Deployment
DevOps - Development & Operations
DF - Data Factory
IOPS - Input/Output Operations Per Second
VNet - Virtual Network

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.