SQL Troubles: 🏭🗒️Microsoft Fabric: Data Pipelines [Notes]

09 February 2025

🏭🗒️Microsoft Fabric: Data Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Feb-2024

[Microsoft Fabric] Data pipeline

{def} a logical sequence of activities that orchestrate a process and perform together a task [1]

usually by extracting data from one or more sources and loading it into a destination;

⇐ often transforming it along the way [1]

⇐ allows to manage the activities as a set instead of each one individually [2]
⇐ used to automate ETL processes that ingest transactional data from operational data stores into an analytical data store [1]
e.g. lakehouse or data warehouse

{concept} activity

{def} an executable task in a pipeline

a flow of activities can be defined by connecting them in a sequence [1]
its outcome (success, failure, or completion) can be used to direct the flow to the next activity in the sequence [1]

{type} data movement activities

copies data from a source data store to a sink data store [2]

{type} data transformation activities

encapsulate data transfer operations

incl. simple Copy Data activities that extract data from a source and load it to a destination
incl. complex Data Flow activities that encapsulate dataflows (Gen2) that apply transformations to the data as it is transferred
incl. notebook activities to run a Spark notebook
incl. stored procedure activities to run SQL code
incl. delete data activities to delete existing data

{type} control flow activities

used to

implement loops
implement conditional branching
manage variables
manage parameter values
enable to implement complex pipeline logic to orchestrate data ingestion and transformation flow [1]

can be parameterized

⇐enabling to provide specific values to be used each time a pipeline is run [1]

when executed, a run is initiated (aka data pipeline run

runs can be initiated on-demand or scheduled to start at a specific frequency
use the unique run ID to review run details to confirm they completed successfully and investigate the specific settings used for each execution [1]

{benefit} increases pipelines’ reusability

{concept} pipeline template

predefined pipeline that can be used and customize as required

{concept} data pipeline run

occurs when a data pipeline is executed
the activities in the data pipeline are executed to completion [3]
can be triggered one of two ways

on-demand
on a schedule

the scheduled pipeline will be able to run based on the time and frequency set [3]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2023) Use Data Factory pipelines in Microsoft Fabric [link]
[2] Microsoft Learn (2024) Microsoft Fabric: Activity overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric Concept: Data pipeline Runs [link]

Resources
[R1] Metadata Driven Pipelines for Microsoft Fabric (link)
[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

SQL Troubles

Pages

09 February 2025

🏭🗒️Microsoft Fabric: Data Pipelines [Notes]

No comments:

About Me