Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 9-Feb-2024
[Microsoft Fabric] Data pipeline
- {def} a logical sequence of activities that orchestrate a process and perform together a task [1]
- usually by extracting data from one or more sources and loading it into a destination;
- ⇐ often transforming it along the way [1]
- ⇐ allows to manage the activities as a set instead of each one individually [2]
- ⇐ used to automate ETL processes that ingest transactional data from operational data stores into an analytical data store (e.g. lakehouse or data warehouse) [1]
- {concept} activity
- {def} an executable task in a pipeline
- a flow of activities can be defined by connecting them in a sequence [1]
- its outcome (success, failure, or completion) can be used to direct the flow to the next activity in the sequence [1]
- {type} data movement activities
- copies data from a source data store to a sink data store [2]
- {type} data transformation activities
- encapsulate data transfer operations
- incl. simple Copy Data activities that extract data from a source and load it to a destination
- incl. complex Data Flow activities that encapsulate dataflows (Gen2) that apply transformations to the data as it is transferred
- incl. notebook activities to run a Spark notebook
- incl. stored procedure activities to run SQL code
- incl. delete data activities to delete existing data
-
{type} control flow activities
- used to
- implement loops
- implement conditional branching
- manage variables
- manage parameter values
- enable to implement complex pipeline logic to orchestrate data ingestion and transformation flow [1]
- can be parameterized
- ⇐enabling to provide specific values to be used each time a pipeline is run [1]
- when executed, a run is initiated (aka data pipeline run
- runs can be initiated on-demand or scheduled to start at a specific frequency
- use the unique run ID to review run details to confirm they completed successfully and investigate the specific settings used for each execution [1]
- {benefit} increases pipelines’ reusability
- {concept} pipeline template
- predefined pipeline that can be used and customize as required
- {concept} data pipeline run
- occurs when a data pipeline is executed
- the activities in the data pipeline are executed to completion [3]
- can be triggered one of two ways
- on-demand
- on a schedule
- the scheduled pipeline will be able to run based on the time and frequency set [3]
Previous Post <<||>> Next Post
References:
[1] Microsoft Learn (2023) Use Data Factory pipelines
in Microsoft Fabric [link]
[2] Microsoft Learn (2024) Microsoft Fabric: Activity overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric Concept: Data
pipeline Runs [link]
Resources
[R1] Metadata Driven Pipelines for Microsoft Fabric (link)
[R2]
No comments:
Post a Comment