Concepts
Dagster provides a variety of abstractions for building and orchestrating data pipelines. These concepts enable a modular, declarative approach to data engineering, making it easier to manage dependencies, monitor execution, and ensure data quality.
Asset
An asset represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster entities to facilitate certain tasks. When you define an asset, either with the @dg.asset decorator or via a component, the definition is automatically added to a top-level Definitions object.
| Concept | Relationship | 
|---|---|
| asset check | assetmay use anasset check | 
| asset spec | assetis described by anasset spec | 
| component | assetmay be programmatically built by a component | 
| config | assetmay use aconfig | 
| definitions | assetis added to a top-levelDefinitionsobject to be deployed | 
| io manager | assetmay use aio manager | 
| partition | assetmay use apartition | 
| resource | assetmay use aresource | 
| job | assetmay be used in ajob | 
| schedule | assetmay be used in aschedule | 
| sensor | assetmay be used in asensor | 
Asset check
An asset_check is associated with an asset to ensure it meets certain expectations around data quality, freshness or completeness. Asset checks run when the asset is executed and store metadata about the related run and if all the conditions of the check were met.
| Concept | Relationship | 
|---|---|
| asset | asset checkmay be used by anasset | 
| definitions | asset checkis added to a top-levelDefinitionsobject to be deployed | 
Asset spec
Specs are standalone objects that describe the identity and metadata of Dagster entities without defining their behavior. For example, an AssetSpec contains essential information like the asset's key (its unique identifier) and tags (labels for organizing and annotating the asset), but it doesn't include the logic for materializing that asset.
| Concept | Relationship | 
|---|---|
| asset | asset specmay describe the identity and metadata of anasset | 
Code location
A code location is a collection of Dagster entity definitions deployed in a specific environment. A code location determines the Python environment (including the version of Dagster being used as well as any other Python dependencies). A Dagster project can have multiple code locations, helping isolate dependencies.
| Concept | Relationship | 
|---|---|
| definitions | code locationmust contain at least one top-levelDefinitionsobject | 
Component
Components are objects that programmatically build assets and other Dagster entity definitions, such as asset_checks, schedules, resources, and sensors. They accept schematized configuration parameters (which are specified using YAML or lightweight Python) and use them to build the actual definitions you need. Components are designed to help you quickly bootstrap parts of your Dagster project and serve as templates for repeatable patterns.
| Concept | Relationship | 
|---|---|
| asset | componentbuildsassetsand otherdefinitions | 
| asset check | componentbuildsasset_checksand otherdefinitions | 
| definitions | componentbuildsassetsand otherdefinitions | 
| job | componentbuildsjobsand otherdefinitions | 
| schedule | componentbuildsschedulesand otherdefinitions | 
| sensor | componentbuildssensorsand otherdefinitions | 
| resource | componentbuildsresourcesand otherdefinitions | 
Config
A config is used to specify config schema for assets, jobs, schedules, and sensors. A RunConfig is a container for all the configuration that can be passed to a run. This allows for parameterization and the reuse of pipelines to serve multiple purposes.
| Concept | Relationship | 
|---|---|
| asset | configmay be used by anasset | 
| job | configmay be used by ajob | 
| schedule | configmay be used by aschedule | 
| sensor | configmay be used by asensor | 
Definitions
In Dagster, "definitions" means two things:
- The objects that combine metadata about Dagster entities with Python functions that define how they behave, for example, asset,ScheduleDefinition, and resource definitions.
- The top-level Definitionsobject that contains references to all the definitions in a Dagster project. Entities included in theDefinitionsobject will be deployed and visible within the Dagster UI.
| Concept | Relationship | 
|---|---|
| asset | Top-level Definitionsobject may contain one or moreassetdefinitions | 
| asset check | Top-level Definitionsobject may contain one or moreasset checkdefinitions | 
| io manager | Top-level Definitionsobject may contain one or moreio managerdefinitions | 
| job | Top-level Definitionsobject may contain one or morejobdefinitions | 
| resource | Top-level Definitionsobject may contain one or moreresourcedefinitions | 
| schedule | Top-level Definitionsobject may contain one or morescheduledefinitions | 
| sensor | Top-level Definitionsobject may contain one or moresensordefinitions | 
| component | definitionmay be the output of acomponent | 
| code location | definitionsmust be deployed in acode location | 
Graph
A GraphDefinition connects multiple ops together to form a DAG. If you are using assets, you will not need to use graphs directly.
| Concept | Relationship | 
|---|---|
| config | graphmay use aconfig | 
| op | graphmust include one or moreops | 
| job | graphmust be part ofjobto execute | 
IO manager
An IOManager defines how data is stored and retrieved between the execution of assets and ops. This allows for a customizable storage and format at any interaction in a pipeline.
| Concept | Relationship | 
|---|---|
| asset | io managermay be used by anasset | 
| definitions | io manageris added to a top-levelDefinitionsobject to be deployed | 
Job
A job is a subset of assets or the GraphDefinition of ops. Jobs are the main form of execution in Dagster.
| Concept | Relationship | 
|---|---|
| asset | jobmay contain a selection ofassets | 
| config | jobmay use aconfig | 
| graph | jobmay contain agraph | 
| schedule | jobmay be used by aschedule | 
| sensor | jobmay be used by asensor | 
| definitions | jobis added to a top-levelDefinitionsobject to be deployed | 
Op
An op is a computational unit of work. Ops are arranged into a GraphDefinition to dictate their order. Ops have largely been replaced by assets.
| Concept | Relationship | 
|---|---|
| type | opmay use atype | 
| graph | opmust be contained ingraphto execute | 
Partition
A PartitionsDefinition represents a logical slice of a dataset or computation mapped to a certain segments (such as increments of time). Partitions enable incremental processing, making workflows more efficient by only running on relevant subsets of data.
| Concept | Relationship | 
|---|---|
| asset | partitionmay be used by anasset | 
Resource
A ResourceDefinition is a way to make external resources (like database or API connections) available to Dagster entities (like assets, schedules, or sensors) during job execution, and to clean up after execution resolves. A ConfigurableResource is a resource that uses structured configuration. For more information, see Configuring resources.
| Concept | Relationship | 
|---|---|
| asset | resourcemay be used by anasset | 
| schedule | resourcemay be used by aschedule | 
| sensor | resourcemay be used by asensor | 
| definitions | resourceis added to a top-levelDefinitionsobject to be deployed | 
Type
A type is a way to define and validate the data passed between ops.
| Concept | Relationship | 
|---|---|
| op | typemay be used by anop | 
Schedule
A ScheduleDefinition is a way to automate jobs or assets to occur on a specified interval. In the cases that a job or asset is parameterized, the schedule can also be set with a run configuration (RunConfig) to match.
| Concept | Relationship | 
|---|---|
| asset | schedulemay include ajobor selection ofassets | 
| config | schedulemay include aconfigif thejoborassetsinclude aconfig | 
| job | schedulemay include ajobor selection ofassets | 
| definitions | scheduleis added to a top-levelDefinitionsobject to be deployed | 
Sensor
A sensor is a way to trigger jobs or assets when an event occurs, such as a file being uploaded or a push notification. In the cases that a job or asset is parameterized, the sensor can also be set with a run configuration (RunConfig) to match.
| Concept | Relationship | 
|---|---|
| asset | sensormay include ajobor selection ofassets | 
| config | sensormay include aconfigif thejoborassetsinclude aconfig | 
| job | sensormay include ajobor selection ofassets | 
| definitions | sensoris added to a top-levelDefinitionsobject to be deployed |