Dagster & AWS S3
The AWS S3 integration allows data engineers to easily read, and write objects to the durable AWS S3 storage enabling engineers to a resilient storage layer when constructing their pipelines.
Installation
- uv
- pip
uv add dagster-aws
pip install dagster-aws
Examples
Here is an example of how to use the S3Resource in a Dagster job to interact with AWS S3:
import pandas as pd
from dagster_aws.s3 import S3Resource
import dagster as dg
@dg.asset
def my_s3_asset(s3: S3Resource):
    df = pd.DataFrame({"column1": [1, 2, 3], "column2": ["A", "B", "C"]})
    csv_data = df.to_csv(index=False)
    s3_client = s3.get_client()
    s3_client.put_object(
        Bucket="my-cool-bucket",
        Key="path/to/my_dataframe.csv",
        Body=csv_data,
    )
defs = dg.Definitions(
    assets=[my_s3_asset],
    resources={"s3": S3Resource(region_name="us-west-2")},
)
About AWS S3
AWS S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely tuned access controls to meet your specific business, organizational, and compliance requirements.