> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tilebox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Storage Event Triggers

> Trigger workflow jobs automatically whenever new objects are created or modified in a storage bucket, enabling event-driven data processing pipelines.

<Note>
  This feature is only available in the Python SDK.
</Note>

## Creating a Storage Event Task

Storage Event Tasks are automations triggered when objects are created or modified in a [storage location](#storage-locations).
To create a Storage Event task, use `tilebox.workflows.automations.StorageEventTask` as your tasks base class instead of the regular `tilebox.workflows.Task`.

```python Python theme={"system"}
from tilebox.workflows import ExecutionContext
from tilebox.workflows.automations import StorageEventTask, StorageEventType

class LogObjectCreation(StorageEventTask):
    head_bytes: int

    def execute(self, context: ExecutionContext) -> None:
        # self.trigger is an attribute of the StorageEventTask class,
        # which contains information about the storage event that triggered this task
        if self.trigger.type == StorageEventType.CREATED:
            path = self.trigger.location
            context.logger.info("A new object was created", path=path)

            # trigger.storage is a storage client for interacting with the storage
            # location that triggered the event
            # using it, we can read the object to get its content as a bytes object
            data = self.trigger.storage.read(path)
            context.logger.info("Read object data", path=path, size_bytes=len(data))
            context.logger.info("Object preview", path=path, preview=data[:self.head_bytes].hex())
```

## Storage Locations

Storage Event tasks are triggered when objects are created or modified in a storage location. This location can be a cloud storage bucket or a local file system. Tilebox supports the following storage locations:

<Columns cols={2}>
  <Card title="Google Cloud Storage" icon="google" horizontal />

  <Card title="Amazon S3" icon="aws" horizontal />

  <Card title="Local File System" icon="folder-open" horizontal />
</Columns>

### Registering a Storage Location

To make a storage location available within Tilebox workflows, it must be registered first. This involves specifying the location and setting up a notification system that forwards events to Tilebox, enabling task triggering. The setup varies depending on the storage location type.

For example, a GCP storage bucket is integrated by setting up a [PubSub Notification with a push subscription](https://cloud.google.com/storage/docs/pubsub-notifications). A local file system requires installing a filesystem watcher. To set up a storage location registered with Tilebox, please [get in touch](mailto:engineering@tilebox.com).

### Listing Available Storage Locations

To list all available storage locations, use the `all` method on the storage location client.

```python Python theme={"system"}
from tilebox.workflows import Client

client = Client()
automations_client = client.automations()

storage_locations = automations_client.storage_locations()
print(storage_locations)
```

```plaintext Output theme={"system"}
[
    StorageLocation(
        location="gcp-project:gcs-bucket-fab3fa2",
        type=StorageType.GCS,
    ),
    StorageLocation(
        location="s3-bucket-263af15",
        type=StorageType.S3,
    ),
    StorageLocation(
        location='/path/to/a/local/folder',
        type=StorageType.FS,
    ),
]
```

### Reading Files from a Storage Location

Once a storage location is registered, you can read files from it using the `read` method on the storage client.

```python Python theme={"system"}
gcs_bucket = storage_locations[0]
s3_bucket = storage_locations[1]
local_folder = storage_locations[2]

gcs_object = gcs_bucket.read("my-object.txt")
s3_object = s3_bucket.read("my-object.txt")
local_object = local_folder.read("my-object.txt")
```

<Note>
  The `read` method instantiates a client for the specific storage location. This requires that
  the storage location is accessible by a task runner and may require credentials for cloud storage
  or physical/network access to a locally mounted file system.
</Note>

<Tip>
  To set up authentication and enable access to a GCS storage bucket, check out the [Google Client docs for authentication](https://cloud.google.com/docs/authentication/client-libraries#python).
</Tip>

## Registering a Storage Event Trigger

After implementing a Storage Event task, register it to trigger each time a storage event occurs. This registration submits a new job consisting of a single task instance derived from the registered Storage Event task prototype.

```python Python theme={"system"}
from tilebox.workflows import Client

client = Client()
automations = client.automations()
storage_event_automation = automations.create_storage_event_automation(
    "log-object-creations",  # name of the storage event automation
    LogObjectCreation(head_bytes=20),  # the task (and its input parameters) to run repeatedly
    triggers=[
        # you can specify a glob pattern:
        # run every time a .txt file is created anywhere in the gcs bucket
        (gcs_bucket, "**.txt"),  
    ],
)
```

<Tip>
  The syntax for specifying glob patterns follows [Standard Wildcards](https://tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm).
  Additionally, you can use `**` as a super-asterisk, a matching operator not sensitive to slash separators.
</Tip>

Here are some examples of valid glob patterns:

| Pattern     | Matches                                                                      |
| ----------- | ---------------------------------------------------------------------------- |
| `*.ext`     | Any file ending in `.ext` in the root directory                              |
| `**/*.ext`  | Any file ending in `.ext` in any subdirectory, but not in the root directory |
| `**.ext`    | Any file ending in `.ext` in any subdirectory, including the root directory  |
| `folder/*`  | Any file directly in a `folder` subdirectory                                 |
| `folder/**` | Any file directly or recursively part of a `folder` subdirectory             |
| `[a-z].txt` | Matches `a.txt`, `b.txt`, etc.                                               |

## Start a Storage Event Task Runner

With the Storage Event automation registered, a job is submitted whenever a storage event occurs. But unless a [task runner](/workflows/concepts/task-runners) is available to execute the Storage Event task the submitted jobs remain in a task queue.
Once an [eligible task runner](/workflows/concepts/task-runners#task-selection) becomes available, all jobs in the queue are executed.

```python Python theme={"system"}
from tilebox.workflows import Client

client = Client()
runner = client.runner(tasks=[LogObjectCreation])
runner.run_forever()
```

### Triggering an Event

Creating an object in the bucket where the task is registered results in a job being submitted:

```bash Creating an object theme={"system"}
echo "Hello World" > my-object.txt
gcloud storage cp my-object.txt gs://gcs-bucket-fab3fa2
```

Inspecting the task runner output reveals that the job was submitted and the task executed:

```plaintext Output theme={"system"}
2024-09-25 16:51:45,621 INFO A new object was created: my-object.txt
2024-09-25 16:51:45,857 INFO The object's file size is 12 bytes
2024-09-25 16:51:45,858 INFO The object's first 20 bytes are: b'Hello World\n'
```

## Inspecting in the Console

The [Tilebox Console](https://console.tilebox.com/workflows/automations) provides an easy way to inspect all registered storage event automations.

<Frame>
  <img src="https://mintcdn.com/tilebox/TYquvc9froFIydg1/assets/console/storage-event-light.png?fit=max&auto=format&n=TYquvc9froFIydg1&q=85&s=046bcf2ef477c366e1b0e92fdfd7c042" alt="Tilebox Workflows automations in the Tilebox Console" className="dark:hidden" width="1536" height="970" data-path="assets/console/storage-event-light.png" />

  <img src="https://mintcdn.com/tilebox/TYquvc9froFIydg1/assets/console/storage-event-dark.png?fit=max&auto=format&n=TYquvc9froFIydg1&q=85&s=dce1752b7e8f8f47d99065e9852272ef" alt="Tilebox Workflows automations in the Tilebox Console" className="hidden dark:block" width="1536" height="970" data-path="assets/console/storage-event-dark.png" />
</Frame>

## Deleting Storage Event automations

To delete a registered storage event automation, use `automations.delete`. After deletion, no new jobs will be submitted by the storage event trigger. Past jobs already triggered will still remain queued.

```python Python theme={"system"}
from tilebox.workflows import Client

client = Client()
automations = client.automations()

# delete the automation as returned by create_storage_event_automation
automations.delete(storage_event_automation)

# or manually by id:
automations.delete("0190bafc-b3b8-88c4-008b-a5db044380d0")
```

## Submitting Storage Event jobs manually

You can submit Storage event tasks as regular tasks for testing purposes or as part of a larger workflow. To do so, instantiate the task with a specific storage location and object name using the `once` method.

```python Python theme={"system"}
job_client = client.jobs()

task = LogObjectCreation(head_bytes=20)

# submitting it directly won't work; raises ValueError:
# job_client.submit("manual-storage-event-job", task)

# instead, we specify a trigger condition, and submit a job manually
job_client.submit(
    "manual-storage-event-job",
    # simulate an event that occurred in the gcs bucket for the object "my-object.txt"
    task.once(gcs_bucket, "my-object.txt"),
)
```
