Storage Event Triggers
Trigger jobs after objects are created or modified in a storage location
Creating a Storage Event Task
Storage Event Tasks are recurrent tasks that are triggered when objects are created or modified in a storage location.
To create a Storage Event task, subclass StorageEventTask
class and overwrite it’s execute
method just
as you would any other regular task.
Storage locations
Storage Event tasks are triggered when objects are created or modified in a storage location. This storage location can be a buckets in a cloud storage or a local file system. Tilebox supports the following storage locations:
Google Cloud Storage
Amazon S3
Local File System
Registering a storage location
In order for a storage location to be available within Tilebox workflows, it needs to be registered first. Registering not only consists of specifying the location, but also setting up a notification system that forwards events to Tilebox, such that tasks can be triggered. How this notifier is set up depends on the storage location type.
For example, a GCP storage bucket is integrated by setting up a PubSub Notification with a push subscription. A local file system requires the installation of a filesystem watcher. If you are interested in setting up a storage location registered with Tilebox, please get in touch.
Listing available storage locations
To list all available storage locations, you can use the all
method on the storage location client.
Reading files from a storage location
Once a storage location is registered, you can read files from it using the read
method on the storage client.
read
instantiates a client for the specific storage location. This of course requires that the storage location
is actually accessible by a task runner and may require credentials for a cloud storage, or physical/network
access to a locally mounted file system.
To set up authentication and enable access to a GCS storage bucket, check out the Google Client docs for authentication.
Registering a Storage Event Trigger
Once a Storage Event task is implemented, it can be registered to be triggered every time a storage event occurs in a storage location. This means, every time a new object is created or modified in a storage location, a new job gets be submitted, consisting of a single task instance derived from the Storage Event registered task prototype.
The syntax for specifying glob patterns are Standard Wildcards.
Additionally you can use **
as super-asterisk, a matching operator not sensitive to slash separators.
Below are some examples of valid glob patterns:
Pattern | Matches |
---|---|
*.ext | Any file ending in .ext in the root directory |
**/*.ext | Any file ending in .ext in any subdirectory, but not in the root directory |
**.ext | Any file ending in .ext in any subdirectory, including the root directory |
folder/* | Any file directly in a folder subdirectory |
folder/** | Any file directly or recursively part of a folder subdirectory |
[a-z].txt | Matches a.txt , b.txt , etc. |
Start a Task Runner capable of executing Storage Event Tasks
With the Storage Event task registered, a job gets submitted whenever the storage event occurs. But for the tasks to actually run, a task runner needs to be available capable of executing the Storage Event task. If no task runner is available, the submitted jobs remain in a task queue. Once a eligible task runner becomes available, all jobs in the queue are picked up and executed.
Triggering an event
Creating an object in the bucket that the task is registered on results in a job being submitted:
Inspecting the task runner output now, reveals that the job was submitted and the task executed:
Inspecting the registered Storage Event Task
The Tilebox Console offers a convenient way to inspect all the recurrent tasks that are registered to run on a schedule.
Once you’ve registered a storage event task, you can inspect it in the console. Here is an example of what it looks like:
Deleting Storage Event Triggers
To delete a registered storage event task, use recurrent_tasks.delete
. After deletion, no new jobs get submitted by
a storage event trigger. But jobs submitted in the past, which are still in the queue, remain in the queue and can still be
picked up by a task runner.
Submitting Storage Event jobs manually
To test storage event tasks, they can be submitted as regular tasks. This is useful for testing purposes, or if you
want to submit a storage event task as part of a larger workflow. To do so, it needs to be instantiated with a specific trigger
time using the once
method.
Submitting a job with a storage event task once immediately schedules the task, and a runner could pick
it up and execute it immediately. The storage location and object location specified in the once
method
only influence the self.trigger
attribute that the storage event task receives.
Was this page helpful?