Sharing data between tasks is crucial for workflows, especially in satellite imagery processing, where large datasets are the norm. Tilebox Workflows offers a straightforward API for storing and retrieving data from a shared cache.
Cache
interface.
cache
parameter. To use an in-memory cache, use tilebox.workflows.cache.InMemoryCache
. This implementation is helpful for local development and quick testing. For alternatives, see the supported cache backends.
context
object that is passed to the execution of each task gains access to a job_cache
attribute that can be used to store and retrieve data from the cache.
prefix
parameter is optional and can be used to set a common prefix for all cache keys, which helps organize objects within a bucket when re-using the same bucket for other purposes.boto3
library to communicate with Amazon S3. For the necessary authentication setup, refer to its documentation.
prefix
parameter is optional and can be used to set a common prefix for all cache keys, which helps organize objects within a bucket when re-using the same bucket for other purposes.ExecutionContext
passed to a tasks execute
function. This job_cache
object provides methods to handle data storage and retrieval from the cache. The specifics of data storage depend on the chosen cache backend.
The cache API is designed to be simple and can handle all types of data, supporting binary data in the form of bytes
, identified by str
cache keys. This allows for storing many different data types, such as pickled Python objects, serialized JSON, UTF-8, or binary data.
The following snippet illustrates storing and retrieving data from the cache.
"data"
can be any size that fits the cache backend constraints. Ensure the key remains unique within the job’s scope to avoid conflicts.
To test the workflow, you can start a local task runner using the InMemoryCache
backend. Then, submit a job to execute the ProducerTask
and observe the output of the ConsumerTask
.
CacheGroupDemo
and running it with a task runner can be done as follows: