A job is one execution of a workflow, starting from a root task with concrete input values. As the root task runs, it can submit subtasks, creating the task graph that belongs to the same job.
When you submit a job, its root task is assigned to a cluster. Compatible runners execute tasks as they become eligible, and Tilebox updates job state from submission through completion, failure, cancellation, or retry.
Submission
To execute a task, it must be initialized with concrete inputs and submitted as a job. The task will then run within the context of the job, and if it generates sub-tasks, those will also execute as part of the same job.
After submitting a job, the root task is scheduled for execution, and any eligible runner can pick it up and execute it.
First, instantiate a job client by calling the jobs method on the workflow client.
from tilebox.workflows import Client
client = Client()
job_client = client.jobs()
After obtaining a job client, submit a job using the submit method. You need to provide a name for the job, an instance of the root task, and an optional cluster to execute the root task on.
# import your own workflow
from my_workflow import MyTask
job = job_client.submit('my-job', MyTask("some", "parameters"))
Once a job is submitted, it’s immediately scheduled for execution. The root task will be picked up and executed as soon as an eligible runner is available.
Retry Handling
Tasks support retry handling for failed executions. This applies to the root task of a job as well, where you can specify the number of retries using the max_retries argument of the submit method.
from my_workflow import MyFlakyTask
job = job_client.submit('my-job', MyFlakyTask(), max_retries=5)
In this example, if MyFlakyTask fails, it will be retried up to five times before being marked as failed.
Submitting to a specific cluster
Jobs default to running on the default cluster.
You can specify another cluster to run the root task on using the cluster argument of the submit method.
from my_workflow import MyFlakyTask
job = job_client.submit('my-job', MyFlakyTask(), cluster="dev-cluster")
Only runners listening on the specified cluster can pick up the task.
Querying jobs
You can query jobs in a given time range using the query method on the job client.
jobs = job_client.query(("2025-01-01", "2025-02-01"))
print(jobs)
Retrieving a specific job
When you submit a job, it’s assigned a unique identifier that can be used to retrieve it later.
You can use the find method on the job client to get a job by its ID.
job = job_client.submit('my-job', MyTask("some", "parameters"))
print(job.id) # 018dd029-58ca-74e5-8b58-b4f99d610f9a
# Later, in another process or machine, retrieve job info
job = job_client.find("018dd029-58ca-74e5-8b58-b4f99d610f9a")
find is also a useful tool for fetching a jobs state after a while, to check if it’s still running or has already completed.
In interactive environments such as Jupyter notebooks, the job object also provides a rich display of the job’s state and progress, if it’s used as the last expression in a cell.
States
Every Job is always in exactly one of the following states:


Submitted
The Job hasn’t started yet, all it’s tasks are queued and it wasn’t canceled by the user.


Running
At least one task of the job is currently running.


Started
The job has started, some tasks are already
COMPUTED, but others are still
QUEUED, waiting for an
eligible runner to pick them up. However no task is currently
RUNNING.


Completed
The job has successfully completed. Every task of the job succeeded and is COMPUTED.


Failed
At least one task of the job has failed, causing the execution of the remaining tasks to be halted. You can
retry the job to resume execution from the point of failure.


Canceled
The job was canceled upon user request. You can
retry the job to resume execution from the point of cancellation.
The state of a job is determined by the states of all it’s tasks. For a list of possible task states, see the task state documentation.
You can programmatically check the state of a job by inspecting it’s state field.
from tilebox.workflows.data import JobState
job = job_client.find("018dd029-58ca-74e5-8b58-b4f99d610f9a")
print("Job is running:", job.state == JobState.RUNNING)
Visualization
Visualizing the execution of a job can be helpful. The Tilebox workflow orchestrator tracks all tasks in a job, including sub-tasks and dependencies. This enables the visualization of the execution of a job as a graph diagram.
display is designed for use in an interactive environment such as a Jupyter notebook. In non-interactive environments, use visualize, which returns the rendered diagram as an SVG string.
Visualization isn’t supported in Go yet.
job = job_client.find("some-job-id") # or a recently submitted job
# Then visualize it
job_client.display(job)
The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the state of each task.
Below is another visualization of a job currently being executed by multiple runners.
From the diagram, the following can be inferred:
- The root task,
MyTask, has been executed, is marked as COMPUTED and submitted three sub-tasks.
- At least three runners are available, as three tasks currently are executed simultaneously.
- The
SubTask that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed.
- The queued
DependentTask requires the LeafTask to complete before it can be executed.
Job visualizations are meant for development and debugging. They are not suitable for large jobs with hundreds of tasks, as the diagrams may become too complex. Currently, visualizations are limited to jobs with a maximum of 200 tasks.
Customizing Task Display Names
The text representing a task in the diagram defaults to a tasks class name. You can customize this by modifying the display field of the current_task object in the task’s execution context. The maximum length for a display name is 1024 characters, with any overflow truncated. Line breaks using \n are supported as well.
from tilebox.workflows import Task, ExecutionContext
class RootTask(Task):
num_subtasks: int
def execute(self, context: ExecutionContext):
context.current_task.display = f"Root({self.num_subtasks})"
for i in range(self.num_subtasks):
context.submit_subtask(SubTask(i))
class SubTask(Task):
index: int
def execute(self, context: ExecutionContext):
context.current_task.display = f"Leaf Nr. {self.index}"
job = job_client.submit('custom-display-names', RootTask(3))
job_client.display(job)
Cancellation
You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by runners and executed even if runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by runners.
Use the cancel method on the job client to cancel a job.
job = job_client.submit('my-job', MyTask())
# After a short while, the job gets canceled
job_client.cancel(job)
A canceled job can be resumed at any time by retrying it.
If any task in a job fails, the job is automatically canceled to avoid executing irrelevant tasks. Future releases will allow configuring this behavior for each task to meet specific requirements.
Retries
If a task fails due to a bug or lack of resources, there is no need to resubmit the entire job. You can simply retry the job, and it will resume from the point of failure. This ensures that all the work that was already done up until the point of the failure isn’t lost.
Future releases may introduce automatic retries for certain failure conditions, which can be useful for handling temporary issues.
Below is an example of a failing job due to a bug in the task’s implementation. The following workflow processes a list of movie titles and queries the OMDb API for each movie’s release date.
from urllib.parse import urlencode
import httpx
from tilebox.workflows import Task, ExecutionContext
class MoviesStats(Task):
titles: list[str]
def execute(self, context: ExecutionContext) -> None:
for title in self.titles:
context.submit_subtask(PrintMovieStats(title))
class PrintMovieStats(Task):
title: str
def execute(self, context: ExecutionContext) -> None:
params = {"t": self.title, "apikey": "<OMDB API Key>"}
url = "http://www.omdbapi.com/?" + urlencode(params)
with context.tracer.span("fetch-movie-stats") as span:
span.set_attribute("movie.title", self.title)
response = httpx.get(url).json()
# set the display name of the task to the title of the movie:
context.current_task.display = response["Title"]
context.logger.info(
"Movie release date fetched",
title=response["Title"],
released=response["Released"],
)
Submitting the workflow as a job reveals a bug in the PrintMovieStats task.
job = job_client.submit('movies-stats', MoviesStats([
"The Matrix",
"Shrek 2",
"Tilebox - The Movie",
"The Avengers",
]))
job_client.display(job)
One of the PrintMovieStats tasks fails with a KeyError. This error occurs when a movie title is not found by the OMDb API, leading to a response without the Title and Released fields.
Task logs from the runners confirm this:
Movie release date fetched title="The Matrix" released="31 Mar 1999"
Movie release date fetched title="Shrek 2" released="19 May 2004"
ERROR: Task PrintMovieStats failed with exception: KeyError('Title')
The corrected version of PrintMovieStats is as follows:
class PrintMovieStats(Task):
title: str
def execute(self, context: ExecutionContext) -> None:
params = {"t": self.title, "apikey": "<OMDB API Key>"}
url = "http://www.omdbapi.com/?" + urlencode(params)
with context.tracer.span("fetch-movie-stats") as span:
span.set_attribute("movie.title", self.title)
response = httpx.get(url).json()
if "Title" in response and "Released" in response:
context.current_task.display = response["Title"]
context.logger.info(
"Movie release date fetched",
title=response["Title"],
released=response["Released"],
)
else:
context.current_task.display = f"NotFound: {self.title}"
context.logger.info("Movie release date not found", title=self.title)
With this fix, and after redeploying the runners with the updated PrintMovieStats implementation, you can retry the job:
job_client.retry(job)
job_client.display(job)
Now the task logs show:
Movie release date not found title="Tilebox - The Movie"
Movie release date fetched title="The Avengers" released="04 May 2012"
The logs confirm that only two tasks were executed, resuming from the point of failure instead of re-executing all tasks.
The job was retried and succeeded. The two tasks that completed before the failure were not re-executed.