Jobs

What is a Job?

Submission

To execute a task, it must be initialized with concrete inputs and submitted as a job. The task will then run within the context of the job, and if it generates sub-tasks, those will also execute as part of the same job. After submitting a job, the root task is scheduled for execution, and any eligible task runner can pick it up and execute it. First, instantiate a job client by calling the jobs method on the workflow client.

from tilebox.workflows import Client

client = Client()
job_client = client.jobs()

After obtaining a job client, submit a job using the submit method. You need to provide a name for the job, an instance of the root task, and an optional cluster to execute the root task on.

# import your own workflow
from my_workflow import MyTask

job = job_client.submit('my-job', MyTask("some", "parameters"))

Once a job is submitted, it’s immediately scheduled for execution. The root task will be picked up and executed as soon as an eligible task runner is available.

Retry Handling

Tasks support retry handling for failed executions. This applies to the root task of a job as well, where you can specify the number of retries using the max_retries argument of the submit method.

from my_workflow import MyFlakyTask

job = job_client.submit('my-job', MyFlakyTask(), max_retries=5)

In this example, if MyFlakyTask fails, it will be retried up to five times before being marked as failed.

Submitting to a specific cluster

Jobs default to running on the default cluster. You can specify another cluster to run the root task on using the cluster argument of the submit method.

from my_workflow import MyFlakyTask

job = job_client.submit('my-job', MyFlakyTask(), cluster="dev-cluster")

Only runners listening on the specified cluster can pick up the task.

Querying jobs

You can query jobs in a given time range using the query method on the job client.

jobs = job_client.query(("2025-01-01", "2025-02-01"))
print(jobs)

Retrieving a specific job

When you submit a job, it’s assigned a unique identifier that can be used to retrieve it later. You can use the find method on the job client to get a job by its ID.

job = job_client.submit('my-job', MyTask("some", "parameters"))
print(job.id)  # 018dd029-58ca-74e5-8b58-b4f99d610f9a

# Later, in another process or machine, retrieve job info
job = job_client.find("018dd029-58ca-74e5-8b58-b4f99d610f9a")

find is also a useful tool for fetching a jobs state after a while, to check if it’s still running or has already completed.

States

A job can be in one of the following states:

QUEUED: the job is queued and waiting for execution
STARTED: at least one task of the job has been started
COMPLETED: all tasks of the job have been completed

from tilebox.workflows.data import JobState

job = job_client.find("018dd029-58ca-74e5-8b58-b4f99d610f9a")

print("Job is queued:", job.state == JobState.QUEUED)

Output

Job is queued: True

Visualization

Visualizing the execution of a job can be helpful. The Tilebox workflow orchestrator tracks all tasks in a job, including sub-tasks and dependencies. This enables the visualization of the execution of a job as a graph diagram.

display is designed for use in an interactive environment such as a Jupyter notebook. In non-interactive environments, use visualize, which returns the rendered diagram as an SVG string.

Visualization isn’t supported in Go yet.

job = job_client.find("some-job-id")  # or a recently submitted job
# Then visualize it
job_client.display(job)

The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the status of each task.

The color codes for task states are:

Task State	Color	Description
Queued	SalmonYellow	The task is queued and waiting for execution.
Running	Blue	The task is currently being executed.
Computed	Green	The task has successfully been computed. If a task is computed, and all it’s sub-tasks are also computed, the task is considered completed.
Failed	Red	The task has been executed but encountered an error.

Below is another visualization of a job currently being executed by multiple task runners.

This visualization shows:

The root task, MyTask, has executed and spawned three sub-tasks.
At least three task runners are available, as three tasks currently are executed simultaneously.
The SubTask that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed.
The queued DependentTask requires the LeafTask to complete before it can be executed.

Job visualizations are meant for development and debugging. They are not suitable for large jobs with hundreds of tasks, as the diagrams may become too complex. Currently, visualizations are limited to jobs with a maximum of 200 tasks.

Customizing Task Display Names

The text representing a task in the diagram defaults to a tasks class name. You can customize this by modifying the display field of the current_task object in the task’s execution context. The maximum length for a display name is 1024 characters, with any overflow truncated. Line breaks using \n are supported as well.

from tilebox.workflows import Task, ExecutionContext

class RootTask(Task):
    num_subtasks: int

    def execute(self, context: ExecutionContext):
        context.current_task.display = f"Root({self.num_subtasks})"
        for i in range(self.num_subtasks):
            context.submit_subtask(SubTask(i))

class SubTask(Task):
    index: int

    def execute(self, context: ExecutionContext):
        context.current_task.display = f"Leaf Nr. {self.index}"

job = job_client.submit('custom-display-names', RootTask(3))
job_client.display(job)

Cancellation

You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners. Use the cancel method on the job client to cancel a job.

job = job_client.submit('my-job', MyTask())
# After a short while, the job gets canceled
job_client.cancel(job)

A canceled job can be resumed at any time by retrying it.

If any task in a job fails, the job is automatically canceled to avoid executing irrelevant tasks. Future releases will allow configuring this behavior for each task to meet specific requirements.

Retries

If a task fails due to a bug or lack of resources, there is no need to resubmit the entire job. You can simply retry the job, and it will resume from the point of failure. This ensures that all the work that was already done up until the point of the failure isn’t lost.

Future releases may introduce automatic retries for certain failure conditions, which can be useful for handling temporary issues.

Below is an example of a failing job due to a bug in the task’s implementation. The following workflow processes a list of movie titles and queries the OMDb API for each movie’s release date.

from urllib.parse import urlencode
import httpx
from tilebox.workflows import Task, ExecutionContext

class MoviesStats(Task):
    titles: list[str]

    def execute(self, context: ExecutionContext) -> None:
        for title in self.titles:
            context.submit_subtask(PrintMovieStats(title))

class PrintMovieStats(Task):
    title: str

    def execute(self, context: ExecutionContext) -> None:
        params = {"t": self.title, "apikey": "<OMDB API Key>"}
        url = "http://www.omdbapi.com/?" + urlencode(params)
        response = httpx.get(url).json()
        # set the display name of the task to the title of the movie:
        context.current_task.display = response["Title"]
        print(f"{response['Title']} was released on {response['Released']}")

Submitting the workflow as a job reveals a bug in the PrintMovieStats task.

job = job_client.submit('movies-stats', MoviesStats([
    "The Matrix",
    "Shrek 2",
    "Tilebox - The Movie",
    "The Avengers",
]))

job_client.display(job)

One of the PrintMovieStats tasks fails with a KeyError. This error occurs when a movie title is not found by the OMDb API, leading to a response without the Title and Released fields. Console output from the task runners confirms this:

Output

The Matrix was released on 31 Mar 1999
Shrek 2 was released on 19 May 2004
ERROR: Task PrintMovieStats failed with exception: KeyError('Title')

The corrected version of PrintMovieStats is as follows:

class PrintMovieStats(Task):
    title: str

    def execute(self, context: ExecutionContext) -> None:
        params = {"t": self.title, "apikey": "<OMDB API Key>"}
        url = "http://www.omdbapi.com/?" + urlencode(params)
        response = httpx.get(url).json()
        if "Title" in response and "Released" in response:
            context.current_task.display = response["Title"]
            print(f"{response['Title']} was released on {response['Released']}")
        else:
            context.current_task.display = f"NotFound: {self.title}"
            print(f"Could not find the release date for {self.title}")

With this fix, and after redeploying the task runners with the updated PrintMovieStats implementation, you can retry the job:

job_client.retry(job)
job_client.display(job)

Now the console output shows:

Output

Could not find the release date for Tilebox - The Movie
The Avengers was released on 04 May 2012

The output confirms that only two tasks were executed, resuming from the point of failure instead of re-executing all tasks.

The job was retried and succeeded. The two tasks that completed before the failure were not re-executed.

Get Started

Datasets

Storage

Workflows

Submission

Retry Handling

Submitting to a specific cluster

Querying jobs

Retrieving a specific job

States

Visualization

Customizing Task Display Names

Cancellation

Retries

Get Started

Datasets

Storage

Workflows

​Submission

​Retry Handling

​Submitting to a specific cluster

​Querying jobs

​Retrieving a specific job

​States

​Visualization

​Customizing Task Display Names

​Cancellation

​Retries

Submission

Retry Handling

Submitting to a specific cluster

Querying jobs

Retrieving a specific job

States

Visualization

Customizing Task Display Names

Cancellation

Retries