Submission

To execute a task, it must be initialized with concrete inputs and submitted as a job. The task will then run within the context of the job, and if it generates sub-tasks, those will also execute as part of the same job.

After submitting a job, the root task is scheduled for execution, and any eligible task runner can pick it up and execute it.

First, instantiate a job client by calling the jobs method on the workflow client.

After obtaining a job client, submit a job using the submit method. You need to provide a name for the job, an instance of the root task, and a cluster to execute the root task on.

Once a job is submitted, it’s immediately scheduled for execution. The root task will be picked up and executed as soon as an eligible task runner is available.

Retry Handling

Tasks support retry handling for failed executions. This applies to the root task of a job as well, where you can specify the number of retries using the max_retries argument of the submit method.

In this example, if MyFlakyTask fails, it will be retried up to five times before being marked as failed.

Retrieving a specific job

When you submit a job, it’s assigned a unique identifier that can be used to retrieve it later.

You can use the find method on the job client to get a job by its ID.

find is also a useful tool for fetching a jobs state after a while, to check if it’s still running or has already completed.

Visualization

Visualizing the execution of a job can be helpful. The Tilebox workflow orchestrator tracks all tasks in a job, including sub-tasks and dependencies. This enables the visualization of the execution of a job as a graph diagram.

display is designed for use in an interactive environment such as a Jupyter notebook. In non-interactive environments, use visualize, which returns the rendered diagram as an SVG string.

The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the status of each task.

The color codes for task states are:

Task StateColorDescription
QueuedSalmonThe task is queued and waiting for execution.
RunningBlueThe task is currently being executed.
ComputedGreenThe task has successfully been computed. If a task is computed, and all it’s sub-tasks are also computed, the task is considered completed.
FailedRedThe task has been executed but encountered an error.

Below is another visualization of a job currently being executed by multiple task runners.

This visualization shows:

  • The root task, MyTask, has executed and spawned three sub-tasks.
  • At least three task runners are available, as three tasks currently are executed simultaneously.
  • The SubTask that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed.
  • The queued DependentTask requires the LeafTask to complete before it can be executed.

Job visualizations are meant for development and debugging. They are not suitable for large jobs with hundreds of tasks, as the diagrams may become too complex. Currently, visualizations are limited to jobs with a maximum of 200 tasks.

Customizing Task Display Names

The text representing a task in the diagram defaults to a tasks class name. You can customize this by modifying the display field of the current_task object in the task’s execution context. The maximum length for a display name is 1024 characters, with any overflow truncated. Line breaks using \n are supported as well.

Cancellation

You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners.

Use the cancel method on the job client to cancel a job.

A canceled job can be resumed at any time by retrying it.

If any task in a job fails, the job is automatically canceled to avoid executing irrelevant tasks. Future releases will allow configuring this behavior for each task to meet specific requirements.

Retries

If a task fails due to a bug or lack of resources, there is no need to resubmit the entire job. You can simply retry the job, and it will resume from the point of failure. This ensures that all the work that was already done up until the point of the failure isn’t lost.

Future releases may introduce automatic retries for certain failure conditions, which can be useful for handling temporary issues.

Below is an example of a failing job due to a bug in the task’s implementation. The following workflow processes a list of movie titles and queries the OMDb API for each movie’s release date.

Submitting the workflow as a job reveals a bug in the PrintMovieStats task.

One of the PrintMovieStats tasks fails with a KeyError. This error occurs when a movie title is not found by the OMDb API, leading to a response without the Title and Released fields.

Console output from the task runners confirms this:

Output
The Matrix was released on 31 Mar 1999
Shrek 2 was released on 19 May 2004
ERROR: Task PrintMovieStats failed with exception: KeyError('Title')

The corrected version of PrintMovieStats is as follows:

With this fix, and after redeploying the task runners with the updated PrintMovieStats implementation, you can retry the job:

Now the console output shows:

Output
Could not find the release date for Tilebox - The Movie
The Avengers was released on 04 May 2012

The output confirms that only two tasks were executed, resuming from the point of failure instead of re-executing all tasks.

The job was retried and succeeded. The two tasks that completed before the failure were not re-executed.