Jobs
Submission
To execute a task, it must be initialized with concrete inputs and submitted as a job. The task will then run within the context of the job, and if it generates sub-tasks, those will also execute as part of the same job.
After submitting a job, the root task is scheduled for execution, and any eligible task runner can pick it up and execute it.
First, instantiate a job client by calling the jobs
method on the workflow client.
After obtaining a job client, submit a job using the submit method. You need to provide a name for the job, an instance of the root task, and a cluster to execute the root task on.
Once a job is submitted, it’s immediately scheduled for execution. The root task will be picked up and executed as soon as an eligible task runner is available.
Retry Handling
Tasks support retry handling for failed executions. This applies to the root task of a job as well, where you can specify the number of retries using the max_retries
argument of the submit
method.
In this example, if MyFlakyTask
fails, it will be retried up to five times before being marked as failed.
Retrieving a specific job
When you submit a job, it’s assigned a unique identifier that can be used to retrieve it later.
You can use the find
method on the job client to get a job by its ID.
find
is also a useful tool for fetching a jobs state after a while, to check if it’s still running or has already completed.
Visualization
Visualizing the execution of a job can be helpful. The Tilebox workflow orchestrator tracks all tasks in a job, including sub-tasks and dependencies. This enables the visualization of the execution of a job as a graph diagram.
display
is designed for use in an interactive environment such as a Jupyter notebook. In non-interactive environments, use visualize, which returns the rendered diagram as an SVG string.
The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the status of each task.
The color codes for task states are:
Task State | Color | Description |
---|---|---|
Queued | SalmonYellow | The task is queued and waiting for execution. |
Running | Blue | The task is currently being executed. |
Computed | Green | The task has successfully been computed. If a task is computed, and all it’s sub-tasks are also computed, the task is considered completed. |
Failed | Red | The task has been executed but encountered an error. |
Below is another visualization of a job currently being executed by multiple task runners.
This visualization shows:
- The root task,
MyTask
, has executed and spawned three sub-tasks. - At least three task runners are available, as three tasks currently are executed simultaneously.
- The
SubTask
that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed. - The queued
DependentTask
requires theLeafTask
to complete before it can be executed.
Job visualizations are meant for development and debugging. They are not suitable for large jobs with hundreds of tasks, as the diagrams may become too complex. Currently, visualizations are limited to jobs with a maximum of 200 tasks.
Customizing Task Display Names
The text representing a task in the diagram defaults to a tasks class name. You can customize this by modifying the display
field of the current_task
object in the task’s execution context. The maximum length for a display name is 1024 characters, with any overflow truncated. Line breaks using \n
are supported as well.
Cancellation
You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners.
Use the cancel
method on the job client to cancel a job.
A canceled job can be resumed at any time by retrying it.
If any task in a job fails, the job is automatically canceled to avoid executing irrelevant tasks. Future releases will allow configuring this behavior for each task to meet specific requirements.
Retries
If a task fails due to a bug or lack of resources, there is no need to resubmit the entire job. You can simply retry the job, and it will resume from the point of failure. This ensures that all the work that was already done up until the point of the failure isn’t lost.
Future releases may introduce automatic retries for certain failure conditions, which can be useful for handling temporary issues.
Below is an example of a failing job due to a bug in the task’s implementation. The following workflow processes a list of movie titles and queries the OMDb API for each movie’s release date.
Submitting the workflow as a job reveals a bug in the PrintMovieStats
task.
One of the PrintMovieStats
tasks fails with a KeyError
. This error occurs when a movie title is not found by the OMDb API, leading to a response without the Title
and Released
fields.
Console output from the task runners confirms this:
The corrected version of PrintMovieStats
is as follows:
With this fix, and after redeploying the task runners with the updated PrintMovieStats
implementation, you can retry the job:
Now the console output shows:
The output confirms that only two tasks were executed, resuming from the point of failure instead of re-executing all tasks.
The job was retried and succeeded. The two tasks that completed before the failure were not re-executed.
Was this page helpful?