Debug a failed workflow run

Use this guide when a workflow job fails, runs slower than expected, or stays queued. Tilebox records job state, task state, logs, traces, and runner context so you can identify whether the problem is in task code, task routing, dependencies, or the runner environment.

Find the job

Open the job in the Tilebox Console, or use the Tilebox command-line tool if you already have the job ID.

tilebox job logs <job-id>
tilebox job spans <job-id>

Check task state first

Start with the task graph. A failed task often points to task code or runtime dependencies. A queued task often points to cluster, runner, or task registration mismatch. Common checks:

The job was submitted to the intended cluster.
A runner is connected to the same cluster.
The runner advertises the submitted task identifier and compatible version.
Any task dependencies are complete.
Retry limits have not been exhausted.

Inspect logs

Logs show messages emitted by task code and runner context attached by Tilebox.

tilebox job logs <job-id>

Use structured log fields in your tasks so the relevant scene ID, product ID, path, or model name appears in the log record.

Inspect traces

Traces show task timing, parent-child relationships, custom spans, and failures.

tilebox job spans <job-id>

Use traces to find slow subtasks, repeated retries, and failures inside a specific custom span.

Fix and rerun

For direct runners, fix the code and restart the runner process. For release runners, publish a fixed release and deploy it. If the fix is compatible with the failed task input schema and task major version, retry the job. If the change is breaking, submit a new job with a new task version.

Retry with a compatible release

Publish a compatible fix, deploy it to the same cluster, and retry failed work.

Inspect workflow runs

Learn how logs, traces, task status, and runner context fit together.

Deploy to your compute Retry a failed job after a compatible fix

Tilebox Guides

Datasets

Workflows

Operations

Debug a failed workflow run

Find the job

Check task state first

Inspect logs

Inspect traces

Fix and rerun

Retry with a compatible release

Inspect workflow runs

​Find the job

​Check task state first

​Inspect logs

​Inspect traces

​Fix and rerun

Retry with a compatible release

Inspect workflow runs

Find the job

Check task state first

Inspect logs

Inspect traces

Fix and rerun