Vertiqa Help← Back to app

Walkthrough: handle a failed or blocked agent run

Inspect run events, checkpoints, policy decisions, and budget state before retrying or cancelling.

Use this walkthrough when Command Center shows a failed run, a stuck run, or an agent action that was blocked by policy or budget.

Real-life example

An RFQ draft agent was scheduled to prepare a quote packet for Metro Equipment, but Command Center shows the run failed. The operator needs to know whether to retry, cancel, or fix setup before trying again.

Start from Command Center

  1. Open the failed-run item in Command Center.
  2. Read why the run needs attention.
  3. Open the linked run in Agent Runs.
  4. Do not retry from habit. Inspect the run first.

Inspect the run

Check these panels in order:

  1. Status and timestamps - confirm whether the run failed, stalled, completed, retried, or was cancelled.
  2. Run events - find the last successful step.
  3. Error - read the operator-visible failure reason.
  4. Checkpoints - look for saved progress before the failure.
  5. Policy decisions - see whether the run was blocked by governance.
  6. Budget usage - confirm whether spend or quota limits affected the run.
  7. Artifacts - check whether partial output exists and should be reviewed manually.

Retry, cancel, or fix

Retry when the failure was temporary: dispatch failed, provider was unavailable, or a worker dependency has recovered.

Cancel when the work is no longer needed, the wrong record was targeted, or repeated retries would create noise.

Fix setup first when the failure came from missing data, rejected policy, budget overage, malformed input, or an unavailable connector.

Expected result after action

After retry, a new run attempt should appear in Agent Runs with updated events. If the retry succeeds, Command Center should stop showing the same failure. If it fails again at the same step, escalate rather than retrying repeatedly.

After cancel, the run should show a cancelled status and the audit trail should show the operator action.

Failure paths

If there are no run events, the problem may be dispatch or worker pickup. Check schedule state and whether other runs are moving.

If policy blocked the run, update policy only when the attempted action is safe for that agent and entity type.

If budget blocked the run, review whether the output is worth the spend before raising limits.

Success criteria

The run is no longer stuck without an owner decision. It is either retried, cancelled, or waiting on a specific setup fix, with run history and audit context preserved.

Related

Last reviewed