跳转至

Visual Execution Trace

Visual execution trace helps determine whether the model received a screenshot, what action it returned, whether the output was parsed, and whether the action executed.

vision_screenshot_ready

A screenshot was captured successfully for this visual execution turn.

{
  "task_id": "task-20260506-001",
  "size": 384221,
  "attached": true,
  "ts": "2026-05-06T09:00:10.100+0800",
  "event": "vision_screenshot_ready"
}
Field Meaning
size Screenshot size in bytes. Greater than 0 usually means the screenshot is valid.
attached Whether the screenshot was attached to the model request.

If you see vision_screenshot_failed or vision_screenshot_error, screenshot capture failed and the model may not be able to observe the page.

llm_prompt_vision_act

A prompt is about to be sent to the model for visual execution.

{
  "task_id": "task-20260506-001",
  "state": "VISION_ACT",
  "attempt": 1,
  "prompt": "You are controlling an Android phone...",
  "ts": "2026-05-06T09:00:10.200+0800",
  "event": "llm_prompt_vision_act"
}
Field Meaning
state Current phase, usually VISION_ACT.
attempt Attempt number. The model may be retried after parse or request failures.
prompt Prompt sent to the model. Long but useful for difficult issues.

llm_response_vision_act

The model returned a raw response.

{
  "task_id": "task-20260506-001",
  "state": "VISION_ACT",
  "attempt": 1,
  "response": "<Observing>Home page...</Observing><command>TAP 540 1860</command>",
  "ts": "2026-05-06T09:00:12.300+0800",
  "event": "llm_response_vision_act"
}
Field Meaning
response Raw model output. It shows what the model understood and planned to do.
attempt Which attempt this response belongs to.

llm_structured_vision_act

The model output was parsed into structured fields. This is easier to read than the raw response.

{
  "task_id": "task-20260506-001",
  "state": "VISION_ACT",
  "data": {
    "Observing": "The app home page is visible",
    "Ovserve_result": "A check-in entry is visible near the bottom",
    "Thinking": "Need to enter the check-in page",
    "action": "Tap the check-in entry",
    "expected": "The check-in page opens",
    "command": "TAP 540 1860"
  },
  "command": "TAP 540 1860",
  "ts": "2026-05-06T09:00:12.360+0800",
  "event": "llm_structured_vision_act"
}
Field Meaning
data.Observing What the model observed on the screen.
data.Thinking Why the model wants to act this way.
data.action Human-readable action description.
data.expected What the model expects after the action.
command Actual command passed to the executor.

vision_retry

A retryable problem happened, such as model request failure, action execution failure, or invalid output format.

{
  "task_id": "task-20260506-001",
  "state": "VISION_ACT",
  "phase": "parse",
  "attempt": 1,
  "max_attempts": 3,
  "error": "missing <command> tag",
  "retrying": true,
  "ts": "2026-05-06T09:00:12.500+0800",
  "event": "vision_retry"
}
Field Meaning
phase Where the problem happened, such as planner_call, parse, command_args, or action_exec.
attempt Current attempt number.
max_attempts Maximum attempts.
error / reason Failure reason. Field name may vary by phase.
retrying Whether AutoLXB will retry.

vision_instruction_invalid

The model output could not be parsed after retries, so visual execution failed.

{
  "task_id": "task-20260506-001",
  "state": "VISION_ACT",
  "error": "missing <command> tag",
  "ts": "2026-05-06T09:00:15.100+0800",
  "event": "vision_instruction_invalid"
}

If this appears often, the selected model may not follow the required visual execution format, or the prompt/output format is not compatible with that model.