plugins/aem/cloud-service/skills/aem-workflow/workflow-triaging/SKILL.md
Triage AEM Workflow issues on AEM as a Cloud Service by classifying symptoms, gathering the right logs and metrics, and mapping to runbooks or Splunk searches. Use when the user asks for workflow activity/errors on a Cloud Service host, needs to classify a Jira ticket, or wants to know what to collect for workflow debugging.
npx skillsauth add adobe/skills workflow-triagingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Classify workflow issues, determine what logs and data to gather, and map to the correct runbook or log search. Optimized for production support on AEM as a Cloud Service.
Map the user's description to a symptom_id and runbook.
| User says / observes | symptom_id | Runbook | |----------------------|------------|---------| | Workflow not moving to next step; stuck in Running | workflow_stuck_not_progressing | runbook-workflow-stuck.md | | Task should be in Inbox but is not visible | task_not_in_inbox | runbook-task-not-in-inbox.md | | Workflow should start automatically but no instance created | workflow_not_starting_launcher | runbook-launcher-not-starting.md | | Workflow in Failed state or step shows error | workflow_fails_or_shows_error | runbook-workflow-fails-or-shows-error.md | | Step failed after retries; failure item in Inbox | step_failed_retries_exhausted | runbook-failed-work-items.md | | Instance Running but no current work item (inconsistent) | stale_workflow_no_work_item | runbook-stale-workflows.md | | Too many instances; slow queries; disk/repo bloat | repository_bloat_too_many_instances | runbook-purge-and-cleanup.md | | User cannot see work item or complete/delegate/return | user_cannot_see_or_complete_item | runbook-inbox-and-permissions.md | | Cannot delete workflow model (running instances) | cannot_delete_model | runbook-model-delete-and-update.md | | Jobs queued a long time; slow completion; queue depth high | slow_throughput_queue_backlog | runbook-job-throughput-and-concurrency.md | | New or changed workflow not starting or step not executing | workflow_setup_validation | runbook-validate-workflow-setup.md |
Before suggesting a runbook or Splunk search, try to obtain:
| Input | Purpose | |-------|---------| | Host / instance | e.g. cm-p163724-e1759416 (Cloud Service program-environment format). | | Time range | e.g. "past 4 hours", "past 10 hours" – for log/Splunk scope. | | Workflow model or step name | e.g. "Dynamic Media Reupload", "DAM Update Asset", "testmodel". | | Instance ID (if known) | From Workflow console URL or payload; ties logs to one instance. | | Payload path (if known) | e.g. /content/dam/...; for path-related errors. | | Log source | Cloud Manager log download, log streaming, or Splunk index/sourcetype. |
If the user only provides host + time, respond with the generic workflow error searches and note that narrowing by model/instance ID will improve accuracy.
Logs on Cloud Service are accessed via Cloud Manager → Environments → Logs (download or streaming). When logs are in Splunk (or any log aggregator), use these patterns.
| Scenario | Primary log pattern(s) | Splunk hint |
|----------|------------------------|-------------|
| Step failed | Error executing workflow step | Add instance ID or model name to narrow. |
| Process not found | getProcess for '*' failed | Extract process name for OSGi check. |
| Stuck at Process step | Same as step failed + getProcess | Combine with payload path. |
| Stale workflow | Cannot archive workitem | Correlate time with instance. |
| Lock / throughput | wait for a lock or refreshing the session since we had to wait | Timechart by host. |
| Permission | Terminate failed / Resume failed / Suspend failed + verifyAccess | Or AccessControlException. |
| Payload path | PathNotFoundException + workflow/payload | Launcher: "launcher config". |
| Launcher not starting | Error adding launcher config / Error retrieving launcher config entries | Path: /conf/global/settings/workflow/launcher/config. |
| Purge failure | Workflow purge '*' : | Filter by repository exception / invalid state. |
Example Splunk searches (replace index/sourcetype/field names as needed):
index=aem sourcetype=aem:error "Error executing workflow step" | table _time host message | sort - _timeindex=aem "getProcess for" "failed" | table _time host messageindex=aem ("Error executing workflow step" OR WorkflowException) (message=*<modelName>* OR message=*<instanceId>*) | sort - _timeindex=aem "wait for a lock" OR "refreshing the session since we had to wait" | table _time host message| User prompt | Triage response | |-------------|------------------| | "Workflow errors on <host> for the past X hours" | Classify as workflow_fails_or_shows_error / step_failed_retries_exhausted. Search Cloud Manager logs or Splunk for "Error executing workflow step", "Error processing workflow job", "getProcess for … failed" on that host. Route to runbook-workflow-fails-or-shows-error. | | "Workflow activity on <host> for the past X hours" | Clarify: "activity" = counts (started/completed/failed) or list of errors? For errors, use same searches. For counts on Cloud Service, use log aggregation or custom reporting API — no JMX. | | "Why did <workflow-or-step> fail? Show failure details." | Need: host, time range, and if possible instance ID. Search Cloud Manager logs for "Error executing workflow step" + model/step name or instance ID; return exception type, message, and stack. Route to runbook-workflow-fails-or-shows-error. | | "Task not in Inbox" | symptom_id: task_not_in_inbox. Route to runbook-task-not-in-inbox. Gather: instance ID, assignee, whether user is initiator/assignee; check Inbox filters and enforceWorkitemAssigneePermissions. | | "Workflow not starting" | symptom_id: workflow_not_starting_launcher. Route to runbook-launcher-not-starting. Gather: model name, payload path, launcher config path; search logs for launcher errors. | | "Workflow stuck / not progressing" | symptom_id: workflow_stuck_not_progressing. Route to runbook-workflow-stuck. First: Does instance have a current work item? If no → stale. If yes, follow decision tree by step type. |
Can answer (with AEM workflow logs in Cloud Manager / Splunk):
process.label is missing.Cannot answer directly (Cloud Service limitations):
Always pair log-based triage with the appropriate runbook for actions (retry via Inbox, Purge Scheduler config, pipeline deploy).
aem-agent-marketplace-workflow-knowledge-base/docs/debugging-index.mdrunbooks/runbook-decision-guide.mdWorkflow-docs/splunk-workflow-triaging.mddocs/error-patterns.mddevelopment
Start AEM Workflows on AEM as a Cloud Service using all available triggering mechanisms. Use when starting workflows manually via the Timeline UI, programmatically via WorkflowSession.startWorkflow(), via the HTTP Workflow API, through Manage Publication, or passing initial metadata and payload to a workflow instance.
development
Single entry point for all AEM as a Cloud Service Workflow skills. Covers workflow model design, custom process step and participant chooser development, launcher configuration, workflow triggering, and production support including debugging stuck/failed workflows, triaging incidents with Cloud Manager logs, thread pool analysis, and Sling Job diagnostics for the Granite Workflow Engine.
development
[BETA] Implement custom AEM Workflow Java components on AEM as a Cloud Service. This skill is in beta. Verify all outputs before applying them to production projects. Use when writing WorkflowProcess steps, ParticipantStepChooser implementations, registering services via OSGi DS R6 annotations, reading step arguments from MetaDataMap, accessing JCR payload via WorkflowSession adapter, reading and writing workflow metadata and variables, and handling errors with WorkflowException for retry behavior.
development
Start AEM Workflows on AEM 6.5 LTS using all available triggering mechanisms. Use when starting workflows manually via the Timeline UI, programmatically via WorkflowSession.startWorkflow(), via the HTTP Workflow API, through Manage Publication, through replication triggers, or passing initial metadata and payload to a workflow instance.