skills/agent-desktop-ffi/SKILL.md
C-ABI bindings over agent-desktop's PlatformAdapter. Consumers (Python ctypes, Swift, Node ffi-napi, Go cgo, C++, Ruby fiddle) link libagent_desktop_ffi.{dylib,so,dll} and call `ad_*` functions directly instead of spawning the CLI binary per call.
npx skillsauth add lahfir/agent-desktop agent-desktop-ffiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Direct C-ABI access to every PlatformAdapter operation. Build the
cdylib with the workspace's release-ffi profile:
cargo build --profile release-ffi -p agent-desktop-ffi
The output is target/release-ffi/libagent_desktop_ffi.dylib
(.so on Linux, .dll on Windows) plus a committed C header at
crates/ffi/include/agent_desktop.h.
Four reference topics, loaded as needed:
*mut T the FFI hands back to the caller.Main thread only (macOS). Call every adapter-touching entrypoint
(ad_get_tree, ad_resolve_element, ad_execute_action,
ad_screenshot, clipboard, launch/close, window ops, observation,
notifications, etc.) from the process's main thread. The FFI enforces
this at runtime in every build profile — a worker-thread call
returns AD_RESULT_ERR_INTERNAL with a diagnostic last-error. On
non-macOS platforms the check is a compile-time true; there is no
runtime cost.
Release profile. cargo build --release produces
panic = "abort" — any Rust panic inside an extern "C" fn will
SIGABRT the host. Use --profile release-ffi to get the correct
panic = "unwind" profile. CI enforces this.
Last-error lifetime. Pointers returned by ad_last_error_*
remain valid across any number of subsequent successful FFI calls
on the same thread. Only the next failing call rotates them. Cache
the pointer once, read it as many times as you need.
Handle release. Every ad_resolve_element result must be
released with ad_free_handle(adapter, handle) on the same adapter
that produced it. On macOS this balances the internal CFRetain;
on Windows/Linux the call is a no-op but safe to issue.
Action policy. ad_execute_action uses the headless policy by
default, matching CLI ref commands: no focus stealing and no cursor
movement. Use ad_execute_action_with_policy(..., AD_POLICY_KIND_FOCUS_FALLBACK, ...) only when focus-changing behavior is
intended, and AD_POLICY_KIND_PHYSICAL only for explicit physical/headed
input semantics.
Text input privacy. On macOS, explicit focus/physical policy can use the clipboard briefly for non-ASCII text insertion. Keep the default headless policy or set values directly for sensitive text when the target supports it.
Enum discriminants. Every #[repr(i32)] enum field is validated
at the C boundary — invalid discriminants return
AD_RESULT_ERR_INVALID_ARGS instead of undefined behavior.
ABI is unstable before 1.0. The header lists the exact current shapes. Anything added or reordered in a later patch is a breaking change; pin the version of libagent_desktop_ffi you link against.
ad_get_tree returns a raw adapter tree, not the CLI snapshot.
Ref IDs are always null, no skeleton/drill-down pipeline is wired
through, and interactive_only / compact follow adapter
semantics which may diverge slightly from the CLI's post-processed
shape. Use ad_find + ad_get / ad_is for point lookups, or
invoke the CLI if you need CLI-parity JSON snapshots.
tools
Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard, manage notifications). Covers 54 commands across observation, interaction, keyboard/mouse, app lifecycle, notifications (macOS), clipboard, wait, and a `skills` command that prints these bundled docs straight from the binary. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports the macOS Phase 1 adapter, with Windows and Linux planned against the same core contracts.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------