layers/faculties/vision/SKILL.md
# Vision Faculty — Sense Perceive and interpret visual content natively through your model's vision capability. You can receive images, screenshots, diagrams, charts, and video frames as part of a conversation — treat them as a natural input channel, not an exception. --- ## When to Engage Vision **Always engage** when the user shares an image — do not ask for a text description if you can perceive the image directly. **Proactively describe** relevant visual content when it materially affec
npx skillsauth add acnlabs/openpersona layers/faculties/visionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Perceive and interpret visual content natively through your model's vision capability. You can receive images, screenshots, diagrams, charts, and video frames as part of a conversation — treat them as a natural input channel, not an exception.
Always engage when the user shares an image — do not ask for a text description if you can perceive the image directly.
Proactively describe relevant visual content when it materially affects your response:
Do not narrate your own perception process ("I am now analyzing the image..."). Engage with the content directly.
When vision is unavailable (model does not support vision, image failed to load, or no image was shared):
node scripts/state-sync.js signal capability_gap '{"need":"vision","reason":"image shared but model cannot process it","priority":"high"}'
| Scenario | Behavior | |---|---| | User shares image with no text | Describe what you perceive, then invite the user's question | | User shares image with a question | Answer the question using the visual content | | User asks about an image you cannot see | Acknowledge the limitation, ask for description | | Multiple images in one message | Address each one, or focus on the one most relevant to the question | | Image contains text (OCR use case) | Read and use the text; note if portions are illegible | | Chart or diagram | Interpret the data/structure, not just the visual layout |
Vision capability is declared in body.runtime.modalities (e.g. { "type": "vision", "provider": "claude-vision" }). The provider determines what image formats and sizes are accepted. No separate script is required — vision is a native model capability. If the declared provider differs from your active model, emit a capability_gap signal.
tools
Audit any OpenPersona (or peer LLM-agent) persona in three complementary modes: structural (CLI, deterministic, CI-friendly: 4 Layers × 5 Systemic Concepts × Constitution gate with role-aware severity), semantic white-box (LLM reads pack-content JSON and scores Soul-narrative quality via rubrics), and semantic black-box (LLM evaluates a remote agent it cannot read on disk, via A2A handshake / consent-probe / passive observation, with confidence caps). Produces quality reports with dimension scores, strengths, and actionable improvements. Use when asked to evaluate, audit, score, review, self-review, peer-review, or black-box review an agent.
tools
Distill any commercial entity into a personalized brand agent — a living brand persona with authentic voice, declared service capabilities, and a standard service contract. Every commercial entity has a brand: a name, a style, a way of showing up in the world. This skill exists so that a street vendor, a family clinic, and a global chain can all have their own agent on equal footing. Supports both distillation from existing brand content and declaration from scratch.
development
A local-first personal AI double framework that helps users build, govern, and evolve their own digital self with clear
development
A complete pipeline to build your AI Second Me: distill your identity from personal data, grow a private knowledge base, train a local model, and govern what gets shared.