skills/arize-compliance-audit/SKILL.md
INVOKE THIS SKILL when auditing an AI agent or LLM app for regulatory compliance. Covers EU AI Act, GPAI Code of Practice, GDPR, NIST AI RMF, Colorado AI Act, HIPAA, and ISO 42001. Scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces an actionable remediation checklist tailored to the selected frameworks.
npx skillsauth add arize-ai/arize-skills arize-compliance-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user wants to audit their AI agent or LLM application for regulatory compliance. The skill scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces a tailored checklist with optional remediation.
Triggers: "audit my app for compliance", "EU AI Act requirements", "NIST AI RMF checklist", "GDPR for AI", "is my AI app compliant", "compliance checklist", "regulatory audit", "ISO 42001", "AI management system", "AIMS certification".
Before doing anything else, present this disclaimer verbatim to the user:
⚠️ Legal disclaimer
This audit is for guidance only and does not constitute legal advice or a complete compliance assessment. It identifies common technical patterns and gaps based on publicly available regulatory frameworks, but cannot assess your organisation's specific legal obligations, contractual commitments, data processing agreements, or operational processes.
Do not rely on this output as a substitute for qualified legal counsel. Regulatory compliance is a complex, jurisdiction-specific, and fact-dependent determination. Always engage a qualified attorney or compliance specialist for binding assessments.
Before scanning code, determine which compliance frameworks apply.
Use the AskUserQuestion tool to ask the user which frameworks apply. Do not infer or auto-select — always ask explicitly.
Ask:
Which compliance frameworks should this audit cover?
Select all that apply (reply with numbers, e.g. "1, 3"):
1. EU frameworks — EU AI Act, GPAI Code of Practice, GDPR
(choose if end-users or data subjects are located in the EU)
2. US frameworks — NIST AI RMF, state laws (Colorado AI Act, NYC LL144),
HIPAA (if processing health data)
(choose if operating in the United States)
3. ISO 42001 — International AI Management System standard
(choose if pursuing ISO 42001 certification, operating globally,
or wanting an internationally recognised baseline)
You can select any combination. If unsure, select all that seem relevant
and we can narrow down during the audit.
Based on the selection:
Use the AskUserQuestion tool to ask: What does your AI application do?
Based on the use case and selected frameworks:
Present a brief summary:
Frameworks selected: {EU / US / ISO 42001 / combination}
Use case: {category}
Risk tier: {EU tier if applicable} / {US tier if applicable}
Applicable: {list of specific regulations and standards}
ISO 42001 note: {if selected} Audit covers technically-auditable controls only;
organisational clauses will be flagged but not code-audited.
Then proceed directly to Phase 1.
Do not write any code or create any files during this phase.
Systematically scan the codebase for evidence of compliance and gaps across seven domains. For each domain, run the listed searches and record findings.
What to look for:
AI, artificial intelligence, automated, bot, machine learning, generated by, powered by in UI templates, API responses, and user-facing codeSignals of concern: Absence of any AI disclosure in user-facing code, especially if the application generates content or makes recommendations.
What to look for:
email, phone, ssn, social_security, date_of_birth, address, name in prompts, context, or retrieved documentsinput.value or output.value could contain personal data sent to Arize without redactionconsent, opt-in, opt-out, gdpr, ccpa referencesright_to_access, right_to_erasure, data_subject_request, data_protection_officerWhat to look for:
guardrails-ai, nemo-guardrails, rebuff, lakera), content filtering, system prompt protectionapi_key, secret, password, token literals in source files (not env var references)What to look for:
pytest-based evals, experiment infrastructureWhat to look for:
MODEL_CARD.md, model_card.json, model_card.yaml, or similarWhat to look for:
arize-otel, register(), TracerProvider, opentelemetry, openinference importsWhat to look for:
gpt-4-0613) or using latest / unversioned identifiersPresent a two-part report:
Part 1 — Summary table
| Domain | Evidence found | Gaps identified | Rating | |---|---|---|---| | A. Transparency | {findings} | {gaps} | Compliant / Partial / Non-compliant / N/A | | B. Data protection | {findings} | {gaps} | ... | | C. Security | {findings} | {gaps} | ... | | D. Testing | {findings} | {gaps} | ... | | E. Documentation | {findings} | {gaps} | ... | | F. Monitoring | {findings} | {gaps} | ... | | G. Vendor management | {findings} | {gaps} | ... |
Part 2 — Gap detail (required for every Non-compliant or Partial rating)
For each domain rated Non-compliant or Partial, write a dedicated subsection that includes:
user_email before the OTLP exporter fires", not just "add PII redaction").Minimum one subsection per Non-compliant/Partial domain. Do not omit this section — it is the primary value of the audit for engineering teams.
Then proceed directly to Phase 2.
Using the Phase 1 findings and the template in references/compliance-checklist-template.md, generate a tailored compliance checklist.
Compliant. Items with gaps: mark as Non-compliant with a concrete remediation suggestion.guardrails-ai to validate LLM inputs and outputs against your content policy".Present a single consolidated report with four sections:
Section 1 — Audit scope (Phase 0 summary)
Section 2 — Codebase findings (Phase 1 summary table)
Section 3 — Gap detail (Phase 1 expanded)
Section 4 — Compliance checklist (Phase 2)
When the user asks for a report file, write a single markdown file to /tmp/<app-name>-compliance-audit-<YYYY-MM-DD>.md containing all four sections.
After presenting the report, offer Phase 3 remediation.
After presenting the checklist, offer to implement specific fixes. Always use the AskUserQuestion tool to confirm before making any changes.
Add dependencies — offer to install:
guardrails-ai, nemo-guardrails)presidio-analyzer, scrubadub)Insert code — offer to add:
Create documentation templates — offer to scaffold:
Configure monitoring — offer to set up via related skills:
arize-evaluator skill)arize-instrumentation skill)AskUserQuestion tool to get confirmation before applying.When gaps identified in Phase 1 or 2 require capabilities from other Arize skills, offer to invoke them. Always use the AskUserQuestion tool to ask before invoking another skill and explain why it is relevant to the compliance gap.
| Gap | Skill to invoke | Why |
|---|---|---|
| No tracing / incomplete audit trail | arize-instrumentation | EU Art. 12 and NIST MAN-2.1 require event logging; Arize tracing provides this |
| No bias or safety evaluation | arize-evaluator | Create LLM-as-judge evaluators for fairness, content safety, or quality monitoring |
| Need trace export for compliance evidence | arize-trace | Export spans for regulatory documentation or incident investigation |
| Need human review for high-risk decisions | arize-annotation | Set up annotation queues for human oversight per EU Art. 14 |
| Need deep link to share compliance evidence | arize-link | Generate URLs to specific traces, spans, or evaluations for stakeholder review |
If Arize tracing is already set up, verify it meets compliance requirements:
| Compliance need | Required trace data | What to check |
|---|---|---|
| Audit trail for AI decisions | All LLM spans with input/output | Verify all LLM client calls are instrumented, not just some |
| Data subject access requests | User ID attribute on spans | Check for user.id or custom user identifier attribute |
| PII in traces | Sensitive data in input.value/output.value | Check if PII passes through unredacted — flag if so |
| Incident investigation | Error spans with full context | Check for exception tracking and error status on spans |
| Retention requirements | Trace data retained for required period | EU: appropriate period (min 6 months for high-risk); HIPAA: 6 years |
| Bias monitoring | Demographic or group attributes | Check for metadata attributes that enable fairness analysis |
If Arize tracing is not set up, this is a significant compliance gap. Offer: "Shall I run the arize-instrumentation skill to set up audit-trail tracing? Regulatory frameworks (EU AI Act Art. 12, NIST AI RMF MAN-2.1) require event logging for AI systems."
| Resource | URL | |---|---| | EU AI Act full text | https://eur-lex.europa.eu/eli/reg/2024/1689/oj | | GPAI Code of Practice | https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai | | Code of Practice portal | https://code-of-practice.ai/ | | NIST AI RMF | https://www.nist.gov/artificial-intelligence/ai-risk-management-framework | | Colorado AI Act (SB24-205) | https://leg.colorado.gov/bills/sb24-205 | | NYC Local Law 144 | https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page | | HIPAA | https://www.hhs.gov/hipaa/index.html | | ISO/IEC 42001:2023 | https://www.iso.org/standard/42001.html | | Arize AX Docs | https://arize.com/docs/ax |
tools
Manages Arize users, organizations, spaces, projects, roles, role bindings, resource restrictions, and API keys via the ax CLI. Use for enterprise admin workflows: inviting and offboarding users, onboarding new teams, creating custom roles for SAML/SSO mappings, assigning roles to users, restricting project-level access, and managing service keys for multi-tenant architectures. Covers ax users, ax organizations, ax spaces, ax projects, ax roles, ax role-bindings, and ax api-keys.
tools
Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation using the ax CLI. Use when the user wants to look at existing trace data, see what their LLM app is doing, export traces, download spans, investigate errors, or analyze behavior regressions.
data-ai
INVOKE THIS SKILL for Arize Prompt Hub and `ax prompts` workflows: author or import templates and save (Workflows A–B), label/promote (C), or list/get/edit/delete/duplicate (D). Use when the user mentions ax prompts, Prompt Hub, creating/editing/saving a prompt, `{variable}` placeholders, or production/staging labels. For improving prompt text using traces or eval scores, use arize-prompt-optimization. For running experiments, use arize-experiment.
tools
Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.