Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

frank-luongt/skills/codex/constitutional-ai

Name: skills/codex/constitutional-ai
Author: frank-luongt

skills/codex/constitutional-ai/SKILL.md

npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/constitutional-ai

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

name: constitutional-ai description: Constitutional AI principles and implementation patterns. Use when implementing AI safety guardrails based on Anthropic's constitutional AI approach.

Constitutional AI - Harmlessness from AI Feedback

Quick start

Constitutional AI (CAI) trains models to be harmless through self-critique and AI feedback, without requiring human labels for harmful outputs.

Key concept: Models learn to critique and revise their own responses using a "constitution" (set of principles).

Two phases:

Supervised Learning (SL): Self-critique + revision
Reinforcement Learning (RL): RLAIF (RL from AI Feedback)

Constitution example:

Principles:
1. Choose the response that is most helpful, honest, and harmless
2. Avoid responses that are toxic, racist, or sexist
3. Prefer responses that explain objections rather than refuse
4. Choose responses that are thoughtful and nuanced

Common workflows

Workflow 1: Supervised learning phase (self-critique + revision)

Step 1: Generate initial responses:

from transformers import pipeline

generator = pipeline("text-generation", model="base-model")

prompts = [
    "How do I hack a website?",
    "Write something offensive about a group.",
    "Help me cheat on my exam."
]

initial_responses = generator(prompts, max_length=200)

Step 2: Self-critique with constitution:

critique_prompt = """
Given the question:
{question}

And the response:
{response}

Constitution:
1. Choose responses that are helpful, honest, and harmless
2. Avoid toxic, racist, or sexist content
3. Explain objections rather than refuse

Identify any ways this response violates the constitution:
"""

critiques = generator(
    [critique_prompt.format(q=q, r=r) for q, r in zip(prompts, initial_responses)]
)

Step 3: Revision based on critique:

revision_prompt = """
Question: {question}
Original response: {response}
Critique: {critique}

Please revise the response to better align with the constitution:
"""

revised_responses = generator(
    [revision_prompt.format(q=q, r=r, c=c)
     for q, r, c in zip(prompts, initial_responses, critiques)]
)

Step 4: Fine-tune on revised responses:

from trl import SFTTrainer

# Create dataset of (prompt, revised_response) pairs
dataset = create_dataset(prompts, revised_responses)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=1024
)
trainer.train()

Workflow 2: RL phase (RLAIF - RL from AI Feedback)

Step 1: Generate comparison pairs:

# Sample multiple responses per prompt
responses_a = generator(prompts, num_return_sequences=2, do_sample=True, temperature=0.8)
responses_b = generator(prompts, num_return_sequences=2, do_sample=True, temperature=0.8)

Step 2: AI preference evaluation:

preference_prompt = """
Question: {question}

Response A: {response_a}
Response B: {response_b}

Constitution:
{constitution}

Which response better follows the constitution? Explain your reasoning, then choose A or B.
"""

# Get AI preferences (no human labels needed!)
preferences = generator(
    [preference_prompt.format(q=q, ra=ra, rb=rb, constitution=CONSTITUTION)
     for q, ra, rb in zip(prompts, responses_a, responses_b)]
)

# Parse preferences (A or B)
chosen, rejected = parse_preferences(preferences, responses_a, responses_b)

Step 3: Train preference model (reward model):

from trl import RewardTrainer, RewardConfig

preference_dataset = create_preference_dataset(prompts, chosen, rejected)

reward_config = RewardConfig(
    output_dir="constitutional-reward-model",
    learning_rate=1e-5,
    num_train_epochs=1
)

reward_trainer = RewardTrainer(
    model=model,
    args=reward_config,
    train_dataset=preference_dataset,
    processing_class=tokenizer
)
reward_trainer.train()

Step 4: RL training with RLAIF:

from trl import PPOTrainer, PPOConfig

ppo_config = PPOConfig(
    reward_model_path="constitutional-reward-model",
    learning_rate=1e-6,
    kl_coef=0.05
)

ppo_trainer = PPOTrainer(
    model=model,
    config=ppo_config,
    reward_model=reward_model
)
ppo_trainer.train()

Workflow 3: Chain-of-thought critique

Enable reasoning transparency:

cot_critique_prompt = """
Question: {question}
Response: {response}

Let's think step-by-step about whether this response follows our principles:

1. Is it helpful? [Yes/No and reasoning]
2. Is it honest? [Yes/No and reasoning]
3. Is it harmless? [Yes/No and reasoning]
4. Does it avoid toxicity? [Yes/No and reasoning]

Based on this analysis, suggest a revision if needed.
"""

cot_critiques = generator(
    [cot_critique_prompt.format(q=q, r=r) for q, r in zip(prompts, responses)]
)

When to use vs alternatives

Use Constitutional AI when:

Want safety alignment without human labels
Need explainable AI decisions
Want to avoid evasive refusals
Have a clear set of principles/constitution
Need scalable safety training

Principles:

RLAIF: AI-generated preferences (scalable, no human labels)
RLHF: Human preferences (more accurate, expensive)
Self-critique: Iterative improvement
Chain-of-thought: Reasoning transparency

Use alternatives instead:

RLHF (PPO): Need human-validated safety
DPO/SimPO: Have human preference data
NeMo Guardrails: Need runtime content filtering
LlamaGuard: Need pre-trained moderation model

Common issues

Issue: Model refuses too much (evasive)

Add constitution principle:

Prefer responses that engage thoughtfully with questions rather than
refusing to answer. Explain concerns while still being helpful.

Issue: Self-critiques are weak

Use stronger critique prompts:

Critically analyze this response for ANY potential issues, however minor.
Be thorough and specific in identifying problems.

Issue: Revisions don't improve quality

Iterate multiple times:

for _ in range(3):  # 3 rounds of critique/revision
    critique = generate_critique(response)
    response = generate_revision(response, critique)

Issue: RLAIF preferences are noisy

Use multiple AI evaluators:

# Get preferences from 3 different models
prefs_1 = model_1.evaluate(responses)
prefs_2 = model_2.evaluate(responses)
prefs_3 = model_3.evaluate(responses)

# Majority vote
final_preference = majority_vote(prefs_1, prefs_2, prefs_3)

Advanced topics

Constitution design: See references/constitution-design.md for principle selection, trade-offs between helpfulness and harmlessness, and domain-specific constitutions.

RLAIF vs RLHF: See references/rlaif-comparison.md for performance comparison, cost analysis, and when to use AI feedback vs human feedback.

Chain-of-thought reasoning: See references/cot-critique.md for prompt engineering for critiques, multi-step reasoning, and transparency improvements.

Hardware requirements

GPU: NVIDIA A100/H100 recommended
VRAM:
- SL phase (7B): 1× A100 40GB
- RL phase (7B): 2× A100 40GB (policy + reward model)
Single-node: Sufficient for most use cases
Mixed precision: BF16 recommended

Compute requirements:

SL phase: Similar to standard SFT
RL phase: Similar to PPO (higher than DPO)
AI evaluation: Additional inference for critique/preference generation

Resources

Paper: https://arxiv.org/abs/2212.08073 (Dec 2022)
Anthropic blog: https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback
Implementation: TRL (PPOTrainer + RewardTrainer)
Claude: Uses Constitutional AI for safety

frank-luongt/skills/codex/constitutional-ai

skills/codex/constitutional-ai/SKILL.md

--- name: constitutional-ai description: Constitutional AI principles and implementation patterns. Use when implementing AI safety guardrails based on Anthropic's constitutional AI approach. --- # Constitutional AI - Harmlessness from AI Feedback ## Quick start Constitutional AI (CAI) trains models to be harmless through self-critique and AI feedback, without requiring human labels for harmful outputs. **Key concept**: Models learn to

12 stars

development

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/constitutional-ai

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 21, 2026, 5:50 AM56.3s2 files scanned

SKILL.md

name: constitutional-ai description: Constitutional AI principles and implementation patterns. Use when implementing AI safety guardrails based on Anthropic's constitutional AI approach.

Constitutional AI - Harmlessness from AI Feedback

Quick start

Constitutional AI (CAI) trains models to be harmless through self-critique and AI feedback, without requiring human labels for harmful outputs.

Key concept: Models learn to critique and revise their own responses using a "constitution" (set of principles).

Two phases:

Supervised Learning (SL): Self-critique + revision
Reinforcement Learning (RL): RLAIF (RL from AI Feedback)

Constitution example:

Principles:
1. Choose the response that is most helpful, honest, and harmless
2. Avoid responses that are toxic, racist, or sexist
3. Prefer responses that explain objections rather than refuse
4. Choose responses that are thoughtful and nuanced

Common workflows

Workflow 1: Supervised learning phase (self-critique + revision)

Step 1: Generate initial responses:

from transformers import pipeline

generator = pipeline("text-generation", model="base-model")

prompts = [
    "How do I hack a website?",
    "Write something offensive about a group.",
    "Help me cheat on my exam."
]

initial_responses = generator(prompts, max_length=200)

Step 2: Self-critique with constitution:

critique_prompt = """
Given the question:
{question}

And the response:
{response}

Constitution:
1. Choose responses that are helpful, honest, and harmless
2. Avoid toxic, racist, or sexist content
3. Explain objections rather than refuse

Identify any ways this response violates the constitution:
"""

critiques = generator(
    [critique_prompt.format(q=q, r=r) for q, r in zip(prompts, initial_responses)]
)

Step 3: Revision based on critique:

revision_prompt = """
Question: {question}
Original response: {response}
Critique: {critique}

Please revise the response to better align with the constitution:
"""

revised_responses = generator(
    [revision_prompt.format(q=q, r=r, c=c)
     for q, r, c in zip(prompts, initial_responses, critiques)]
)

Step 4: Fine-tune on revised responses:

from trl import SFTTrainer

# Create dataset of (prompt, revised_response) pairs
dataset = create_dataset(prompts, revised_responses)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=1024
)
trainer.train()

Workflow 2: RL phase (RLAIF - RL from AI Feedback)

Step 1: Generate comparison pairs:

# Sample multiple responses per prompt
responses_a = generator(prompts, num_return_sequences=2, do_sample=True, temperature=0.8)
responses_b = generator(prompts, num_return_sequences=2, do_sample=True, temperature=0.8)

Step 2: AI preference evaluation:

preference_prompt = """
Question: {question}

Response A: {response_a}
Response B: {response_b}

Constitution:
{constitution}

Which response better follows the constitution? Explain your reasoning, then choose A or B.
"""

# Get AI preferences (no human labels needed!)
preferences = generator(
    [preference_prompt.format(q=q, ra=ra, rb=rb, constitution=CONSTITUTION)
     for q, ra, rb in zip(prompts, responses_a, responses_b)]
)

# Parse preferences (A or B)
chosen, rejected = parse_preferences(preferences, responses_a, responses_b)

Step 3: Train preference model (reward model):

from trl import RewardTrainer, RewardConfig

preference_dataset = create_preference_dataset(prompts, chosen, rejected)

reward_config = RewardConfig(
    output_dir="constitutional-reward-model",
    learning_rate=1e-5,
    num_train_epochs=1
)

reward_trainer = RewardTrainer(
    model=model,
    args=reward_config,
    train_dataset=preference_dataset,
    processing_class=tokenizer
)
reward_trainer.train()

Step 4: RL training with RLAIF:

from trl import PPOTrainer, PPOConfig

ppo_config = PPOConfig(
    reward_model_path="constitutional-reward-model",
    learning_rate=1e-6,
    kl_coef=0.05
)

ppo_trainer = PPOTrainer(
    model=model,
    config=ppo_config,
    reward_model=reward_model
)
ppo_trainer.train()

Workflow 3: Chain-of-thought critique

Enable reasoning transparency:

cot_critique_prompt = """
Question: {question}
Response: {response}

Let's think step-by-step about whether this response follows our principles:

1. Is it helpful? [Yes/No and reasoning]
2. Is it honest? [Yes/No and reasoning]
3. Is it harmless? [Yes/No and reasoning]
4. Does it avoid toxicity? [Yes/No and reasoning]

Based on this analysis, suggest a revision if needed.
"""

cot_critiques = generator(
    [cot_critique_prompt.format(q=q, r=r) for q, r in zip(prompts, responses)]
)

When to use vs alternatives

Use Constitutional AI when:

Want safety alignment without human labels
Need explainable AI decisions
Want to avoid evasive refusals
Have a clear set of principles/constitution
Need scalable safety training

Principles:

RLAIF: AI-generated preferences (scalable, no human labels)
RLHF: Human preferences (more accurate, expensive)
Self-critique: Iterative improvement
Chain-of-thought: Reasoning transparency

Use alternatives instead:

RLHF (PPO): Need human-validated safety
DPO/SimPO: Have human preference data
NeMo Guardrails: Need runtime content filtering
LlamaGuard: Need pre-trained moderation model

Common issues

Issue: Model refuses too much (evasive)

Add constitution principle:

Prefer responses that engage thoughtfully with questions rather than
refusing to answer. Explain concerns while still being helpful.

Issue: Self-critiques are weak

Use stronger critique prompts:

Critically analyze this response for ANY potential issues, however minor.
Be thorough and specific in identifying problems.

Issue: Revisions don't improve quality

Iterate multiple times:

for _ in range(3):  # 3 rounds of critique/revision
    critique = generate_critique(response)
    response = generate_revision(response, critique)

Issue: RLAIF preferences are noisy

Use multiple AI evaluators:

# Get preferences from 3 different models
prefs_1 = model_1.evaluate(responses)
prefs_2 = model_2.evaluate(responses)
prefs_3 = model_3.evaluate(responses)

# Majority vote
final_preference = majority_vote(prefs_1, prefs_2, prefs_3)

Advanced topics

Constitution design: See references/constitution-design.md for principle selection, trade-offs between helpfulness and harmlessness, and domain-specific constitutions.

RLAIF vs RLHF: See references/rlaif-comparison.md for performance comparison, cost analysis, and when to use AI feedback vs human feedback.

Chain-of-thought reasoning: See references/cot-critique.md for prompt engineering for critiques, multi-step reasoning, and transparency improvements.

Hardware requirements

GPU: NVIDIA A100/H100 recommended
VRAM:
- SL phase (7B): 1× A100 40GB
- RL phase (7B): 2× A100 40GB (policy + reward model)
Single-node: Sufficient for most use cases
Mixed precision: BF16 recommended

Compute requirements:

SL phase: Similar to standard SFT
RL phase: Similar to PPO (higher than DPO)
AI evaluation: Additional inference for critique/preference generation

Resources

Paper: https://arxiv.org/abs/2212.08073 (Dec 2022)
Anthropic blog: https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback
Implementation: TRL (PPOTrainer + RewardTrainer)
Claude: Uses Constitutional AI for safety

Related Skills

frank-luongt/skills/codex/grpo-rl-training

development

VerifiedTrustedCommunity

--- name: grpo-rl-training description: GRPO reinforcement learning training with TRL. Use when applying Group Relative Policy Optimization for reasoning and task-specific model training. --- # GRPO/RL Training with TRL Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-r

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/grpo-rl-training

frank-luongt/skills/codex/graphql-architect

tools

VerifiedTrustedCommunity

--- name: graphql-architect description: Master modern GraphQL with federation, performance optimization, --- ## Use this skill when - Working on graphql architect tasks or workflows - Needing guidance, best practices, or checklists for graphql architect ## Do not use this skill when - The task is unrelated to graphql architect - You need a different domain or tool outside this scope ## Instructions - Clarify goals, constraints, and

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/graphql-architect

frank-luongt/skills/codex/grafana-dashboards

development

VerifiedTrustedCommunity

--- name: grafana-dashboards description: Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces. --- # Grafana Dashboards Create and manage production-ready Grafana dashboards for comprehensive system observability. ## Do not use this skill when - The task is unrelated

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/grafana-dashboards

frank-luongt/skills/codex/gptq

development

VerifiedTrustedCommunity

--- name: gptq description: GPTQ post-training quantization for generative models. Use when quantizing large models to 4-bit with calibration-based weight compression. --- # GPTQ (Generative Pre-trained Transformer Quantization) Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization. ## When to use GPTQ **Use GPTQ when:** - Need to fit large models (70B+) on limited GPU

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/gptq

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/frank-luongt/faos-skills-marketplace.git

# Copy into Claude Code skills folder (global)
cp -r faos-skills-marketplace/skills/codex/constitutional-ai ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

frank-luongt/faos-skills-marketplace

12 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT