Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

a5c-ai/homoglyph-detector

Name: homoglyph-detector
Author: a5c-ai

library/specializations/security-compliance/skills/homoglyph-detector/SKILL.md

npx skillsauth add a5c-ai/babysitter homoglyph-detector

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Homoglyph Detector

Byte-level forensic analysis of code changes to detect Unicode homoglyph substitutions — characters that look identical to ASCII in every editor and diff tool but have different codepoints, silently breaking string comparisons, dictionary lookups, and identifier resolution.

Purpose

Homoglyph attacks (related to CVE-2021-42574 "Trojan Source") are the highest-stealth trojan technique. A Cyrillic р (U+0440) looks identical to a Latin p (U+0070) in every font, editor, and diff viewer. The only way to detect it is byte-level analysis via hexdump.

This skill pipes git diffs through hexdump -C and scans for multi-byte UTF-8 sequences where single-byte ASCII is expected, particularly in string literals used as dictionary keys, variable names, and identifiers.

Capabilities

Confusable Character Detection

Scans for these high-risk Unicode confusables:

| Latin | Cyrillic | Greek | UTF-8 Bytes | |-------|----------|-------|-------------| | a (61) | а (D0 B0) | α (CE B1) | 1 vs 2 bytes | | c (63) | с (D1 81) | — | 1 vs 2 bytes | | e (65) | е (D0 B5) | ε (CE B5) | 1 vs 2 bytes | | o (6F) | о (D0 BE) | ο (CE BF) | 1 vs 2 bytes | | p (70) | р (D1 80) | ρ (CF 81) | 1 vs 2 bytes | | x (78) | х (D1 85) | χ (CF 87) | 1 vs 2 bytes | | y (79) | у (D1 83) | — | 1 vs 2 bytes |

Zero-Width Character Detection

U+200B — Zero-width space
U+200C — Zero-width non-joiner
U+200D — Zero-width joiner
U+FEFF — Byte order mark (in non-BOM position)

Bidi Control Character Detection (Trojan Source)

U+200F — Right-to-left mark
U+200E — Left-to-right mark
U+202A — Left-to-right embedding
U+202B — Right-to-left embedding
U+202C — Pop directional formatting
U+2066 — Left-to-right isolate
U+2067 — Right-to-left isolate

Context-Aware Analysis

Focuses on string literals (dictionary keys, config values)
Focuses on identifiers (variable names, function names, class names)
Ignores legitimate Unicode in comments, docstrings, and i18n strings
Compares byte patterns between removed (-) and added (+) diff lines

Input Schema

{
  "type": "object",
  "required": ["projectRoot", "changedFiles"],
  "properties": {
    "projectRoot": {
      "type": "string",
      "description": "Absolute path to the git repository"
    },
    "changedFiles": {
      "type": "array",
      "items": { "type": "string" },
      "description": "List of changed file paths to scan"
    },
    "scanMode": {
      "type": "string",
      "enum": ["uncommitted", "commit-range", "branch-diff"],
      "default": "uncommitted"
    },
    "baseRef": { "type": "string" },
    "headRef": { "type": "string" }
  }
}

Output Schema

{
  "type": "object",
  "required": ["filesScanned", "homoglyphsFound", "verdict"],
  "properties": {
    "filesScanned": { "type": "number" },
    "homoglyphsFound": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "file": { "type": "string" },
          "line": { "type": "number" },
          "byteOffset": { "type": "string" },
          "context": { "type": "string" },
          "expectedAscii": { "type": "string" },
          "actualBytes": { "type": "string" },
          "unicodeCodepoint": { "type": "string" },
          "scriptName": { "type": "string" },
          "impact": { "type": "string" }
        }
      }
    },
    "bidiControlChars": { "type": "array" },
    "verdict": {
      "type": "string",
      "enum": ["CLEAN", "HOMOGLYPH_DETECTED"]
    }
  }
}

Detection Method

# Step 1: Pipe git diff through hexdump
git diff <file> | hexdump -C

# Step 2: In added (+) lines, look for multi-byte sequences
# where the removed (-) line had single-byte ASCII
#
# Example — Latin 'p' vs Cyrillic 'р':
# Removed: 22 70 70 67 22   |  "ppg"  |   ← 70 = Latin 'p'
# Added:   22 d1 80 70 67   |  "..pg" |   ← d1 80 = Cyrillic 'р'
#
# The d1 80 bytes where 70 should be = HOMOGLYPH DETECTED

Usage Example

skill: {
  name: 'homoglyph-detector',
  context: {
    projectRoot: '/path/to/project',
    changedFiles: ['backend/app/prediction/temporal.py'],
    scanMode: 'uncommitted'
  }
}

Real-World Example

From adversarial drill #6:

Attack: Dictionary key "ppg" changed to "рpg" (Cyrillic р + Latin pg)
Camouflage: 4 lines of harmless round() wrappers added as decoy
Impact: All dict.get("ppg") lookups return default 0, disabling trend detection
Detection: hexdump -C revealed bytes d1 80 where 70 was expected

Process Files

nation-state-trojan-detection.js — Phase 2: Homoglyph Detection (parallel with semantic analysis)

a5c-ai/homoglyph-detector

library/specializations/security-compliance/skills/homoglyph-detector/SKILL.md

Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code

514 stars

development

Updated Apr 2, 2026

$ install --global

skillsauth

npx skillsauth add a5c-ai/babysitter homoglyph-detector

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 2, 2026, 2:04 PM55.6s2 files scanned

SKILL.md

name:: homoglyph-detector
description:: Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code

Homoglyph Detector

Purpose

Capabilities

Confusable Character Detection

Scans for these high-risk Unicode confusables:

Zero-Width Character Detection

U+200B — Zero-width space
U+200C — Zero-width non-joiner
U+200D — Zero-width joiner
U+FEFF — Byte order mark (in non-BOM position)

Bidi Control Character Detection (Trojan Source)

U+200F — Right-to-left mark
U+200E — Left-to-right mark
U+202A — Left-to-right embedding
U+202B — Right-to-left embedding
U+202C — Pop directional formatting
U+2066 — Left-to-right isolate
U+2067 — Right-to-left isolate

Context-Aware Analysis

Focuses on string literals (dictionary keys, config values)
Focuses on identifiers (variable names, function names, class names)
Ignores legitimate Unicode in comments, docstrings, and i18n strings
Compares byte patterns between removed (-) and added (+) diff lines

Input Schema

{
  "type": "object",
  "required": ["projectRoot", "changedFiles"],
  "properties": {
    "projectRoot": {
      "type": "string",
      "description": "Absolute path to the git repository"
    },
    "changedFiles": {
      "type": "array",
      "items": { "type": "string" },
      "description": "List of changed file paths to scan"
    },
    "scanMode": {
      "type": "string",
      "enum": ["uncommitted", "commit-range", "branch-diff"],
      "default": "uncommitted"
    },
    "baseRef": { "type": "string" },
    "headRef": { "type": "string" }
  }
}

Output Schema

{
  "type": "object",
  "required": ["filesScanned", "homoglyphsFound", "verdict"],
  "properties": {
    "filesScanned": { "type": "number" },
    "homoglyphsFound": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "file": { "type": "string" },
          "line": { "type": "number" },
          "byteOffset": { "type": "string" },
          "context": { "type": "string" },
          "expectedAscii": { "type": "string" },
          "actualBytes": { "type": "string" },
          "unicodeCodepoint": { "type": "string" },
          "scriptName": { "type": "string" },
          "impact": { "type": "string" }
        }
      }
    },
    "bidiControlChars": { "type": "array" },
    "verdict": {
      "type": "string",
      "enum": ["CLEAN", "HOMOGLYPH_DETECTED"]
    }
  }
}

Detection Method

# Step 1: Pipe git diff through hexdump
git diff <file> | hexdump -C

# Step 2: In added (+) lines, look for multi-byte sequences
# where the removed (-) line had single-byte ASCII
#
# Example — Latin 'p' vs Cyrillic 'р':
# Removed: 22 70 70 67 22   |  "ppg"  |   ← 70 = Latin 'p'
# Added:   22 d1 80 70 67   |  "..pg" |   ← d1 80 = Cyrillic 'р'
#
# The d1 80 bytes where 70 should be = HOMOGLYPH DETECTED

Usage Example

skill: {
  name: 'homoglyph-detector',
  context: {
    projectRoot: '/path/to/project',
    changedFiles: ['backend/app/prediction/temporal.py'],
    scanMode: 'uncommitted'
  }
}

Real-World Example

From adversarial drill #6:

Attack: Dictionary key "ppg" changed to "рpg" (Cyrillic р + Latin pg)
Camouflage: 4 lines of harmless round() wrappers added as decoy
Impact: All dict.get("ppg") lookups return default 0, disabling trend detection
Detection: hexdump -C revealed bytes d1 80 where 70 was expected

Process Files

nation-state-trojan-detection.js — Phase 2: Homoglyph Detection (parallel with semantic analysis)

Related Skills

a5c-ai/model-card-generator

development

VerifiedTrustedCommunity

Model documentation skill for generating model cards following Google's model card framework.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/model-card-generator

a5c-ai/mlflow-experiment-tracker

development

VerifiedTrustedCommunity

MLflow integration skill for experiment tracking, model registry, and artifact management. Enables LLMs to log experiments, compare runs, manage model lifecycle, and retrieve artifacts through the MLflow API.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/mlflow-experiment-tracker

a5c-ai/lime-explainer

data-ai

VerifiedTrustedCommunity

LIME-based local explanation skill for individual predictions across tabular, text, and image data.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/lime-explainer

a5c-ai/kubeflow-pipeline-executor

devops

VerifiedTrustedCommunity

Kubeflow Pipelines skill for ML workflow orchestration, component management, and Kubernetes-native ML.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/kubeflow-pipeline-executor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/a5c-ai/babysitter.git

# Copy into Claude Code skills folder (global)
cp -r babysitter/library/specializations/security-compliance/skills/homoglyph-detector ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

a5c-ai/babysitter

514 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT